opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-03-28 13:43:12 -04:00

Author	SHA1	Message	Date
Jeff Roberson	91e31c3c08	Consistently use busy and vm_page_valid() rather than touching page bits directly. This improves API compliance, asserts, etc. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23283	2020-01-23 04:54:49 +00:00
Jeff Roberson	1eb13fce84	Block the thread lock in sched_throw() and use cpu_switch() to unblock it. The introduction of lockless switch in r355784 created a race to re-use the exiting thread that was only possible to hit on a hypervisor. Reported/Tested by: rlibby Discussed with: rlibby, jhb	2020-01-23 03:36:50 +00:00
Gleb Smirnoff	ad3980121b	DEVICE_POLLING is an alternative to network interrupts and also needs to enter epoch. Assert that in the netisr_poll() and do the work for the idle poll routine.	2020-01-23 01:30:50 +00:00
Gleb Smirnoff	511d1afb6b	Enter the network epoch for interrupt handlers of INTR_TYPE_NET. Provide tunable to limit how many times handlers may be executed without reentering epoch. Differential Revision: https://reviews.freebsd.org/D23242	2020-01-23 01:24:47 +00:00
Gleb Smirnoff	c4eb66309f	Add ie_hflags to struct intr_event, which accumulates flags from all handlers on this event. For now handle only IH_ENTROPY in that manner.	2020-01-23 01:20:59 +00:00
Conrad Meyer	4577cf3744	cpufreq(4): Add support for Intel Speed Shift Intel Speed Shift is Intel's technology to control frequency in hardware, with hints from software. Let's get a working version of this in the tree and we can refine it from here. Submitted by: bwidawsk, scottph Reviewed by: bcr (manpages), myself Discussed with: jhb, kib (earlier versions) With feedback from: Greg V, gallatin, freebsdnewbie AT freenet.de Relnotes: yes Differential Revision: https://reviews.freebsd.org/D18028	2020-01-22 23:28:42 +00:00
Hans Petter Selasky	1f69a50940	Make sure the VNET is properly set when calling tcp_drop() from the ktls taskqueue callback function. A valid VNET is needed when updating statistics. panic() tcp_state_change() tcp_drop() ktls_reset_send_tag() taskqueue_run_locked() taskqueue_thread_loop() Sponsored by: Mellanox Technologies	2020-01-21 11:43:25 +00:00
Mateusz Guzik	6403455301	cache: revert r352613 now that vhold does not take locks	2020-01-20 19:52:23 +00:00
Mateusz Guzik	8bba93c7e0	cache: make numcachehv use counter(9) on all archs Requested by: kib	2020-01-20 14:42:11 +00:00
Jeff Roberson	d6e13f3b4d	Don't hold the object lock while calling getpages. The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033	2020-01-19 23:47:32 +00:00
Mateusz Guzik	a9099e5b10	vfs: switch vop_stdunlock to call lockmgr_unlock Since the flags argument is now alawys 0 the new call provides the same behavior.	2020-01-19 21:41:34 +00:00
Jeff Roberson	811d05fcb7	Provide an API for interlocked refcount sleeps. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22908	2020-01-19 18:18:17 +00:00
Mateusz Guzik	28479aaae2	vfs: allow v_holdcnt to transition 0->1 without the interlock Since r356672 ("vfs: rework vnode list management") there is nothing to do apart from altering freevnodes count, but this much can be safely done based on the result of atomic_fetchadd. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D23186	2020-01-19 17:47:04 +00:00
Mateusz Guzik	059cb4843b	cache: counter_u64_add_protected -> counter_u64_add Fixes booting on RISC-V where it does happen to not be equivalent. Reported by: lwhsu	2020-01-19 17:05:26 +00:00
Mateusz Guzik	1399033590	cache: convert numcachehv to counter(9) on 64-bit platforms	2020-01-19 05:37:27 +00:00
Mateusz Guzik	512fa9a4e0	vfs: plug a conditional assigment of lo_name in getnewvnode It only matters for witness. No functional changes.	2020-01-19 05:36:45 +00:00
Kyle Evans	05d7dd739c	sysent targets: further cleanup and deduplication r355473 vastly improved the readability and cleanliness of these Makefiles. Every single one of them follows the same pattern and duplicates the exact same logic. Now that we have GENERATED/SRCS, split SRCS up into the two parameters we'll use for ${MAKESYSCALLS} rather than assuming a specific ordering of SRCS and include a common sysent.mk to handle the rest. This makes it less tedious to make sweeping changes. Some default values are provided for GENERATED/SYSENT_*; almost all of these just use a 'syscalls.master' and 'syscalls.conf' in cwd, and they all use effectively the same filenames with an arbitrary prefix. Most ABIs will be able to get away with just setting GENERATED_PREFIX and including ^/sys/conf/sysent.mk, while others only need light additions. kern/Makefile is the notable exception, as it doesn't take a SYSENT_CONF and the generated files are spread out between ^/sys/kern and ^/sys/sys, but it otherwise fits the pattern enough to use the common version. Reviewed by: brooks, imp Nice!: emaste Differential Revision: https://reviews.freebsd.org/D23197	2020-01-18 20:37:45 +00:00
Mateusz Guzik	2d0c620272	vfs: distribute freevnodes counter per-cpu It gets rolled up to the global when deferred requeueing is performed. A dedicated read routine makes sure to return a value only off by a certain amount. This soothes a global serialisation point for all 0<->1 hold count transitions. Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23235	2020-01-18 01:29:02 +00:00
Mateusz Guzik	d3cc535474	vfs: provide F_ISUNIONSTACK as a kludge for libc Prior to introduction of this op libc's readdir would call fstatfs(2), in effect unnecessarily copying kilobytes of data just to check fs name and a mount flag. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D23162	2020-01-17 14:42:25 +00:00
Mateusz Guzik	1ad72b270c	vfs: shorten lock hold time in vdbatch_process	2020-01-17 14:39:00 +00:00
Gleb Smirnoff	66c6c556b6	Change argument order of epoch_call() to more natural, first function, then its argument. Reviewed by: imp, cem, jhb	2020-01-17 06:10:24 +00:00
Mateusz Guzik	66f67d5e5e	vfs: increment numvnodes without the vnode list lock unless under pressure The vnode list lock is only needed to reclaim free vnodes or kick the vnlru thread (or to block and not miss a wake up (but note the sleep has a timeout so this would not be a correctness issue)). Try to get away without the lock by just doing an atomic increment. The lock is contended e.g., during poudriere -j 104 where about half of all acquires come from vnode allocation code. Note the entire scheme needs a rewrite, the above just reduces it's SMP impact. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23140	2020-01-16 21:45:21 +00:00
Mateusz Guzik	b7f50b9ad1	vfs: refcator vnode allocation Semantics are almost identical. Some code is deduplicated and there are fewer memory accesses. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D23158	2020-01-16 21:43:13 +00:00
Mateusz Guzik	875cfc082d	vfs: reimplement vlrureclaim to actually use LRU Take advantage of global ordering introduced in r356672. Reviewed by: mckusick (previous version) Differential Revision: https://reviews.freebsd.org/D23067	2020-01-16 10:44:02 +00:00
Jeff Roberson	a81c400e75	Simplify VM and UMA startup by eliminating boot pages. Instead use careful ordering to allocate early pages in the same way boot pages were but only as needed. After the KVA allocator has started up we allocate the KVA that we consumed during boot. This also makes the boot pages freeable since they have vm_page structures allocated with the rest of memory. Parts of this patch were written and tested by markj. Reviewed by: glebius, markj Differential Revision: https://reviews.freebsd.org/D23102	2020-01-16 05:01:21 +00:00
Kirk McKusick	bbb1e07d65	Peter Holm reports that his test that does an umount(8) on an active mount point while numerous tests are running that are writing to files on that mount point cause the unmount(8) to hang forever. The unmount(8) system call is handled in the kernel by the dounmount() function. The cause of the hang is that prior to dounmount() calling VFS_UNMOUNT() it is calling VFS_SYNC(mp, MNT_WAIT). The MNT_WAIT flag indicates that VFS_SYNC() should not return until all the dirty buffers associated with the mount point have been written to disk. Because user processes are allowed to continue writing and can do so faster than the data can be written to disk, the call to VFS_SYNC() can never finish. Unlike VFS_SYNC(), the VFS_UNMOUNT() routine can suspend all processes when they request to do a write thus having a finite number of dirty buffers to write that cannot be expanded. There is no need to call VFS_SYNC() before calling VFS_UNMOUNT(), because VFS_UNMOUNT() needs to flush everything again anyway after suspending writes, to catch anything that was dirtied between the VFS_SYNC() and writes being suspended. The fix is to simply remove the unnecessary call to VFS_SYNC() from dounmount(). Reported by: Peter Holm Analysis by: Chuck Silvers Tested by: Peter Holm MFC after: 7 days Sponsored by: Netflix	2020-01-15 18:53:32 +00:00
Gleb Smirnoff	9074694339	Since this code uses if_ref()/if_rele() it must include if_var.h explicitly, not via header pollution.	2020-01-15 03:39:11 +00:00
Gleb Smirnoff	3264dcadc9	- Move global network epoch definition to epoch.h, as more different subsystems tend to need to know about it, and including if_var.h is huge header pollution for them. Polluting possible non-network users with single symbol seems much lesser evil. - Remove non-preemptible network epoch. Not used yet, and unlikely to get used in close future.	2020-01-15 03:34:21 +00:00
Mateusz Guzik	cda3176851	vfs: in vop_stdadd_writecount only vlazy vnodes on mounts using msync The only reason to vlazy there is to (overzealously) ensure all vnodes which need to be visited by msync scan can be found there. In particluar this is of no use zfs and tmpfs. While here depessimize the check.	2020-01-15 01:34:05 +00:00
Ryan Libby	51871224c0	malloc: remove assumptions about MINALLOCSIZE Remove assumptions about the minimum MINALLOCSIZE, in order to allow testing of smaller MINALLOCSIZE. A following patch will lower the MINALLOCSIZE, but not so much that the present patch is required for correctness at these sites. Reviewed by: jeff, markj Sponsored by: Dell EMC Isilon	2020-01-14 02:14:02 +00:00
Konstantin Belousov	fedab1b499	Code must not unlock a mutex while owning the thread lock. Reviewed by: hselasky, markj Sponsored by: Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23150	2020-01-13 14:30:19 +00:00
Mateusz Guzik	0c236d3d52	vfs: per-cpu batched requeuing of free vnodes Constant requeuing adds significant lock contention in certain workloads. Lessen the problem by batching it. Per-cpu areas are locked in order to synchronize against UMA freeing memory. vnode's v_mflag is converted to short to prevent the struct from growing. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 122.38s user 1780.45s system 6242% cpu 30.480 total patched: 144.84s user 985.90s system 4856% cpu 23.282 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22998	2020-01-13 02:39:41 +00:00
Mateusz Guzik	cc3593fbd9	vfs: rework vnode list management The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997	2020-01-13 02:37:25 +00:00
Mateusz Guzik	57083d2576	vfs: add per-mount vnode lazy list and use it for deferred inactive + msync This obviates the need to scan the entire active list looking for vnodes of interest. msync is handled by adding all vnodes with write count to the lazy list. deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag. Vnodes get dequeued from the list when their hold count reaches 0. Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that spurious locking is avoided in the common case. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22995	2020-01-13 02:34:02 +00:00
Conrad Meyer	365cd52245	Fix a typo in r356667 comment No functional change. Reported by: bdragon Approved by: csprng(markm), earlier version X-MFC-With: r356667	2020-01-12 23:52:16 +00:00
Conrad Meyer	86def3dcd6	getrandom(2): Add Linux GRND_INSECURE API flag Treat it as a synonym for GRND_NONBLOCK. The reasoning is this: We have two choices for handling Linux's GRND_INSECURE API flag. 1. We could ignore it completely (like GRND_RANDOM). However, this might produce the surprising result of GRND_INSECURE requests blocking, when the Linux API does not block. 2. Alternatively, we could treat GRND_INSECURE requests as requests for GRND_NONBLOCk. Here, the surprising result for Linux programs is that invocations with unseeded random(4) will produce EAGAIN, rather than garbage. Honoring the flag in the way Linux does seems fraught. If we actually use the output of a random(4) implementation prior to seeding, we leak some entropy (in an information theory and also practical sense) from what will be the initial seed to attackers (or allow attackers to arbitrary DoS initial seeding, if we don't leak). This seems unacceptable -- it defeats the purpose of blocking on initial seeding. Secondary to that concern, before seeding we may have arbitrarily little entropy collected; producing output from zero or a handful of entropy bits does not seem particularly useful to userspace. If userspace can accept garbage, insecure, non-random bytes, they can create their own insecure garbage with srandom(time(NULL)) or similar. Any program which would be satisfied with a 3-bit key CTR stream has no need for CSPRNG bytes. So asking the kernel to produce such an output from the secure getrandom(2) API seems inane. For now, we've elected to emulate GRND_INSECURE as an alternative spelling of GRND_NONBLOCK (2). Consider this API not-quite stable for now. We guarantee it will never block. But we will attempt to monitor actual port uptake of this bizarre API and may revise our plans for the unseeded behavior (prior stable/13 branching). Approved by: csprng(markm), manpages(bcr) See also: https://lwn.net/ml/linux-kernel/cover.1577088521.git.luto@kernel.org/ See also: https://lwn.net/ml/linux-kernel/20200107204400.GH3619@mit.edu/ Differential Revision: https://reviews.freebsd.org/D23130	2020-01-12 20:47:38 +00:00
Edward Tomasz Napierala	ca603bb1ee	dd kern_getpriority(), make Linuxulator use it. Reviewed by: kib, emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22842	2020-01-12 14:25:44 +00:00
Edward Tomasz Napierala	7a0ef283e6	Add kern_setpriority(), use it in Linuxulator. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22841	2020-01-12 13:38:51 +00:00
Mateusz Guzik	d199ad3b44	Add "panicked" boolean which can be tested instead of panicstr The test is performed all the time and reading entire panicstr to do it wastes space.	2020-01-12 06:09:10 +00:00
Mateusz Guzik	879e0604ee	Add KERNEL_PANICKED macro for use in place of direct panicstr tests	2020-01-12 06:07:54 +00:00
Mateusz Guzik	91de98e6d4	vfs: only recalculate watermarks when limits are changing Previously they would get recalculated all the time, in particular in: getnewvnode -> vcheckspace -> vspace	2020-01-11 23:00:57 +00:00
Mateusz Guzik	e6ae744e0e	vfs: deduplicate vnode allocation logic This creates a dedicated routine (vn_alloc) to allocate vnodes. As a side effect code duplicationw with getnewvnode_reserve is eleminated. Add vn_free for symmetry.	2020-01-11 22:59:44 +00:00
Mateusz Guzik	b52d50cf69	vfs: prealloc vnodes in getnewvnode_reserve Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations.	2020-01-11 22:58:14 +00:00
Mateusz Guzik	6928306764	vfs: incomplete pass at converting more ints to u_long Most notably numvnodes and freevnodes were u_long, but parameters used to govern them remained as ints.	2020-01-11 22:56:20 +00:00
Mateusz Guzik	bf62296f35	vfs: add missing CLTFLA_MPSAFE annotations This covers all kern/vfs_*.c files.	2020-01-11 22:55:12 +00:00
Kyle Evans	1171c633fb	Set .ORDER for makesyscalls generated files When either makesyscalls.lua or syscalls.master changes, all of the ${GENERATED} targets are now out-of-date. With make jobs > 1, this means we will run the makesyscalls script in parallel for the same ABI, generating the same set of output files. Prior to r356603 , there is a large window for interlacing output for some of the generated files that we were generating in-place rather than staging in a temp dir. After that, we still should't need to run the script more than once per-ABI as the first invocation should update all of them. Add .ORDER to do so cleanly. Reviewed by: brooks Discussed with: sjg Differential Revision: https://reviews.freebsd.org/D23099	2020-01-10 18:24:17 +00:00
Mark Johnston	dc727127f1	Change malloc_domain() to return the allocation size to the caller. Otherwise the malloc type accounting in malloc_domainset(9) is wrong after r355203. Reviewed by: rlibby Reported by: kaktus Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23095	2020-01-09 15:02:48 +00:00
Kyle Evans	6a38cd3a54	kern/Makefile: systrace_args.c is also generated	2020-01-09 06:10:25 +00:00
Kyle Evans	39eae263cd	shmfd: posix_fallocate(2): only take rangelock for section we need Other mechanisms that resize the shmfd grab a write lock from 0 to OFF_MAX for safety, so we still get proper synchronization of shmfd->shm_size in effect. There's no need to block readers/writers of earlier segments when we're just reserving more space, so narrow the scope -- it would likely be safe to narrow it completely to just the section of the range that extends beyond our current size, but this likely isn't worth it since the size isn't stable until the writelock is granted the first time. Suggested by: cem (passing comment)	2020-01-09 04:03:17 +00:00
Kyle Evans	f10405323a	posixshm: implement posix_fallocate(2) Linux expects to be able to use posix_fallocate(2) on a memfd. Other places would use this with shm_open(2) to act as a smarter ftruncate(2). Test has been added to go along with this. Reviewed by: kib (earlier version) Differential Revision: https://reviews.freebsd.org/D23042	2020-01-08 19:08:44 +00:00

1 2 3 4 5 ...

17117 commits