syscallenter() has a slow path to handle syscall auditing and dtrace
syscall tracing. It uses AUDIT_SYSCALL_ENTER() to check whether to take
the slow path, but this macro also has side effects: it writes the audit
log entry. When systrace (dtrace syscall tracing) is enabled, this
would get short-circuited, and we end up not writing audit log entries.
Introduce a pure macro to check whether auditing is enabled, use it in
syscallenter() instead of AUDIT_SYSCALL_ENTER().
Approved by: so
Security: FreeBSD-EN-25:02.audit
Reviewed by: kib
Reported by: Joe Duin <jd@firexfly.com>
Fixes: 2f7292437d ("Merge audit and systrace checks")
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D48448
(cherry picked from commit f78fe93085)
(cherry picked from commit 4b9ba274d7)
The sockaddr passed to ktrcapfail() may be smaller than
sizeof(struct sockaddr), and the trailing bytes in the sockaddr
structure will be uninitialized, whereupon they get copied out to
userspace.
Approved by: so
Security: FreeBSD-SA-25:04.ktrace
PR: 283673
Reviewed by: jfree, emaste
Reported by: Yichen Chai <yichen.chai@gmail.com>
Reported by: Zhuo Ying Jiang Li <zyj20@cl.cam.ac.uk>
Fixes: 9bec841312 ("ktrace: Record detailed ECAPMODE violations")
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D48499
(cherry picked from commit 5b86888bae)
(cherry picked from commit 99d5ee8738)
This is derived from swills@ fork of the Juniper virtfs with many
changes by me including bug fixes, style improvements, clearer layering
and more consistent logging. The filesystem is renamed to p9fs to better
reflect its function and to prevent possible future confusion with
virtio-fs.
Several updates and fixes from Juniper have been integrated into this
version by Val Packett and these contributions along with the original
Juniper authors are credited below.
To use this with bhyve, add 'virtio_p9fs_load=YES' to loader.conf. The
bhyve virtio-9p device allows access from the guest to files on the host
by mapping a 'sharename' to a host path. It is possible to use p9fs as a
root filesystem by adding this to /boot/loader.conf:
vfs.root.mountfrom="p9fs:sharename"
for non-root filesystems add something like this to /etc/fstab:
sharename /mnt p9fs rw 0 0
In both examples, substitute the share name used on the bhyve command
line.
The 9P filesystem protocol relies on stateful file opens which map
protocol-level FIDs to host file descriptors. The FreeBSD vnode
interface doesn't really support this and we use heuristics to guess the
right FID to use for file operations. This can be confused by privilege
lowering and does not guarantee that the FID created for a given file
open is always used for file operations, even if the calling process is
using the file descriptor from the original open call. Improving this
would involve changes to the vnode interface which is out-of-scope for
this import.
Differential Revision: https://reviews.freebsd.org/D41844
Reviewed by: kib, emaste, dch
MFC after: 3 months
Co-authored-by: Val Packett <val@packett.cool>
Co-authored-by: Ka Ho Ng <kahon@juniper.net>
Co-authored-by: joyu <joyul@juniper.net>
Co-authored-by: Kumara Babu Narayanaswamy <bkumara@juniper.net>
This commit also includes the original refactoring changes
This change allows the kernel to operate with the default netisr cpu-affinity settings while having RSS compiled in. Normally, RSS changes quite a bit of the behaviour of the kernel dispatch service - this change allows for reducing impact on incompatible hardware while preserving the option to boost throughput speeds based on packet flow CPU affinity.
Make sure to compile the following options in the kernel:
options RSS
As well as setting the following sysctls:
net.inet.rss.enabled: 1
net.isr.bindthreads: 1
net.isr.maxthreads: -1 (automatically sets it to the number of CPUs)
And optionally (to force a 1:1 mapping between CPUs and buckets):
net.inet.rss.bits: 3 (for 8 CPUs)
net.inet.rss.bits: 2 (for 4 CPUs)
etc.
Set pin_default_swi to 0 by default in the RSS case.
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date: Thu Sep 14 16:13:01 2023 +0000
vfs: don't provoke recycling non-free vnodes without a good reason
If the total number of free vnodes is at or above target, there is no
point creating more of them.
This commit was done as a performance optimization but ends up
causing slowdowns when doing operations on many files.
Approved by: re (cperciva)
Requested by: re (cperciva)
(cherry picked from commit ab05a1cf32)
(cherry picked from commit 2ca9c96dc0)
[MFC note: This is not a direct cherry-pick due to API changes in HEAD
which are not present in stable/14.]
We need to call into INTRNG to activate all interrupts on platforms that
use it. Currently, interrupts are only activated in the nexus drivers for
INTRNG platforms, but this does not handle other bus devices such as
gpiobus that manage their own IRQ space.
Reported by: cperciva
Reviewed by: cperciva, jhb
Approved by: re (kib)
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D47282
(cherry picked from commit c85855a72d)
(cherry picked from commit c54bdf84d5)
so_peerlabel can only be used when the socket is not listening.
Reviewed by: markj
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46755
(cherry picked from commit 2fb778fab8)
If vattr.va_size is 0, we will end up accessing invalid memory. This is
mostly harmless (because malloc(0) still allocates some memory), but it
triggers a KASAN report.
PR: 282268
Reviewed by: christos, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47240
(cherry picked from commit b5149b2653)
Some drivers, in particular anything which creates an ifnet during
attach, need to have the current VNET set, as if_attach_internal() and
its callees access VNET-global variables.
device_probe_and_attach() handles this, but this is not the only way to
arrive in DEVICE_ATTACH. In particular, bus drivers may invoke
device_attach() directly, as does devctl2's DEV_ENABLE ioctl handler.
So, set the current VNET in device_attach() instead.
I believe it is always safe to use vnet0, as devctl2 ioctls are not
permitted within a jail.
PR: 282168
Reviewed by: zlei, kevans, bz, imp, glebius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47174
(cherry picked from commit f4e35c044c)
There has been a documented case in the exports(5) man
page forever, which specifies that the -maproot or -mapall
may have a single user entry, followed by a ':'.
This case is defined as specifying no groups (aka cr_ngroups == 0).
This patch fixes the NFS server so that it handles this case correctly.
After MFC'ng this patch to stable/13 and stable/14, I propose that
this unusual case be deprecated and no longer allowed in FreeBSD15.
At that point, this patch can be reverted.
(cherry picked from commit caa309c881)
This is a feature which allows one to splice two TCP sockets together
such that data which arrives on one socket is automatically pushed into
the send buffer of the spliced socket. This can be used to make TCP
proxying more efficient as it eliminates the need to copy data into and
out of userspace.
The interface is copied from OpenBSD, and this implementation aims to be
compatible. Splicing is enabled by setting the SO_SPLICE socket option.
When spliced, data that arrives on the receive buffer is automatically
forwarded to the other socket. In particular, splicing is a
unidirectional operation; to splice a socket pair in both directions,
SO_SPLICE needs to be applied to both sockets. More concretely, when
setting the option one passes the following struct:
struct splice {
int fd;
off_t max;
struct timveval idle;
};
where "fd" refers to the socket to which the first socket is to be
spliced, and two setsockopt(SO_SPLICE) calls are required to set up a
bi-directional splice.
select(), poll() and kevent() do not return when data arrives in the
receive buffer of a spliced socket, as such data is expected to be
removed automatically once space is available in the corresponding send
buffer. Userspace can perform I/O on spliced sockets, but it will be
unpredictably interleaved with splice I/O.
A splice can be configured to unsplice once a certain number of bytes
have been transmitted, or after a given time period. Once unspliced,
the socket behaves normally from userspace's perspective. The number of
bytes transmitted via the splice can be retrieved using
getsockopt(SO_SPLICE); this works after unsplicing as well, up until the
socket is closed or spliced again. Userspace can also manually trigger
unsplicing by splicing to -1.
Splicing work is handled by dedicated threads, similar to KTLS. A
worker thread is assigned at splice creation time. At some point it
would be nice to have a direct dispatch mode, wherein the thread which
places data into a receive buffer is also responsible for pushing it
into the sink, but this requires tighter integration with the protocol
stack in order to avoid reentrancy problems.
Currently, sowakeup() and related functions will signal the worker
thread assigned to a spliced socket. so_splice_xfer() does the hard
work of moving data between socket buffers.
Co-authored by: gallatin
Reviewed by: brooks (interface bits)
MFC after: 3 months
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D46411
(cherry picked from commit a1da7dc1cd)
Files loaded "-t firmware" (or module_type="firmware"). They are
registered with the firmware system using the full path to the file.
There's only one firmware per file, and it is the entire file. We do an
extra firmware_get() on any firmware we find here to prevent them from
ever being unloaded (we can't handle that case sanely).
Sponsored by: Netflix
Reviewed by: tsoome, jhb
Differential Revision: https://reviews.freebsd.org/D43522
(cherry picked from commit 479905a1ed)
When we can't find a .ko module to satisfy the firmware request, try
harder by looking for a file to read in directly. We compose this file's
name by appending the imagename parameter to the firmware path
(currently hard-wired to be /boot/firmware, future plans are for a
path). Allow this file to be unloaded when firmware_put() releases the
last reference, but we don't need to do the indirection and dance we
need to do when unloading the .ko that will unregister the firmware.
Sponsored by: Netflix
Reviewed by: manu, jhb
Differential Revision: https://reviews.freebsd.org/D43555
(cherry picked from commit c7b1e980ae)
Do not clear knotes from the TTY until it gets dealloc'ed, unless the
TTY is being revoked, in that case delete the knotes when closed is
called on the TTY.
When knotes are cleared from a knlist, those knotes become detached from
the knlist. And when an event is triggered on a detached knote there
isn't an associated knlist and therefore no lock will be taken when the
event is triggered.
This becomes a problem when a detached knote is triggered on a TTY since
the mutex for a TTY is also used as the lock for its knlists. This
scenario ends up calling the TTY event handlers without the TTY lock
being held and tripping on asserts in the event handlers.
PR: 272151
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D41605
(cherry picked from commit acd5638e26)