Commit graph

20067 commits

Author SHA1 Message Date
Mark Johnston
26312c233e audit: Fix short-circuiting in syscallenter()
syscallenter() has a slow path to handle syscall auditing and dtrace
syscall tracing.  It uses AUDIT_SYSCALL_ENTER() to check whether to take
the slow path, but this macro also has side effects: it writes the audit
log entry.  When systrace (dtrace syscall tracing) is enabled, this
would get short-circuited, and we end up not writing audit log entries.

Introduce a pure macro to check whether auditing is enabled, use it in
syscallenter() instead of AUDIT_SYSCALL_ENTER().

Approved by:	so
Security:	FreeBSD-EN-25:02.audit
Reviewed by:	kib
Reported by:	Joe Duin <jd@firexfly.com>
Fixes:		2f7292437d ("Merge audit and systrace checks")
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D48448

(cherry picked from commit f78fe93085)
(cherry picked from commit 4b9ba274d7)
2025-01-30 07:25:19 +01:00
Mark Johnston
3717a36932 ktrace: Fix uninitialized memory disclosure
The sockaddr passed to ktrcapfail() may be smaller than
sizeof(struct sockaddr), and the trailing bytes in the sockaddr
structure will be uninitialized, whereupon they get copied out to
userspace.

Approved by:	so
Security:	FreeBSD-SA-25:04.ktrace
PR:		283673
Reviewed by:	jfree, emaste
Reported by:	Yichen Chai <yichen.chai@gmail.com>
Reported by:	Zhuo Ying Jiang Li <zyj20@cl.cam.ac.uk>
Fixes:		9bec841312 ("ktrace: Record detailed ECAPMODE violations")
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D48499

(cherry picked from commit 5b86888bae)
(cherry picked from commit 99d5ee8738)
2025-01-30 07:25:17 +01:00
Doug Rabson
4aa850d6cb Add an implementation of the 9P filesystem
This is derived from swills@ fork of the Juniper virtfs with many
changes by me including bug fixes, style improvements, clearer layering
and more consistent logging. The filesystem is renamed to p9fs to better
reflect its function and to prevent possible future confusion with
virtio-fs.

Several updates and fixes from Juniper have been integrated into this
version by Val Packett and these contributions along with the original
Juniper authors are credited below.

To use this with bhyve, add 'virtio_p9fs_load=YES' to loader.conf. The
bhyve virtio-9p device allows access from the guest to files on the host
by mapping a 'sharename' to a host path. It is possible to use p9fs as a
root filesystem by adding this to /boot/loader.conf:

	vfs.root.mountfrom="p9fs:sharename"

for non-root filesystems add something like this to /etc/fstab:

	sharename /mnt p9fs rw 0 0

In both examples, substitute the share name used on the bhyve command
line.

The 9P filesystem protocol relies on stateful file opens which map
protocol-level FIDs to host file descriptors. The FreeBSD vnode
interface doesn't really support this and we use heuristics to guess the
right FID to use for file operations.  This can be confused by privilege
lowering and does not guarantee that the FID created for a given file
open is always used for file operations, even if the calling process is
using the file descriptor from the original open call. Improving this
would involve changes to the vnode interface which is out-of-scope for
this import.

Differential Revision: https://reviews.freebsd.org/D41844
Reviewed by: kib, emaste, dch
MFC after: 3 months
Co-authored-by: Val Packett <val@packett.cool>
Co-authored-by: Ka Ho Ng <kahon@juniper.net>
Co-authored-by: joyu <joyul@juniper.net>
Co-authored-by: Kumara Babu Narayanaswamy <bkumara@juniper.net>
2025-01-10 10:30:32 +01:00
Mark Johnston
de1b92e2b4 kern: Make fileops and filterops tables const where possible
No functional change intended.

MFC after:	1 week

(cherry picked from commit ef9ffb8594)
2024-12-16 16:15:42 +01:00
Stephan de Wit
5e72057985 rss: add sysctl enable toggle
This commit also includes the original refactoring changes

This change allows the kernel to operate with the default netisr cpu-affinity settings while having RSS compiled in. Normally, RSS changes quite a bit of the behaviour of the kernel dispatch service - this change allows for reducing impact on incompatible hardware while preserving the option to boost throughput speeds based on packet flow CPU affinity.

Make sure to compile the following options in the kernel:

    options  RSS

As well as setting the following sysctls:

    net.inet.rss.enabled: 1
    net.isr.bindthreads: 1
    net.isr.maxthreads: -1 (automatically sets it to the number of CPUs)

And optionally (to force a 1:1 mapping between CPUs and buckets):

    net.inet.rss.bits: 3 (for 8 CPUs)
    net.inet.rss.bits: 2 (for 4 CPUs)

etc.

Set pin_default_swi to 0 by default in the RSS case.
2024-12-11 11:10:51 +01:00
Kirk McKusick
c16d0924db Revert commit 8733bc277a
Author: Mateusz Guzik <mjg@FreeBSD.org>
Date:   Thu Sep 14 16:13:01 2023 +0000

    vfs: don't provoke recycling non-free vnodes without a good reason

    If the total number of free vnodes is at or above target, there is no
    point creating more of them.

This commit was done as a performance optimization but ends up
causing slowdowns when doing operations on many files.

Approved by:	re (cperciva)
Requested by:   re (cperciva)

(cherry picked from commit ab05a1cf32)
(cherry picked from commit 2ca9c96dc0)
2024-11-15 15:23:13 -08:00
Konstantin Belousov
19d23cb8ac vm_object: do not assume that un_pager.devp.dev is cdev
PR:	282533
Approved by:	re (cperciva)

(cherry picked from commit 580340dbda)
(cherry picked from commit 92a9501b6b)
2024-11-13 20:06:36 +02:00
Konstantin Belousov
300d034b3c device_pager: rename the un_pager.devp.dev field to handle
(cherry picked from commit f0c07fe3d0)
(cherry picked from commit c57dc755fa)

Approved by:	re (cperciva)
2024-11-13 20:06:28 +02:00
Andrew Turner
4b4c7b1924 bus: Activate INTRNG interrupts in common code
[MFC note: This is not a direct cherry-pick due to API changes in HEAD
which are not present in stable/14.]

We need to call into INTRNG to activate all interrupts on platforms that
use it.  Currently, interrupts are only activated in the nexus drivers for
INTRNG platforms, but this does not handle other bus devices such as
gpiobus that manage their own IRQ space.

Reported by:	cperciva
Reviewed by:	cperciva, jhb
Approved by:	re (kib)
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D47282

(cherry picked from commit c85855a72d)
(cherry picked from commit c54bdf84d5)
2024-11-03 08:15:34 -08:00
Michael Tuexen
1e980fdf7a getsockopt: improve locking for SOL_SOCKET level socket options
Ensure SOLISTENING() is done inside SOCK_LOCK()/SOCK_UNLOCK()
for getsockopt() handling of SOL_SOCKET-level socket options.

Reviewed by:		markj, rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46881

(cherry picked from commit 3326ab87cc)
2024-10-31 12:37:55 +01:00
Michael Tuexen
66c7d5365a MAC: improve handling of listening sockets
so_peerlabel can only be used when the socket is not listening.

Reviewed by:		markj
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D46755

(cherry picked from commit 2fb778fab8)
2024-10-31 12:32:36 +01:00
Mark Johnston
9e3e111d74 linker: Handle a truncated hints file properly
If vattr.va_size is 0, we will end up accessing invalid memory.  This is
mostly harmless (because malloc(0) still allocates some memory), but it
triggers a KASAN report.

PR:		282268
Reviewed by:	christos, imp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D47240

(cherry picked from commit b5149b2653)
2024-10-30 13:21:06 +00:00
Mark Johnston
2e80ea70b9 bus: Set the current VNET in device_attach()
Some drivers, in particular anything which creates an ifnet during
attach, need to have the current VNET set, as if_attach_internal() and
its callees access VNET-global variables.

device_probe_and_attach() handles this, but this is not the only way to
arrive in DEVICE_ATTACH.  In particular, bus drivers may invoke
device_attach() directly, as does devctl2's DEV_ENABLE ioctl handler.
So, set the current VNET in device_attach() instead.

I believe it is always safe to use vnet0, as devctl2 ioctls are not
permitted within a jail.

PR:		282168
Reviewed by:	zlei, kevans, bz, imp, glebius
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D47174

(cherry picked from commit f4e35c044c)
2024-10-26 12:58:50 +00:00
Konstantin Belousov
ec8d60f0d9 devices: report iommu data for the device in the dev. sysctl tree
(cherry picked from commit b08d332da0)
2024-10-24 05:44:40 +03:00
Konstantin Belousov
2cda503184 device: add generic named per-device property
(cherry picked from commit cb83af64f1)
2024-10-24 05:44:40 +03:00
Rick Macklem
f6e1add0c9 nfsd: Fix handling of credentials with cr_ngroups == 0
There has been a documented case in the exports(5) man
page forever, which specifies that the -maproot or -mapall
may have a single user entry, followed by a ':'.
This case is defined as specifying no groups (aka cr_ngroups == 0).

This patch fixes the NFS server so that it handles this case correctly.

After MFC'ng this patch to stable/13 and stable/14, I propose that
this unusual case be deprecated and no longer allowed in FreeBSD15.
At that point, this patch can be reverted.

(cherry picked from commit caa309c881)
2024-10-23 18:06:26 -07:00
Mark Johnston
4601b3e04d socket: Only log splice structs to ktrace if KTR_STRUCT is configured
Fixes:	a1da7dc1cd ("socket: Implement SO_SPLICE")
(cherry picked from commit 283bf3b4b1)
2024-10-17 15:49:12 +00:00
Siva Mahadevan
032014aaae socket: wrap ktrsplice call with KTRACE ifdef
This fixes a build error when the kernel is built without KTRACE
support.

Reviewed by:	emaste, markj
Fixes:		a1da7dc1cd ("socket: Implement SO_SPLICE")
Pull Request:	https://github.com/freebsd/freebsd-src/pull/1426

(cherry picked from commit 75cd1e534c)
2024-10-17 15:49:11 +00:00
Mark Johnston
93ff7dbaea socket: Implement SO_SPLICE
This is a feature which allows one to splice two TCP sockets together
such that data which arrives on one socket is automatically pushed into
the send buffer of the spliced socket.  This can be used to make TCP
proxying more efficient as it eliminates the need to copy data into and
out of userspace.

The interface is copied from OpenBSD, and this implementation aims to be
compatible.  Splicing is enabled by setting the SO_SPLICE socket option.
When spliced, data that arrives on the receive buffer is automatically
forwarded to the other socket.  In particular, splicing is a
unidirectional operation; to splice a socket pair in both directions,
SO_SPLICE needs to be applied to both sockets.  More concretely, when
setting the option one passes the following struct:

    struct splice {
	    int fd;
	    off_t max;
	    struct timveval idle;
    };

where "fd" refers to the socket to which the first socket is to be
spliced, and two setsockopt(SO_SPLICE) calls are required to set up a
bi-directional splice.

select(), poll() and kevent() do not return when data arrives in the
receive buffer of a spliced socket, as such data is expected to be
removed automatically once space is available in the corresponding send
buffer.  Userspace can perform I/O on spliced sockets, but it will be
unpredictably interleaved with splice I/O.

A splice can be configured to unsplice once a certain number of bytes
have been transmitted, or after a given time period.  Once unspliced,
the socket behaves normally from userspace's perspective.  The number of
bytes transmitted via the splice can be retrieved using
getsockopt(SO_SPLICE); this works after unsplicing as well, up until the
socket is closed or spliced again.  Userspace can also manually trigger
unsplicing by splicing to -1.

Splicing work is handled by dedicated threads, similar to KTLS.  A
worker thread is assigned at splice creation time.  At some point it
would be nice to have a direct dispatch mode, wherein the thread which
places data into a receive buffer is also responsible for pushing it
into the sink, but this requires tighter integration with the protocol
stack in order to avoid reentrancy problems.

Currently, sowakeup() and related functions will signal the worker
thread assigned to a spliced socket.  so_splice_xfer() does the hard
work of moving data between socket buffers.

Co-authored by:	gallatin
Reviewed by:	brooks (interface bits)
MFC after:	3 months
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D46411

(cherry picked from commit a1da7dc1cd)
2024-10-17 15:48:36 +00:00
Warner Losh
d36ba3989c firmware: unbreak armv7
Use proper format specifiers (with casts) and don't redefine flags.

Fixes:			c7b1e980ae
Sponsored by:		Netflix

(cherry picked from commit 3a3afbec38)
2024-10-16 08:19:21 -06:00
Warner Losh
903ddc7f14 firmware: Allow binary files to be loaded by /boot/loader
Files loaded "-t firmware" (or module_type="firmware").  They are
registered with the firmware system using the full path to the file.
There's only one firmware per file, and it is the entire file. We do an
extra firmware_get() on any firmware we find here to prevent them from
ever being unloaded (we can't handle that case sanely).

Sponsored by:		Netflix
Reviewed by:		tsoome, jhb
Differential Revision:	https://reviews.freebsd.org/D43522

(cherry picked from commit 479905a1ed)
2024-10-16 08:19:21 -06:00
Warner Losh
2ca7b03d62 firmware: load binary firmware files
When we can't find a .ko module to satisfy the firmware request, try
harder by looking for a file to read in directly. We compose this file's
name by appending the imagename parameter to the firmware path
(currently hard-wired to be /boot/firmware, future plans are for a
path). Allow this file to be unloaded when firmware_put() releases the
last reference, but we don't need to do the indirection and dance we
need to do when unloading the .ko that will unregister the firmware.

Sponsored by:		Netflix
Reviewed by:		manu, jhb
Differential Revision:	https://reviews.freebsd.org/D43555

(cherry picked from commit c7b1e980ae)
2024-10-16 08:19:21 -06:00
Warner Losh
dfd6b1ee5b subr_firmware: Sort includes
Sponsored by:		Netflix

(cherry picked from commit 4b62b42a8d)
2024-10-16 08:19:21 -06:00
Konstantin Belousov
cdc72c4960 kinfo_vmentry: report cdev name for device mappings
(cherry picked from commit ac9b565b1a)
2024-10-15 18:03:59 +03:00
Konstantin Belousov
1e2317b3bf shm_alloc(): cleanup
(cherry picked from commit e578fd853a)
2024-10-15 17:50:17 +03:00
Konstantin Belousov
5dc4245028 sys/user.h: report posix shm mappings
(cherry picked from commit a8c641bbcb)
2024-10-15 17:50:17 +03:00
Konstantin Belousov
64de7b789b posix shm: add shm_get_path(9)
(cherry picked from commit bda73e441f)
2024-10-15 17:50:17 +03:00
Konstantin Belousov
ec35a9c65d posix shm: mark backing objects with SHM_POSIXSHM flag
(cherry picked from commit a10870ecea)
2024-10-15 17:50:17 +03:00
Konstantin Belousov
987c8e9afa kinfo_{vmobject,vmentry}: move copy of pathes into the vnode handling scope
(cherry picked from commit 71a66883b5)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
32b3b01083 ptrace(PT_VM_ENTRY): report max protection
(cherry picked from commit e90b2b7d6c)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
a1ae801880 kinfo_vmentry: report max protection
(cherry picked from commit 409c2fa385)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
ba8062dc22 kinfo_vmentry: report mappings of the SysV shm segments
(cherry picked from commit d3dd6bd403)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
30a5ba9ded sysvshm: add shmobjinfo() function to find key/seq of the segment backed by obj
(cherry picked from commit b72029589e)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
1ef669ae41 vm_object: add OBJ_SYSVSHM flag to indicate SysV shm backing object
(cherry picked from commit f186252e0d)
2024-10-15 17:50:16 +03:00
Konstantin Belousov
2bb9d4f758 sysv_ipc: remove sys/cdefs.h include
(cherry picked from commit 8771dc950a)
2024-10-15 17:50:15 +03:00
Jamie Gritton
16e1424d24 jail: expose children.max and children.cur via sysctl
Submitted by:   Igor Ostapenko <igor.ostapenko_pm.me>
Differential Revision:  <https://reviews.freebsd.org/D43565>

(cherry picked from commit ab0841bdbe)
2024-10-13 16:45:58 -07:00
Robert Wing
012c194f39 tty: delete knotes when TTY is revoked
Do not clear knotes from the TTY until it gets dealloc'ed, unless the
TTY is being revoked, in that case delete the knotes when closed is
called on the TTY.

When knotes are cleared from a knlist, those knotes become detached from
the knlist. And when an event is triggered on a detached knote there
isn't an associated knlist and therefore no lock will be taken when the
event is triggered.

This becomes a problem when a detached knote is triggered on a TTY since
the mutex for a TTY is also used as the lock for its knlists. This
scenario ends up calling the TTY event handlers without the TTY lock
being held and tripping on asserts in the event handlers.

PR:             272151
Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D41605

(cherry picked from commit acd5638e26)
2024-10-10 20:28:20 +00:00
Zhenlei Huang
54c79d3ae4 khelp: Sprinkle const qualifiers where appropriate
No functional change intended.

MFC after:	1 week

(cherry picked from commit 89937323bd)
2024-10-08 12:44:54 +08:00
Zhenlei Huang
18aa2a81c4 hhook: Sprinkle const qualifiers where appropriate
No functional change intended.

MFC after:	1 week

(cherry picked from commit 941f8aceac)
2024-10-08 12:44:54 +08:00
Konstantin Belousov
d9aa256201 sysctl: add KERN_PROC_RLIMIT_USAGE
(cherry picked from commit c85d3064c4)
2024-10-05 10:08:56 +03:00
Konstantin Belousov
a23e9b154d Regen 2024-10-05 10:08:56 +03:00
Konstantin Belousov
7c41d08320 Add getrlimitusage(2)
(cherry picked from commit f028f44ef3)
2024-10-05 10:08:55 +03:00
Konstantin Belousov
e05087ee1c Add proc_nfiles(9)
(cherry picked from commit 9c3e516ad0)
2024-10-05 10:08:55 +03:00
Konstantin Belousov
4a337ee7ef uifree(9): report non-zero values for all shared resources
(cherry picked from commit af96ccc6a5)
2024-10-05 10:08:55 +03:00
Konstantin Belousov
c15b2e046e sys_pipe: consistently use cr_ruidinfo for accounting of pipebuf
(cherry picked from commit a52b30ff98)
2024-10-05 10:08:55 +03:00
Konstantin Belousov
a8c663bb42 pipespace_new(): decrease uidinfo pipebuf usage if reservation check failed
(cherry picked from commit 40769168a5)
2024-10-05 10:08:55 +03:00
Konstantin Belousov
6536b979b8 pipe: use pipe subsystem KVA counter instead of pipe_map size
(cherry picked from commit d6074f73af)
2024-10-05 10:08:55 +03:00
Konstantin Belousov
d532d9926e pipes: reserve configured percentage of buffers zone to superuser
(cherry picked from commit 7672cbef2c)
2024-10-05 10:08:55 +03:00
Konstantin Belousov
b7eecc86c3 kernel: add RLIMIT_PIPEBUF
(cherry picked from commit 3458bbd397)
2024-10-05 10:08:54 +03:00
Zhenlei Huang
44a6f9c9a0 subr_bus: Stop checking for failures from malloc(M_WAITOK)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45852

(cherry picked from commit 99e3bb555c)
2024-09-30 12:44:14 +08:00