Commit graph

18408 commits

Author SHA1 Message Date
Konstantin Belousov
802cf4ab0e namei: add NDPREINIT() macro
Its intent is to do the initialization of the future part of struct nameidata
which should be used across several namei() and VOPs.  Right now it is NOP.

Reviewed by:	mckusick
Discussed with:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30041
2021-06-23 23:46:15 +03:00
Warner Losh
ddfc9c4c59 newbus: Move from bus_child_{pnpinfo,location}_src to bus_child_{pnpinfo,location} with sbuf
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.

Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.

Document these new interfaces with man pages, and oversight from before.

Reviewed by:		jhb, bcr
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D29937
2021-06-22 20:52:06 -06:00
Edward Tomasz Napierala
06250515cf imgact_elf: compute auxv buffer size instead of using magic value
The new buffer is somewhat larger, but there should be no functional
changes.

Reviewed By:	kib, imp
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30821
2021-06-21 17:07:07 +01:00
Colin Percival
fe51b5a76d kern_tslog: Include tslog data from loader
The i386 loader (and hopefully others to come) now passes tslog data
as a "preloaded module".  Include this in the data returned by the
debug.tslog sysctl.

Reviewed by:	kevans
2021-06-20 20:09:47 -07:00
Warner Losh
0a99422970 Move mips and arm to 1000Hz by default.
armv6 and armv7 systems already were 1000Hz. The other armv5 were a
mix of 100 and 1000. This changes them to 1000. Should there be
issues, we can add options HZ=100 to the systems that have bad
performance at the drop of a hat.

mips is a lot more complicated. But most of the systems are already
1000HZ. The hardware exceptions are all fast enough to run at
1000Hz. MALTA is our primary emulator, and history has shown emulators
tend to like 100Hz better, so run those systems at 100Hz. As with arm,
any system that shows a huge performance regression can reverted to
100Hz easily.

This was going to be committed well in advance of the 13 branch, but
it was delayed and forgotten til now.

Discussed on:	#bsdmips ages ago
Sponsored by:	Netflix
2021-06-16 20:00:14 -06:00
John Baldwin
faf0224ff2 ktls: Don't mark existing received mbufs notready for TOE TLS.
The TOE driver might receive decrypted TLS records that are enqueued
to the socket buffer after ktls_try_toe() returns and before
ktls_enable_rx() locks the receive buffer to call sb_mark_notready().
In that case, sb_mark_notready() would incorrectly treat the decrypted
TLS record as an encrypted record and schedule it for decryption.
This always resulted in the connection being dropped as the data in
the control message did not look like a valid TLS header.

To fix, don't try to handle software decryption of existing buffers in
the socket buffer for TOE TLS in ktls_enable_rx().  If a TOE TLS
driver needs to decrypt existing data in the socket buffer, the driver
will need to manage that in its tod_alloc_tls_session method.

Sponsored by:	Chelsio Communications
2021-06-15 17:45:21 -07:00
Konstantin Belousov
a12e901a5a Add a knob to disable dequeueing SIGCHLD on waiting for live process
It seems that Linux does not dequeue siginfo for SIGCHLD when wait*(2)
reports status of the running process.  In particular, sigwaitinfo(2)
and other signal querying syscalls can observe the siginfo after wait.

FreeBSD dequeued siginfo from the beginning, so we cannot change the
default ABI to be more compatible.  Still, add a knob to enable to
change to the other behavior for debugging purposes.

Reported by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:00:19 +03:00
Konstantin Belousov
bc38762474 Add a knob to not drop signal with default ignored or ignored actions
Traditionally, BSD drops signals with the default action during send,
not even putting them to the destination process queue.  This semantic
is not shared with other operating systems (Linux), which do queue
such signals.  In particular, sigtimedwait(2) and related syscalls can
observe the delivery.

Add a global knob kern.sig_discard_ign which can be set to false to force
enqueuing of the signals with default action.  Also add an ABI flag to
indicate that signals should be queued.

Note that it is not practical to run with the knob turned on, because almost
all software that care about the delivery of such signals, is aware of the
difference, and misbehaves if the signals are actually queued.  The purpose
of the knob as is is to allow for easier diagnostic of the programs that
need the adjustments, to confirm the cause of problem.

Reported by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:00:19 +03:00
Konstantin Belousov
acced8b043 sigwait: add comment explaining EINTR/ERESTART details
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:00:19 +03:00
Konstantin Belousov
afb36e289c sigwait(2) and sigtimedwait(2) must not be restarted.
Reported by:	dchagin
Reviewed by:	dchagin, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30675
2021-06-16 02:00:18 +03:00
Mark Johnston
a100217489 Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros
This makes it easier to change the socket locking protocols.  No
functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:32 -04:00
Mark Johnston
f4bb1869dd Consistently use the SOLISTENING() macro
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly.  As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING().  No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:27 -04:00
Andrew Gallatin
ed5e13cfc2 ktls: Fix interaction with RATELIMIT
uipc_ktls.c was missing opt_ratelimit.h, so it was
never noticing that RATELIMIT was enabled.  Once it was
enabled, it failed to compile as  ktls_modify_txrtlmt()
had accrued a compilation error when it was not being
compiled in.

Sponsored by: Netflix
2021-06-14 10:51:16 -04:00
Dmitry Chagin
e884512ad1 Split kern_poll() on two counterparts.
The kern_poll_kfds() operates on clear kernel data, kfds points to an
array in the kernel, while kern_poll() operates on user supplied pollfd.
Move nfds check to kern_poll_maxfds().

No functional changes, it's for future use in the Linux emulation layer.

Reviewd by:		kib
Differential Revision:	https://reviews.freebsd.org/D30690
MFC after:		2 weeks
2021-06-10 15:11:25 +03:00
Dmitry Chagin
f570a6723e Fix copyright, remove "all rights reserved".
The eventfd code was written by me, rdivacky@ copyrigth applicable only
to epoll part of the Linuxulator code. Roman is ok to retire his copyright
from sys/kern/sys_eventfd.c and 'All rights reserved.' lines from
sys/compat/linux/linux_event.[c|h] and sys/kern/sys_eventfd.c files.

Reviewed by:		kib, emaste
Approved by:		rdivacky
Differential Revision:	https://reviews.freebsd.org/D30677
MFC after:		2 weeks
2021-06-08 08:18:00 +03:00
Mark Johnston
887c753c9f Fix handling of D_GIANTOK
It was meant to suppress only the printf(), not the subsequent injection
of Giant-protected thunks for various file operations.

Fixes:		fbeb4ccac9
Reported by:	pho
Tested by:	pho
MFC after:	6 days
Pointy hat:	markj
2021-06-07 16:45:50 -04:00
Mark Johnston
fbeb4ccac9 Suppress D_NEEDGIANT warnings for some drivers
During boot we warn that the kbd and openfirm drivers are Giant-locked
and may be deleted.  Generally, the warning helps signal that certain
old drivers are not being maintained and are subject to removal, but
this doesn't really apply to certain drivers which are harder to
detangle from Giant.

Add a flag, D_GIANTOK, that devices can specify to suppress the
misleading warning.  Use it in the kbd and openfirm drivers.

Reviewed by:	imp, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30649
2021-06-06 16:44:46 -04:00
Konstantin Belousov
2d423f7671 sysent: allow ABI to disable setid on exec.
Reviewed by:	dchagin
Tested by:	trasz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28154
2021-06-06 21:42:52 +03:00
Konstantin Belousov
19e6043a44 kern_exec.c: Add execve_nosetid() helper
Reviewed by:	dchagin
Tested by:	trasz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D28154
2021-06-06 21:42:41 +03:00
Jason A. Harmening
59409cb90f Add a generic mechanism for preventing forced unmount
This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount.  Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.

Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.

vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.

Adopt these new functions in both nullfs and unionfs.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30401
2021-06-05 18:20:36 -07:00
wiklam
43521b46fc Correcting comment about "sched_interact_score".
Reviewed by:	jrtc@, imp@
Pull Request:	https://github.com/freebsd/freebsd-src/pull/431

Sponsored by:		Netflix
2021-06-02 21:50:57 -06:00
Warner Losh
9f3d1a98dd regen after tweaks to getgroups and setgroups
Sponsored by:		Netflix
2021-06-02 13:24:50 -06:00
Moritz Buhl
4bc2174a1b kern: fail getgroup and setgroup with negative int
Found using
https://github.com/NetBSD/src/blob/trunk/tests/lib/libc/sys/t_getgroups.c

getgroups/setgroups want an int and therefore casting it to u_int
resulted in `getgroups(-1, ...)` not returning -1 / errno = EINVAL.

imp@ updated syscall.master and made changes markj@ suggested

PR:			189941
Tested by:		imp@
Reviewed by:		markj@
Pull Request:		https://github.com/freebsd/freebsd-src/pull/407
Differential Revision:	https://reviews.freebsd.org/D30617
2021-06-02 13:22:57 -06:00
Mateusz Guzik
c9f8dcda85 kqueue: replace kq_ncallouts loop with atomic_fetchadd 2021-06-02 15:14:58 +00:00
Rich Ercolani
a19ae1b099 vfs: fix MNT_SYNCHRONOUS check in vn_write
ca1ce50b2b ("vfs: add more safety against concurrent forced
unmount to vn_write") has a side effect of only checking MNT_SYNCHRONOUS
if O_FSYNC is set.

Reviewed By: mjg
Differential Revision: https://reviews.freebsd.org/D30610
2021-06-02 13:42:02 +00:00
Kyle Evans
2d741f33bd kern: ether_gen_addr: randomize on default hostuuid, too
Currently, this will still hash the default (all zero) hostuuid and
potentially arrive at a MAC address that has a high chance of collision
if another interface of the same name appears in the same broadcast
domain on another host without a hostuuid, e.g., some virtual machine
setups.

Instead of using the default hostuuid, just treat it as a failure and
generate a random LA unicast MAC address.

Reviewed by:	bz, gbe, imp, kbowling, kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D29788
2021-06-01 22:59:21 -05:00
Mark Johnston
283e60fb31 ktrace: Fix an inverted comparison added in commit f3851b235
Fixes:		f3851b235 ("ktrace: Fix a race with fork()")
Reported by:	dchagin, phk
2021-06-01 09:15:35 -04:00
Konstantin Belousov
d3f7975fcb thread_reap_barrier(): remove unused variable
Noted by:	alc
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
2021-05-31 23:03:42 +03:00
Konstantin Belousov
f62c7e54e9 Add thread_reap_barrier()
Reviewed by:	hselasky,markj
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30468
2021-05-31 18:09:22 +03:00
Konstantin Belousov
3a68546d23 quisce_cpus(): add special handling for PDROP
Currently passing PDROP to the quisce_cpus() function does not make sense.
Add special meaning for it, by not waiting for the idle thread to schedule.

Also avoid allocating u_int[MAXCPU] on the stack.

Reviewed by:	hselasky, markj
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30468
2021-05-31 18:09:22 +03:00
Konstantin Belousov
845d77974b kern_thread.c: wrap too long lines
Reviewed by:	hselasky, markj
Sponsored by:	Mellanox Technologies/NVidia Networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30468
2021-05-31 18:09:22 +03:00
Konstantin Belousov
e266a0f7f0 kern linker: do not allow more than one kldload and kldunload syscalls simultaneously
kld_sx is dropped e.g. for executing sysinits, which allows user
to initiate kldunload while module is not yet fully initialized.

Reviewed by:	markj
Differential revision:	https://reviews.freebsd.org/D30456
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-05-31 18:09:22 +03:00
Konstantin Belousov
27006229f7 vinvalbuf: do not panic if we were unable to flush dirty buffers
Return EBUSY instead and let caller to handle the issue.

For vgone()/vnode reclamation, caller first does vinvalbuf(V_SAVE),
which return EBUSY in case dirty buffers where not flushed. Then caller
calls vinvalbuf(0) due to non-zero return, which gets rid of all dirty
buffers without dependencies.

PR:	238565
Reviewed by:	asomers, mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30555
2021-05-31 01:20:53 +03:00
Jason A. Harmening
a4b07a2701 VFS_QUOTACTL(9): allow implementation to indicate busy state changes
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9).  The implementation may then indicate to the caller
whether it needed to unbusy the mount.

Also, add stbool.h to libprocstat modules which #define _KERNEL
before including sys/mount.h.  Otherwise they'll pull in sys/types.h
before defining _KERNEL and therefore won't have the bool definition
they need for mp_busy.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30556
2021-05-30 14:53:47 -07:00
Jason A. Harmening
271fcf1c28 Revert commits 6d3e78ad6c and 54256e7954
Parts of libprocstat like to pretend they're kernel components for the
sake of including mount.h, and including sys/types.h in the _KERNEL
case doesn't fix the build for some reason.  Revert both the
VFS_QUOTACTL() change and the follow-up "fix" for now.
2021-05-29 17:48:02 -07:00
Mateusz Guzik
3cf75ca220 vfs: retire unused vn_seqc_write_begin_unheld* 2021-05-29 22:04:09 +00:00
Mateusz Guzik
d81aefa8b7 vfs: use the sentinel trick in locked lookup path parsing 2021-05-29 22:04:09 +00:00
Mateusz Guzik
478c52f1e3 vfs: slightly rework vn_rlimit_fsize 2021-05-29 22:04:09 +00:00
Mateusz Guzik
9bfddb3ac4 fd: use PROC_WAIT_UNLOCKED when clearing p_fd/p_pd 2021-05-29 22:04:09 +00:00
Jason A. Harmening
6d3e78ad6c VFS_QUOTACTL(9): allow implementation to indicate busy state changes
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9).  The implementation may then indicate to the caller
whether it needed to unbusy the mount.

Reviewed By:	kib, markj
Differential Revision: https://reviews.freebsd.org/D30218
2021-05-29 14:05:39 -07:00
Mark Johnston
f3851b235b ktrace: Fix a race with fork()
ktrace(2) may toggle trace points in any of
1. a single process
2. all members of a process group
3. all descendents of the processes in 1 or 2

In the first two cases, we do not permit the operation if the process is
being forked or not visible. However, in case 3 we did not enforce this
restriction for descendents. As a result, the assertions about the child
in ktrprocfork() may be violated.

Move these checks into ktrops() so that they are applied consistently.

Allow KTROP_CLEAR for nascent processes. Otherwise, there is a window
where we cannot clear trace points for a nascent child if they are
inherited from the parent.

Reported by:	syzbot+d96676592978f137e05c@syzkaller.appspotmail.com
Reported by:	syzbot+7c98fcf84a4439f2817f@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30481
2021-05-27 15:52:20 -04:00
Mark Johnston
e00bae5c18 kevent: Prohibit negative change and event list lengths
Previously, a negative change list length would be treated the same as
an empty change list.  A negative event list length would result in
bogus copyouts.  Make kevent(2) return EINVAL for both cases so that
application bugs are more easily found, and to be more robust against
future changes to kevent internals.

Reviewed by:	imp, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30480
2021-05-27 15:52:20 -04:00
Mark Johnston
f885100773 ktrace: Handle negative array sizes in ktrstructarray
ktrstructarray() may be used to create copies of kevent(2) change and
event arrays.  It is called before parameter validation is done and so
should check for bogus array lengths before allocating a copy.

Reported by:	syzkaller
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30479
2021-05-27 15:52:20 -04:00
Edward Tomasz Napierala
905d192d6f Unstaticize parts of coredumping code
This makes it possible to call __elfN(size_segments) and __elfN(puthdr)
from Linux coredump code.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D30455
2021-05-26 11:51:57 +01:00
John Baldwin
6b313a3a60 Include the trailer in the original dst_iov.
This avoids creating a duplicate copy on the stack just to
append the trailer.

Reviewed by:	gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30139
2021-05-25 16:59:19 -07:00
John Baldwin
21e3c1fbe2 Assume OCF is the only KTLS software backend.
This removes support for loadable software backends.  The KTLS OCF
support is now always included in kernels with KERN_TLS and the
ktls_ocf.ko module has been removed.  The software encryption routines
now take an mbuf directly and use the TLS mbuf as the crypto buffer
when possible.

Bump __FreeBSD_version for software backends in ports.

Reviewed by:	gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30138
2021-05-25 16:59:19 -07:00
John Baldwin
883a0196b6 crypto: Add a new type of crypto buffer for a single mbuf.
This is intended for use in KTLS transmit where each TLS record is
described by a single mbuf that is itself queued in the socket buffer.
Using the existing CRYPTO_BUF_MBUF would result in
bus_dmamap_load_crp() walking additional mbufs in the socket buffer
that are not relevant, but generating a S/G list that potentially
exceeds the limit of the tag (while also wasting CPU cycles).

Reviewed by:	markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30136
2021-05-25 16:59:18 -07:00
John Baldwin
6663f8a23e sglist: Add sglist_append_single_mbuf().
This function appends the contents of a single mbuf to an sglist
rather than an entire mbuf chain.

Reviewed by:	gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30135
2021-05-25 16:59:18 -07:00
John Baldwin
aa341db39b Rename m_unmappedtouio() to m_unmapped_uiomove().
This function doesn't only copy data into a uio but instead is a
variant of uiomove() similar to uiomove_fromphys().

Reviewed by:	gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30444
2021-05-25 16:59:18 -07:00
John Baldwin
3f9dac85cc Extend m_copyback() to support unmapped mbufs.
Reviewed by:	gallatin, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D30133
2021-05-25 16:59:18 -07:00