Commit graph

148113 commits

Author SHA1 Message Date
Doug Moore
ac0572e660 radix_tree: compute slot from keybarr
The computation of keybarr(), the function that determines when a
search has failed at a non-leaf node, can be done in a way that
computes the 'slot' value when keybarr() fails, which is exactly when
slot() would next be invoked. Computing things this way saves space in
search loops.

This reduces the amd64 coding of the search loop in vm_radix_lookup
from 40 bytes to 28 bytes.

Reviewed by:	alc
Tested by:	pho (as part of a larger change)
Differential Revision:	https://reviews.freebsd.org/D41235
2023-07-30 15:12:06 -05:00
Mark Johnston
5ad29bc8d4 amd64: Fix TLB invalidation routines in !SMP kernels
amd64 is special in that its implementation of zpcpu_offset_cpu() is not
the identity transformation, even in !SMP kernels.  Because the pm_pcidp
array of amd64's struct pmap is allocated from a pcpu UMA zone, this
means that accessing pm_pcidp directly, as is done in !SMP
implementations of pmap_invalidate_*, does not work.  Specifically, I
see occasional unexplicable crashes in userspace when PCIDs are enabled.

Apply a minimal patch to fix the problem.  While it would also make
sense to provide separate implementations of zpcpu_* for !SMP kernels,
fixing it this way makes the SMP and !SMP implementations of
pmap_invalidate_* more similar.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D41230
2023-07-30 11:12:35 -04:00
Doug Moore
38f5cb1bfb radix_tree: redefine the clev field
The clev field in the node struct is almost always multiplied by
WIDTH; occasionally, it is incremented and then multiplied by
WIDTH. Instructions can be saved by storing it always multiplied by
WIDTH.

For the computation of slot(), this just eliminates a
multiplication. For trimkey(), where the caller always adds one to
clev before passing it as an argument, this change has the caller, not
the caller, do that. Trimkey() handles it not by adding WIDTH to the
input parameter, but by shifting COUNT, and not 1. That produces the
same result, and it relieves keybarr of the need to test to avoid
shifting by more than 63 bits, since level is always <= 63.

This takes 3 instrutions and 14 bytes out of the basic lookup loop on
amd64.

Reviewed by:	kib
Tested by:	pho (as part of a larger change)
Differential Revision:	https://reviews.freebsd.org/D41226
2023-07-30 01:20:07 -05:00
Dmitry Chagin
dbac8474fe vfs: Deleting a doubled inclusion of sys/capsicum.h
Reviewed by:
Differential Revision:	https://reviews.freebsd.org/D41223
MFC after:		1 week
2023-07-29 11:21:58 +03:00
Dmitry Chagin
67116c6905 linux(4): Fix control message size calculation
To determine the size in bytes needed to hold a control message
and its contents of length len, CMSG_SPACE should be used.

Reviewed by:
Differential Revision:	https://reviews.freebsd.org/D41224
MFC after:		1 week
2023-07-29 11:21:35 +03:00
Kevin Bowling
38588749af e1000: HWCSUM excemption fixes
Also disable IPV6 checksum offload.

Spell hw->mac.type < e1000_82543 as e1000_82542.  Confusingly, chips
like 82540 and 82541 come later and do not have these issues.  There
is no functional change here, as the enum was defined in such a way
it worked correctly.  But this reads literally.

MFC after:	1 week
2023-07-28 18:17:35 -07:00
Michael Tuexen
c620788150 sctp: keep sb_acc and sb_ccc in sync
PR:		260116
MFC after:	1 week
2023-07-28 15:16:23 +02:00
Michael Tuexen
b279e84a47 sctp: improve consistency
This is simplifying a patch to address PR 260116.

PR:		260116
MFC after:	1 week
2023-07-28 14:36:11 +02:00
Alan Cox
3d7c37425e amd64 pmap: Catch up with pctrie changes
Recent changes to the pctrie code make it necessary to initialize the
kernel pmap's rangeset for PKU.
2023-07-28 15:13:13 -05:00
Warner Losh
e474a8e243 cam: Log errors from passthru commands
Since a30ecd42b8 we've logged almost all unexpected errors from
commands. However, some passthru commands were not logged via devctl. To
fix this, pass all requests through passerror (which calls
cam_periph_error), but flag those requests that didn't want error
recovery as SF_NO_RECOVERY, like we do for device probing. By doing this
we get identical behavior to the current code, but log these errors.

We have had hangs on drives that seems to show no error. Vendor analysis
of the drive found an illegal command that happen to hang the drive. In
verifying their analysis, we discovered that the pass through commands
from things like smartctl that encountered errors or timeouts weren't
logged.

Sponsored by:		Netflix
Reviewed by:		ken, mav
Differential Revision:	https://reviews.freebsd.org/D41167
2023-07-28 12:11:21 -06:00
Doug Moore
2d2bcba7ba Every path in a radix trie ends with a leaf or a NULL. By replacing
NULL (non-leaf) pointers with NULL leaves, there is a NULL test
removed from every iteration of an index-based search loop.

This speeds up radix trie searches by few percent. If there are any
radix tries that are not initialized with the init() function, but
instead depend on zeroing everything being proper initialization, this
will break those tries.

Reviewed by:	alc, kib
Tested by:	pho (as part of a larger change)
Differential Revision:	https://reviews.freebsd.org/D41171
2023-07-28 11:39:52 -05:00
Mark Johnston
dd24d475d5 arm64: Add constants for decoding ISS fields for WF* exceptions
WFI and WFIT trap to EL2 when executed in a vmm guest.  (Currently
WFE/WFET are not configured to trap.)  We only handle WFI at the moment,
so these constants are useful when handling the exception.

Reviewed by:	andrew
MFC after:	1 week
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D41199
2023-07-28 09:34:38 -04:00
Andrew Turner
53e1af5a10 arm64: Decode the ID_AA64PFR2_EL1 register
No fields have been defined, but it has been documented in the
Architecture Reference Manual.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40897
2023-07-28 12:53:02 +01:00
Andrew Turner
8c111e5b37 arm64: Update the ID_AA64PFR1_EL1 fields
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40896
2023-07-28 12:53:02 +01:00
Andrew Turner
0766dde9b5 arm64: Update the ID_AA64PFR0_EL1 fields
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40895
2023-07-28 12:53:02 +01:00
Andrew Turner
22235b631b arm64: Decode the ID_AA64MMFR4_EL1 register
No fields have been defined, but it has been documented in the
Architecture Reference Manual.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40894
2023-07-28 12:53:02 +01:00
Andrew Turner
c65679143f arm64: Decode the ID_AA64MMFR3_EL1 register
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40893
2023-07-28 12:53:02 +01:00
Andrew Turner
2134cfe793 arm64: Don't use hex for ID_AA64MMFR2_EL1_op/CR*
It breaks a future macro that creates the alternative register name
for old compilers.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40892
2023-07-28 12:53:02 +01:00
Andrew Turner
284f91de8b arm64: Update the ID_AA64MMFR1_EL1 fields
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40891
2023-07-28 12:53:01 +01:00
Andrew Turner
b21402d058 arm64: Update the ID_AA64MMFR0_EL1 fields
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40890
2023-07-28 12:53:01 +01:00
Andrew Turner
de01309926 arm64: Update the ID_AA64ISAR1_EL1 fields
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40889
2023-07-28 12:53:01 +01:00
Andrew Turner
4182f58172 arm64: Update the ID_AA64ISAR0_EL1 fields
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40888
2023-07-28 12:53:01 +01:00
Andrew Turner
6fd44e5f53 arm64: Update the ID_AA64DFR0_EL1 fields
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D40887
2023-07-28 12:53:01 +01:00
Kristof Provost
680ad06f90 mroute: avoid calling if_allmulti with the lock held
Avoid locking issues when if_allmulti() calls the driver's if_ioctl,
because that may acquire sleepable locks (while we hold a non-sleepable
rwlock).

Fortunately there's no pressing need to hold the mroute lock while we
do this, so we can postpone the call slightly, until after we've
released the lock.

This avoids the following WITNESS warning (with iflib drivers):

	lock order reversal: (sleepable after non-sleepable)
	 1st 0xffffffff82f64960 IPv4 multicast forwarding (IPv4 multicast forwarding, rw) @ /usr/src/sys/netinet/ip_mroute.c:1050
	 2nd 0xfffff8000480f180 iflib ctx lock (iflib ctx lock, sx) @ /usr/src/sys/net/iflib.c:4525
	lock order IPv4 multicast forwarding -> iflib ctx lock attempted at:
	#0 0xffffffff80bbd6ce at witness_checkorder+0xbbe
	#1 0xffffffff80b56d10 at _sx_xlock+0x60
	#2 0xffffffff80c9ce5c at iflib_if_ioctl+0x2dc
	#3 0xffffffff80c7c395 at if_setflag+0xe5
	#4 0xffffffff82f60a0e at del_vif_locked+0x9e
	#5 0xffffffff82f5f0d5 at X_ip_mrouter_set+0x265
	#6 0xffffffff80bfd402 at sosetopt+0xc2
	#7 0xffffffff80c02105 at kern_setsockopt+0xa5
	#8 0xffffffff80c02054 at sys_setsockopt+0x24
	#9 0xffffffff81046be8 at amd64_syscall+0x138
	#10 0xffffffff8101930b at fast_syscall_common+0xf8

See also:	https://redmine.pfsense.org/issues/12079
Reviewed by:	mjg
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D41209
2023-07-28 11:32:39 +02:00
Dmitry Chagin
4281dab8bc linux(4): Add elf_hwcap2 to x86
On x86 Linux via AT_HWCAP2 the user controlled (by tunables) processor
capabilities are exposed.

Reviewed by:
Differential Revision:	https://reviews.freebsd.org/D41165
MFC after:		2 weeks
2023-07-28 11:56:59 +03:00
Dmitry Chagin
5440e7017a i386: Don't use static DPCPU and VNET defines in i386 modules
As of c84617e8 a similar to 4802a2cb and b6ea4c5a fix should be
applied to i386 too.

Reviewed by:
Differential Revision:	https://reviews.freebsd.org/D41195
2023-07-28 11:55:31 +03:00
Warner Losh
7872131605 cam: Fail 2/0 asc/ascq return code
This asc/ascq code 2/0 ("No seek complete") is a fatal error on modern
drives indicating a sensor failure. One of our vendors noticed we
retried 2/0 so many times in their failure analysis and asked why (no
other OS else does). They've indicated that this failures means the
track couldn't be located (something that's not going to change, except
if the environment changes significantly, which won't happen on a
timescale useful to retries).

Sponsored by:		Netflix
2023-07-27 22:28:01 -06:00
Jessica Clarke
8a6ab0f71f Pre-quote macros passed to .incbin to avoid unwanted substitution
Currently for the MFS, firmware and VDSO template assembly files we pass
the path to include with .incbin unquoted and use __XSTRING within the
assembly file to stringify it. However, __XSTRING doesn't just perform a
single level of expansion, it performs the normal full expansion of the
macro, and so if the path itself happens to tokenise to something that
includes a defined macro in it that will itself be substituted. For
example, with #define MACRO 1, a path like /path/containing/MACRO/in/it
will expand to /path/containing/1/in/it and then, when stringified, end
up as "/path/containing/1/in/it", not the intended string. Normally,
macros have names that start or end witih underscores and are unlikely
to appear in a tokenised path (even if technically they could), but now
that we've switched to GNU C as of commit ec41a96daa ("sys: Switch the
kernel's C standard from C99 to GNU99.") there are a few new macros
defined which don't start or end with underscores: unix, which is always
defined to 1, and i386, which is defined to 1 on i386. The former
probably doesn't appear in user paths in practice, but the latter has
been seen to and is likely quite common in the wild.

Fix this by defining the macro pre-quoted instead of using __XSTRING.
Note that technically we don't need to do this for vdso_wrap.S today as
all the paths passed to it are safe file names with no user-controlled
prefix but we should do it anyway for consistency and robustness against
future changes.

This allows make tinderbox to pass when built with source and object
directories inside ~/path-with-unix, which would otherwise expand to
~/path-with-1 and break.

PR:	272744
Fixes:	ec41a96daa ("sys: Switch the kernel's C standard from C99 to GNU99.")
2023-07-28 05:08:43 +01:00
Kevin Bowling
cbcab907f8 e1000: Corrections for lem(4)/em(4) txcsum offload
Explicitly set ipcss/ipcse/ipcso for IPv6 per intel SDM as indicated in
inline comments.

Fix and consolidate 82543/82547 hwcsum exemption.

While here rearrange and expand some commentary.
2023-07-27 15:58:05 -07:00
Przemyslaw Lewandowski
04d4e34538
iflib: Fix panic during driver reload stress test
During a driver reload stress test, after 50-300 reloads a panic occurs.
After adding sleeps in between loading and unloading the driver, the
issue does not occur.  It's possible that loading/unloading too fast may
cause the gt_taskqueue pointer to be freed earlier than expected;
checking for a null pointer first fixes it.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	erj@
Tested by:	jeffrey.e.pieper@intel.com
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D39457
2023-07-27 15:47:12 -07:00
Kirk McKusick
831b1ff791 UFS/FFS: Migrate to modern uintXX_t from u_intXX_t.
As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html
move to the modern uintXX_t. While here also migrate u_char to uint8_t.
Where other kernel interfaces allow, migrate u_long to uint64_t.

No functional changes intended.

MFC-after:    1 week
Sponsored-by: The FreeBSD Foundation
2023-07-27 15:27:36 -07:00
Mark Johnston
1083a8cd85 pcpu: Remove unused definitions of ALT_STACK_SIZE
This was added originally for the sparc64 port and apparently copied to
other platforms.  No functional change intended.

MFC after:	1 week
2023-07-27 16:02:03 -04:00
Mark Johnston
ca6cd604c8 kmsan: Use the correct origin bytes in kmsan_check_arg()
Upon discovering a violation kmsan_check_arg() passes a pointer to
function parameter shadow state to kmsan_report_hook().
kmsan_report_hook() uses that address to find the origin cells, assuming
that the passed address belongs to the kernel map.  This has two
problems:
1) Function parameter origin state is also located in TLS, not in the
   origin map, but kmsan_report_hook() doesn't know this.
2) KMSAN TLS for thread0 is statically allocated and thus isn't shadowed
   (because the kernel itself is not shadowed).

These bugs could result in inaccuracies in KMSAN reports, or a page
fault when trying to report a KMSAN violation (which by default panics
the kernel anyway).

Fix the problem by making callers of kmsan_report_hook() provide a
pointer to origin cells.

Sponsored by:	The FreeBSD Foundation
2023-07-27 16:02:03 -04:00
Mark Johnston
640e5cb304 kmsan: Add a comment explaining why KMSAN doesn't shadow above KERNBASE
Sponsored by:	The FreeBSD Foundation
2023-07-27 16:01:58 -04:00
Mark Johnston
96c2538121 opencrypto: Respect alignment constraints in xor_and_encrypt()
Copy operands to an aligned buffer before performing operations which
require alignment.  Otherwise it's possible for this code to trigger an
alignment fault on armv7.

Reviewed by:	jhb
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
Differential Revision:	https://reviews.freebsd.org/D41211
2023-07-27 15:44:52 -04:00
Mark Johnston
1be56e0bb1 arm/unwind: Check stack pointer boundaries before dereferencing
If the unwinder somehow ends up with a stack pointer that lies outside
the stack, then an attempt to dereference can lead to a fault, which
causes the kernel to panic again and unwind the stack, which leads to a
fault...

Add kstack_contains() checks at points where we dereference the stack
pointer.  This avoids the aforementioned infinite loop in one case I hit
where some OpenSSL assembly code apparently confuses the unwinder.

Reviewed by:	jhb
MFC after:	2 weeks
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
Differential Revision:	https://reviews.freebsd.org/D41210
2023-07-27 15:44:00 -04:00
Michael Tuexen
cf32543fa4 tcp: document that conditional fields in tcpcb should be at the end
Reviewed by: 	rscheff, Peter Lei
Sponsored by:	Netflix, Inc.
2023-07-27 09:02:19 +02:00
Shailend Chand
74861578d9 gve: Fix Tx tcpdump panic
Ringing the doorbell before making the BPF call can result in the
mbuf being freed before the BPF call.

Reviewed-by:		markj
MFC-after:		3 days
Differential Revision: https://reviews.freebsd.org/D41189
2023-07-26 22:36:42 -07:00
Alan Cox
5ec2d94ade vm_mmap_object: Update the spelling of true/false
Since fitit is already a bool, use true/false instead of TRUE/FALSE.

MFC after:	2 weeks
2023-07-27 00:25:53 -05:00
Gleb Smirnoff
e3ba0d6add inpcb: do not copy so_options into inp_flags2
Since f71cb9f748 socket stays connnected with inpcb through latter's
lifetime and there is no reason to complicate things and copy these
flags.

Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D41198
2023-07-26 20:35:42 -07:00
Gleb Smirnoff
a43e7a96b6 inpcb: use internal flag to mark pcbs that are inserted into lbgroup
Using INP_REUSEPORT_LB is unsafe, as it is basically a copy of socket's
SO_REUSEPORT_LB flag, which can be cleared by userland after bind().

Reviewed by:		markj
Reported by:		syzbot+e7d2e451f89fb444319b@syzkaller.appspotmail.com
Differential Revision:	https://reviews.freebsd.org/D41197
2023-07-26 20:35:30 -07:00
Konstantin Belousov
474708c334 fork1(): properly track the state of the pg_killsx lock
Reported by:	dchagin
Fixes:	232b922cb3
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-07-27 02:33:58 +03:00
Michael Tuexen
ab65c64bc4 tcp: fix handling of <RST,ACK> segments in SYN-RCVD for RACK and BBR
This deals with TCP endpoints in the SYN-RCVD state coming from the
SYN-SENT state.

Reviewed by:		rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D41203
2023-07-26 16:22:13 +02:00
Richard Scheffenegger
b352ef58c2 tcp: Handle <RST,ACK> in SYN-RCVD
Patch base stack to correctly handle the RST bit independently
of other header flags per TCP RFC.

MFC after: 1 week
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D40982
2023-07-27 00:42:26 +02:00
Kirk McKusick
6f0ca273a3 Add diagnostics to fsck_ffs(8) for journaled soft-updates debugging.
MFC-after:    1 week
Sponsored-by: The FreeBSD Foundation
2023-07-26 14:50:02 -07:00
Konstantin Belousov
232b922cb3 killpg(): close a race with fork(), part 2
When we are sending terminating signal to the group, killpg() needs
to guarantee that all group members are to be terminated (it does not
need to ensure that they are terminated on return from killpg()).  The
pg_killsx change eliminates the largest window there, but still, if
a multithreaded process is signalled, the following could happen:
- thread 1 is selected for the signal delivery and gets descheduled
- thread 2 waits for pg_killsx lock, obtains it and forks
- thread 1 continue executing and terminates the process
This scenario allows the child to escape still.

Fix it by single-threading forking parent if a conflict with pg_killsx
is noted.  We try to lock pg_killsx without sleeping, and failure to
acquire it means that a parallel killpg(2) is executed.  Then, stop
other threads for running and in particular, receive signals, to avoid
the situation explained above.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41128
2023-07-26 18:13:02 +03:00
Konstantin Belousov
dfe172484d sigtd(): prefer non-stopped thread as a target for signal queue
This should improve signal delivery latency and better expose the
process state to the executing threads.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41128
2023-07-26 18:12:55 +03:00
Konstantin Belousov
aaa924138a Revert "killpg(): close a race with fork(), part 2"
This reverts commits 81a37995c7 and
565a343ae3.

There is still a leakage of the p_killpg_cnt, some but not all sources
of which were identified.

Second, and more important, is that there is a fundamental issue with
blocked signals having KSI_KILLPG flag set.  Queueing of such signal
increments p_killpg_cnt, but it cannot be decremented until the signal
is delivered.  If, for instance, a single-threaded process with blocked
signal receives killpg-kill and executes fork(2), the fork enter check
returns with ERESTART.  And since signal is blocked, the condition
cannot be cleared.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D41128
2023-07-26 18:12:55 +03:00
Marius Strobl
4ef1c6f75d base: Remove support for the VTOC8 partitioning scheme
The removal of the sparc64 support in February 2020 obsoleted the
VTOC8 partitioning scheme as no other FreeBSD platform makes use
of it. Moreover, the code is bitrotting as nothing defines e. g.
LOADER_VTOC8_SUPPORT any more and, thus, should go now, too. With
this change, the following commits are reverted as far as VTOC8
is concerned and parts haven't already previously been deleted
along with prior sparc64 removals:
094fcb157d
a7d366e958
ba8d50d08b

The alignment example d9711c28ef
added to the VTOC8 section of gpart.8 is folded into the MBR one.

This should finally conclude the deorbit of sparc64-specific bits.

        We had joy, we had fun
        we ran Unix on a Sun.
        But that source and the song
        of FreeBSD have all gone.

Credits to Michael Bueker for the original "Unix on a Sun" and Rod
McKuen for the "Seasons in the Sun" lyrics.
2023-07-26 13:16:12 +02:00
Marius Strobl
29fe5efc8a ofw(4): Add my copyright and additional history for ofw_reg_to_paddr()
The ofw_reg_to_paddr() in this file is the powerpc OF_decode_addr()
formerly added in 812403402e. However,
the latter function in turn was based on the sparc64 counterpart I
previously added in 2b2250b149.
2023-07-26 13:14:22 +02:00