The computation of keybarr(), the function that determines when a
search has failed at a non-leaf node, can be done in a way that
computes the 'slot' value when keybarr() fails, which is exactly when
slot() would next be invoked. Computing things this way saves space in
search loops.
This reduces the amd64 coding of the search loop in vm_radix_lookup
from 40 bytes to 28 bytes.
Reviewed by: alc
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41235
amd64 is special in that its implementation of zpcpu_offset_cpu() is not
the identity transformation, even in !SMP kernels. Because the pm_pcidp
array of amd64's struct pmap is allocated from a pcpu UMA zone, this
means that accessing pm_pcidp directly, as is done in !SMP
implementations of pmap_invalidate_*, does not work. Specifically, I
see occasional unexplicable crashes in userspace when PCIDs are enabled.
Apply a minimal patch to fix the problem. While it would also make
sense to provide separate implementations of zpcpu_* for !SMP kernels,
fixing it this way makes the SMP and !SMP implementations of
pmap_invalidate_* more similar.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D41230
The clev field in the node struct is almost always multiplied by
WIDTH; occasionally, it is incremented and then multiplied by
WIDTH. Instructions can be saved by storing it always multiplied by
WIDTH.
For the computation of slot(), this just eliminates a
multiplication. For trimkey(), where the caller always adds one to
clev before passing it as an argument, this change has the caller, not
the caller, do that. Trimkey() handles it not by adding WIDTH to the
input parameter, but by shifting COUNT, and not 1. That produces the
same result, and it relieves keybarr of the need to test to avoid
shifting by more than 63 bits, since level is always <= 63.
This takes 3 instrutions and 14 bytes out of the basic lookup loop on
amd64.
Reviewed by: kib
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41226
To determine the size in bytes needed to hold a control message
and its contents of length len, CMSG_SPACE should be used.
Reviewed by:
Differential Revision: https://reviews.freebsd.org/D41224
MFC after: 1 week
Also disable IPV6 checksum offload.
Spell hw->mac.type < e1000_82543 as e1000_82542. Confusingly, chips
like 82540 and 82541 come later and do not have these issues. There
is no functional change here, as the enum was defined in such a way
it worked correctly. But this reads literally.
MFC after: 1 week
Since a30ecd42b8 we've logged almost all unexpected errors from
commands. However, some passthru commands were not logged via devctl. To
fix this, pass all requests through passerror (which calls
cam_periph_error), but flag those requests that didn't want error
recovery as SF_NO_RECOVERY, like we do for device probing. By doing this
we get identical behavior to the current code, but log these errors.
We have had hangs on drives that seems to show no error. Vendor analysis
of the drive found an illegal command that happen to hang the drive. In
verifying their analysis, we discovered that the pass through commands
from things like smartctl that encountered errors or timeouts weren't
logged.
Sponsored by: Netflix
Reviewed by: ken, mav
Differential Revision: https://reviews.freebsd.org/D41167
NULL (non-leaf) pointers with NULL leaves, there is a NULL test
removed from every iteration of an index-based search loop.
This speeds up radix trie searches by few percent. If there are any
radix tries that are not initialized with the init() function, but
instead depend on zeroing everything being proper initialization, this
will break those tries.
Reviewed by: alc, kib
Tested by: pho (as part of a larger change)
Differential Revision: https://reviews.freebsd.org/D41171
WFI and WFIT trap to EL2 when executed in a vmm guest. (Currently
WFE/WFET are not configured to trap.) We only handle WFI at the moment,
so these constants are useful when handling the exception.
Reviewed by: andrew
MFC after: 1 week
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D41199
No fields have been defined, but it has been documented in the
Architecture Reference Manual.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40897
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40896
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40895
No fields have been defined, but it has been documented in the
Architecture Reference Manual.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40894
It breaks a future macro that creates the alternative register name
for old compilers.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40892
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40890
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40889
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40888
While here move to decimal for the _op and _CR definitions to be used
by a future macro to define the register when the assembler doesn't
know about it.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40887
Avoid locking issues when if_allmulti() calls the driver's if_ioctl,
because that may acquire sleepable locks (while we hold a non-sleepable
rwlock).
Fortunately there's no pressing need to hold the mroute lock while we
do this, so we can postpone the call slightly, until after we've
released the lock.
This avoids the following WITNESS warning (with iflib drivers):
lock order reversal: (sleepable after non-sleepable)
1st 0xffffffff82f64960 IPv4 multicast forwarding (IPv4 multicast forwarding, rw) @ /usr/src/sys/netinet/ip_mroute.c:1050
2nd 0xfffff8000480f180 iflib ctx lock (iflib ctx lock, sx) @ /usr/src/sys/net/iflib.c:4525
lock order IPv4 multicast forwarding -> iflib ctx lock attempted at:
#0 0xffffffff80bbd6ce at witness_checkorder+0xbbe
#1 0xffffffff80b56d10 at _sx_xlock+0x60
#2 0xffffffff80c9ce5c at iflib_if_ioctl+0x2dc
#3 0xffffffff80c7c395 at if_setflag+0xe5
#4 0xffffffff82f60a0e at del_vif_locked+0x9e
#5 0xffffffff82f5f0d5 at X_ip_mrouter_set+0x265
#6 0xffffffff80bfd402 at sosetopt+0xc2
#7 0xffffffff80c02105 at kern_setsockopt+0xa5
#8 0xffffffff80c02054 at sys_setsockopt+0x24
#9 0xffffffff81046be8 at amd64_syscall+0x138
#10 0xffffffff8101930b at fast_syscall_common+0xf8
See also: https://redmine.pfsense.org/issues/12079
Reviewed by: mjg
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D41209
On x86 Linux via AT_HWCAP2 the user controlled (by tunables) processor
capabilities are exposed.
Reviewed by:
Differential Revision: https://reviews.freebsd.org/D41165
MFC after: 2 weeks
This asc/ascq code 2/0 ("No seek complete") is a fatal error on modern
drives indicating a sensor failure. One of our vendors noticed we
retried 2/0 so many times in their failure analysis and asked why (no
other OS else does). They've indicated that this failures means the
track couldn't be located (something that's not going to change, except
if the environment changes significantly, which won't happen on a
timescale useful to retries).
Sponsored by: Netflix
Currently for the MFS, firmware and VDSO template assembly files we pass
the path to include with .incbin unquoted and use __XSTRING within the
assembly file to stringify it. However, __XSTRING doesn't just perform a
single level of expansion, it performs the normal full expansion of the
macro, and so if the path itself happens to tokenise to something that
includes a defined macro in it that will itself be substituted. For
example, with #define MACRO 1, a path like /path/containing/MACRO/in/it
will expand to /path/containing/1/in/it and then, when stringified, end
up as "/path/containing/1/in/it", not the intended string. Normally,
macros have names that start or end witih underscores and are unlikely
to appear in a tokenised path (even if technically they could), but now
that we've switched to GNU C as of commit ec41a96daa ("sys: Switch the
kernel's C standard from C99 to GNU99.") there are a few new macros
defined which don't start or end with underscores: unix, which is always
defined to 1, and i386, which is defined to 1 on i386. The former
probably doesn't appear in user paths in practice, but the latter has
been seen to and is likely quite common in the wild.
Fix this by defining the macro pre-quoted instead of using __XSTRING.
Note that technically we don't need to do this for vdso_wrap.S today as
all the paths passed to it are safe file names with no user-controlled
prefix but we should do it anyway for consistency and robustness against
future changes.
This allows make tinderbox to pass when built with source and object
directories inside ~/path-with-unix, which would otherwise expand to
~/path-with-1 and break.
PR: 272744
Fixes: ec41a96daa ("sys: Switch the kernel's C standard from C99 to GNU99.")
Explicitly set ipcss/ipcse/ipcso for IPv6 per intel SDM as indicated in
inline comments.
Fix and consolidate 82543/82547 hwcsum exemption.
While here rearrange and expand some commentary.
During a driver reload stress test, after 50-300 reloads a panic occurs.
After adding sleeps in between loading and unloading the driver, the
issue does not occur. It's possible that loading/unloading too fast may
cause the gt_taskqueue pointer to be freed earlier than expected;
checking for a null pointer first fixes it.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: erj@
Tested by: jeffrey.e.pieper@intel.com
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D39457
As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html
move to the modern uintXX_t. While here also migrate u_char to uint8_t.
Where other kernel interfaces allow, migrate u_long to uint64_t.
No functional changes intended.
MFC-after: 1 week
Sponsored-by: The FreeBSD Foundation
Upon discovering a violation kmsan_check_arg() passes a pointer to
function parameter shadow state to kmsan_report_hook().
kmsan_report_hook() uses that address to find the origin cells, assuming
that the passed address belongs to the kernel map. This has two
problems:
1) Function parameter origin state is also located in TLS, not in the
origin map, but kmsan_report_hook() doesn't know this.
2) KMSAN TLS for thread0 is statically allocated and thus isn't shadowed
(because the kernel itself is not shadowed).
These bugs could result in inaccuracies in KMSAN reports, or a page
fault when trying to report a KMSAN violation (which by default panics
the kernel anyway).
Fix the problem by making callers of kmsan_report_hook() provide a
pointer to origin cells.
Sponsored by: The FreeBSD Foundation
Copy operands to an aligned buffer before performing operations which
require alignment. Otherwise it's possible for this code to trigger an
alignment fault on armv7.
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D41211
If the unwinder somehow ends up with a stack pointer that lies outside
the stack, then an attempt to dereference can lead to a fault, which
causes the kernel to panic again and unwind the stack, which leads to a
fault...
Add kstack_contains() checks at points where we dereference the stack
pointer. This avoids the aforementioned infinite loop in one case I hit
where some OpenSSL assembly code apparently confuses the unwinder.
Reviewed by: jhb
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D41210
Ringing the doorbell before making the BPF call can result in the
mbuf being freed before the BPF call.
Reviewed-by: markj
MFC-after: 3 days
Differential Revision: https://reviews.freebsd.org/D41189
Since f71cb9f748 socket stays connnected with inpcb through latter's
lifetime and there is no reason to complicate things and copy these
flags.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D41198
This deals with TCP endpoints in the SYN-RCVD state coming from the
SYN-SENT state.
Reviewed by: rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D41203
Patch base stack to correctly handle the RST bit independently
of other header flags per TCP RFC.
MFC after: 1 week
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D40982
When we are sending terminating signal to the group, killpg() needs
to guarantee that all group members are to be terminated (it does not
need to ensure that they are terminated on return from killpg()). The
pg_killsx change eliminates the largest window there, but still, if
a multithreaded process is signalled, the following could happen:
- thread 1 is selected for the signal delivery and gets descheduled
- thread 2 waits for pg_killsx lock, obtains it and forks
- thread 1 continue executing and terminates the process
This scenario allows the child to escape still.
Fix it by single-threading forking parent if a conflict with pg_killsx
is noted. We try to lock pg_killsx without sleeping, and failure to
acquire it means that a parallel killpg(2) is executed. Then, stop
other threads for running and in particular, receive signals, to avoid
the situation explained above.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41128
This should improve signal delivery latency and better expose the
process state to the executing threads.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41128
This reverts commits 81a37995c7 and
565a343ae3.
There is still a leakage of the p_killpg_cnt, some but not all sources
of which were identified.
Second, and more important, is that there is a fundamental issue with
blocked signals having KSI_KILLPG flag set. Queueing of such signal
increments p_killpg_cnt, but it cannot be decremented until the signal
is delivered. If, for instance, a single-threaded process with blocked
signal receives killpg-kill and executes fork(2), the fork enter check
returns with ERESTART. And since signal is blocked, the condition
cannot be cleared.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D41128
The removal of the sparc64 support in February 2020 obsoleted the
VTOC8 partitioning scheme as no other FreeBSD platform makes use
of it. Moreover, the code is bitrotting as nothing defines e. g.
LOADER_VTOC8_SUPPORT any more and, thus, should go now, too. With
this change, the following commits are reverted as far as VTOC8
is concerned and parts haven't already previously been deleted
along with prior sparc64 removals:
094fcb157da7d366e958ba8d50d08b
The alignment example d9711c28ef
added to the VTOC8 section of gpart.8 is folded into the MBR one.
This should finally conclude the deorbit of sparc64-specific bits.
We had joy, we had fun
we ran Unix on a Sun.
But that source and the song
of FreeBSD have all gone.
Credits to Michael Bueker for the original "Unix on a Sun" and Rod
McKuen for the "Seasons in the Sun" lyrics.
The ofw_reg_to_paddr() in this file is the powerpc OF_decode_addr()
formerly added in 812403402e. However,
the latter function in turn was based on the sparc64 counterpart I
previously added in 2b2250b149.