OpenZFS release 2.2.7
Notable upstream pull request merges:
#15457022bf8637 Increase L2ARC write rate and headroom
#155271ba5b2ef8 freebsd: remove __FBSDID macro use
#157929e7af55e8 Add custom debug printing for your asserts
#15793a1ea40af8 libzfs: use zfs_strerror() in place of strerror()
#16181 -multiple zdb/ztest: improve and harmonise crash output
#16208e5e4957a5 Allow block cloning to be interrupted by a signal
#16210ba4e582a6 FreeBSD: Add const qualifier to members of struct
opensolaris_utsname
#16225a6198f34b Simplify issig()
#1633525ec9a903 zdb: fix BRT dump
#16364cf80a803d zvol: ensure device minors are properly cleaned up
#16404384b53be8 FreeBSD: Add missing memory reclamation accounting
#16492 -multiple Split "shared" userspace & kernel code into separate files
#16496f1694496a zfs_file: rename zfs_file_fallocate to zfs_file_deallocate
#16511bc0d89bfc Fix an uninitialized data access
#165292dc8529d9 Fix handling of DNS names with '-' in them for sharenfs
#1653930ea44296 zfs_log: add flex array fields to log record structs
#16546098789216 Evicting too many bytes from MFU metadata
#1655154278533a Reduce and handle EAGAIN errors on AIO label reads
#1655484ad1d536 FreeBSD: restore zfs_znode_update_vfs()
#1656521c40e6d9 FreeBSD: Sync taskq_cancel_id() returns with Linux
#1656748482bb2f Properly release key in spa_keystore_dsl_key_hold_dd()
#16584e8f4592a1 Avoid computing strlen() inside loops
#16605acc8a3186 ARC: Cache arc_c value during arc_evict()
#16650fc60e0c6e freebsd: Use compiler.h from FreeBSD's base's linuxkpi
#16667b32b35cea zdb: show bp in uberblock dump
#166841f5e1b919 Pack dmu_buf_impl_t by 16 bytes
#1668873b3e8ace Fix gcc uninitialized warning in FreeBSD zio_crypt.c
#16690727506c94 On the first vdev open ignore impossible ashift hints
#16692d83cd5307 zdb: add extra -T flag to show histograms of BRT refcounts
#1669382ab837a0 Fix gcc unused value warning in FreeBSD simd.h
#167402bba6e3c5 BRT: Don't call brt_pending_remove() on holes/embedded
#16801299da6ace Fix race in libzfs_run_process_impl
Obtained from: OpenZFS
OpenZFS commit: e269af1b3c
OpenZFS tag: zfs-2.2.7
systrace.c fails to build if we're using a common compiler.h for both
openzfs and linuxkpi. The issue is easy enough to fix: don't redefined
lower_32_bits if it's already defined in linux.h, since it's the least
'standardized'. This will allow systrace.c to build using an equivalent
macro.
MFC After: 3 days
Sponsored by: Netflix
(cherry picked from commit 481d5a4891)
Previous versions of pw(8) wouldn't chmod the home directory if it
already existed prior to user creation, rendering adduser(8) -M
ineffective in some cases. Add a test to cover that situation.
PR: 280099
Reviewed by: kevans
(cherry picked from commit f7cf62cf72)
The adduser(8) prompt allows one to set the mode of a new home
directory, but pw(8) doesn't honor the -M mode if the home directory
already exists at creation time. It doesn't seem to make sense to
ignore the mode (which may lead to a security issue on the system being
configured) when we'll happily chown an existing directory, so fix the
inconsistency.
PR: 280099
Reviewed by: des, jlduran (previous version)
(cherry picked from commit 6a7238fd7c)
We populate the kqueue with all of four kevents: three signal handlers and
one for read of the child pipe. Every time we start the child, we rebuild
this kqueue from scratch for the child and tear it down before we exit and
check if we need to restart the child. As a consequence, we effectively
drop any of the signals we're interested in between restarts.
Push the kqueue out into the daemon state to avoid losing any signal events
in the process, and reimplement the restart timer in terms of kqueue timers.
The pipe read event will be automatically deleted upon last close, which
leaves us with only the signal events that really get retained between
restarts of the child.
PR: 277959
Reviewed by: des, markj
(cherry picked from commit bc1dfc316a)
This is somewhaht hard to test reliably, but we'll give it a shot. Startup
a sleep(1) daemon with a hefty restart delay. In refactoring of daemon(8),
we inadvertently started dropping SIGTERMs that came in while we were
waiting to restart the child, so we employ the strategy:
- Pop the child sleep(1) first
- Wait for sleep(1) to exit (pid file truncated)
- Pop the daemon(8) with a SIGTERM
- Wait for daemon(8) to exit
The pidfile is specifically truncated outside of the event loop so that we
don't have a kqueue to catch it in the current model.
PR: 277959
Reviewed by: des, markj
(cherry picked from commit 9ab59e900c)
We need to be able to test some more restart behavior that depends on
knowing specifically where we're at (inside the event loop or outside of
the event loop). Truncate the pidfile until the process is restarted to
give the test a clean marker rather than having to add arbitrary delays
and hoping for the best.
Reviewed by: des, markj
(cherry picked from commit aa8722cc18)
It's possible to take a signal after pselect/ppoll have set their return
value, but before we actually return to userland. This results in
taking a signal without reflecting it in the return value, which weakens
the guarantees provided by these functions.
Switch both to restore the signal mask before we would deliver signals
on return to userland. If a signal was received after the wait was
over, then we'll just have the signal queued up for the next time it
comes unblocked. The modified signal mask is retained if we were
interrupted so that ast() actually handles the signal, at which point
the signal mask is restored.
des@ has a test case demonstrating the issue in D47738 which will
follow.
Reported by: des
Reviewed by: des (earlier version), kib
Sponsored by: Klara, Inc.
Sponsored by: NetApp, Inc.
(cherry picked from commit ccb973da1f)
grep -q -v means "are there any lines that don't match", not "are there
no lines that match", and since the file has lines other than ones with
nvme_util.o when up-to-date this triggers on every build.
Fixes: 26a09db3ad ("Fix incremental build with WITH_NVME newly enabled")
MFC after: 1 week
(cherry picked from commit e546c3950a)
If the test runner is under heavy load, the command we are testing may
succeed in printing to stdout before the dummy receiver has terminated.
Add a short delay to reduce the likelihood of this happening.
MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D47572
(cherry picked from commit b697835ce6)
D37419 corrupts VFP context store on signal delivery and D38696 corrupts PCB
because it performs a binary copy between structures with different layouts.
Revert the problematic parts of these commits to have signals delivery
working. Unfortunately, there are more problems with these revisions and
more fixes need to be developed.
Fixes: 6926e2699a
Fixes: 4d2427f2c4
MFC after: 4 weeks
(cherry picked from commit 3abef90c32)
Insert a direct assignment to the location counter to ensure that orphaned
sections cannot be emitted between the _exidx_start symbol and the .ARM.exidx
section.
Discussed with: jrtc27
MFC after: 1 week
(cherry picked from commit 1701dfae1b)
The BUSDMA buffer is treated as normal memory during compilation and compiler
is free to inline/optimize basic functions (i.e. memset, memcpy) accessing
buffers, including when an instruction is generated that performs a word
access to unaligned data. We support this, but only if the buffer in question
is mapped as normal memory (cached or not), but not to memory mapped as
strongly ordered or device type.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D47485
(cherry picked from commit 248109448f)
Without this patch, an all upper case user domain name
(as specified by nfsuserd(8)) would not work.
I believe this was done so that Kerberos realms were
not confused with user domains.
Now, RFC8881 specifies that the user domain name is a
DNS name. As such, all upper case names should work.
This patch fixes this case so that it works. The custom
comparison function is no longer needed.
PR: 282620
(cherry picked from commit 0347ddf41f)
RFC8275 defines a new attribute as an extension to NFSv4.2
called MODE_UMASK. This patch adds support for this attribute
to the NFSv4.2 client and server.
Since FreeBSD applies the umask above the VFS/VOP layer,
this attribute does not actually have any effect on the
handling of ACL inheritance, which is what it is designed for.
However, future changes to NFSv4.2 require support of it,
so this patch does that, resulting in behaviour identcal to
the mode attribute already supported.
(cherry picked from commit 2477e88b8d)
Some Fujitsu Lifebooks return an invalid _BIX object. The first element
of _BIX is a revision number, which indicates what elements will follow:
* ACPI 4.0 defined _BIX revision 0 with 20 elements.
* ACPI 6.0 introduced _BIX revision 1 with 21 elements.
The problem is that the offending Lifebooks have the a non-zero _BIX
revision, but provide 20 fields only.
The ACPICA parser chokes on this [1], but that seems to be
inconsequential. More importantly, our own battery info handling code
also verifies that for revision > 0, there are at least 21 fields - and
refuses to process the invalid _BIX. One workaround would be to
introduce special case / quirk handling for Fujitsu Lifebooks. A better
one is to relax the requirements check: If there are only 20 elements,
treat the _BIX as revision 0, no matter what revision number was
provided by the device.
Linux doesn't run into this problem by the way because it only supports
the 20 fields defined in the ACPI 4.0 spec [3]. It never looks at the
revision number or the 21st field added in ACPI 6.0.
[1] https://cgit.freebsd.org/src/tree/sys/contrib/dev/acpica/components/namespace/nsprepkg.c#n815
[2] https://cgit.freebsd.org/src/tree/sys/dev/acpica/acpi_cmbat.c#n371
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/battery.c#n418
PR: 252030
Reviewed by: imp
MFC After: 2 weeks
(cherry picked from commit cd8c3af747)
A hardware IPv6 server needs 2 consecutive stids (server tids) starting
from a 2-aligned stid whereas an IPv4 server needs only 1 stid without
any constraint. The allocator used to grab the first free stid(s) for
both but this can fragment the stid space leaving nothing suitable for
IPv6 even when lots of stids are available. Change the allocator to
prefer stids for IPv4 from the ones that cannot be used for IPv6.
Reviewed by: jhb
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D47042
(cherry picked from commit f48fb131c2)
Reported by: Sony Arpita Das @ Chelsio
Fixes: 5c15094916 cxgbe(4): Update the board names of the T6 OCP cards.
Sponsored by: Chelsio Communications
(cherry picked from commit 878413d559)
We were previously allocating MAXCPU structures for several purposes,
but this is generally unnecessary and is quite excessive, especially
after MAXCPU was bumped to 1024 on amd64 and arm64. We already are
careful to allocate only as many per-CPU tracing buffers as are needed;
extend this to other allocations.
For example, in a 2-vCPU VM, the size of a consumer state structure
drops from 64KB to 128B. The size of the per-consumer `dts_buffer` and
`dts_aggbuffer` arrays shrink similarly. Ditto for pre-allocations of
local and global D variable storage space.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47667
(cherry picked from commit 5d12db2daf)
- Use a fresh context when entering dtrace_invop() via a breakpoint
exception.
- Mark the #BP trapframe as initialized.
MFC after: 2 weeks
(cherry picked from commit cc3da1955c)
The loop doesn't check for overflow of the event buffer, which can
easily happen if other tests are running in parallel (the bectl tests in
particular trigger devd events).
When that overflow occurs, a funny thing can happen: the loop ends up
trying to read 0 bytes from the socket, succeeds, and then prints its
buffer to stdout. It does this as fast as possible, eventually timing
out. Then, because kyua wants to log the test's output, it slurps the
output file into memory so that it can insert it into the test db. This
output file is quite large, usually around 8GB when I see it happen, and
is large enough to trigger an OOM kill in my test suite runner VM.
Fix the test: use a larger buffer and fail the test if we fill it before
both events are observed. Also don't print the output buffer on every
loop iteration, since unlike the seqpacket test that will just print the
same output over and over.
Reviewed by: imp, asomers
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D47625
(cherry picked from commit 30cafaa961)
The variable b[] is on the stack, thus cannot overlap with ipov, which
points to the heap area, so prefer memcpy() over memmove(), aka bcopy().
No functional change intended.
Reviewed by: cc, rrs, cy, #transport, #network
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47713
(cherry picked from commit 949190c5af)
This was lost during the initial introduction of the pvscsi driver [1].
Later the driver was enabled on arm64 [2], so also install the man page
on arm64.
1. 052e12a508 Add the pvscsi driver to the tree
2. 375d797b81 Enable pvscsi and vmx in arm64 GENERIC
Reviewed by: emaste, Alexander Ziaee <concussious.bugzilla_runbox.com> (manpages)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47712
(cherry picked from commit 1811fa623d)
This fixes a serious performance problem with NFS handling of large
directories, as the new get_name code is much more efficient than the
default zfs_readdir. This is actually part of
20232ecfaa in 2.3. But I've taken only
the minimum code to implement get_name, and not the rest of the long
name changes.
Signed-off-by: Charles Hedrick <hedrick@rutgers.edu>
Co-authored-by: Charles L. Hedrick <hedrick@ncommunis.cs.rutgers.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
When replacing a disk, a child process is forked to run a script called
zfs_prepare_disk (which can be useful for disk firmware update or health
check). The parent than calls waitpid and checks the child error/status
code.
However, the _reap_children thread (created from zed_exec_process to
manage zedlets) also waits for all children with the same PGID and can
stole the signal, causing the replace operation to be aborted.
As waitpid returns -1, the parent incorrectly assume that the child
process had an error or was killed. This, in turn, leaves the newly
added disk in REMOVED or UNAVAIL status rather than completing the
replace process.
This patch changes the PGID of the child process execuing the
prepare script, shielding it from the _reap_children thread.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes#16801
Adjust the m4 function to mimic sentinel we use in spl-proc.c
This fixes the detection on kernels compiled with CONFIG_RANDSTRUCT=y
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ivan Volosyuk <Ivan.Volosyuk@gmail.com>
Closes: #16620Closes: #16805
It's good to reduce privilege as early as possible.
Suggested by: jlduran
Reviewed by: jlduran
Obtained from: NetBSD
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47869
(cherry picked from commit 91629228e3)
Moved from libsys to libc for stable/14.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47556
(cherry picked from commit 36887e0494)
Remove uses of cv_waiters in PCM_RELEASE and CHN_BROADCAST, and also use
a counter around cv_timedwait_sig() in chn_sleep(), which is checked in
pcm_killchans(), as opposed to reading cv_waiters directly, which is a
layering violation.
While here, move CHN_BROADCAST below the channel lock operations.
Reported by: avg, jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch, avg
Differential Revision: https://reviews.freebsd.org/D47780
(cherry picked from commit 46a97b9cd6)
No functional change intended.
Sponsored by: The FreeBSD Foundation
MFC after: 2 days
Reviewed by: dev_submerge.ch, markj
Differential Revision: https://reviews.freebsd.org/D47737
(cherry picked from commit 5a217a8d7d)