Commit graph

301136 commits

Author SHA1 Message Date
Jessica Clarke
2c444fdb0c libc,libthr: Remove __pthread_distribute_static_tls
This private API is no longer used by rtld-elf so can be removed.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D50921
2025-07-10 20:00:28 +01:00
Jessica Clarke
a1d971ad3f rtld-elf: Track allocated TCBs internally and use for distribute_static_tls
Currently rtld delegates to libc or libthr to initialise the TCBs for
all existing threads when dlopen is called for a library that is using
static TLS. This creates an odd split where rtld manages all of TLS for
dynamically-linked executables except for this specific case, and is
unnecessarily complex, including having to reason about the locking due
to dropping the bind lock so libthr can take the thread list lock
without deadlocking if any of the code run whilst that lock is held ends
up calling back into rtld (such as for lazy PLT resolution).

The only real reason we call out into libc / libthr is that we don't
have a list of threads in rtld and that's how we find the currently used
TCBs to initialise (and at the same time do the copy in the callee
rather than adding overhead with some kind of callback that provides the
TCB to rtld. If we instead keep a list of allocated TCBs in rtld itself
then we no longer need to do this, and can just copy the data in rtld.
How these TCBs are mapped to threads is irrelevant, rtld can just treat
all TCBs equally and ensure that each TCB's static TLS data block
remains in sync with the current set of loaded modules, just as how
_rtld_allocate_tls creates a fresh TCB and associated data without any
embedded threading model assumptions.

As an implementation detail, to avoid a separate allocation for the list
entry and having to find that allocation from the TCB to remove and free
it on deallocation, we allocate a fake TLS offset for it and embed the
list entry there in each TLS block.

This will also make it easier to add a new TLS ABI downstream in
CheriBSD, especially in the presence of library compartmentalisation.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D50920
2025-07-10 20:00:28 +01:00
Jessica Clarke
4d2752925a rtld-elf: Extract part of allocate_tls_offset into allocate_tls_offset_common
This will be used to allocate additional space for a TAILQ_ENTRY by rtld
at a known offset from the TCB, as if it were TLS data.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D51068
2025-07-10 20:00:28 +01:00
Simon J. Gerraty
d1f0ee548c Allow net_cleanup for loader.efi
While netbooting with loader.efi on at least one arm64 platform
which uses u-boot emulating UEFI, the kernel gets corrupted, we
suspected the u-boot ethernet driver was still running.

Use netdev.dv_cleanup for efinet_dev to address this.

This in turn requires calling dev_cleanup() before bi_load() to avoid
a loader crash since bi_load() calls ExitBootServices.

Reviewed by:	imp
Sponsored by:   Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D51186
2025-07-10 11:14:38 -07:00
Kyle Evans
3fde39073c shar: remove from the tree well in advance of the 15.0 release
We have had deprecation notice in the manpage for nearly six months, and
it is also present both in 13.5 and 14.3.  tar(1) can supply this
functionality for those that truly need it, and cy@ has also created a
sysutils/freebsd-shar port for this version of a frontend to live on in
ports -- this port has been available since December 18, 2024.

Reviewed by:	allanjude, cy, emaste, jrm
Differential Revision:	https://reviews.freebsd.org/D50925
2025-07-10 12:57:03 -05:00
Kyle Evans
4c8b54f765 lockf: minor cosmetic cleanups, no functional change
Switch various flags from int -> bool.

kill(getpid(), foo) and raise(foo) are equivalent in this context, so
switch to the latter as a minor preference for readability.

Use proper signal fences instead of volatile for our SIGALRM handler.

Reviewed by:	kib (earlier version), des
Differential Revision:	https://reviews.freebsd.org/D51027
2025-07-10 12:54:20 -05:00
Kyle Evans
c82203e65d lockf: tests: add tests for the -p and -T features
Reviewed by:	des
Differential Revision:	https://reviews.freebsd.org/D51026
2025-07-10 12:54:20 -05:00
Kyle Evans
679f619495 lockf: add a -T option to terminate the child upon early abort
This is useful to avoid having the command running twice in the face of
the admin terminating the process.  Notably, if the -p option is not in
use (or can't be used, e.g., because we can't open the file for writing)
then this provides a nice alternative where one simply needs to send a
SIGTERM to the lockf(1) process associated with the lock file to clean
it all up.

Reviewed by:	des
Differential Revision:	https://reviews.freebsd.org/D51025
2025-07-10 12:54:20 -05:00
Kyle Evans
7e8afac0cb lockf: switch to a SIGCHLD model for reaping child
A future change will add a -T flag to forward SIGTERM along to the
child before we cleanup and terminate ourselves.  Using a SIGCHLD
handler to do that with SIGTERM blocked only while the child is actively
being collected will enable us to safely do so without having to worry
that our pid is potentially invalid.

Add a test that concisely checks that the child's error is properly
bubbled up to the caller.

Reviewed by:	des, kib
Differential Revision:	https://reviews.freebsd.org/D51024
2025-07-10 12:54:19 -05:00
Kyle Evans
df268d4b03 lockf: add a -p mode to write the child's pid
If we're going to hold the lock, it can be useful to scribble down the
pid that we spawned off to quickly associate the lock back to the
process that's keeping it open.

Reviewed by:	allanjude (previous version), des
Differential Revision:	https://reviews.freebsd.org/D51014
2025-07-10 12:54:19 -05:00
Warner Losh
d78d04b17c cam: Fail the disk if READ CAPACITY returns 4/2 asc/ascq
HGST disks that are sick are returning 44/0 for START UNIT (which we
ignore) and then 4/2 on READ CAPACITY. START UNIT should be enough for
READ CAPACITY to succeed or UNIT ATTENTION. However, we get NOT_READ +
4/2 back. I've seen this on several models of HGST drives. Invalidate
the peripheral when we detect this condition. This is likely the least
bad thing we can do: It removes access to daX, but leaves passY so logs
may be extracted (if awkwardly). Removing daX access removes the disk
device that causes problems to geom outlined below.

Although the timeout is 5s for READ_CAPACITY, we wait the full 30s for
READ_CAPACITY_16. This causes us to stall booting as we start to taste
as soon as we release the final hold... but the tasting means
g_wait_idle() takes now takes over 5 minutes to clear since we do this
for all the opens. Even using a timeout of 3s instead of 30s leads to
boot times of almost 5 minutes in these cases, so there are other,
downstream operations that are taking a while, so it's not just a matter
of adjusting the timeout. Failing the periph early solves the bulk of
this problem (the tasting related delays). What the HBA does is HBA
specific and some have firmwares that are also confused by this when
they enumerate or discover the drive, leading to long (but still shorter
than 5 minute) delays. This patch won't solve that aspect of startup
delays with sick disks.

Perhaps we should fail the periph when START UNIT fails with the same
codes we check in the read capacity path. I'm reluctant to do such a
global change since it's in cam_periph, and there seems no good way to
flag that we want this behavior. It's also a bit magical when it runs
(some drive report 44/0 always, and some just report it on START UNIT,
and these HGST drive fall into the latter category).

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D51218
2025-07-10 10:17:01 -06:00
Warner Losh
5a656ef632 cam: Add xpt_gdev_type() and use it instead of many copies of same
Add a convenience wrapper to XPT_GDEV_TYPE in the same way we have one
for XPT_PATH_INQ. The changes from PRIORITY_NORMAL to PRIORITY_NONE are
intentional because this isn't a queued CCB. Please note: we have
several places still that construct a XPT_GDEV_TYPE message by
overwriting a CCB that happens to be laying around. I've not used this
method, by and large, in those places since I didn't want to risk
upsetting allocation flags that might be present (since we use a specail
allocator for some CCB types in *_da.c).

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D51216
2025-07-10 10:17:00 -06:00
Warner Losh
cad5cfe750 cam: Use less stack space
Use less stack space by using the specific type of ccb to do the
callback.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D51215
2025-07-10 10:17:00 -06:00
Dag-Erling Smørgrav
83e351780f nfsv4: No need to check va_flags twice.
Fixes:		c5d72d29fe
Reviewed by:	rmacklem
2025-07-10 16:54:41 +02:00
Ariel Ehrenberg
b6b3743fa2 mlx5en: add driver tls status string method for rx sessions
Upon collecting tls information, kernel calls driver to get driver/hw
tls state. Driver calls hw to get its tracking and authentication
states, and dump them into the driver state buffer. This requires a
sleep to wait for the hw response.

Reviewed by:	kib
Sponsored by:	NVidia networking
2025-07-10 17:42:27 +03:00
Konstantin Belousov
cdd8129216 mlx5_en: wait_for_completion_timeout() takes jiffies
Sponsored by:	Nvidia networking
2025-07-10 17:42:27 +03:00
Konstantin Belousov
d7c807aa88 sysctl net.inet.tcp.ktcplist: properly fill driver status length field
Also ignore errors from drivers. If driver snd_tag status method
returned an error, silently ignore the returned string, and not advance
the position of the filled buffer.

Sponsored by:	Nvidia networking
2025-07-10 17:42:27 +03:00
Konstantin Belousov
18905fc31b sysctl net.inet.tcp.ktcplist: try to handle EDEADLK
If EDEADLK is returned from the locked handler, restart it.  Do it
limited number of times.  Catch signals between tries.

Reviewed by:	glebius, markj
Sponsored by:	Nvidia networking
Differential revision:	https://reviews.freebsd.org/D51143
2025-07-10 17:42:27 +03:00
Konstantin Belousov
b435452e6b sysctl net.inet.tcp.ktlslist: allow snd_tag_status_str() to sleep
For this, unlock inp around the calls, taking the reference on it.  If
the inp appears to be freed or unlinked after the relock, return
EDEADLK.

Reviewed by:	glebius, markj
Sponsored by:	Nvidia networking
Differential revision:	https://reviews.freebsd.org/D51143
2025-07-10 17:42:27 +03:00
Konstantin Belousov
1b7d0c2ee9 in_pcb: add in_pcbrele_rlock()
The helper that derefs and rlocks the provided inp.  Returns false if inp
is still usable.

Reviewed by:	glebius, markj
Sponsored by:	Nvidia networking
Differential revision:	https://reviews.freebsd.org/D51143
2025-07-10 17:42:27 +03:00
Konstantin Belousov
63389aea24 inotify: do not call into namei() with a locked vnode
PR:	288127
Reviewed by:	markj
Fixes:	f1f230439f
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D51233
2025-07-10 17:34:45 +03:00
Alexander Ziaee
42df4faf70
builtin.1: streamlined rewrite + document keybinds
+ clean title: one line with keywords, builtin's mlinked
+ additional search terms: FreeBSD, index
+ synopsis: builtins are documented in your shell's manual
+ clear/scoped introduction: increase straightforwardness
+ table alignment fixed, now renders nicely on MANWIDTH=59
+ keybindings: fundamentals now documented, more please?
+ improved structure and flow, and a spdx tag
+ remove info(1) from SEE ALSO

MFC after:	3 days
Discussed with:	imp,jlduran
Reviewed by:	imp
Closes:		https://github.com/freebsd/freebsd-src/pull/1481
2025-07-10 08:45:10 -04:00
Doug Moore
8adb3acb63 pctrie: leave iter at root after ge_lookup failure
If pctrie_lookup_iter_ge fails to return a node, the iterator is left
with NULL as the current node. Instead, make the pctrie_root the
current node when the pctrie has an internalnode. Do the same thing
for lookup_iter_le. If an iterator is reused after a ge/le lookup
fails, this will skip the step in _pctrie_lookup_node where a NULL is
replaced by the node at the top of the trie.

Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D51232
2025-07-10 03:14:07 -05:00
Maxim Konovalov
78935fa40e named_attribute.7: typo fix 2025-07-09 23:21:03 +00:00
Rick Macklem
c130914d29 named_attribute.7: Document the named attribute mechanism
The named attribute interface is an alternate way
to manipulate extended attributes, based on the
interface provided by Solaris.

This man page documents this interface.

Reviewed by:	kib, ziaee (manpages), pauamma_gundo.com
Differential Revision:	https://reviews.freebsd.org/D49851
Fixes:	2ec2ba7e23 ("vfs: Add VFS/syscall support for Solaris style extended attributes")
2025-07-09 15:55:52 -07:00
Dag-Erling Smørgrav
4982db387f pfctl: Fix 32-bit build.
Fixes:		1997370109
Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D51230
2025-07-09 22:42:10 +02:00
Dag-Erling Smørgrav
42e613018d opendir, readdir, telldir: Use the correct types.
Use either size_t or off_t (as appropriate) instead of long.

Sponsored by:	Klara, Inc.
Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D51210
2025-07-09 22:34:22 +02:00
Dag-Erling Smørgrav
dd81cc2bc5 getdirentries: Return ENOTDIR if not a directory.
This is both more logical and more useful than EINVAL.

While here, also check for VBAD and return EBADF in that case.  This can
happen if the underlying filesystem got forcibly unmounted after the
directory was opened.  Previously, this would also have returned EINVAL,
which wasn't right but wasn't wrong either; however, ENOTDIR would not
be appropriate.

MFC after:	never
Sponsored by:	Klara, Inc.
Reviewed by:	kevans, kib
Differential Revision:	https://reviews.freebsd.org/D51209
2025-07-09 22:34:18 +02:00
Dag-Erling Smørgrav
9bf14f2a47 kyua: Try harder to delete directories.
When recursing into a directory to delete it, start by chmod'ing it to
0700.  This fixes an issue where kyua is able to run, but not debug, a
test case that creates unwriteable directories, because when debugging
it tries (and fails) to delete the directory after the test completes.

MFC after:	1 week
Sponsored by:	Klara, Inc.
Reviewed by:	igoro
Differential Revision:	https://reviews.freebsd.org/D51229
2025-07-09 22:28:47 +02:00
Dag-Erling Smørgrav
ccf937320a libc: Finish removing fscandir{,_b}().
These only existed for a few days before being renamed, so there's no
reason to continue to carry compatibility shims for them.

Fixes:		deeebfdeca
Sponsored by:	Klara, Inc.
Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D50981
2025-07-09 21:17:06 +02:00
Alexander Ziaee
91ad27bcfc
truncate.1: Polish title and examples
+ describe better
+ switch examples to human readable sizes, with consistent spacing
+ `Downsize ... in 5 Megabytes` >> `Downsize ... by 5 Megabytes`
+ remove prompt from lone example with prompt for consistency
+ remove x permissions from kernel example
+ examples now fit on standard console without wrapping

While here:
+ fold a line to eliminate linter warning + tag spdx
+ add -nosplit to AUTHORS to eliminate a rendering glitch

MFC after:	3 days
Discussed with:	asomers, jhb, maxim
Reviewed by:	imp (previous version)
Closes:		https://github.com/freebsd/freebsd-src/pull/1568
2025-07-09 13:52:01 -04:00
Konstantin Belousov
6280d03bb5 nfsserver rename: lock mnt_renamelock as required
Fixes:	ef6ea91593
Reported by:	des
Reviewed by:	markj, rmacklem
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D51211
2025-07-09 20:38:11 +03:00
Konstantin Belousov
44e9c9f8f3 nfsvno_rename(): do not use -1 as special error indicator
it clashes with ERESTART.  Use EJUSTRETURN for the case, as it is often
done in other places in the kernel.

Reviewed by:	markj, rmacklem
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D51211
2025-07-09 20:38:11 +03:00
Dag-Erling Smørgrav
89990e28e6 cp: Add descriptions to all test cases.
While here, touch a few test cases up.

Sponsored by:	Klara, Inc.
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D51124
2025-07-09 19:10:59 +02:00
Dag-Erling Smørgrav
2d6b33f801 cp: Add an option to visit sources in order.
This adds a --sort option which makes cp pass a comparison function to
FTS, ensuring that sources are visited and traversed in a predictable
order.  This will help make certain test cases more reliable.

Sponsored by:	Klara, Inc.
Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D51214
2025-07-09 19:07:13 +02:00
Dag-Erling Smørgrav
c3efa16dc9 cp: Add GNU-compatible long options.
While here, fully switch boolean variables from int to bool, and clean
up the manual page a little.

Sponsored by:	Klara, Inc.
Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D51213
2025-07-09 19:07:13 +02:00
Ruslan Bukin
bc2e336010 hwt(4): Add initial man page.
Reviewed by:	ziaee
Sponsored by:	UKRI
Differential Revision:	https://reviews.freebsd.org/D51192
2025-07-09 16:57:07 +01:00
Siva Mahadevan
68fe0d9cc0 pfctl tests: use require.kmods instead of manual check for pf
Reviewed by:	kp
Signed-off-by:	Siva Mahadevan <me@svmhdvn.name>
Sponsored by:	The FreeBSD Foundation
Pull Request:	https://github.com/freebsd/freebsd-src/pull/1762
2025-07-09 17:38:09 +02:00
Kristof Provost
939aacb600 pfctl tests: macro test requires pf to be loaded
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2025-07-09 17:38:08 +02:00
Ronald Klop
9c95fcb7cd tests: Get the MAC from the epairs.
This removes knowledge of the implementation of if_epair.
Makes it easier to modify if_epair in future commits.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D51205
2025-07-09 17:38:08 +02:00
Kristof Provost
4f822ad285 pfsync: count failed state insertions
If we fail to import a state, for whatever reason, count this as a bad action.
We should not drop states without at least incrementing an error count.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2025-07-09 17:38:08 +02:00
Kristof Provost
a4757fbd8c pfsync: log a bad version as a bad version, not a bad action
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2025-07-09 17:38:08 +02:00
Kristof Provost
f651e87be2 pf tests: sync a state with an rtableid that doesn't exist
Create a state with an rtableid (i.e. fib) that doesn't exist on the receiving
side. This used to not be handled, and could provoke panics. Create such a
situation to ensure we still don't panic.

PR:		287981
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2025-07-09 17:38:08 +02:00
Kristof Provost
4af4fefedd pf: ignore state update with invalid rtableid
It's possible for a peer to send us a state update with an rtableid we don't
support (i.e. >= net.fibs).
Drop these updates rather than potentially crashing later by setting an invalid
fib number.

PR:		287981
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2025-07-09 17:38:07 +02:00
John Baldwin
5c59cec2d5 nvmf: Auto-reconnect periodically after a disconnect
Use a timer in the nvmf(4) driver to periodically trigger a devctl
"RECONNECT" notification.  A trigger in the /etc/devd/nvmf.conf file
invokes "nvmecontrol reconnect nvmeX" upon each notification.  This
differs from iSCSI which uses a dedicated daemon (iscsid(8)) to wait
inside a custom ioctl for an iSCSI initiator event to occur, but I
think this design might be simpler.

Similar to nvme-cli, the interval between reconnection attempts is
specified in seconds by the --reconnect-delay argument to the connect
and reconnect commands.  Note that nvme-cli uses -c for short letter
of this command, but that was already taken so nvmecontrol uses -r.
The default is 10 seconds to match Linux.

In addition, a second timeout can be used to force a full detach of a
disconnected the nvmeX device after the controller loss timeout
expires.  The timeout for this is specified in seconds by the
--ctrl-loss-tmo/-l options (identical to nvme-cli).  The default is
600 seconds.

Either of these timers can be disabled by setting the timer to 0.  In
that case, the associated action (devctl notifications or full detach)
will not occur after a disconnect.

Note that this adds a dedicated taskqueue for nvmf tasks instead of
using taskqueue_thread as the controller loss task could deadlock
waiting for the completion of other tasks queued to taskqueue_thread.
(Specifically, tearing down the CAM SIM can trigger
destroy_dev_sched_cb() and waits for the callback to run, but the
callback is scheduled to run in a task on taskqueue_thread.  Possibly,
destroy_dev_sched should be using a dedicated taskqueue.)

Reviewed by:	imp (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D50222
2025-07-09 10:19:45 -04:00
Dag-Erling Smørgrav
2187622436 fts: Fix option list indentation.
Fixes:		da2025a0e8
Sponsored by:	Klara, Inc.
Reviewed by:	bcr
Differential Revision:	https://reviews.freebsd.org/D51208
2025-07-09 15:20:08 +02:00
Kristof Provost
3b6bcad340 pfctl.8: Further document recursive flush behaviour
OK sashan

Obtained from:	OpenBSD, kn <kn@openbsd.org>, 5bd1c2906f
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2025-07-09 10:57:51 +02:00
Kristof Provost
fb0d388e5d pfctl: Print the main ruleset/anchor as "/" not "<root>" for consistency
OK sashan

Obtained from:	OpenBSD, kn <kn@openbsd.org>, baa66dbe09
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2025-07-09 10:57:51 +02:00
Kristof Provost
05c33e5acb pfctl tests: recursive flush test case
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2025-07-09 10:57:50 +02:00
Kristof Provost
041ce1d690 pfctl: recursively flush rules and tables
The recursive operation ("pfctl -a '*' ...") works for '-s' option already. This
change enables the same thing for '-F' option, so "pfctl -a '*' -Fa" will
flush everything from PF driver.

The idea was discussed with many on tech@ in spring 2019.

OK kn@

Obtained from:	OpenBSD, sashan <sashan@openbsd.org>, ae711728d4
Obtained from:	OpenBSD, sashan <sashan@openbsd.org>, 7abd52e24a
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2025-07-09 10:57:50 +02:00