Commit graph

405 commits

Author SHA1 Message Date
Olivier Certner
de6a4418ee
sysctl(9): Ease exporting struct sizes; Discourage doing that
Introduce two helpers, the more general SYSCTL_SIZEOF() and
a struct-specific one SYSCTL_SIZEOF_STRUCT() which prepends 'struct' in
the description and in the use of sizeof() but uses the raw structure
name as the knob's name.  The size of the object/structure is exported
under 'debug.sizeof'.

Existing knobs under 'debug.sizeof' were all converted to use the
helpers.

Add a note before the helpers discouraging the introduction of new
leaves for ad-hoc reasons.  List alternative means for developers to
obtain the size of arbitrary kernel structures easily (thanks to markj@
for providing these).

No functional change (intended).

Reviewed by:    kib, markj
MFC after:      3 days
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D50121

(cherry picked from commit 713abc9880aabe0ff924ff644bceb6ff404ed3cd)
(cherry picked from commit efce9f8a510b60736994e50288b78fc7b32b5d90)

Approved by:    re (cperciva)
2025-05-13 14:41:33 +02:00
Mark Johnston
48fddba637 devfs: Return early from sysctl_devname() if no input is given
Otherwise we end up searching for a device using an uninitialized key,
tripping up KMSAN.

MFC after:	2 weeks

(cherry picked from commit 48d6b52add36cc09e7fb1fbec44ab66c0742f320)
2025-04-06 13:54:03 +00:00
Mark Johnston
48155c983c kern: Make fileops and filterops tables const where possible
No functional change intended.

MFC after:	1 week

(cherry picked from commit ef9ffb8594eee294334ced627755bf5b46b48f9f)
2024-12-03 01:03:42 +00:00
Konstantin Belousov
9ccd539412 devfs_allocv(): style
(cherry picked from commit 6d79564fe341c8dbf09405cae1a0a76460aaf8aa)
2024-05-19 03:57:54 +03:00
Doug Rabson
a1e55af8d1 Fix MNT_IGNORE for devfs, fdescfs and nullfs
The MNT_IGNORE flag can be used to mark certain filesystem mounts so
that utilities such as df(1) and mount(8) can filter out those mounts by
default. This can be used, for instance, to reduce the noise from
running container workloads inside jails which often have at least three
and sometimes as many as ten mounts per container.

The flag is supplied by the nmount(2) system call and is recorded so
that it can be reported by statfs(2). Unfortunately several filesystems
override the default behaviour and mask out the flag, defeating its
purpose. This change preserves the MNT_IGNORE flag for those filesystems
so that it can be reported correctly.

MFC after:	1 week

(cherry picked from commit b5c4616582cebdcf4dee909a3c2f5b113c4ae59e)
2024-04-22 15:48:15 +02:00
Konstantin Belousov
cde0347ca7 cdevpriv(9): add iterator
(cherry picked from commit d3efbe0132b24e8660df836905cda7662f85a154)
2024-03-30 10:31:13 +02:00
Konstantin Belousov
7dff3d1cdf kcmp(2): implement for devfs files
(cherry picked from commit 5c41d888de1aba0e82531fb6df4cc3b6989d37bd)
2024-01-30 22:24:43 +02:00
Gordon Bergling
d0f5ac0a72 devfs(5): Fix a typo in a source code comment
- s/interpeted/interpreted/

(cherry picked from commit 7cf293536ebacc92150be12e0be928500e670610)
2024-01-23 07:43:09 +01:00
Jason A. Harmening
23332e34e6 devfs: add integrity asserts for cdevp_list
It's possible for misuse of cdev KPIs or for bugs in devfs itself to
result in e.g. a cdev object's container being freed while still on the
global list used to populate each devfs mount; see PR 273418 for a
recent example.

Since a node may be marked inactive well before it is reaped from the
list, add a new flag solely to track list membership, and employ it in
some basic list integrity assertions to catch bad actors.

Discussed with:	kib, mjg

(cherry picked from commit 67864268da53b792836f13be10299de8cd62997e)
2023-09-27 19:46:38 -05:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Warner Losh
4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Stefan Eßer
88a795e80c sys/fs: do not report blocks allocated for synthetic file systems
The pseudo file systems (devfs, fdescfs, procfs, etc.) report total
and available blocks and inodes despite being synthetic with no
underlying storage device to which those values could be applied.

The current code of these file systems tends to report a fixed number
of total blocks but no free blocks, and in the case of procfs,
libprocfs, linsysfs also no free inodes.

This can be irritating in e.g. the "df" output, since 100% of the
resources seem to be in use, but it can also create warnings in
monitoring tools used for capacity management.

This patch makes these file systems return the same value for the
total and free parameters, leading to 0% in use being displayed by
"df". Since there is no resource that can be exhausted, this appears
to be a sensible result.

Reviewed by:	mckusick
Differential Revision:	https://reviews.freebsd.org/D39442
2023-04-25 09:59:15 +02:00
Mateusz Guzik
829f0bcb5f vfs: add the concept of vnode state transitions
To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>

As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).

Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).

The new field lands in a 1-byte hole, thus it does not grow the struct.

Bump __FreeBSD_version to 1400077

Reviewed by:	kib (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D37759
2022-12-26 17:35:12 +00:00
Mateusz Guzik
5b5b7e2ca2 vfs: always retain path buffer after lookup
This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D36542
2022-09-17 09:10:38 +00:00
Mateusz Guzik
ad5e1f9c2d devfs: stop taking the interlock in devfs_delete
It buys nothing now that vhold does not require it.
2022-09-14 22:51:42 +00:00
Mateusz Guzik
a1c555f48b devfs: retire the unused DEVFS_DEL_VNLOCKED flag 2022-09-14 22:47:53 +00:00
Mateusz Guzik
497240def8 Retire clone_drain_lock
It is only ever xlocked in drain_dev_clone_events and the only consumer of
that routine does not need it -- eventhandler code already makes sure the
relevant callback is no longer running.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D36268
2022-08-20 09:44:05 +00:00
Gordon Bergling
8ea3ceda76 fs: fix a few common typos in source code comments
- s/quadradically/quadratically/
- s/persistant/persistent/

Obtained from:	NetBSD
MFC after:	3 days
2022-02-06 13:48:31 +01:00
Konstantin Belousov
66c5fbca77 insmntque1(): remove useless arguments
Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D34071
2022-01-31 16:49:08 +02:00
Mateusz Guzik
2a7e4cf843 Revert b58ca5df0b ("vfs: remove the now unused insmntque1")
I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.

Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.

Noted by:	kib
Reported by:	pho
2022-01-27 16:32:22 +00:00
Mateusz Guzik
3af3e99ce4 devfs: stop using insmntque1
It adds nothing of value over insmntque.
2022-01-27 00:54:30 +01:00
Mateusz Guzik
3ffcfa599e vfs: add vop_stdadd_writecount_nomsync
This avoids needing to inspect the mount point every time.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D33125
2021-11-26 12:06:08 +00:00
Mateusz Guzik
2b68eb8e1d vfs: remove thread argument from VOP_STAT
and fo_stat.
2021-10-11 13:22:32 +00:00
Mateusz Guzik
b4a58fbf64 vfs: remove cn_thread
It is always curthread.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D32453
2021-10-11 13:21:47 +00:00
Mark Johnston
243b324f96 devfs: Avoid comparison with an uninitialized var in devfs_fp_check()
devvn_refthread() will initialize *devp only if it succeeds, so check for
success before comparing with fp->f_data.  Other devvn_refthread()
callers are careful to do this.

Reported by:	KMSAN
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30068
2021-05-03 13:24:30 -04:00
Konstantin Belousov
d485c77f20 Remove #define _KERNEL hacks from libprocstat
Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in
userspace, assuming that the consumer has an idea what it is for.
Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h,
sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the
same caveat.

Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h
being unusable in userspace, where it override struct buf with its own
definition.  Instead, provide struct m_buf and struct m_vnode and adapt
code to use local variants.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Mateusz Guzik
3bc17248d3 devfs: fix use count leak when using TIOCSCTTY
by matching devfs_ctty_ref

Fixes: 3b44443626 ("devfs: rework si_usecount to track opens")
2021-02-09 01:54:21 +00:00
Edward Tomasz Napierala
4ddb3cc597 devfs(4): defer freeing until we drop devmtx ("cdev")
Before r332974 the old code would sometimes cause a rare lock order
reversal against pagequeue, which looked roughly like this:

witness_checkorder()
__mtx_lock-flags()
vm_page_alloc()
uma_small_alloc()
keg_alloc_slab()
keg_fetch-slab()
zone_fetch-slab()
zone_import()
zone_alloc_bucket()
uma_zalloc_arg()
bucket_alloc()
uma_zfree_arg()
free()
devfs_metoo()
devfs_populate_loop()
devfs_populate()
devfs_rioctl()
VOP_IOCTL_APV()
VOP_IOCTL()
vn_ioctl()
fo_ioctl()
kern_ioctl()
sys_ioctl()

Since r332974 the original problem no longer exists, but it still
makes sense to move things out of the - often congested - lock.

Reviewed By:	kib, markj
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27334
2020-12-29 13:47:36 +00:00
Mateusz Guzik
586ee69f09 fs: clean up empty lines in .c and .h files 2020-09-01 21:18:40 +00:00
Conrad Meyer
0ac9e27ba9 devfs: Abstract locking assertions
The conversion was largely mechanical: sed(1) with:

  -e 's|mtx_assert(&devmtx, MA_OWNED)|dev_lock_assert_locked()|g'
  -e 's|mtx_assert(&devmtx, MA_NOTOWNED)|dev_lock_assert_unlocked()|g'

The definitions of these abstractions in fs/devfs/devfs_int.h are the
only non-mechanical change.

No functional change.
2020-08-12 00:32:31 +00:00
Mateusz Guzik
3b44443626 devfs: rework si_usecount to track opens
This removes a lot of special casing from the VFS layer.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D25612
2020-08-11 14:27:57 +00:00
Mateusz Guzik
ca423b858b devfs: bool -> int
Fixes buildworld after r364069
2020-08-10 11:46:39 +00:00
Mateusz Guzik
7b19bddac8 devfs: save on spurious relocking for devfs_populate
Tested by:	pho
2020-08-10 10:36:43 +00:00
Mateusz Guzik
f8935a96d1 devfs: use cheaper lockmgr entry points
Tested by:	pho
2020-08-10 10:36:10 +00:00
Mateusz Guzik
f9c13ab856 devfs: use vget_prep/vget_finish
Tested by:	pho
2020-08-10 10:35:47 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Mateusz Guzik
11c345b18f devfs: fix a vnode use-after-free in devfs_ioctl
The vnode to be replaced was read with a shared lock, meaning 2 racing threads
can find the same one.

While here clean it up a little bit.
2020-07-04 06:27:28 +00:00
Thomas Munro
f270658873 vfs: track sequential reads and writes separately
For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.

Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.

Reviewed by:	mjg, kib, tmunro
Approved by:	mjg (mentor)
Obtained from:	Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision:	https://reviews.freebsd.org/D25024
2020-06-21 08:51:24 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Mateusz Guzik
f1fa1ba3d0 Fix up various vnode-related asserts which did not dump the used vnode 2020-02-03 14:25:32 +00:00
Kyle Evans
6a5abb1ee5 Provide O_SEARCH
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23247
2020-02-02 16:34:57 +00:00
Mateusz Guzik
45757984f8 vfs: consistently use size_t for buflen around VOP_VPTOCNP 2020-02-01 20:34:43 +00:00
Mateusz Guzik
b249ce48ea vfs: drop the mostly unused flags argument from VOP_UNLOCK
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D21427
2020-01-03 22:29:58 +00:00
Mateusz Guzik
6fa079fc3f vfs: flatten vop vectors
This eliminates the following loop from all VOP calls:

while(vop != NULL && \
    vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
        vop = vop->vop_default;

Reviewed by:	jeff
Tesetd by:	pho
Differential Revision:	https://reviews.freebsd.org/D22738
2019-12-16 00:06:22 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Kyle Evans
1b50b999f9 tty: implement TIOCNOTTY
Generally, it's preferred that an application fork/setsid if it doesn't want
to keep its controlling TTY, but it could be that a debugger is trying to
steal it instead -- so it would hook in, drop the controlling TTY, then do
some magic to set things up again. In this case, TIOCNOTTY is quite handy
and still respected by at least OpenBSD, NetBSD, and Linux as far as I can
tell.

I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as
long as it doesn't impose a major burden.

Reviewed by:	bcr (manpages), kib
Differential Revision:	https://reviews.freebsd.org/D22572
2019-11-30 20:10:50 +00:00
Mateusz Guzik
a02cab334c devfs: introduce a per-dev lock to protect ->si_devsw
This allows bumping threadcount without taking the global devmtx lock.

In particular this eliminates contention on said lock while using bhyve
with multiple vms.

Reviewed by:	kib
Tested by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22548
2019-11-30 16:46:19 +00:00
Mateusz Guzik
1fccb43c39 vfs: change si_usecount management to count used vnodes
Currently si_usecount is effectively a sum of usecounts from all associated
vnodes. This is maintained by special-casing for VCHR every time usecount is
modified. Apart from complicating the code a little bit, it has a scalability
impact since it forces a read from a cacheline shared with said count.

There are no consumers of the feature in the ports tree. In head there are only
2: revoke and devfs_close. Both can get away with a weaker requirement than the
exact usecount, namely just the count of active vnodes. Changing the meaning to
the latter means we only need to modify it on 0<->1 transitions, avoiding the
check plenty of times (and entirely in something like vrefact).

Reviewed by:	kib, jeff
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D22202
2019-11-20 12:05:59 +00:00
Mateusz Guzik
4c9ba39aea devfs: use MNTK_NOMSYNC
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22009
2019-10-13 15:41:47 +00:00
Konstantin Belousov
e3fdd051f9 devfs_vptocnp(): correct the component name when node is not at top.
Node' cdp.si_name is the full path as provided by make_dev(9), it
should not be returned by VOP_VPTOCNP() when only the last component
is requested.  Use the dirent entry instead.

With this note, handling of VDIR and VCHR nodes only differs in
handling of root vnode, which simplifies and unifies the logic.

Reported by:	Li, Zhichao1 <Zhichao_Li1@Dell.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-10-11 18:41:24 +00:00