Commit graph

17679 commits

Author SHA1 Message Date
Mateusz Guzik
36f47512d9 vfs: inline vrefcnt 2020-08-12 04:53:20 +00:00
Mateusz Guzik
4c2d103a02 vfs: garbage collect vrefactn 2020-08-12 04:53:02 +00:00
Mateusz Guzik
6883f07e97 vfs: reimplement vref on top of vget
No change in generated assembly.
2020-08-12 04:52:35 +00:00
Conrad Meyer
0ac9e27ba9 devfs: Abstract locking assertions
The conversion was largely mechanical: sed(1) with:

  -e 's|mtx_assert(&devmtx, MA_OWNED)|dev_lock_assert_locked()|g'
  -e 's|mtx_assert(&devmtx, MA_NOTOWNED)|dev_lock_assert_unlocked()|g'

The definitions of these abstractions in fs/devfs/devfs_int.h are the
only non-mechanical change.

No functional change.
2020-08-12 00:32:31 +00:00
Mateusz Guzik
3b44443626 devfs: rework si_usecount to track opens
This removes a lot of special casing from the VFS layer.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D25612
2020-08-11 14:27:57 +00:00
Mateusz Guzik
2d0631dd08 vfs: stricter validation for flags passed to namei in cn_flags
namei de facto expects that the naimeidata object is properly initialized,
but at the same time it mixes consumer-passable and internal flags, while
tolerating this part by explicitly clearing some of them.

Tighten the interface instead.

While here renumber the flags and denote the gap between the 2 variants.

Try to piggy back th renumber on the just bumped __FreeBSD_version.
2020-08-11 01:34:40 +00:00
Mateusz Guzik
25e42ee217 vfs: drop the hello world stat probes from the vfs provider
Interested parties can get the same information by hoooking on vop_stat.
2020-08-10 18:11:00 +00:00
Mateusz Guzik
5e79447d60 cache: let SAVESTART passthrough
The flag is only passed for non-LOOKUP ops and those fallback to the slowpath.
2020-08-10 12:28:56 +00:00
Mateusz Guzik
bb48255cf5 cache: resize struct namecache to a multiply of alignment
For example struct namecache on amd64 is 100 bytes, but it has to occupies
104. Use the extra bytes to support longer names.
2020-08-10 12:05:55 +00:00
Mateusz Guzik
8b62cebea7 cache: remove unused variables from cache_fplookup_parse 2020-08-10 11:51:56 +00:00
Mateusz Guzik
03337743db vfs: clean MNTK_FPLOOKUP if MNT_UNION is set
Elides checking it during lookup.
2020-08-10 11:51:21 +00:00
Mateusz Guzik
c571b99545 cache: strlcpy -> memcpy 2020-08-10 10:40:14 +00:00
Mateusz Guzik
3ba0e51703 vfs: partially support file create/delete/rename in lockless lookup
Perform the lookup until the last 2 elements and fallback to slowpath.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:35:18 +00:00
Mateusz Guzik
21d5af2b30 vfs: drop the thread argumemnt from vfs_fplookup_vexec
It is guaranteed curthread.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:34:22 +00:00
Mateusz Guzik
7f70080150 vfs: disallow NOCACHE with LOOKUP
This means there is no expectation lookup will purge the terminal entry,
which simplifies lockless lookup.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:33:40 +00:00
Mateusz Guzik
51ea7bea91 vfs: add VOP_STAT
The current scheme of calling VOP_GETATTR adds avoidable overhead.

An example with tmpfs doing fstat (ops/s):
before: 7488958
after:  7913833

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D25910
2020-08-07 23:06:40 +00:00
Mateusz Guzik
1ff80a3400 vfs: release the interlock after failing to set VHOLD_NO_SMR
While here add more comments.

Diagnosed by:	markj
Reported by:	pho
Fixes:	r362827 ("vfs: protect vnodes with smr")
2020-08-07 19:36:08 +00:00
Warner Losh
f7bb4f88c5 Remove obsolete part of comment. It was cut and pasted from the old version of
this function, and was never relevant to the new version.
2020-08-07 18:21:48 +00:00
Hans Petter Selasky
826c079373 Add full support support for dynamic allocation and freeing of epoch's.
Make sure to reclaim epoch structures when they are freed to support
dynamic allocation and freeing of epoch structures.

While at it, move the 64 supported epoch control structures to the
static memory domain. This overall simplifies the management and
debugging of system epoch's.

Reviewed by:		kib, markj
Differential Revision:	https://reviews.freebsd.org/D25960
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-08-07 15:32:42 +00:00
Mark Johnston
0ffec1b03d Clean up reassignbuf() and buf_vlist_remove() a bit.
- Convert panic() calls to INVARIANTS-only assertions.  The PCTRIE code
  provides some of the same protection since it will panic upon an
  attempt to remove a non-resident buffer.
- Update the comment above reassignbuf() to reflect reality.

Reviewed by:	cem, kib, mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25965
2020-08-06 15:43:15 +00:00
Mark Johnston
7013797e34 Remove the vfs.reassignbufcalls counter and sysctl.
As the 20-year old comment above it suggests, the counter is of dubious
value.  Moreover, the (global) counter was not updated precisely and
hurts scalability.

Reviewed by:	cem, kib, mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25965
2020-08-06 15:42:59 +00:00
Mateusz Guzik
e910c93eea cache: add more predicts for failing conditions 2020-08-06 04:20:14 +00:00
Mateusz Guzik
95888901f7 cache: plug unititalized variable use
CID:	1431128
2020-08-06 04:19:47 +00:00
Mateusz Guzik
bb62c418fd vfs hash: annotate the lock with __exclusive_cache_line
Note the code does not scale in the current form.
2020-08-05 19:34:13 +00:00
Mateusz Guzik
4f00177887 pipe: reduce atime precision
The routine is called on successful write and read, which on pipes happens a
lot and for small sizes.

Precision provided by default seems way bigger than necessary and it causes
problems in vms on amd64 (it rdtscp's which vmexits). getnanotime seems to
provide the level roughly in lines of Linux so we should be good here.

Sample result from will-it-scale pipe1_processes -t 1 (ops/s):
before: 426464
after: 3247421

Note the that atime handling for named pipes is broken with and without the
patch. The filesystem code is never used for updating atime and never looks
at the updated field. Consequently, while there are no provisions added to
handle named pipes separately, the change is a nop for that case.

Differential Revision:	 https://reviews.freebsd.org/D23964
2020-08-05 19:15:59 +00:00
Andrey V. Elsukov
edde7a538b Add m__getjcl SDT probe.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2020-08-05 11:39:09 +00:00
Mateusz Guzik
e1b1971c05 cache: don't ignore size passed to nchinittbl 2020-08-05 09:38:02 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Mateusz Guzik
2b86f9d6d0 cache: convert the hash from LIST to SLIST
This reduces struct namecache by sizeof(void *).

Negative side is that we have to find the previous element (if any) when
removing an entry, but since we normally don't expect collisions it should be
fine.

Note this adds cache_get_hash calls which can be eliminated.
2020-08-05 09:25:59 +00:00
Mateusz Guzik
cf8ac0de81 cache: reduce zone alignment to 8 bytes
It used to be sizeof of the given struct to accomodate for 32 bit mips
doing 64 bit loads, but the same can be achieved with requireing just
64 bit alignment.

While here reorder struct namecache so that most commonly used fields
are closer.
2020-08-05 09:24:38 +00:00
Mateusz Guzik
d61ce7ef50 cache: convert ncnegnash into a macro
It is a read-only var with value known at compilation time.
2020-08-05 09:24:00 +00:00
Mateusz Guzik
158ab70c24 vfs: tidy up namei entry point
- predict for string copy errors
- reshuffle inititalistion of vars which are not needed
2020-08-05 07:33:39 +00:00
Mateusz Guzik
2840f07d4f cache: cleanup lockless entry point
- remove spurious bzero
- assert ni_lcf, it has to be set by namei by this point
2020-08-05 07:32:26 +00:00
Mateusz Guzik
8ccf01e0e2 cache: stop messing with cn_lkflags
See r363882.
2020-08-05 07:30:57 +00:00
Mateusz Guzik
27c4618df5 cache: stop messing with cn_flags
This removes flag setting/unsetting carried over from regular lookup.
Flags still get for compatibility when falling back.

Note .. and . handling can get partially folded together.
2020-08-05 07:30:17 +00:00
Mateusz Guzik
db99ec5656 vfs: support lockless dotdot lookup
Tested by:	pho
2020-08-04 23:07:42 +00:00
Mateusz Guzik
b403aa126e cache: add NCF_WIP flag
This allows making half-constructed entries visible to the lockless lookup,
which now can check for either "not yet fully constructed" and "no longer valid"
state.

This will be used for .. lookup.
2020-08-04 23:07:00 +00:00
Mateusz Guzik
6e10434c02 cache: add cache_purge_vgone
cache_purge locklessly checks whether the vnode at hand has any namecache
entries. This can race with a concurrent purge which managed to remove
the last entry, but may not be done touching the vnode.

Make sure we observe the relevant vnode lock as not taken before proceeding
with vgone.

Paired with the fact that doomed vnodes cannnot receive entries this restores
the invariant that there are no namecache-related writing users past cache_purge
in vgone.

Reported by:	pho
2020-08-04 23:04:29 +00:00
Mateusz Guzik
bd66a0750f mtx: add mtx_wait_unlocked 2020-08-04 23:00:00 +00:00
Mateusz Guzik
8541ae04b4 rms: fix typo: bitmamp -> bitmap
Reported by:	kib
2020-08-04 20:31:03 +00:00
Mateusz Guzik
1164f7a566 cache: factor away failed vexec handling 2020-08-04 19:55:26 +00:00
Mateusz Guzik
0439b00ea8 cache: assorted tidy ups 2020-08-04 19:55:00 +00:00
Mateusz Guzik
18bd02e2ce cache: factor away lockless dot lookup and add missing stat + sdt probe 2020-08-04 19:54:37 +00:00
Mateusz Guzik
17a66c7087 vfs: add vfs_op_thread_enter/exit _crit variants
and employ them in the namecache. Eliminates all spurious checks for preemption.
2020-08-04 19:54:10 +00:00
Mateusz Guzik
0311b05fec cache: add missing numcache detrement on insertion failure 2020-08-04 19:52:52 +00:00
Mateusz Guzik
3211e783e3 rms: add a comment explaining performance deficiencies of write locking 2020-08-04 19:52:16 +00:00
Mark Johnston
96ad26eefb Remove free_domain() and uma_zfree_domain().
These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches.  They no longer serve any
purpose since UMA now provides their functionality by default.  Remove
them to simplyify the kernel memory allocator interfaces a bit.

Reviewed by:	cem, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25937
2020-08-04 13:58:36 +00:00
Konstantin Belousov
6e0c8e1ae2 Add SOL_LOCAL symbolic constant for unix socket option level.
The constant seems to exists on MacOS X >= 10.8.

Requested by:	swills
Reviewed by:	allanjude, kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25933
2020-08-03 22:13:02 +00:00
Warner Losh
e67c55c998 Some function had the blank lines, others didn't. Most of the ones that didn't
were newer, so remove this now-optional blank line everywhere.
2020-08-03 22:12:18 +00:00
Konstantin Belousov
ca9a39acb3 Provide more correct description for sysctl kern.smp.cores.
Reported by:	dewayne@heuristicsystems.com.au
PR:	248454
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-08-03 17:17:17 +00:00
Mateusz Guzik
7ad2f1105e vfs: store precomputed namecache hash in the vnode
This significantly speeds up path lookup, Cascade Lake doing access(2) on ufs
on /usr/obj/usr/src/amd64.amd64/sys/GENERIC/vnode_if.c, ops/s:
before: 2535298
after: 2797621

Over +10%.

The reversed order of computation here does not seem to matter for hash
distribution.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25921
2020-08-02 20:02:06 +00:00
Mateusz Guzik
838984de32 vfs: move namecache initialisation into cache_vnode_init 2020-08-02 19:42:06 +00:00
Conrad Meyer
9da903e5d3 Unlocked getblk: Fix new false-positive assertion
A free buf's lock may be held (temporarily) due to unlocked lookup, so
buf_alloc() must acquire it without LK_NOWAIT.  The unlocked getblk path
should unlock it promptly once it realizes the identity does not match
the buffer it was searching for.

Reported by:	gallatin
Reviewed by:	kib
Tested by:	pho
X-MFC-With:	r363482
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D25914
2020-08-02 16:34:27 +00:00
Mateusz Guzik
936c24faba cred: add more asserts for td_realucred == td_ucred 2020-08-01 16:02:32 +00:00
Mateusz Guzik
8a7ec17095 cache: reshuffle struct cache_fpl and nameidata_saved
Shaves 16 bytes.
2020-08-01 06:35:18 +00:00
Mateusz Guzik
5a3944334c cache: mark climb_mount as __noinline 2020-08-01 06:34:18 +00:00
Mateusz Guzik
85cf316172 vfs: inline NDINIT_ALL
The routine takes more than 6 arguments, which on amd64 means some of
them have to be passed through the stack.
2020-08-01 06:33:38 +00:00
Mateusz Guzik
14576629bb vfs: convert ni_rigthsneeded to a pointer
Shaves 8 bytes of struct nameidata on 64-bit platforms.
2020-08-01 06:33:11 +00:00
Mateusz Guzik
21c162605b vfs: make rights mandatory for NDINIT_ALL 2020-08-01 06:32:25 +00:00
Conrad Meyer
d6a75d39e9 getblk: Remove a non-sensical LK_NOWAIT | LK_SLEEPFAIL
No functional change.

LK_SLEEPFAIL implies a behavior that is only possible if the lock operation can
sleep.  LK_NOWAIT prevents the lock operation from sleeping.

Discussed with:	kib
2020-07-31 00:13:40 +00:00
Conrad Meyer
59d13f6154 getblk: Avoid sleeping on wrong buf in lockless path
If the buffer identity changed during lookup, sleeping could introduce a
lock order reversal.  Since we do not know if the identity changed until we
get the lock, we must try-lock (LK_NOWAIT) only.  EINTR and ERESTART error
handling becomes irrelevant, as we no longer sleep.

Reported by:	kib
Reviewed by:	kib
X-MFC-With:	r363482
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D25898
2020-07-31 00:07:01 +00:00
Mateusz Guzik
cb90ef2875 cache: drop the useless numchecks counter 2020-07-30 22:52:18 +00:00
Mateusz Guzik
952759111e Further depessimize priv_check_cred_vfs_generation 2020-07-30 22:14:04 +00:00
Mateusz Guzik
848f8effdd vfs: inline vops if there are no pre/post associated calls
This removes a level of indirection from frequently used methods, most notably
VOP_LOCK1 and VOP_UNLOCK1.

Tested by:	pho
2020-07-30 15:50:51 +00:00
Mateusz Guzik
2e4f8220e8 vfs: fold poll_no_poll into vop_nopoll
The logic was almost completely present in vop_stdpoll anyway.
2020-07-30 15:48:56 +00:00
Mateusz Guzik
b1f910e02c vfs: short-circuit the common case NDFREE calls
Almost all consumers use the NDF_ONLY_PNBUF macro, making them avoidably branch
a lot in the NDFREE routine. Also note most of them should not need to call
any cleanup anyway as they don't request HASBUF.
2020-07-30 15:47:41 +00:00
Mateusz Guzik
404927357d vfs: add support for WANTPARENT and LOCKPARENT to lockless lookup
This makes the realpath syscall operational with the new lookup. Note that the
walk to obtain the full path name still takes locks.

Tested by:      pho
Differential Revision:	https://reviews.freebsd.org/D23917
2020-07-30 15:45:11 +00:00
Mateusz Guzik
8230d29357 vfs: support negative entry promotion in lockless lookup
Tested by:	pho
2020-07-30 15:44:10 +00:00
Mateusz Guzik
4057e3eaaa vfs: add NOMACCHECK and AUDITVNODE2 to lockless lookup
They are both nops since lookup does not progress with either mac or audit enabled.

Tested by:	pho
2020-07-30 15:43:16 +00:00
Mateusz Guzik
d3e63e8eb2 vfs: make sure startdir_used is always assigned to before use
CID:	1431070
2020-07-30 07:11:08 +00:00
Mark Johnston
1b778ba260 Fix a logic error in uipc_ready_scan().
When processing the last record in a socket buffer, take care to avoid a
NULL pointer dereference when advancing the record iterator.

Reported by:	syzbot+6a689cc9c27bd265237a@syzkaller.appspotmail.com
Fixes:		r359778
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-07-30 00:52:37 +00:00
John Baldwin
0f70a1489d Properly handle a closed TLS socket with pending receive data.
If the remote end closes a TLS socket and the socket buffer still
contains not-yet-decrypted TLS records but no decrypted TLS records,
soreceive needs to block or fail with EWOULDBLOCK.  Previously it was
trying to return data and dereferencing a NULL pointer.

Reviewed by:	np
Sponsored by:	Chelsio
Differential Revision:	https://reviews.freebsd.org/D25838
2020-07-29 23:24:32 +00:00
Mateusz Guzik
fad6dd772d vfs: elide MAC-induced locking on rename if there are no relevant hoooks 2020-07-29 17:05:31 +00:00
Mateusz Guzik
fd8c6a48ab vfs: honor error code returned by mac_vnode_check_rename_from
MFC after:	3 days
2020-07-29 17:04:33 +00:00
Yoshihiro Takahashi
8f11c99715 - Cleanups related to sparc64 removal.
- Remove remains of sparc64 files.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D25831
2020-07-28 10:58:37 +00:00
Kyle Evans
fd35bfaecf makesyscalls.sh: improve the 'this is going away' message
Reported by:	Ronald Klop, rgrimes
2020-07-28 01:05:40 +00:00
Kyle Evans
bb97350f28 makesyscalls.sh: spit out a deprecation notice to stderr
This has for a while been replaced by makesyscalls.lua in the stock FreeBSD
build.  Ensure downstreams get some notice that it'a going away if they're
reliant on it, maybe.
2020-07-27 03:13:23 +00:00
Doug Moore
00fd73d2da Fix an overflow bug in the blist allocator that needlessly capped max
swap size by dividing a value, which was always a multiple of 64, by
64.  Remove the code that reduced max swap size down to that cap.

Eliminate the distinction between BLIST_BMAP_RADIX and
BLIST_META_RADIX.  Call them both BLIST_RADIX.

Make improvments to the blist self-test code to silence compiler
warnings and to test larger blists.

Reported by:	jmallett
Reviewed by:	alc
Discussed with:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D25736
2020-07-25 18:29:10 +00:00
Mateusz Guzik
e914224af1 fd: put back FILEDESC_SUNLOCK to pwd_hold lost during rebase
Reported by:	pho
2020-07-25 15:34:29 +00:00
Alexander Motin
aba10e131f Allow swi_sched() to be called from NMI context.
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.

To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs.  On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY.  To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25754
2020-07-25 15:19:38 +00:00
Mateusz Guzik
9dbd12fb52 vfs: add support for !LOCKLEAF to lockless lookup
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D23916
2020-07-25 10:40:38 +00:00
Mateusz Guzik
c42b77e694 vfs: lockless lookup
Provides full scalability as long as all visited filesystems support the
lookup and terminal vnodes are different.

Inner workings are explained in the comment above cache_fplookup.

Capabilities and fd-relative lookups are not supported and will result in
immediate fallback to regular code.

Symlinks, ".." in the path, mount points without support for lockless lookup
and mismatched counters will result in an attempt to get a reference to the
directory vnode and continue in regular lookup. If this fails, the entire
operation is aborted and regular lookup starts from scratch. However, care is
taken that data is not copied again from userspace.

Sample benchmark:
incremental -j 104 bzImage on tmpfs:
before: 142.96s user 1025.63s system 4924% cpu 23.731 total
after: 147.36s user 313.40s system 3216% cpu 14.326 total

Sample microbenchmark: access calls to separate files in /tmpfs, 104 workers, ops/s:
before:   2165816
after:  151216530

Reviewed by:    kib
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25578
2020-07-25 10:37:15 +00:00
Mateusz Guzik
07d2145a17 vfs: add the infrastructure for lockless lookup
Reviewed by:    kib
Tested by:      pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25577
2020-07-25 10:32:45 +00:00
Mateusz Guzik
0379ff6ae3 vfs: introduce vnode sequence counters
Modified on each permission change and link/unlink.

Reviewed by:	kib
Tested by:	pho (in a patchset)
Differential Revision:	https://reviews.freebsd.org/D25573
2020-07-25 10:31:52 +00:00
Mateusz Guzik
d1385ab26e Guard sbcompress_ktls_rx with KERN_TLS
Fixes a compilation warning after r363464
2020-07-25 07:15:23 +00:00
Mateusz Guzik
bf71b96c69 Do a lockless check in kthread_suspend_check
Otherwise an idle system running lockstat sleep 10 reports contention on
process lock comming from bufdaemon.

While here fix a style nit.
2020-07-25 07:14:33 +00:00
Conrad Meyer
81dc6c2c61 Use gbincore_unlocked for unprotected incore()
Reviewed by:	markj
Sponsored by:	Isilon
Differential Revision:	https://reviews.freebsd.org/D25790
2020-07-24 17:34:44 +00:00
Conrad Meyer
68ee1dda06 Add unlocked/SMR fast path to getblk()
Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
API wrapping this functionality.  Use it for a fast path in getblkx(),
falling back to locked lookup if we raced a thread changing the buf's
identity.

Reported by:	Attilio
Reviewed by:	kib, markj
Testing:	pho (in progress)
Sponsored by:	Isilon
Differential Revision:	https://reviews.freebsd.org/D25782
2020-07-24 17:34:04 +00:00
Conrad Meyer
3c30b23519 Use SMR to provide safe unlocked lookup for pctries from SMR zones
Adapt r358130, for the almost identical vm_radix, to the pctrie subsystem.
Like that change, the tree is kept correct for readers with store barriers
and careful ordering.  Existing locks serialize writers.

Add a PCTRIE_DEFINE_SMR() wrapper that takes an additional smr_t parameter
and instantiates a FOO_PCTRIE_LOOKUP_UNLOCKED() function, in addition to the
usual definitions created by PCTRIE_DEFINE().

Interface consumers will be introduced in later commits.

As future work, it might be nice to add vm_radix algorithms missing from
generic pctrie to the pctrie interface, and then adapt vm_radix to use
pctrie.

Reported by:	Attilio
Reviewed by:	markj
Sponsored by:	Isilon
Differential Revision:	https://reviews.freebsd.org/D25781
2020-07-24 17:32:10 +00:00
Mateusz Guzik
138698898f lockmgr: add missing 'continue' to account for spuriously failed fcmpset
PR:		248245
Reported by:	gbe
Noted by:	markj
Fixes by:	r363415 ("lockmgr: add adaptive spinning")
2020-07-24 17:28:24 +00:00
John Baldwin
3c0e568505 Add support for KTLS RX via software decryption.
Allow TLS records to be decrypted in the kernel after being received
by a NIC.  At a high level this is somewhat similar to software KTLS
for the transmit path except in reverse.  Protocols enqueue mbufs
containing encrypted TLS records (or portions of records) into the
tail of a socket buffer and the KTLS layer decrypts those records
before returning them to userland applications.  However, there is an
important difference:

- In the transmit case, the socket buffer is always a single "record"
  holding a chain of mbufs.  Not-yet-encrypted mbufs are marked not
  ready (M_NOTREADY) and released to protocols for transmit by marking
  mbufs ready once their data is encrypted.

- In the receive case, incoming (encrypted) data appended to the
  socket buffer is still a single stream of data from the protocol,
  but decrypted TLS records are stored as separate records in the
  socket buffer and read individually via recvmsg().

Initially I tried to make this work by marking incoming mbufs as
M_NOTREADY, but there didn't seemed to be a non-gross way to deal with
picking a portion of the mbuf chain and turning it into a new record
in the socket buffer after decrypting the TLS record it contained
(along with prepending a control message).  Also, such mbufs would
also need to be "pinned" in some way while they are being decrypted
such that a concurrent sbcut() wouldn't free them out from under the
thread performing decryption.

As such, I settled on the following solution:

- Socket buffers now contain an additional chain of mbufs (sb_mtls,
  sb_mtlstail, and sb_tlscc) containing encrypted mbufs appended by
  the protocol layer.  These mbufs are still marked M_NOTREADY, but
  soreceive*() generally don't know about them (except that they will
  block waiting for data to be decrypted for a blocking read).

- Each time a new mbuf is appended to this TLS mbuf chain, the socket
  buffer peeks at the TLS record header at the head of the chain to
  determine the encrypted record's length.  If enough data is queued
  for the TLS record, the socket is placed on a per-CPU TLS workqueue
  (reusing the existing KTLS workqueues and worker threads).

- The worker thread loops over the TLS mbuf chain decrypting records
  until it runs out of data.  Each record is detached from the TLS
  mbuf chain while it is being decrypted to keep the mbufs "pinned".
  However, a new sb_dtlscc field tracks the character count of the
  detached record and sbcut()/sbdrop() is updated to account for the
  detached record.  After the record is decrypted, the worker thread
  first checks to see if sbcut() dropped the record.  If so, it is
  freed (can happen when a socket is closed with pending data).
  Otherwise, the header and trailer are stripped from the original
  mbufs, a control message is created holding the decrypted TLS
  header, and the decrypted TLS record is appended to the "normal"
  socket buffer chain.

(Side note: the SBCHECK() infrastucture was very useful as I was
 able to add assertions there about the TLS chain that caught several
 bugs during development.)

Tested by:	rmacklem (various versions)
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24628
2020-07-23 23:48:18 +00:00
Mateusz Guzik
c795344ff7 locks: fix a long standing bug for primitives with kdtrace but without spinning
In such a case the second argument to lock_delay_arg_init was NULL which was
immediately causing a null pointer deref.

Since the sructure is only used for spin count, provide a dedicate routine
initializing it.

Reported by:	andrew
2020-07-23 17:26:53 +00:00
Brooks Davis
5a01eca698 Use SI_ORDER_(FOURTH|FIFTH) rather than bespoke versions.
No functional change.

When these SYSINITs were added these macros didn't exist.

Reviewed by:	imp
Obtained from:	CheriBSD
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D25758
2020-07-22 23:35:41 +00:00
Mateusz Guzik
31ad4050fe lockmgr: add adaptive spinning
It is very conservative. Only spinning when LK_ADAPTIVE is passed, only on
exclusive lock and never when any waiters are present. buffer cache is remains
not spinning.

This reduces total sleep times during buildworld etc., but it does not shorten
total real time (culprits are contention in the vm subsystem along with slock +
upgrade which is not covered).

For microbenchmarks: open3_processes -t 52 (open/close of the same file for
writing) ops/s:
before: 258845
after: 801638

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D25753
2020-07-22 12:30:31 +00:00
Mitchell Horne
dc42509049 INTRNG: only shuffle for !EARLY_AP_STARTUP
During device attachment, all interrupt sources will bind to the BSP,
as it is the only processor online. This means interrupts must be
redistributed ("shuffled") later, during SI_SUB_SMP.

For the EARLY_AP_STARTUP case, this is no longer true. SI_SUB_SMP will
execute much earlier, meaning APs will be online and available before
devices begin attachment, and there will therefore be nothing to
shuffle.

All PIC-conforming interrupt controllers will handle this early
distribution properly, except for RISC-V's PLIC. Make the necessary
tweak to the PLIC driver.

While here, convert irq_assign_cpu from a boolean_t to a bool.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D25693
2020-07-21 22:47:02 +00:00
Mateusz Guzik
4aff9f5d99 lockmgr: denote recursion with a bit in lock value
This reduces excessive reads from the lock.

Tested by:	pho
2020-07-21 14:42:22 +00:00
Mateusz Guzik
f6b091fbbd lockmgr: rewrite upgrade to stop always dropping the lock
This matches rw and sx locks.
2020-07-21 14:41:25 +00:00
Mateusz Guzik
bdb6d824f4 lockmgr: add a helper for reading the lock value 2020-07-21 14:39:20 +00:00
Adrian Chadd
f7d38a13a8 [net80211] Add new privileges; restrict what can be done in a jail.
Split the MANAGE privilege into MANAGE, SETMAC and CREATE_VAP.

+ VAP_MANAGE is everything but setting the MAC and creating a VAP.
+ VAP_SETMAC is setting the MAC address of the VAP.
  Typically you wouldn't want the jail to be able to modify this.
+ CREATE_VAP is to create a new VAP. Again, you don't want to be doing
  this in a jail, but this DOES stop being able to run some corner
  cases like Dynamic WDS (DWDS) AP in a jail/vnet. We can figure this
  bit out later.

This allows me to run wpa_supplicant in a jail after transferring
a STA VAP into it. I unfortunately can't currently set the wlan
debugging inside the jail; that would be super useful!

Reviewed by:	bz
Differential Revision:	https://reviews.freebsd.org/D25630
2020-07-19 15:16:27 +00:00
Mateusz Guzik
7cd4443fb1 Short-circuit tdfind when looking for the calling thread.
Common occurence with cpuset and other places.
2020-07-18 00:14:43 +00:00
Mateusz Guzik
3ea3fbe685 vfs: fix vn_poll performance with either MAC or AUDIT
The code would unconditionally lock the vnode to audit or call the
mac hoook, even if neither want to do anything. Pre-check the state
to avoid locking in the common case of nothing to do.

Note this code should not be normally executed anyway as vnodes are
always return ready. However, poll1/2 from will-it-scale use regular
files for benchmarking, presumably to focus on the interface itself
as the vnode handler is not supposed to do almost anything.

This in particular fixes poll2 which passes 128 fds.

$ ./poll2_processes -s 10
before: 134411
after:  271572
2020-07-16 14:09:18 +00:00
Mateusz Guzik
ab06a30517 vfs: fix MAC/AUDIT mismatch in vn_poll
Auditing would not be performed without MAC compiled in.
2020-07-16 14:04:28 +00:00
Mateusz Guzik
b1607c8727 poll: factor fd lookup out of scan and rescan 2020-07-15 10:24:39 +00:00
Mateusz Guzik
d8bc2a17a5 fd: remove fd_lastfile
It keeps recalculated way more often than it is needed.

Provide a routine (fdlastfile) to get it if necessary.

Consumers may be better off with a bitmap iterator instead.
2020-07-15 10:24:04 +00:00
Mateusz Guzik
7177149a4d fd: add obvious branch predictions to fdalloc 2020-07-15 10:14:00 +00:00
Mateusz Guzik
29f3e5ea41 cache: make negative shrinker round robin on all lists every time
Previously it would check 4, 3, 2, 1 lists. In practice by the time
it is getting called all lists have some elements and consequently
this does not result in new evictions.

Nonetheless, the code is clearer.

Tested by:	pho
2020-07-14 21:19:33 +00:00
Mateusz Guzik
a110fa2ee1 cache: remove numcalls
The counter is not very useful and if necessary the value can be
found by summing up other counters.
2020-07-14 21:17:46 +00:00
Mateusz Guzik
4516c7eed9 cache: count dropped entries 2020-07-14 21:17:08 +00:00
Mateusz Guzik
654e644e80 cache: remove neg_locked argument from cache_zap_locked
Tested by:	pho
2020-07-14 21:16:48 +00:00
Mateusz Guzik
ffb0abddf1 cache: remove a useless argument from cache_negative_insert 2020-07-14 21:16:07 +00:00
Mateusz Guzik
9f8d452173 cache: create a dedicate struct for negative entries
.. and stuff if into the unused target vnode field

This gets rid of concurrent nc_flag modifications racing with the
shrinker and consequently fixes a bug where such a change could have
been missed when cache_ncp_invalidate was being issued..

Reported by:	zeising
Tested by:	pho, zeising
Fixes:	r362828 ("cache: lockless forward lookup with smr")
2020-07-14 21:14:59 +00:00
Mateusz Guzik
373278a7f6 fd: stop looping in pwd_hold
We don't expect to fail acquiring the reference unless running into a corner
case. Just in case ensure forward progress by taking the lock.

Reviewed by:	kib, markj
Differential Revision: https://reviews.freebsd.org/D25616
2020-07-11 21:57:03 +00:00
Mateusz Guzik
74f61caed5 vfs: fix early termination of kern_getfsstat
The kernel would unlock already unlocked mutex if the buffer got filled up
before the mount list ended.

Reported by:	pho
Fixes:	r363069 ("vfs: depessimize getfsstat when only the count is requested")
2020-07-10 09:24:27 +00:00
Mateusz Guzik
422f38d8ea vfs: fix trivial whitespace issues which don't interefere with blame
.. even without the -w switch
2020-07-10 09:01:36 +00:00
Mateusz Guzik
6c69e69724 vfs: depessimize getfsstat when only the count is requested
This avoids relocking mountlist_mtx for each entry.
2020-07-10 06:47:58 +00:00
Mateusz Guzik
8c1f410c19 vfs: avoid spurious memcpy in vfs_statfs
It is quite often called for the very same buffer.
2020-07-10 06:46:42 +00:00
Kyle Evans
3f07b9d9f8 shm_open2: Implement SHM_GROW_ON_WRITE
Lack of SHM_GROW_ON_WRITE is actively breaking Python's memfd_create tests,
so go ahead and implement it. A future change will make memfd_create always
set SHM_GROW_ON_WRITE, to match Linux behavior and unbreak Python's tests
on -CURRENT.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25502
2020-07-10 00:43:45 +00:00
Mark Johnston
fe59cb6ba2 Apply the logic from r363051 to semctl(2) and __sem_base field.
Reported by:	Jeffball <jeffball@grimm-co.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25600
2020-07-09 18:34:54 +00:00
Mark Johnston
f4f16af1d3 Avoid copying out kernel pointers from msgctl(IPC_STAT).
While this behaviour is harmless, it is really just an artifact of the
fact that the msgctl(2) implementation uses a user-visible structure as
part of the internal implementation, so it is not deliberate and these
pointers are not useful to userspace.  Thus, NULL them out before
copying out, and remove references to them from the manual page.

Reported by:	Jeffball <jeffball@grimm-co.com>
Reviewed by:	emaste, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25600
2020-07-09 17:26:49 +00:00
Mark Johnston
866a5d1298 Regenerate.
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:34:49 +00:00
Mark Johnston
bdfe61e05e Permit cpuset_(get|set)domain() in capability mode.
These system calls already perform validation of their parameters when
called in capability mode, identical to cpuset_(get|set)affinity().

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:34:29 +00:00
Pawel Biernacki
e94fdc3833 kern.tty_info_kstacks: set compact format as default 2020-07-06 16:34:15 +00:00
Mark Johnston
69b565d7c0 Allow accesses of the caller's CPU and domain sets in capability mode.
cpuset_(get|set)(affinity|domain)(2) permit a get or set of the calling
thread or process' CPU and domain set in capability mode, but only when
the thread or process ID is specified as -1.  Extend this to cover the
case where the ID actually matches the caller's TID or PID, since some
code, such as our pthread_attr_get_np() implementation, always provides
an explicit ID.

It was not and still is not permitted to access CPU and domain sets for
other threads in the same process when the process is in capability
mode.  This might change in the future.

Submitted by:	Greg V <greg@unrelenting.technology> (original version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25552
2020-07-06 16:34:09 +00:00
Pawel Biernacki
cd1c083d80 kern.tty_info_kstacks: add a compact format
Add a more compact display format for kern.tty_info_kstacks inspired by
procstat -kk. Set it as a default one.

# sysctl kern.tty_info_kstacks=1
kern.tty_info_kstacks: 0 -> 1
# sleep 2
^T
load: 0.17  cmd: sleep 623 [nanslp] 0.72r 0.00u 0.00s 0% 2124k
#0 0xffffffff80c4443e at mi_switch+0xbe
#1 0xffffffff80c98044 at sleepq_catch_signals+0x494
#2 0xffffffff80c982c2 at sleepq_timedwait_sig+0x12
#3 0xffffffff80c43af3 at _sleep+0x193
#4 0xffffffff80c50e31 at kern_clock_nanosleep+0x1a1
#5 0xffffffff80c5119b at sys_nanosleep+0x3b
#6 0xffffffff810ffc69 at amd64_syscall+0x119
#7 0xffffffff810d5520 at fast_syscall_common+0x101
sleep: about 1 second(s) left out of the original 2
^C
# sysctl kern.tty_info_kstacks=2
kern.tty_info_kstacks: 1 -> 2
# sleep 2
^T
load: 0.24  cmd: sleep 625 [nanslp] 0.81r 0.00u 0.00s 0% 2124k
mi_switch+0xbe sleepq_catch_signals+0x494 sleepq_timedwait_sig+0x12
sleep+0x193 kern_clock_nanosleep+0x1a1 sys_nanosleep+0x3b
amd64_syscall+0x119 fast_syscall_common+0x101
sleep: about 1 second(s) left out of the original 2
^C

Suggested by:	avg
Reviewed by:	mjg
Relnotes:	yes
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D25487
2020-07-06 16:33:28 +00:00
Mark Johnston
9eb997cb48 Lift cpuset Capsicum checks into a subroutine.
Otherwise the same checks are duplicated across four different system
call implementations, cpuset_(get|set)(affinity|domain)().  No
functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:33:21 +00:00
Mateusz Guzik
9b0c2e5909 vfs: expand on vhold_smr comment 2020-07-06 02:00:35 +00:00
Mateusz Guzik
d363fa4127 lockf: elide avoidable locking in lf_advlockasync
While here assert on ls_threads state.
2020-07-05 23:07:54 +00:00
Konstantin Belousov
4543c1c329 Fix typo.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-07-05 20:54:01 +00:00
Andrew Turner
fcf7a48191 Rerun kernel ifunc resolvers after all CPUs have started
On architectures that use RELA relocations it is safe to rerun the ifunc
resolvers on after all CPUs have started, but while they are sill parked.

On arm64 with big.LITTLE this is needed as some SoCs have shipped with
different ID register values the big and little clusters meaning we were
unable to rely on the register values from the boot CPU.

Add support for rerunning the resolvers on arm64 and amd64 as these are
both RELA using architectures.

Reviewed by:	kib
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D25455
2020-07-05 14:38:22 +00:00
Mateusz Guzik
dc3c991598 Add char and short types to kcsan 2020-07-04 06:22:05 +00:00
Mateusz Guzik
58199a7052 ifdef out pg_jobc assertions added in r361967
They trigger for some people, the bug is not obvious, there are no takers
for fixing it, the issue already had to be there for years beforehand and
is low priority.
2020-07-03 09:23:11 +00:00
Mateusz Guzik
a2de789ebb cred: add a prediction to crfree for td->td_realucred == cr
This matches crhold and eliminates an assembly maze in the common case.
2020-07-02 12:58:07 +00:00
Mateusz Guzik
d23850207b cache: add missing call to cache_ncp_invalid for negative hits
Note the dtrace probe can fire even the entry is gone, but I don't think that's
worth fixing.
2020-07-02 12:56:20 +00:00
Mateusz Guzik
d129e0eba0 cache: fix misplaced fence in cache_ncp_invalidate
The intent was to mark the entry as invalid before cache_zap starts messing
with it.

While here add some comments.
2020-07-02 12:54:50 +00:00
Konstantin Belousov
4bc5ce2c74 Use tdfind() in pget().
Reviewed by:	jhb, hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25532
2020-07-02 10:40:47 +00:00
Andrew Turner
ecc8ccb441 Simplify the flow when getting/setting an isrc
Rather than unlocking and returning we can just perform the needed action
only when the interrupt source is valid and reuse the unlock in both the
valid irq and invalid irq cases.

Sponsored by:	Innovate UK
2020-07-01 12:07:28 +00:00
Mateusz Guzik
5d1c042d32 cache: lockless forward lookup with smr
This eliminates the need to take bucket locks in the common case.

Concurrent lookup utilizng the same vnodes is still bottlenecked on referencing
and locking path components, this will be taken care of separately.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23913
2020-07-01 05:59:08 +00:00
Mateusz Guzik
f8022be3e6 vfs: protect vnodes with smr
vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr
section, allowing consumers to get away without locking.

See vhold_smr and vdropl for comments explaining caveats.

Reviewed by:	kib
Testec by:	pho
Differential Revision:	https://reviews.freebsd.org/D23913
2020-07-01 05:56:29 +00:00
Andrew Gallatin
46cac10b3b Fix a panic when unloading firmware
LIST_FOREACH_SAFE() is not safe in the presence
of other threads removing list entries when a
mutex is released.

This is not in the critical path, so just restart
the scan each time we drop the lock, rather than
using a marker.

Reviewed by:	jhb, markj
Sponsored by:	Netflix
2020-06-29 21:35:50 +00:00
John Baldwin
4a711b8d04 Use zfree() instead of explicit_bzero() and free().
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().

Suggested by:	cem
Reviewed by:	cem, delphij
Approved by:	csprng (cem)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25435
2020-06-25 20:17:34 +00:00
Mark Johnston
84242cf68a Call swap_pager_freespace() from vm_object_page_remove().
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object.  The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.

Reviewed by:	alc, kib
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25329
2020-06-25 15:21:21 +00:00
Enji Cooper
d6701b6c8c Add kern.features.witness
Adding `kern.features.witness` helps expose whether or not the kernel has
`options WITNESS` enabled, so the `feature_present(3)` API can be used
to query whether or not witness(9) is built into the kernel.

This support is helpful with userspace applications (generally speaking,
tests), as it can be queried to determine whether or not tests related
to WITNESS should be run.

MFC after:	1 week
Reviewed by: cem, darrick.freebsd_gmail.com
Differential Revision: https://reviews.freebsd.org/D25302
Sponsored by:	DellEMC Isilon
2020-06-24 18:51:01 +00:00
Thomas Munro
f270658873 vfs: track sequential reads and writes separately
For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.

Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.

Reviewed by:	mjg, kib, tmunro
Approved by:	mjg (mentor)
Obtained from:	Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision:	https://reviews.freebsd.org/D25024
2020-06-21 08:51:24 +00:00
Jeff Roberson
03270b59ee Use zone nomenclature that is consistent with UMA. 2020-06-21 04:59:02 +00:00
Brandon Bergren
40b664f64b [PowerPC] More relocation fixes
It turns out relocating the symbol table itself can cause issues, like fbt
crashing because it applies the offsets to the kernel twice.

This had been previously brought up in rS333447 when the stoffs hack was
added, but I had been unaware of this and reimplemented symtab relocation.

Instead of relocating the symbol table, keep track of the relocation base
in ddb, so the ddb symbols behave like the kernel linker-provided symbols.

This is intended to be NFC on platforms other than PowerPC, which do not
use fully relocatable kernels. (The relbase will always be 0)

 * Remove the rest of the stoffs hack.
 * Remove my half-baked displace_symbol_table() function.
 * Extend ddb initialization to cope with having a relocation offset on the
   kernel symbol table.
 * Fix my kernel-as-initrd hack to work with booke64 by using a temporary
   mapping to access the data.
 * Fix another instance of __powerpc__ that is actually RELOCATABLE_KERNEL.
 * Change the behavior or X_db_symbol_values to apply the relocation base
   when updating valp, to match link_elf_symbol_values() behavior.

Reviewed by:	jhibbits
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D25223
2020-06-21 03:39:26 +00:00
Pawel Biernacki
049264c5cc hw.bus.info: rework handler
hw.bus.info was added in r68522 as a node, but there was never anything
connected "behind" it.  Its only purpose is to return a struct u_businfo.
The only in-base consumer are devinfo(3)/devinfo(8).
Rewrite the handler as SYSCTL_PROC and mark it as MPSAFE and read-only
as there never was a writable path.

Reviewed by:	kib
Approved by:	kib (mentor)
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D25321
2020-06-18 21:42:54 +00:00
Mark Johnston
95033af923 Add the SCTP_SUPPORT kernel option.
This is in preparation for enabling a loadable SCTP stack.  Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-18 19:32:34 +00:00
Ryan Moeller
33b39b6615 Apply default security flavor in vfs_export
There may be some version of mountd out there that does not supply a default
security flavor when none is given for an export.

Set the default security flavor in vfs_export if none is given, and remove the
workaround for oexport compat.

Reported by:	npn
Reviewed by:	rmacklem
Approved by:	mav (mentor)
MFC after:	3 days
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25300
2020-06-16 21:30:30 +00:00
Simon J. Gerraty
73845fdbd3 Make KENV_MVALLEN tunable
When doing secure boot, loader wants to export loader.ve.hashed
the value of which typically exceeds KENV_MVALLEN.

Replace use of KENV_MVALLEN with tunable kenv_mvallen.

Add getenv_string_buffer() for the case where a stack buffer cannot be
created and use uma_zone_t kenv_zone for suitably sized buffers.

Reviewed by:	stevek, kevans
Obtained from:	Abhishek Kulkarni <abkulkarni@juniper.net>
MFC after:	1 week
Sponsored by:	Juniper Networks
Differential Revision: https://reviews.freebsd.org//D25259
2020-06-16 17:02:56 +00:00
Rick Macklem
1f7104d720 Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.
Since mnt_flags was upgraded to 64bits there has been a quirk in
"struct export_args", since it hold a copy of mnt_flags
in ex_flags, which is an "int" (32bits).
This happens to currently work, since all the flag bits used in ex_flags are
defined in the low order 32bits. However, new export flags cannot be defined.
Also, ex_anon is a "struct xucred", which limits it to 16 additional groups.
This patch revises "struct export_args" to make ex_flags 64bits and replaces
ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a
groups list, so it can be malloc'd up to NGROUPS in size.
This requires that the VFS_CHECKEXP() arguments change, so I also modified the
last "secflavors" argument to be an array pointer, so that the
secflavors could be copied in VFS_CHECKEXP() while the export entry is locked.
(Without this patch VFS_CHECKEXP() returns a pointer to the secflavors
array and then it is used after being unlocked, which is potentially
a problem if the exports entry is changed.
In practice this does not occur when mountd is run with "-S",
but I think it is worth fixing.)

This patch also deleted the vfs_oexport_conv() function, since
do_mount_update() does the conversion, as required by the old vfs_cmount()
calls.

Reviewed by:	kib, freqlabs
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D25088
2020-06-14 00:10:18 +00:00