The conversion was largely mechanical: sed(1) with:
-e 's|mtx_assert(&devmtx, MA_OWNED)|dev_lock_assert_locked()|g'
-e 's|mtx_assert(&devmtx, MA_NOTOWNED)|dev_lock_assert_unlocked()|g'
The definitions of these abstractions in fs/devfs/devfs_int.h are the
only non-mechanical change.
No functional change.
This removes a lot of special casing from the VFS layer.
Reviewed by: kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D25612
namei de facto expects that the naimeidata object is properly initialized,
but at the same time it mixes consumer-passable and internal flags, while
tolerating this part by explicitly clearing some of them.
Tighten the interface instead.
While here renumber the flags and denote the gap between the 2 variants.
Try to piggy back th renumber on the just bumped __FreeBSD_version.
This means there is no expectation lookup will purge the terminal entry,
which simplifies lockless lookup.
Tested by: pho
Sponsored by: The FreeBSD Foundation
The current scheme of calling VOP_GETATTR adds avoidable overhead.
An example with tmpfs doing fstat (ops/s):
before: 7488958
after: 7913833
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D25910
Make sure to reclaim epoch structures when they are freed to support
dynamic allocation and freeing of epoch structures.
While at it, move the 64 supported epoch control structures to the
static memory domain. This overall simplifies the management and
debugging of system epoch's.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D25960
MFC after: 1 week
Sponsored by: Mellanox Technologies
- Convert panic() calls to INVARIANTS-only assertions. The PCTRIE code
provides some of the same protection since it will panic upon an
attempt to remove a non-resident buffer.
- Update the comment above reassignbuf() to reflect reality.
Reviewed by: cem, kib, mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25965
As the 20-year old comment above it suggests, the counter is of dubious
value. Moreover, the (global) counter was not updated precisely and
hurts scalability.
Reviewed by: cem, kib, mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25965
The routine is called on successful write and read, which on pipes happens a
lot and for small sizes.
Precision provided by default seems way bigger than necessary and it causes
problems in vms on amd64 (it rdtscp's which vmexits). getnanotime seems to
provide the level roughly in lines of Linux so we should be good here.
Sample result from will-it-scale pipe1_processes -t 1 (ops/s):
before: 426464
after: 3247421
Note the that atime handling for named pipes is broken with and without the
patch. The filesystem code is never used for updating atime and never looks
at the updated field. Consequently, while there are no provisions added to
handle named pipes separately, the change is a nop for that case.
Differential Revision: https://reviews.freebsd.org/D23964
This reduces struct namecache by sizeof(void *).
Negative side is that we have to find the previous element (if any) when
removing an entry, but since we normally don't expect collisions it should be
fine.
Note this adds cache_get_hash calls which can be eliminated.
It used to be sizeof of the given struct to accomodate for 32 bit mips
doing 64 bit loads, but the same can be achieved with requireing just
64 bit alignment.
While here reorder struct namecache so that most commonly used fields
are closer.
This removes flag setting/unsetting carried over from regular lookup.
Flags still get for compatibility when falling back.
Note .. and . handling can get partially folded together.
This allows making half-constructed entries visible to the lockless lookup,
which now can check for either "not yet fully constructed" and "no longer valid"
state.
This will be used for .. lookup.
cache_purge locklessly checks whether the vnode at hand has any namecache
entries. This can race with a concurrent purge which managed to remove
the last entry, but may not be done touching the vnode.
Make sure we observe the relevant vnode lock as not taken before proceeding
with vgone.
Paired with the fact that doomed vnodes cannnot receive entries this restores
the invariant that there are no namecache-related writing users past cache_purge
in vgone.
Reported by: pho
These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches. They no longer serve any
purpose since UMA now provides their functionality by default. Remove
them to simplyify the kernel memory allocator interfaces a bit.
Reviewed by: cem, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25937
The constant seems to exists on MacOS X >= 10.8.
Requested by: swills
Reviewed by: allanjude, kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25933
This significantly speeds up path lookup, Cascade Lake doing access(2) on ufs
on /usr/obj/usr/src/amd64.amd64/sys/GENERIC/vnode_if.c, ops/s:
before: 2535298
after: 2797621
Over +10%.
The reversed order of computation here does not seem to matter for hash
distribution.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25921
A free buf's lock may be held (temporarily) due to unlocked lookup, so
buf_alloc() must acquire it without LK_NOWAIT. The unlocked getblk path
should unlock it promptly once it realizes the identity does not match
the buffer it was searching for.
Reported by: gallatin
Reviewed by: kib
Tested by: pho
X-MFC-With: r363482
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D25914
No functional change.
LK_SLEEPFAIL implies a behavior that is only possible if the lock operation can
sleep. LK_NOWAIT prevents the lock operation from sleeping.
Discussed with: kib
If the buffer identity changed during lookup, sleeping could introduce a
lock order reversal. Since we do not know if the identity changed until we
get the lock, we must try-lock (LK_NOWAIT) only. EINTR and ERESTART error
handling becomes irrelevant, as we no longer sleep.
Reported by: kib
Reviewed by: kib
X-MFC-With: r363482
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D25898
Almost all consumers use the NDF_ONLY_PNBUF macro, making them avoidably branch
a lot in the NDFREE routine. Also note most of them should not need to call
any cleanup anyway as they don't request HASBUF.
This makes the realpath syscall operational with the new lookup. Note that the
walk to obtain the full path name still takes locks.
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D23917
When processing the last record in a socket buffer, take care to avoid a
NULL pointer dereference when advancing the record iterator.
Reported by: syzbot+6a689cc9c27bd265237a@syzkaller.appspotmail.com
Fixes: r359778
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
If the remote end closes a TLS socket and the socket buffer still
contains not-yet-decrypted TLS records but no decrypted TLS records,
soreceive needs to block or fail with EWOULDBLOCK. Previously it was
trying to return data and dereferencing a NULL pointer.
Reviewed by: np
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D25838
This has for a while been replaced by makesyscalls.lua in the stock FreeBSD
build. Ensure downstreams get some notice that it'a going away if they're
reliant on it, maybe.
swap size by dividing a value, which was always a multiple of 64, by
64. Remove the code that reduced max swap size down to that cap.
Eliminate the distinction between BLIST_BMAP_RADIX and
BLIST_META_RADIX. Call them both BLIST_RADIX.
Make improvments to the blist self-test code to silence compiler
warnings and to test larger blists.
Reported by: jmallett
Reviewed by: alc
Discussed with: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D25736
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.
To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs. On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY. To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25754
Provides full scalability as long as all visited filesystems support the
lookup and terminal vnodes are different.
Inner workings are explained in the comment above cache_fplookup.
Capabilities and fd-relative lookups are not supported and will result in
immediate fallback to regular code.
Symlinks, ".." in the path, mount points without support for lockless lookup
and mismatched counters will result in an attempt to get a reference to the
directory vnode and continue in regular lookup. If this fails, the entire
operation is aborted and regular lookup starts from scratch. However, care is
taken that data is not copied again from userspace.
Sample benchmark:
incremental -j 104 bzImage on tmpfs:
before: 142.96s user 1025.63s system 4924% cpu 23.731 total
after: 147.36s user 313.40s system 3216% cpu 14.326 total
Sample microbenchmark: access calls to separate files in /tmpfs, 104 workers, ops/s:
before: 2165816
after: 151216530
Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25578
Modified on each permission change and link/unlink.
Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25573
Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
API wrapping this functionality. Use it for a fast path in getblkx(),
falling back to locked lookup if we raced a thread changing the buf's
identity.
Reported by: Attilio
Reviewed by: kib, markj
Testing: pho (in progress)
Sponsored by: Isilon
Differential Revision: https://reviews.freebsd.org/D25782
Adapt r358130, for the almost identical vm_radix, to the pctrie subsystem.
Like that change, the tree is kept correct for readers with store barriers
and careful ordering. Existing locks serialize writers.
Add a PCTRIE_DEFINE_SMR() wrapper that takes an additional smr_t parameter
and instantiates a FOO_PCTRIE_LOOKUP_UNLOCKED() function, in addition to the
usual definitions created by PCTRIE_DEFINE().
Interface consumers will be introduced in later commits.
As future work, it might be nice to add vm_radix algorithms missing from
generic pctrie to the pctrie interface, and then adapt vm_radix to use
pctrie.
Reported by: Attilio
Reviewed by: markj
Sponsored by: Isilon
Differential Revision: https://reviews.freebsd.org/D25781
Allow TLS records to be decrypted in the kernel after being received
by a NIC. At a high level this is somewhat similar to software KTLS
for the transmit path except in reverse. Protocols enqueue mbufs
containing encrypted TLS records (or portions of records) into the
tail of a socket buffer and the KTLS layer decrypts those records
before returning them to userland applications. However, there is an
important difference:
- In the transmit case, the socket buffer is always a single "record"
holding a chain of mbufs. Not-yet-encrypted mbufs are marked not
ready (M_NOTREADY) and released to protocols for transmit by marking
mbufs ready once their data is encrypted.
- In the receive case, incoming (encrypted) data appended to the
socket buffer is still a single stream of data from the protocol,
but decrypted TLS records are stored as separate records in the
socket buffer and read individually via recvmsg().
Initially I tried to make this work by marking incoming mbufs as
M_NOTREADY, but there didn't seemed to be a non-gross way to deal with
picking a portion of the mbuf chain and turning it into a new record
in the socket buffer after decrypting the TLS record it contained
(along with prepending a control message). Also, such mbufs would
also need to be "pinned" in some way while they are being decrypted
such that a concurrent sbcut() wouldn't free them out from under the
thread performing decryption.
As such, I settled on the following solution:
- Socket buffers now contain an additional chain of mbufs (sb_mtls,
sb_mtlstail, and sb_tlscc) containing encrypted mbufs appended by
the protocol layer. These mbufs are still marked M_NOTREADY, but
soreceive*() generally don't know about them (except that they will
block waiting for data to be decrypted for a blocking read).
- Each time a new mbuf is appended to this TLS mbuf chain, the socket
buffer peeks at the TLS record header at the head of the chain to
determine the encrypted record's length. If enough data is queued
for the TLS record, the socket is placed on a per-CPU TLS workqueue
(reusing the existing KTLS workqueues and worker threads).
- The worker thread loops over the TLS mbuf chain decrypting records
until it runs out of data. Each record is detached from the TLS
mbuf chain while it is being decrypted to keep the mbufs "pinned".
However, a new sb_dtlscc field tracks the character count of the
detached record and sbcut()/sbdrop() is updated to account for the
detached record. After the record is decrypted, the worker thread
first checks to see if sbcut() dropped the record. If so, it is
freed (can happen when a socket is closed with pending data).
Otherwise, the header and trailer are stripped from the original
mbufs, a control message is created holding the decrypted TLS
header, and the decrypted TLS record is appended to the "normal"
socket buffer chain.
(Side note: the SBCHECK() infrastucture was very useful as I was
able to add assertions there about the TLS chain that caught several
bugs during development.)
Tested by: rmacklem (various versions)
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24628
In such a case the second argument to lock_delay_arg_init was NULL which was
immediately causing a null pointer deref.
Since the sructure is only used for spin count, provide a dedicate routine
initializing it.
Reported by: andrew
It is very conservative. Only spinning when LK_ADAPTIVE is passed, only on
exclusive lock and never when any waiters are present. buffer cache is remains
not spinning.
This reduces total sleep times during buildworld etc., but it does not shorten
total real time (culprits are contention in the vm subsystem along with slock +
upgrade which is not covered).
For microbenchmarks: open3_processes -t 52 (open/close of the same file for
writing) ops/s:
before: 258845
after: 801638
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D25753
During device attachment, all interrupt sources will bind to the BSP,
as it is the only processor online. This means interrupts must be
redistributed ("shuffled") later, during SI_SUB_SMP.
For the EARLY_AP_STARTUP case, this is no longer true. SI_SUB_SMP will
execute much earlier, meaning APs will be online and available before
devices begin attachment, and there will therefore be nothing to
shuffle.
All PIC-conforming interrupt controllers will handle this early
distribution properly, except for RISC-V's PLIC. Make the necessary
tweak to the PLIC driver.
While here, convert irq_assign_cpu from a boolean_t to a bool.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D25693
Split the MANAGE privilege into MANAGE, SETMAC and CREATE_VAP.
+ VAP_MANAGE is everything but setting the MAC and creating a VAP.
+ VAP_SETMAC is setting the MAC address of the VAP.
Typically you wouldn't want the jail to be able to modify this.
+ CREATE_VAP is to create a new VAP. Again, you don't want to be doing
this in a jail, but this DOES stop being able to run some corner
cases like Dynamic WDS (DWDS) AP in a jail/vnet. We can figure this
bit out later.
This allows me to run wpa_supplicant in a jail after transferring
a STA VAP into it. I unfortunately can't currently set the wlan
debugging inside the jail; that would be super useful!
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D25630
The code would unconditionally lock the vnode to audit or call the
mac hoook, even if neither want to do anything. Pre-check the state
to avoid locking in the common case of nothing to do.
Note this code should not be normally executed anyway as vnodes are
always return ready. However, poll1/2 from will-it-scale use regular
files for benchmarking, presumably to focus on the interface itself
as the vnode handler is not supposed to do almost anything.
This in particular fixes poll2 which passes 128 fds.
$ ./poll2_processes -s 10
before: 134411
after: 271572
It keeps recalculated way more often than it is needed.
Provide a routine (fdlastfile) to get it if necessary.
Consumers may be better off with a bitmap iterator instead.
Previously it would check 4, 3, 2, 1 lists. In practice by the time
it is getting called all lists have some elements and consequently
this does not result in new evictions.
Nonetheless, the code is clearer.
Tested by: pho
.. and stuff if into the unused target vnode field
This gets rid of concurrent nc_flag modifications racing with the
shrinker and consequently fixes a bug where such a change could have
been missed when cache_ncp_invalidate was being issued..
Reported by: zeising
Tested by: pho, zeising
Fixes: r362828 ("cache: lockless forward lookup with smr")
We don't expect to fail acquiring the reference unless running into a corner
case. Just in case ensure forward progress by taking the lock.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D25616
The kernel would unlock already unlocked mutex if the buffer got filled up
before the mount list ended.
Reported by: pho
Fixes: r363069 ("vfs: depessimize getfsstat when only the count is requested")
Lack of SHM_GROW_ON_WRITE is actively breaking Python's memfd_create tests,
so go ahead and implement it. A future change will make memfd_create always
set SHM_GROW_ON_WRITE, to match Linux behavior and unbreak Python's tests
on -CURRENT.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25502
While this behaviour is harmless, it is really just an artifact of the
fact that the msgctl(2) implementation uses a user-visible structure as
part of the internal implementation, so it is not deliberate and these
pointers are not useful to userspace. Thus, NULL them out before
copying out, and remove references to them from the manual page.
Reported by: Jeffball <jeffball@grimm-co.com>
Reviewed by: emaste, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25600
These system calls already perform validation of their parameters when
called in capability mode, identical to cpuset_(get|set)affinity().
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
cpuset_(get|set)(affinity|domain)(2) permit a get or set of the calling
thread or process' CPU and domain set in capability mode, but only when
the thread or process ID is specified as -1. Extend this to cover the
case where the ID actually matches the caller's TID or PID, since some
code, such as our pthread_attr_get_np() implementation, always provides
an explicit ID.
It was not and still is not permitted to access CPU and domain sets for
other threads in the same process when the process is in capability
mode. This might change in the future.
Submitted by: Greg V <greg@unrelenting.technology> (original version)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25552
Add a more compact display format for kern.tty_info_kstacks inspired by
procstat -kk. Set it as a default one.
# sysctl kern.tty_info_kstacks=1
kern.tty_info_kstacks: 0 -> 1
# sleep 2
^T
load: 0.17 cmd: sleep 623 [nanslp] 0.72r 0.00u 0.00s 0% 2124k
#0 0xffffffff80c4443e at mi_switch+0xbe
#1 0xffffffff80c98044 at sleepq_catch_signals+0x494
#2 0xffffffff80c982c2 at sleepq_timedwait_sig+0x12
#3 0xffffffff80c43af3 at _sleep+0x193
#4 0xffffffff80c50e31 at kern_clock_nanosleep+0x1a1
#5 0xffffffff80c5119b at sys_nanosleep+0x3b
#6 0xffffffff810ffc69 at amd64_syscall+0x119
#7 0xffffffff810d5520 at fast_syscall_common+0x101
sleep: about 1 second(s) left out of the original 2
^C
# sysctl kern.tty_info_kstacks=2
kern.tty_info_kstacks: 1 -> 2
# sleep 2
^T
load: 0.24 cmd: sleep 625 [nanslp] 0.81r 0.00u 0.00s 0% 2124k
mi_switch+0xbe sleepq_catch_signals+0x494 sleepq_timedwait_sig+0x12
sleep+0x193 kern_clock_nanosleep+0x1a1 sys_nanosleep+0x3b
amd64_syscall+0x119 fast_syscall_common+0x101
sleep: about 1 second(s) left out of the original 2
^C
Suggested by: avg
Reviewed by: mjg
Relnotes: yes
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D25487
Otherwise the same checks are duplicated across four different system
call implementations, cpuset_(get|set)(affinity|domain)(). No
functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
On architectures that use RELA relocations it is safe to rerun the ifunc
resolvers on after all CPUs have started, but while they are sill parked.
On arm64 with big.LITTLE this is needed as some SoCs have shipped with
different ID register values the big and little clusters meaning we were
unable to rely on the register values from the boot CPU.
Add support for rerunning the resolvers on arm64 and amd64 as these are
both RELA using architectures.
Reviewed by: kib
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D25455
They trigger for some people, the bug is not obvious, there are no takers
for fixing it, the issue already had to be there for years beforehand and
is low priority.
Rather than unlocking and returning we can just perform the needed action
only when the interrupt source is valid and reuse the unlock in both the
valid irq and invalid irq cases.
Sponsored by: Innovate UK
This eliminates the need to take bucket locks in the common case.
Concurrent lookup utilizng the same vnodes is still bottlenecked on referencing
and locking path components, this will be taken care of separately.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D23913
vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr
section, allowing consumers to get away without locking.
See vhold_smr and vdropl for comments explaining caveats.
Reviewed by: kib
Testec by: pho
Differential Revision: https://reviews.freebsd.org/D23913
LIST_FOREACH_SAFE() is not safe in the presence
of other threads removing list entries when a
mutex is released.
This is not in the critical path, so just restart
the scan each time we drop the lock, rather than
using a marker.
Reviewed by: jhb, markj
Sponsored by: Netflix
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().
Suggested by: cem
Reviewed by: cem, delphij
Approved by: csprng (cem)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25435
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object. The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25329
Adding `kern.features.witness` helps expose whether or not the kernel has
`options WITNESS` enabled, so the `feature_present(3)` API can be used
to query whether or not witness(9) is built into the kernel.
This support is helpful with userspace applications (generally speaking,
tests), as it can be queried to determine whether or not tests related
to WITNESS should be run.
MFC after: 1 week
Reviewed by: cem, darrick.freebsd_gmail.com
Differential Revision: https://reviews.freebsd.org/D25302
Sponsored by: DellEMC Isilon
For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.
Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.
Reviewed by: mjg, kib, tmunro
Approved by: mjg (mentor)
Obtained from: Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision: https://reviews.freebsd.org/D25024
It turns out relocating the symbol table itself can cause issues, like fbt
crashing because it applies the offsets to the kernel twice.
This had been previously brought up in rS333447 when the stoffs hack was
added, but I had been unaware of this and reimplemented symtab relocation.
Instead of relocating the symbol table, keep track of the relocation base
in ddb, so the ddb symbols behave like the kernel linker-provided symbols.
This is intended to be NFC on platforms other than PowerPC, which do not
use fully relocatable kernels. (The relbase will always be 0)
* Remove the rest of the stoffs hack.
* Remove my half-baked displace_symbol_table() function.
* Extend ddb initialization to cope with having a relocation offset on the
kernel symbol table.
* Fix my kernel-as-initrd hack to work with booke64 by using a temporary
mapping to access the data.
* Fix another instance of __powerpc__ that is actually RELOCATABLE_KERNEL.
* Change the behavior or X_db_symbol_values to apply the relocation base
when updating valp, to match link_elf_symbol_values() behavior.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25223
hw.bus.info was added in r68522 as a node, but there was never anything
connected "behind" it. Its only purpose is to return a struct u_businfo.
The only in-base consumer are devinfo(3)/devinfo(8).
Rewrite the handler as SYSCTL_PROC and mark it as MPSAFE and read-only
as there never was a writable path.
Reviewed by: kib
Approved by: kib (mentor)
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D25321
This is in preparation for enabling a loadable SCTP stack. Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.
Discussed with: tuexen
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
There may be some version of mountd out there that does not supply a default
security flavor when none is given for an export.
Set the default security flavor in vfs_export if none is given, and remove the
workaround for oexport compat.
Reported by: npn
Reviewed by: rmacklem
Approved by: mav (mentor)
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25300
When doing secure boot, loader wants to export loader.ve.hashed
the value of which typically exceeds KENV_MVALLEN.
Replace use of KENV_MVALLEN with tunable kenv_mvallen.
Add getenv_string_buffer() for the case where a stack buffer cannot be
created and use uma_zone_t kenv_zone for suitably sized buffers.
Reviewed by: stevek, kevans
Obtained from: Abhishek Kulkarni <abkulkarni@juniper.net>
MFC after: 1 week
Sponsored by: Juniper Networks
Differential Revision: https://reviews.freebsd.org//D25259
Since mnt_flags was upgraded to 64bits there has been a quirk in
"struct export_args", since it hold a copy of mnt_flags
in ex_flags, which is an "int" (32bits).
This happens to currently work, since all the flag bits used in ex_flags are
defined in the low order 32bits. However, new export flags cannot be defined.
Also, ex_anon is a "struct xucred", which limits it to 16 additional groups.
This patch revises "struct export_args" to make ex_flags 64bits and replaces
ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a
groups list, so it can be malloc'd up to NGROUPS in size.
This requires that the VFS_CHECKEXP() arguments change, so I also modified the
last "secflavors" argument to be an array pointer, so that the
secflavors could be copied in VFS_CHECKEXP() while the export entry is locked.
(Without this patch VFS_CHECKEXP() returns a pointer to the secflavors
array and then it is used after being unlocked, which is potentially
a problem if the exports entry is changed.
In practice this does not occur when mountd is run with "-S",
but I think it is worth fixing.)
This patch also deleted the vfs_oexport_conv() function, since
do_mount_update() does the conversion, as required by the old vfs_cmount()
calls.
Reviewed by: kib, freqlabs
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D25088