A future patch that will add a Linux compatible copy_file_range(2) syscall
needs to be able to lock the byte ranges of two files concurrently.
To do this without a risk of deadlock, a non-blocking variant of
vn_rangelock_rlock() called vn_rangelock_tryrlock() was needed.
This patch adds this, along with vn_rangelock_trywlock(), in order to
do this.
The patch also adds a couple of comments, that I hope clarify how the
algorithm used in kern_rangelock.c works.
Reviewed by: kib, asomers (previous version)
Differential Revision: https://reviews.freebsd.org/D20645
r160875 added sbdestroy() as a wrapper around sbrelease_internal to be
called from sofree(), yet the comment added in the same revision to
sofree() still mentions sbrelease_internal().
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20488
We were leaking the fuse ticket if the original operation completed before
the daemon received the INTERRUPT operation. Fixing this was easier than I
expected.
Sponsored by: The FreeBSD Foundation
Previously fusefs would never recycle vnodes. After VOP_INACTIVE, they'd
linger around until unmount or the vnlru reclaimed them. This commit
essentially actives and inlines the old reclaim_revoked sysctl, and fixes
some issues dealing with the attribute cache and multiply linked files.
Sponsored by: The FreeBSD Foundation
Previously, the aiotx task relied on the aio jobs in the queue to hold
a reference on the socket. However, when the last job is completed,
there is nothing left to hold a reference to the socket buffer lock
used to check if the queue is empty. In addition, if the last job on
the queue is cancelled, the task can run with no queued jobs holding a
reference to the socket buffer lock the task uses to notice the queue
is empty.
Fix these races by holding an explicit reference on the socket when
the task is queued and dropping that reference when the task
completes.
Reviewed by: np
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20539
The format to use depends on hardware configuration (synthesis-time),
so make it compile-time kernel option.
Extended format allows DMA engine to operate with 64-bit memory addresses.
Sponsored by: DARPA, AFRL
counter(9) is more performant than using atomic instructions to update
sysctls that just report statistics to userland.
Sponsored by: The FreeBSD Foundation
"pin_list" allows to specify child pins as a list of pin numbers.
Existing hint "pins" serves the same purpose but with a 32-bit wide bit
mask. One problem with that is that a controller can have more than 32
pins. One example is amdgpio. Also, a list of numbers is a little bit
more human friendly than a matching bit mask. As a side note, it seems
that in FDT pins are typically specified by their numbers as well.
This commit also adds accessors for instance variables (IVARs) that
define the child pins. My primary goal is to allow a child to be
configured programmatically rather than via hints (assuming that FDT is
not supported on a platform). Also, while a child should not care about
specific pin numbers that are allocated to it, it could be interested in
how many were actually assigned to it.
While there, I removed "flags" instance variable. It was unused.
Reviewed by: mizhka
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20459
Fix memory leaks relating to FUSE_BMAP and FUSE_CREATE. There are still
leaks relating to FUSE_INTERRUPT, but they'll be harder to fix since the
server is legally allowed to never respond to a FUSE_INTERRUPT operation.
Sponsored by: The FreeBSD Foundation
copy the VFP registers.
arvm7 VFP uses 32 64bits fp registers (but those could be used in pairs to
make 16 128bits registers), while aarch64 uses 32 128bits fp registers, so
we have to copy the value of each register.
that replaced a pmap_invalidate_page() with a dsb(ishst) in
pmap_enter_quick_locked(). Even though this change is in principle
correct, I am seeing occasional, spurious bus errors that are only
reproducible without this pmap_invalidate_page(). (None of adding an
isb, "upgrading" the dsb to wait on loads as well as stores, or
disabling superpage mappings eliminates the bus errors.) Add an XXX
comment explaining why the pmap_invalidate_page() is being performed.
Discussed with: andrew, markj
This adds emulation for:
test r/m16, imm16
test r/m32, imm32
test r/m64, imm32 sign-extended to 64
OpenBSD guests compiled with clang 8.0.0 use TEST directly against a
Local APIC register instead of separate read via MOV followed by a
TEST against the register.
PR: 238794
Submitted by: jhb
Reported by: Jason Tubnor jason@tubnor.net
Tested by: Jason Tubnor jason@tubnor.net
Reviewed by: markj, Patrick Mooney patrick.mooney@joyent.com
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20755
Use it to indicate whether the page may be safely freed following
its removal from the object. Also change vm_page_remove() to assume
that the page's object pointer is non-NULL, and have callers perform
this check instead.
This is a step towards an implementation of an atomic reference counter
for each physical page structure.
Reviewed by: alc, dougm, kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20758
As of protocol 7.23, fuse file systems can specify their cache behavior on a
per-mountpoint basis. If they set FUSE_WRITEBACK_CACHE in
fuse_init_out.flags, then they'll get the writeback cache. If not, then
they'll get the writethrough cache. If they set FOPEN_DIRECT_IO in every
FUSE_OPEN response, then they'll get no cache at all.
The old vfs.fusefs.data_cache_mode sysctl is ignored for servers that use
protocol 7.23 or later. However, it's retained for older servers,
especially for those running in jails that lack access to the new protocol.
This commit also fixes two other minor test bugs:
* WriteCluster:SetUp was using an uninitialized variable.
* Read.direct_io_pread wasn't verifying that the cache was actually
bypassed.
Sponsored by: The FreeBSD Foundation
"fdt" is removed from the driver module name as the driver does not
require FDT and can work very well on hints based systems.
A module dependency is added for gpiobus. Without that owc cannot
resolve symbols in gpiobus if both are loaded as kernel modules.
Finally, a driver module module version is added.
Reviewed by: imp
MFC after: 11 days
When pmap_pkru_on_remove() is called, the sva argument value was
advanced. Clear PKRU earlier when sva still specifies the start of
the region.
Noted and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
In set_regs32()/fill_regs32(), we have to get/set SP and LR from/to
tf_x[13] and tf_x[14].
set_regs() and fill_regs() may be called for a 32bits process, if the process
is ptrace'd from a 64bits debugger. So, in set_regs() and fill_regs(), get
or set PC and SPSR from where the debugger expects it, from tf_x[15] and
tf_x[16].
Assert that the per-mountpoint softdep mutex is held in modified
functions that do not already have this assertion. No functional
change intended.
Reviewed by: kib, mckusick (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20741
- Remove desc_used, which is only ever written to.
- Remove a dead store to reclaimed.
- Don't recycle avail.
- Sort variables according to style(9).
These changes will make a subsequent commit easier to read.
o In iflib_tx_credits_update(), don't bother checking whether the
ift_txd_credits_update method pointer is NULL; _iflib_pre_assert()
asserts upfront that this method has been assigned and functions
like iflib_{fast_intr_rxtx,netmap_timer_adjust,txq_can_drain}()
and _task_fn_tx() were already unconditionally relying on the
method being callable.
The fusefs kernel module allegedly supported no_attrcache, no_readahed,
no_datacache, no_namecache, and no_mmap mount options, but the mount_fusefs
binary never did. So there was no way to ever activate these options.
Delete them. Some of them have alternatives:
no_attrcache: set the attr_valid time to 0 in FUSE_LOOKUP and FUSE_GETATTR
responses.
no_readahed: set max_readahead to 0 in the FUSE_INIT response.
no_datacache: set the vfs.fusefs.data_cache_mode sysctl to 0, or (coming
soon) set the attr_valid time to 0 and set FUSE_AUTO_INVAL_DATA in
the FUSE_INIT response.
no_namecache: set entry_valid time to 0 in FUSE_LOOKUP and FUSE_GETATTR
responses.
Sponsored by: The FreeBSD Foundation
If a server supports a timestamp granularity other than 1ns, it can tell the
client this as of protocol 7.23. The client will use that granularity when
updating its cached timestamps during write. This way the timestamps won't
appear to change following flush.
Sponsored by: The FreeBSD Foundation
Misaligned floating point loads and stores are already handled for AIM, but
use the DSISR to obtain the necessary data. Book-E does not have the DSISR,
so these fixups are not performed, leading to a SIGBUS on misaligned FP
loads or stores. Obtain the necessary data on the Book-E side, similar to
how is done for SPE.
MFC after: 1 week
by dropping TCP fragments with offset = 1.
In addition to dropping these fragments, add a DTrace probe to allow
for more detailed monitoring and diagnosis if required.
MFC after: 1 week
As of r349396 the kernel will internally update the mtime and ctime of files
on write. It will also flush the mtime should a SETATTR happen before the
data cache gets flushed. Now it will flush the ctime too, if the server is
using protocol 7.23 or higher.
This is the only case in which the kernel will explicitly set a file's
ctime, since neither utimensat(2) nor any other user interfaces allow it.
Sponsored by: The FreeBSD Foundation
Writing should implicitly update a file's mtime and ctime. For fuse, the
server is supposed to do that. But the client needs to do it too, because
the FUSE_WRITE response does not include time attributes, and it's not
desirable to issue a GETATTR after every WRITE. When using the writeback
cache, there's another hitch: the kernel should ignore the mtime and ctime
fields in any GETATTR response for files with a dirty write cache.
Sponsored by: The FreeBSD Foundation
Since the only caller to vm_map_splay is vm_map_lookup_entry, move the
implementation of vm_map_splay into vm_map_lookup_helper, called by
vm_map_lookup_entry.
vm_map_lookup_entry returns the greatest entry less than or equal to a
given address, but in many cases the caller wants the least entry
greater than or equal to the address and uses the next pointer to get
to it. Provide an alternative interface to lookup,
vm_map_lookup_entry_ge, to provide the latter behavior, and let
callers use one or the other rather than having them use the next
pointer after a lookup miss to get what they really want.
In vm_map_growstack, the caller wants an entry that includes a given
address, and either the preceding or next entry depending on the value
of eflags in the first entry. Incorporate that behavior into
vm_map_lookup_helper, the function that implements all of these
lookups.
Eliminate some temporary variables used with vm_map_lookup_entry, but
inessential.
Reviewed by: markj (earlier version)
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20664
Writes that extend a file should update the file's size. r344185 restricted
that behavior for fusefs to only happen when the data cache was enabled.
That probably made sense at the time because the attribute cache wasn't
fully baked yet. Now that it is, we should always update the cached file
size during write.
Sponsored by: The FreeBSD Foundation
DMU sync code calls taskq_dispatch() for each sublist of os_dirty_dnodes
and os_synced_dnodes. Since the number of sublists by default is equal
to number of CPUs, it will dispatch equal, potentially large, number of
tasks, waking up many CPUs to handle them, even if only one or few of
sublists actually have any work to do.
This change adds check for empty sublists to avoid this.
Use the standard facilities for getpages and putpages instead of bespoke
implementations that don't work well with the writeback cache. This has
several corollaries:
* Change the way we handle short reads _again_. vfs_bio_getpages doesn't
provide any way to handle unexpected short reads. Plus, I found some more
lock-order problems. So now when the short read is detected we'll just
clear the vnode's attribute cache, forcing the file size to be requeried
the next time it's needed. VOP_GETPAGES doesn't have any way to indicate
a short read to the "caller", so we just bzero the rest of the page
whenever a short read happens.
* Change the way we decide when to set the FUSE_WRITE_CACHE bit. We now set
it for clustered writes even when the writeback cache is not in use.
Sponsored by: The FreeBSD Foundation
Fixes panic when loading ipfw.ko and if_epair.ko built with modern compiler.
Similar to arm64 and riscv, when using a modern compiler (!gcc4.2), code
generated tries to access data in the wrong location, causing kernel panic
(data storage interrupt trap) when loading if_epair and ipfw.
Issue was reproduced with kernel/module compiled using gcc8 and clang8. It
affects both ELFv1 and ELFv2 ABI environments.
PR: 232387
Submitted by: alfredo.junior_eldorado.org.br
Reported by: Mark Millard
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20461
instead of a linear array.
The multicast memberships for the inpcb structure are protected by a
non-sleepable lock, INP_WLOCK(), which needs to be dropped when
calling the underlying possibly sleeping if_ioctl() method. When using
a linear array to keep track of multicast memberships, the computed
memory location of the multicast filter may suddenly change, due to
concurrent insertion or removal of elements in the linear array. This
in turn leads to various invalid memory access issues and kernel
panics.
To avoid this problem, put all multicast memberships on a STAILQ based
list. Then the memory location of the IPv4 and IPv6 multicast filters
become fixed during their lifetime and use after free and memory leak
issues are easier to track, for example by: vmstat -m | grep multi
All list manipulation has been factored into inline functions
including some macros, to easily allow for a future hash-list
implementation, if needed.
This patch has been tested by pho@ .
Differential Revision: https://reviews.freebsd.org/D20080
Reviewed by: markj @
MFC after: 1 week
Sponsored by: Mellanox Technologies
The final server unref should be done by the server thread to prevent
deadlock in the client cdevpriv destructor, which cannot destroy
itself.
MFC after: 1 week
Sponsored by: Mellanox Technologies