Commit graph

68 commits

Author SHA1 Message Date
Mateusz Guzik
586ee69f09 fs: clean up empty lines in .c and .h files 2020-09-01 21:18:40 +00:00
Mateusz Guzik
abd80ddb94 vfs: introduce v_irflag and make v_type smaller
The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by:	kib, jeff
Differential Revision:	https://reviews.freebsd.org/D22715
2019-12-08 21:30:04 +00:00
Alan Somers
6c0c362075 fusefs: Fix iosize for FUSE_WRITE in 7.8 compat mode
When communicating with a FUSE server that implements version 7.8 (or older)
of the FUSE protocol, the FUSE_WRITE request structure is 16 bytes shorter
than normal. The protocol version check wasn't applied universally, leading
to an extra 16 bytes being sent to such servers. The extra bytes were
allocated and bzero()d, so there was no information disclosure.

Reviewed by:	emaste
MFC after:	3 days
MFC-With:	r350665
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21557
2019-09-11 19:29:40 +00:00
Alan Somers
669a092af1 fusefs: fix panic when writing with O_DIRECT and using writeback cache
When a fusefs file system is mounted using the writeback cache, the cache
may still be bypassed by opening a file with O_DIRECT.  When writing with
O_DIRECT, the cache must be invalidated for the affected portion of the
file.  Fix some panics caused by inadvertently invalidating too much.

Sponsored by:	The FreeBSD Foundation
2019-07-28 15:17:32 +00:00
Alan Somers
8aafc8c389 [skip ci] update copyright headers in fusefs files
Sponsored by:	The FreeBSD Foundation
2019-06-28 04:18:10 +00:00
Alan Somers
f8ebf1cd7e fusefs: implement protocol 7.23's FUSE_WRITEBACK_CACHE option
As of protocol 7.23, fuse file systems can specify their cache behavior on a
per-mountpoint basis.  If they set FUSE_WRITEBACK_CACHE in
fuse_init_out.flags, then they'll get the writeback cache.  If not, then
they'll get the writethrough cache.  If they set FOPEN_DIRECT_IO in every
FUSE_OPEN response, then they'll get no cache at all.

The old vfs.fusefs.data_cache_mode sysctl is ignored for servers that use
protocol 7.23 or later.  However, it's retained for older servers,
especially for those running in jails that lack access to the new protocol.

This commit also fixes two other minor test bugs:
* WriteCluster:SetUp was using an uninitialized variable.
* Read.direct_io_pread wasn't verifying that the cache was actually
  bypassed.

Sponsored by:	The FreeBSD Foundation
2019-06-26 17:32:31 +00:00
Alan Somers
788af9538a fusefs: automatically update mtime and ctime on write
Writing should implicitly update a file's mtime and ctime.  For fuse, the
server is supposed to do that.  But the client needs to do it too, because
the FUSE_WRITE response does not include time attributes, and it's not
desirable to issue a GETATTR after every WRITE.  When using the writeback
cache, there's another hitch: the kernel should ignore the mtime and ctime
fields in any GETATTR response for files with a dirty write cache.

Sponsored by:	The FreeBSD Foundation
2019-06-25 23:40:18 +00:00
Alan Somers
0d3a88d76c fusefs: writes should update the file size, even when data_cache_mode=0
Writes that extend a file should update the file's size.  r344185 restricted
that behavior for fusefs to only happen when the data cache was enabled.
That probably made sense at the time because the attribute cache wasn't
fully baked yet.  Now that it is, we should always update the cached file
size during write.

Sponsored by:	The FreeBSD Foundation
2019-06-25 18:36:11 +00:00
Alan Somers
b9e2019755 fusefs: rewrite vop_getpages and vop_putpages
Use the standard facilities for getpages and putpages instead of bespoke
implementations that don't work well with the writeback cache.  This has
several corollaries:

* Change the way we handle short reads _again_.  vfs_bio_getpages doesn't
  provide any way to handle unexpected short reads.  Plus, I found some more
  lock-order problems.  So now when the short read is detected we'll just
  clear the vnode's attribute cache, forcing the file size to be requeried
  the next time it's needed.  VOP_GETPAGES doesn't have any way to indicate
  a short read to the "caller", so we just bzero the rest of the page
  whenever a short read happens.

* Change the way we decide when to set the FUSE_WRITE_CACHE bit.  We now set
  it for clustered writes even when the writeback cache is not in use.

Sponsored by:   The FreeBSD Foundation
2019-06-25 17:24:43 +00:00
Alan Somers
1734e205f3 fusefs: refine the short read fix from r349332
b_fsprivate1 needs to be initialized even for write operations, probably
because a buffer can be used to read, write, and read again with the final
read serviced by cache.

Sponsored by:	The FreeBSD Foundation
2019-06-24 20:08:28 +00:00
Alan Somers
17575bad85 fusefs: improve the short read fix from r349279
VOP_GETPAGES intentionally tries to read beyond EOF, so fuse_read_biobackend
can't rely on bp->b_resid > 0 indicating a short read.  And adjusting
bp->b_count after a short read seems to cause some sort of resource leak.
Instead, store the shortfall in the bp->b_fsprivate1 field.

Sponsored by:	The FreeBSD Foundation
2019-06-24 17:05:31 +00:00
Alan Somers
44f654fdc5 fusefs: fix corruption on short reads caused by r349279
Even if a short read is caused by EOF, it's still necessary to bzero the
remaining buffer, because that buffer could become valid as a result of a
future ftruncate or pwrite operation.

Reported by:	fsx
Sponsored by:	The FreeBSD Foundation
2019-06-21 23:29:29 +00:00
Alan Somers
aef22f2d75 fusefs: correctly handle short reads
A fuse server may return a short read for three reasons:

* The file is opened with FOPEN_DIRECT_IO.  In this case, the short read
  should be returned directly to userland.  We already handled this case
  correctly.

* The file was truncated server-side, and the read hit EOF.  In this case,
  the kernel should update the file size.  Fixed in the case of VOP_READ.
  Fixing this for VOP_GETPAGES is TODO.

* The file is opened in writeback mode, there are dirty buffers past what
  the server thinks is the file's EOF, and the read hit what the server
  thinks is the file's EOF.  In this case, the client is trying to read a
  hole, and should zero-fill it.  We already handled this case, and I added
  a test for it.

Sponsored by:	The FreeBSD Foundation
2019-06-21 21:44:31 +00:00
Alan Somers
a1c9f4ad0d fusefs: implement VOP_BMAP
If the fuse daemon supports FUSE_BMAP, then use that for the block mapping.
Otherwise, use the same technique used by vop_stdbmap.  Report large values
for runp and runb in order to maximize read clustering and minimize upcalls,
even if we don't know the true layout.

The major result of this change is that sequential reads to FUSE files will
now usually happen 128KB at a time instead of 64KB.

Sponsored by:	The FreeBSD Foundation
2019-06-20 17:08:21 +00:00
Alan Somers
84879e46c2 fusefs: multiple fixes related to the write cache
* Don't always write the last page synchronously.  That's not actually
  required.  It was probably just masking another bug that I fixed later,
  possibly in r349021.

* Enable the NotifyWriteback tests now that Writeback cache is working.

* Add a test to ensure that the write cache isn't flushed synchronously when
  in writeback mode.

Sponsored by:	The FreeBSD Foundation
2019-06-17 23:34:11 +00:00
Alan Somers
402b609c80 fusefs: use cluster_read for more readahead
fusefs will now use cluster_read.  This allows readahead of more than one
cache block.  However, it won't yet actually cluster the reads because that
requires VOP_BMAP, which fusefs does not yet implement.

Sponsored by:	The FreeBSD Foundation
2019-06-17 22:01:23 +00:00
Alan Somers
d569012f45 fusefs: implement non-clustered readahead
fusefs will now read ahead at most one cache block at a time (usually 64
KB).  Clustered reads are still TODO.  Individual file systems may disable
read ahead by setting fuse_init_out.max_readahead=0 during initialization.

Sponsored by:	The FreeBSD Foundation
2019-06-17 16:56:51 +00:00
Alan Somers
b5aaf286ea fusefs: fix the "write-through" of write-through cacheing
Our fusefs(5) module supports three cache modes: uncached, write-through,
and write-back.  However, the write-through mode (which is the default) has
never actually worked as its name suggests.  Rather, it's always been more
like "write-around".  It wrote directly, bypassing the cache.  The cache
would only be populated by a subsequent read of the same data.

This commit fixes that problem.  Now the write-through mode works as one
would expect: write(2) immediately adds data to the cache and then blocks
while the daemon processes the write operation.

A side effect of this change is that non-cache-block-aligned writes will now
incur a read-modify-write cycle of the cache block.  The old behavior
(bypassing write cache entirely) can still be achieved by opening a file
with O_DIRECT.

PR:		237588
Sponsored by:	The FreeBSD Foundation
2019-06-14 19:47:48 +00:00
Alan Somers
8eecd9ce05 fusefs: enable write clustering
Enable write clustering in fusefs whenever cache mode is set to writeback
and the "async" mount option is used.  With default values for MAXPHYS,
DFLTPHYS, and the fuse max_write mount parameter, that means sequential
writes will now be written 128KB at a time instead of 64KB.

Also, add a regression test for PR 238565, a panic during unmount that
probably affects UFS, ext2, and msdosfs as well as fusefs.

PR:		238565
Sponsored by:	The FreeBSD Foundation
2019-06-14 18:14:51 +00:00
Alan Somers
dff3a6b410 fusefs: fix a bug with WriteBack cacheing
An errant vfs_bio_clrbuf snuck in in r348931.  Surprisingly, it doesn't have
any effect most of the time.  But under some circumstances it cause the
buffer to behave in a write-only fashion.

Sponsored by:	The FreeBSD Foundation
2019-06-13 19:07:03 +00:00
Alan Somers
a87e0831ab fusefs: WIP fixing writeback cacheing
The current "writeback" cache mode, selected by the
vfs.fusefs.data_cache_mode sysctl, doesn't do writeback cacheing at all.  It
merely goes through the motions of using buf(9), but then writes every
buffer synchronously.  This commit:

* Enables delayed writes when the sysctl is set to writeback cacheing
* Fixes a cache-coherency problem when extending a file whose last page has
  just been written.
* Removes the "sync" mount option, which had been set unconditionally.
* Adjusts some SDT probes
* Adds several new tests that mimic what fsx does but with more control and
  without a real file system.  As I discover failures with fsx, I add
  regression tests to this file.
* Adds a test that ensures we can append to a file without reading any data
  from it.

This change is still incomplete.  Clustered writing is not yet supported,
and there are frequent "panic: vm_fault_hold: fault on nofault entry" panics
that I need to fix.

Sponsored by:	The FreeBSD Foundation
2019-06-11 16:32:33 +00:00
Alan Somers
ddc51e453e fusefs: remove some stuff that was copy/pasted from nfsclient
fusefs's I/O methods were originally copy/pasted from nfsclient.  This
commit removes some irrelevant parts, like stuff involving B_NEEDCOMMIT.

Sponsored by:	The FreeBSD Foundation
2019-06-06 20:35:41 +00:00
Alan Somers
0269ae4c19 MFHead @348740
Sponsored by:	The FreeBSD Foundation
2019-06-06 16:20:50 +00:00
Alan Somers
011bca9948 fusefs: simplify fuse_write_biobackend. No functional change.
Sponsored by:	The FreeBSD Foundation
2019-06-05 20:18:56 +00:00
Alan Somers
a639731ba9 fusefs: respect RLIMIT_FSIZE
Sponsored by:	The FreeBSD Foundation
2019-06-03 23:24:07 +00:00
Alan Somers
d4fd0c8148 fusefs: set the flags fields of fuse_write_in and fuse_read_in
These fields are supposed to contain the file descriptor flags as supplied
to open(2) or set by fcntl(2).  The feature is kindof useless on FreeBSD
since we don't supply all of these flags to fuse (because of the weak
relationship between struct file and struct vnode).  But we should at least
set the access mode flags (O_RDONLY, etc).

This is the last fusefs change needed to get full protocol 7.9 support.
There are still a few options we don't support for good reason (mandatory
file locking is dumb, flock support is broken in the protocol until 7.17,
etc), but there's nothing else to do at this protocol level.

Sponsored by:	The FreeBSD Foundation
2019-05-28 01:09:19 +00:00
Alan Somers
bda39894c5 fusefs: set FUSE_WRITE_CACHE when writing from cache
This bit tells the server that we're not sure which uid, gid, and/or pid
originated the write.  I don't know of a single file system that cares, but
it's part of the protocol.

Sponsored by:	The FreeBSD Foundation
2019-05-27 21:36:28 +00:00
Alan Somers
65417f5e27 Remove "struct ucred*" argument from vtruncbuf
vtruncbuf takes a "struct ucred*" argument. AFAICT, it's been unused ever
since that function was first added in r34611. Remove it.  Also, remove some
"struct ucred" arguments from fuse and nfs functions that were only used by
vtruncbuf.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20377
2019-05-24 20:27:50 +00:00
Alan Somers
e76986fde0 fusefs: fix exporting fuse filesystems with nfsd
A previous commit made fuse exportable via userland NFS servers.
Compatibility with the in-kernel nfsd required two more changes:

* During read and write operations, implicitly do a FUSE_OPEN if there isn't
  already a valid file handle.  That's because nfsd never calls VOP_OPEN.
* During VOP_READDIR, if an implicit open was necessary, directory offsets
  from a previous VOP_READDIR may not be valid, so VOP_READDIR may have to
  start from the beginning and read until it encounters the requested
  offset.

I've done only limited testing over NFS, so there are probably still some
more bugs.  Thanks to rmacklem for all of the readdir changes, which he had
made for his pnfs work.

Sponsored by:	The FreeBSD Foundation
2019-05-23 23:06:26 +00:00
Alan Somers
2013b723d3 fusefs: improve attribute cacheing
Consolidate all calls to fuse_vnode_setsize as a result of a file attribute
change to one location in fuse_internal_setattr.  There are still a few
calls elsewhere that happen as a result of a write.

Sponsored by:	The FreeBSD Foundation
2019-05-23 00:22:03 +00:00
Alan Somers
18a2264e27 fusefs: fix "recursing on non recursive lockmgr" panic
When mounted with -o default_permissions and when
vfs.fusefs.data_cache_mode=2, fuse_io_strategy would try to clear the suid
bit after a successful write by a non-owner.  When combined with a
not-yet-committed attribute-caching patch I'm working on, and if the
FUSE_SETATTR response indicates an unexpected filesize (legal, if the file
system has other clients), this would end up calling vtruncbuf.  That would
panic, because the buffer lock was already held by bufwrite or bufstrategy
or something else upstack from fuse_vnop_strategy.

Sponsored by:	The FreeBSD Foundation
2019-05-22 23:30:51 +00:00
Alan Somers
b6b7fe7c7d fusefs: remove the vfs.fusefs.sync_resize syctl, correctly this time
In r347547 I intended to remove the vfs.fusefs.sync_resize sysctl, leaving
fusefs's behavior as though sync_resize had its default value.  But I forgot
that I had already turned off sync_resize in my development system's
/etc/sysctl.conf.

This commit complete removes the optional behavior that was formerly
controlled by sync_resize.  There's no need for explicitly calling
FUSE_SETATTR after every FUSE_WRITE that extends a file.  The daemon can
infer that the file is being extended.  If this sysctl was added as a
workaround for a buggy daemon, there's no clue as to what that daemon may
have been.

Sponsored by:	The FreeBSD Foundation
2019-05-22 19:49:25 +00:00
Alan Somers
16bd2d47c7 fusefs: Upgrade FUSE protocol to version 7.9.
This commit upgrades the FUSE API to protocol 7.9 and adds unit tests for
backwards compatibility with servers built for version 7.8.  It doesn't
implement any of 7.9's new features yet.

Sponsored by:	The FreeBSD Foundation
2019-05-16 17:24:11 +00:00
Alan Somers
3d15b234a4 fusefs: don't track a file's size in two places
fuse_vnode_data.filesize was mostly redundant with
fuse_vnode_data.cached_attrs.st_size, but didn't have exactly the same
meaning.  It was very confusing.  This commit eliminates the former.  It
also eliminates fuse_vnode_refreshsize, which ignored the cache timeout
value.

Sponsored by:	The FreeBSD Foundation
2019-05-15 00:38:52 +00:00
Alan Somers
4d09e76a73 fusefs: remove the vfs.fusefs.sync_resize syctl
This sysctl was added > 6.5 years ago for no clear purpose.  I'm guessing
that it may have had something to do with the incomplete attribute cache.
But the attribute cache works now.  Since there's no clear motivation for
this sysctl, it's best to remove it.

Sponsored by:	The FreeBSD Foundation
2019-05-13 19:47:31 +00:00
Alan Somers
bad4c94dc8 fusefs: remove the vfs.fusefs.fix_broken_io sysctl
This looks like it may have been a workaround for a specific buggy FUSE
filesystem.  However, there's no information about what that bug may have
been, and the workaround is > 6.5 years old, so I consider the sysctl to be
unmaintainable.

Sponsored by:	The FreeBSD Foundation
2019-05-13 19:31:09 +00:00
Alan Somers
7648bc9fee MFHead @347527
Sponsored by:	The FreeBSD Foundation
2019-05-13 18:25:55 +00:00
Alan Somers
8350cbd8d6 [skip ci] fusefs: remove an obsolete comment
Sponsored by:	The FreeBSD Foundation
2019-05-13 15:39:54 +00:00
Alan Somers
a90e32de25 fusefs: clear SUID & SGID after a successful write by a non-owner
Reported by:	pjdfstest
Sponsored by:	The FreeBSD Foundation
2019-05-06 16:17:55 +00:00
Alan Somers
9c7ec33162 fusefs: fix a deadlock in VOP_PUTPAGES
As of r346162 fuse now invalidates the cache during writes.  But it can't do
that when writing from VOP_PUTPAGES, because the write is coming _from_ the
cache.  Trying to invalidate the cache in that situation causes a deadlock
in vm_object_page_remove, because the pages in question have already been
busied by the same thread.

PR:		235774
Sponsored by:	The FreeBSD Foundation
2019-04-26 19:47:43 +00:00
Alan Somers
21d4686c5c fusefs: diff reduction between fuse_read_biobackend and ext_read
The main difference is to replace some custom logic with bread.  No
functional change at this point, but this is one step towards adding
readahead.

Sponsored by:	The FreeBSD Foundation
2019-04-23 22:34:32 +00:00
Alan Somers
419e7ff674 fusefs: rename the SDT probes from "fuse" to "fusefs"
This matches the new name of the kld.

Sponsored by:	The FreeBSD Foundation
2019-04-20 00:04:31 +00:00
Alan Somers
723c776829 fusefs: WIP making FUSE operations interruptible
The fuse protocol includes a FUSE_INTERRUPT operation that the client can
send to the server to indicate that it wants to abort an in-progress
operation.  It's required to interrupt any syscall that is blocking on a
fuse operation.

This commit adds basic FUSE_INTERRUPT support.  If a process receives any
signal while it's blocking on a FUSE operation, it will send a
FUSE_INTERRUPT and wait for the original operation to complete.  But there
is still much to do:

* The current code will leak memory if the server ignores FUSE_INTERRUPT,
  which many do.  It will also leak memory if the server completes the
  original operation before it receives the FUSE_INTERRUPT.
* An interrupted read(2) will incorrectly appear to be successful.
* fusefs should return immediately for fatal signals.
* Operations that haven't been sent to the server yet should be aborted
  without sending FUSE_INTERRUPT.
* Test coverage should be better.
* It would be great if write operations could be made restartable.
  That would require delaying uiomove until the last possible moment, which
  would be sometime during fuse_device_read.

PR:		236530
Sponsored by:	The FreeBSD Foundation
2019-04-17 23:32:38 +00:00
Alan Somers
6af6fdcea7 fusefs: evict invalidated cache contents during write-through
fusefs's default cache mode is "writethrough", although it currently works
more like "write-around"; writes bypass the cache completely.  Since writes
bypass the cache, they were leaving stale previously-read data in the cache.
This commit invalidates that stale data.  It also adds a new global
v_inval_buf_range method, like vtruncbuf but for a range of a file.

PR:		235774
Reported by:	cem
Sponsored by:	The FreeBSD Foundation
2019-04-12 19:05:06 +00:00
Konstantin Belousov
ae90941431 Add vn_fsync_buf().
Provide a convenience function to avoid the hack with filling fake
struct vop_fsync_args and then calling vop_stdfsync().

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-09 20:20:04 +00:00
Alan Somers
12292a99ac fusefs: correctly handle short writes
If a FUSE daemon returns FOPEN_DIRECT_IO when a file is opened, then it's
allowed to write less data than was requested during a FUSE_WRITE operation
on that file handle.  fusefs should simply return a short write to userland.

The old code attempted to resend the unsent data.  Not only was that
incorrect behavior, but it did it in an ineffective way, by attempting to
"rewind" the uio and uiomove the unsent data again.

This commit correctly handles short writes by returning directly to
userland if FOPEN_DIRECT_IO was set.  If it wasn't set (making the short
write technically a protocol violation), then we resend the unsent data.
But instead of rewinding the uio, just resend the data that's already in the
kernel.

That necessitated a few changes to fuse_ipc.c to reduce the amount of bzero
activity.  fusefs may be marginally faster as a result.

PR:		236381
Sponsored by:	The FreeBSD Foundation
2019-04-04 16:51:34 +00:00
Alan Somers
9f10f423a9 fusefs: send FUSE_FLUSH during VOP_CLOSE
The FUSE protocol says that FUSE_FLUSH should be send every time a file
descriptor is closed.  That's not quite possible in FreeBSD because multiple
file descriptors can share a single struct file, and closef doesn't call
fo_close until the last close.  However, we can still send FUSE_FLUSH on
every VOP_CLOSE, which is probably good enough.

There are two purposes for FUSE_FLUSH.  One is to allow file systems to
return EIO if they have an error when writing data that's cached
server-side.  The other is to release POSIX file locks (which fusefs(5) does
not yet support).

PR:		236405, 236327
Sponsored by:	The FreeBSD Foundation
2019-04-03 19:59:45 +00:00
Alan Somers
f8d4af104b fusefs: send FUSE_OPEN for every open(2) with unique credentials
By default, FUSE performs authorization in the server.  That means that it's
insecure for the client to reuse FUSE file handles between different users,
groups, or processes.  Linux handles this problem by creating a different
FUSE file handle for every file descriptor.  FreeBSD can't, due to
differences in our VFS design.

This commit adds credential information to each fuse_filehandle.  During
open(2), fusefs will now only reuse a file handle if it matches the exact
same access mode, pid, uid, and gid of the calling process.

PR:		236844
Sponsored by:	The FreeBSD Foundation
2019-04-01 20:42:15 +00:00
Alan Somers
5fccbf313a fusefs: don't force direct io for files opened O_WRONLY
Previously fusefs would treat any file opened O_WRONLY as though the
FOPEN_DIRECT_IO flag were set, in an attempt to avoid issuing reads as part
of a RMW write operation on a cached part of the file.  However, the FUSE
protocol explicitly allows reads of write-only files for precisely that
reason.

Sponsored by:	The FreeBSD Foundation
2019-03-30 00:57:07 +00:00
Alan Somers
98852a32af fusefs: fix error handling in fuse_vnop_strategy
Reported by:	cem
Sponsored by:	The FreeBSD Foundation
2019-03-28 21:57:42 +00:00