During recent testing related to the IETF NFSv4 Bakeathon, it was
discovered that Kerberized NFSv4.1/4.2 mounts to pNFS servers
(sec=krb5[ip],pnfs mount options) was broken.
The FreeBSD client was using the "service principal" for
the MDS to try and establish a rpcsec_gss credential for a DS,
which is incorrect. (A "service principal" looks like
"nfs@<fqdn-of-server>" and the <fqdn-of-server> for the DS is not
the same as the MDS for most pNFS servers.)
To fix this, the rpcsec_gss code needs to be able to do a
reverse DNS lookup of the DS's IP address. A new kgssapi upcall
to the gssd(8) daemon is added by this patch to do the reverse DNS
along with a new rpcsec_gss function to generate the "service
principal".
A separate patch to the gssd(8) will be committed, so that this
patch will fix the problem. Without the gssd(8) patch, the new
upcall fails and current/incorrect behaviour remains.
This bug only affects the rare case of a Kerberized (sec=krb5[ip],pnfs)
mount using pNFS.
This patch changes the internal KAPI between the kgssapi and
nfscl modules, but since I did a version bump a few days ago,
I will not do one this time.
(cherry picked from commit dd7d42a1fae5a4879b62689a165238082421f343)
Commit 6aded1e6b2e5 fixed a rare case when handling an NFSv4
Rename reply when delegations are in use. This patch fixes the
associated comment.
(cherry picked from commit 0a958aa16fed1978879d64e3b225f1d232cc5a98)
When delegations are enabled (they are not by default in
the FreeBSD NFSv4 server), rename will check for and return
delegations. If the second of these DelegReturn operations
were to fail (they rarely do), then the code would not retry
the rename with returning delegations, as it is intended to do.
The patch fixes the problem, since the DelegReturn reply status
is the second iteration of the loop and not the first iteration.
As noted, this bug would have rarely manifested a problem, since
DelegReturn operations do not normally fail.
(cherry picked from commit 6aded1e6b2e5549120031032e1c7f8b002882327)
Approved by: so
Security: SA-23:18.nfsclient
Reviewed by: rmacklem
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 6fa843f6e647a1a1e0e42af1e7abc9e903699f31)
Since newnfs_copycred() calls crsetgroups() which in turn calls
crextend() which might do a malloc(M_WAITOK), newnfs_copycred()
cannot be called with a mutex held. Fortunately, the malloc()
call is rarely done, since XU_GROUPS is 16 and the NFS client
uses a maximum of 17 (only 17 groups will cause the malloc() to
be called). Further, it is only a problem if the malloc() tries
to sleep(). As such, this bug does not seem to have caused
problems in practice.
This patch fixes the one place in the NFS client where
newnfs_copycred() is called while a mutex is held by moving the
call to after where the mutex is released.
Found by inspection while working on an experimental patch.
(cherry picked from commit 501bdf3001190686bf55d9d333cb533858c2cf2f)
During testing at a recent IETF NFSv4 Bakeathon, a non-FreeBSD
server was rebooted. After the reboot, the FreeBSD client sent
an Open/Claim_previous with a Getattr after the Open in the same
compound. The Open/Claim_previous was done to recover the Open
and a Delegation for for a file. The Open succeeded, but the
Getattr after the Open failed with NFSERR_DELAY. This resulted
in the FreeBSD client retrying the entire RPC over and over again,
until the server's recovery grace period ended. Since the Open
succeeded, there was no need to retry the entire RPC.
This patch modifies the NFSv4 client side recovery Open/Claim_previous
RPC reply handling to deal with this case. With this patch, the
Getattr reply of NFSERR_DELAY is ignored and the successful Open
reply is processed.
This bug will not normally affect users, since this non-FreeBSD
server is not widely used (it may not even have shipped to any
customers).
(cherry picked from commit 14bbf4fe5abb20f1126168e66b03127ae920f78e)
For NFSv4.1/4.2, there are two new options for the Open operation.
These two options use the file handle for the file instead of the
file handle for the directory plus a file name. By doing so, the
client code is simplified (it no longer needs the "nfsv4node" structure
attached to the NFS vnode). It also avoids problems caused by another
NFS client (or process running locally in the NFS server) doing a
rename or remove of the file name between the Lookup and Open.
Unfortunately, there was a bug (fixed recently by commit X)
in the NFS server which mis-parsed the Claim_Deleg_Cur_FH
arguments. To allow this patch to work with the broken FreeBSD
NFSv4.1/4.2 server, NFSMNTP_BUGGYFBSDSRV is defined and is set
when a correctly formatted Claim_Deleg_Cur_FH fails with NFSERR_EXPIRED.
(This is what the old, broken NFS server does, since it erroneously
uses the Getattr arguments as a stateID.) Once this flag is set,
the client fills in a stateID, to make the broken NFS server happy.
Tested at a recent IETF NFSv4 Bakeathon.
(cherry picked from commit 196787f79e67374527a1d528a42efa8b31acd9af)
Without this patch, a NFSv4 Readdir operation acquires the vnode for
each entry in the directory. If only the Type, Fileid, Mounted_on_fileid
and ReaddirError attributes are requested by a client, acquiring the vnode
is not necessary for non-directories. Directory vnodes must be acquired
to check for server file system mount points.
This patch avoids acquiring the vnode, as above, resulting in a 3-8%
improvement in Readdir RPC RTT for some simple tests I did.
Note that only non-rdirplus NFSv4 mounts will benefit from this change.
Tested during a recent IETF NFSv4 Bakeathon testing event.
(cherry picked from commit cd5edc7db261fb228be4044e6fdd38850eb4e9c4)
In a recent email list discussion related to NFSv4 mount problems
against a non-FreeBSD NFSv4 server, the reporter of the issue noted
that the server had replied 10068 (NFSERR_RETRYUNCACHEDREP). This
did not seem related to the mount problem, but I had never seen this
error before. It indicates that an RPC retry after a new TCP
connection has been established failed because the server did not
cache the reply. Since this should only happen for idempotent
operations, redoing the RPC should be safe.
This patch modifies the NFSv4.1/4.2 client to redo the RPC instead
of considering the server error fatal. It should only affect the
unusual case where TCP connections to NFSv4 servers are breaking
without the NFSv4 server rebooting.
MFC after: 2 weeks
(cherry picked from commit c4e298251ab01665f5bb3edeb740a51331818a45)
PR#274346 reports a crash which appears to be caused by a NULL default session
being destroyed. This patch should avoid the crash.
PR: 274346
(cherry picked from commit db7257ef972ed75e33929d39fd791d3699b53c63)
When I implemented a test patch using Open Claim_Deleg_Cur_FH
I discovered that the NFSv4.1/4.2 server was broken for this
Open option. Fortunately it is never used by the FreeBSD
client and never used by other clients unless delegations
are enabled. (The FreeBSD NFSv4 server does not have delegations
enabled by default.)
Claim_Deleg_Cur_FH was broken because the code mistakenly
assumed a stateID argument, which is not the case.
This patch fixes the bug by changing the XDR parser to not
expect a stateID and to fill most of the stateID in from the
clientID. The clientID is the first two elements of the "other"
array for the stateID and is sufficient to identify which
client the delegation is issued to. Since there is only one
delegation issued to a client per file, this is sufficient to
locate the correct delegation.
If you are running non-FreeBSD NFSv4.1/4.2 mounts against the
FreeBSD server, you need this patch if you have delegations enabled.
PR: 274574
(cherry picked from commit f300335d9aebf2e99862bf783978bd44ede23550)
If VOP_READLINK returns a path that contains a NUL, it will trigger an
assertion in vfs_lookup. Sanitize such paths in fusefs, rejecting any
and warning the user about the misbehaving server.
PR: 274268
Sponsored by: Axcient
Reviewed by: mjg, markj
Differential Revision: https://reviews.freebsd.org/D42081
(cherry picked from commit 662ec2f781521c36b76af748d74bb0a3c2e27a76)
The sysctl variable 'vfs.nfs.iodmin' is actually a loader tunable. Add
sysctl flag CTLFLAG_TUN to it so that `sysctl -T` will report it
correctly.
No functional change intended.
Reviewed by: kib, imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42113
(cherry picked from commit 95c01e9b329406699e89907167b5c3c9effbcbca)
When using cached attributes, we must update a file's atime during
close, if it has been read since the last attribute refresh. But,
* Don't update atime if we lack write permissions to the file or if the
file system is readonly.
* If the daemon fails our atime update request for any reason, don't
report this as a failure for VOP_CLOSE.
PR: 270749
Reported by: Jamie Landeg-Jones <jamie@catflap.org>
Sponsored by: Axcient
Reviewed by: pfg
Differential Revision: https://reviews.freebsd.org/D41925
(cherry picked from commit fb619c94c679e939496fe0cf94b8d2cba95e6e63)
fusefs: fix unused variables from fb619c94c67
PR: 270749
Reported by: cy
Sponsored by: Axcient
(cherry picked from commit e5236d25f2c0709acf3547e6af45f4bb4eec4f02)
It's possible for misuse of cdev KPIs or for bugs in devfs itself to
result in e.g. a cdev object's container being freed while still on the
global list used to populate each devfs mount; see PR 273418 for a
recent example.
Since a node may be marked inactive well before it is reaped from the
list, add a new flag solely to track list membership, and employ it in
some basic list integrity assertions to catch bad actors.
Discussed with: kib, mjg
(cherry picked from commit 67864268da53b792836f13be10299de8cd62997e)
VOP_COPY_FILE_RANGE(9) is now caled when source and target vnodes
reside on the same filesystem type (not just on the same mountpoint).
The check if vnodes are on the same mountpoint must be done in the
filesystem code. There are currently only three users - fusefs(5) already
has this check, ZFS can handle multiple mountpoints and a check has been
added to NFS client.
ZFS block cloning is now possible between all snapshots and datasets
of the same ZFS pool.
This is an early MFC due to release schedule.
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D41721
Approved by: re (gjb)
(cherry picked from commit 969071be938ca9b96f8dff003c7b43d6308849f1)
ISO9660 permits specifying a logical block size that is any power of 2
greater than or equal to 512. The geom disk layer requires requests
to be aligned on sector boundaries of the provider. With a volume
that uses a logical block size smaller than the underlying disk sector
size (e.g. a logical block size of 512 or 1024 on a CD which uses 2048
byte sectors), the current cd9660 vfs can issue requests for partial
sectors, or on non-sector boundaries.
Fixing this properly would require wrapping all of the calls to
bread*/bwrite* in cd9660 vfs to roundup requests to be on sector
boundaries which can include both the length, but also the starting
sector number (and thus requiring use of an offset relative to b_data
in the resulting buf).
These images do not seem to be common however given that no one has
fixed this in cd9660's vfs in the past few decades, so just reject
them during mount with an error. If such images are found to be used
in the wild in practice, then the larger fix can be applied.
PR: 258063
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41228
- If the size is negative or if rounding it up to a multiple of
the block size overflows, fail the read request with ERANGE.
- While here, add a sanity check that the ICB length for the root
directory is at least as long as a minimum-sized file entry.
PR: 257768
Reported by: Robert Morris <rtm@lcs.mit.edu>
MFC after: 1 week
Sponsored by: FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41220
which changes /dev/fd/N files types to symbolic link with the behavior
of symbolic links.
PR: 272127
Reported by: Peter Eriksson <pen@lysator.liu.se>
Reviewed by: dchagin
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40969
This reverts commits 4a402dfe0b and
3bffa22623.
The fix will be implemented in somewhat different manner. The semantic
adjustment is incompatible with linuxolator expectations.
Reported and reviewed by: dchagin
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40969
It contains arbitrary garbage, which is not cleared by vfs_bio_clrbuf()
which only zeroes invalid portions of the pages.
Reported by: Maxim Suhanov <dfirblog@gmail.com>
Discussed with: so
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Instead of using VV_READLINK vnode flag and checking it in one place,
just assign VLNK type to the Fdesc vnodes for linrdlnk mounts. Then all
places where symlinks needs to be followed, e.g. lookup(), are handled.
PR: 272127
Reported by: Peter Eriksson <pen@lysator.liu.se>
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40700
pseudofs nodes store their name in a flexible array member, so the node
allocation is sized using the length of the name, including a nul
terminator. pfs_lookup() scans a directory of nodes, comparing names to
find a match. The comparison was incorrect and assumed that all node
names were at least as long as the name being looked up, which of course
isn't true.
I believe the bug is mostly harmless since it cannot result in false
positive or negative matches from the lookup, but it triggers a KASAN
check.
Reported by: pho
Reviewed by: kib, Olivier Certner <olce.freebsd@certner.fr>
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40692
The zone has been dead ever since commit
b9e2019755 ("fusefs: rewrite vop_getpages and vop_putpages")
No functional change intended.
Reviewed by: asomers
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D40143
The ext2fs does not support disks with sectorsize more 512 bytes.
The main issue is in reading/writing superblock, which is not aligned
with 4k value. Reimplement the superblock reading logic to make it
indifferent to disk logical sector size. The logical sector size
more then page size is not supported, like it is doing on Linux side.
PR: 271105
Reported by: k(at)vodka.home.kg
Reviewed by: pfg
MFC after: 2 week
Differential Revision: https://reviews.freebsd.org/D40047
The SPDX folks have obsoleted the BSD-2-Clause-NetBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
If unionfs_domount() fails, the mount path will not call VFS_UNMOUNT()
to clean up after it. If this failure happens during upper vnode
registration, the unionfs root vnode will already be allocated.
vflush() it in order to prevent the vnode from being leaked and the
subsequent vfs_mount_destroy() call from getting stuck waiting for
the mountpoint reference count to drain.
Reviewed by: kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D39767