and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.
s/vfs_mount/vfs_omount/
s/vfs_nmount/vfs_mount/
Name our filesystems mount function consistently.
Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.
Reorganize the root filesystem selection code.
Add local rootvp variables as needed.
Remove checks for miniroot's in the swappartition. We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
our cached 'next vnode' being removed from this mountpoint. If we
find that it was recycled, we restart our traversal from the start
of the list.
Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:
MNT_ILOCK(mp);
loop:
for (vp = TAILQ_FIRST(&mp...);
(vp = nvp) != NULL;
nvp = TAILQ_NEXT(vp,...)) {
if (vp->v_mount != mp)
goto loop;
MNT_IUNLOCK(mp);
...
MNT_ILOCK(mp);
}
MNT_IUNLOCK(mp);
The code which takes vnodes off a mountpoint looks like this:
MNT_ILOCK(vp->v_mount);
...
TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
...
MNT_IUNLOCK(vp->v_mount);
...
vp->v_mount = something;
(Take a moment and try to spot the locking error before you read on.)
On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.
Fix:
Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go. This saves approx 65 lines of
duplicated code.
Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.
Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.
the vnode around calls to vinvalbuf()). Apparently no one has tested
ext2fs with DEBUG_VOP_LOCKS. Vnode locking for vinvalbuf() might not
be required in non-soft-updates cases, but it is now asserted.
MFffs (uncommitted related and nearby cleanups: don't unlock the vnode
after vinvalbuf() only to have to relock it almost immediately; don't
refer to devices classified by vn_isdisk() as block devices).
the user requests a read-only mount. This is necessary because we
don't do the VOP_OPEN again if they upgrade a read-only mount to
read-write.
Noticed by: bde
of ffs_reload()'s mountp parameter to mp in rev.1.28 of ffs_vnops.c
had not been merged here.
ext2fs_reload() is still missing locking from not merging other changes
to ffs_reload(), but none of these is related to recent locking changes.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.
Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.
Discussed with: jeff
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.
The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.
This fixes the panic on shutdown people have reported.
contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
flag to the initial BUF_LOCK(). This will eventually be used in cases
were we want to use a buffer only if it is not currently in use.
- Convert all consumers of the getblk() api to use this extra parameter.
Reviwed by: arch
Not objected to by: mckusick
that use it. Specifically, vop_stdlock uses the lock pointed to by
vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to
reference vp->v_lock. Filesystems that wish to use the default
do not need to allocate a lock at the front of their node structure
(as some still did) or do a lockinit. They can simply start using
vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks,
but still use the vop_stdlock functions (such as nullfs) can simply
replace vp->v_vnlock with a pointer to the lock that they wish to
have used for the vnode. Such filesystems are responsible for
setting the vp->v_vnlock back to the default in their vop_reclaim
routine (e.g., vp->v_vnlock = &vp->v_lock).
In theory, this set of changes cleans up the existing filesystem
lock interface and should have no function change to the existing
locking scheme.
Sponsored by: DARPA & NAI Labs.
v_tag is now const char * and should only be used for debugging.
Additionally:
1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK
2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which
is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.
Suggested by: phk
Reviewed by: bde, rwatson (earlier version)
ext2fs, inode numbers start at 1, so the maximum valid inode number
is (s_inodes_per_group * s_groups_count), not one less. This is
just a minimal change to avoid unnecessary panics and errors; some
other related bugs that Bruce Evans mentioned to me are not addressed.
Reviewed by: bde (ages ago)
shared code and converting all ufs references. Originally it may
have made sense to share common features between the two filesystems,
but recently it has only caused problems, the UFS2 work being the
final straw.
All UFS_* indirect calls are now direct calls to ext2_* functions,
and ext2fs-specific mount and inode structures have been introduced.
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
provided the latter is nonzero. At this point, the former is a fairly
arbitrary default value (DFTPHYS), so changing it to any reasonable
value specified by the device driver is safe. Using the maximum of
these limits broke ffs clustered i/o for devices whose si_iosize_max
is < DFLTPHYS. Using the minimum would break device drivers' ability
to increase the active limit from DFTLPHYS up to MAXPHYS.
Copied the code for this and the associated (unnecessary?) fixup of
mp_iosize_max to all other filesystems that use clustering (ext2fs and
msdosfs). It was completely missing.
PR: 36309
MFC-after: 1 week
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.
structure changes now rather then piecemeal later on. mnt_nvnodelist
currently holds all the vnodes under the mount point. This will eventually
be split into a 'dirty' and 'clean' list. This way we only break kld's once
rather then twice. nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
the number of references on the filesystem root vnode to be both
expected and released. Many filesystems hold an extra reference on
the filesystem root vnode, which must be accounted for when
determining if the filesystem is busy and then released if it isn't
busy. The old `skipvp' approach required individual filesystem
xxx_unmount functions to re-implement much of vflush()'s logic to
deal with the root vnode.
All 9 filesystems that hold an extra reference on the root vnode
got the logic wrong in the case of forced unmounts, so `umount -f'
would always fail if there were any extra root vnode references.
Fix this issue centrally in vflush(), now that we can.
This commit also fixes a vnode reference leak in devfs, which could
result in idle devfs filesystems that refuse to unmount.
Reviewed by: phk, bp
to struct mount.
This makes the "struct netexport *" paramter to the vfs_export
and vfs_checkexport interface unneeded.
Consequently that all non-stacking filesystems can use
vfs_stdcheckexp().
At the same time, make it a pointer to a struct netexport
in struct mount, so that we can remove the bogus AF_MAX
and #include <net/radix.h> from <sys/mount.h>
An initial tidyup of the mount() syscall and VFS mount code.
This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.
* the guts of the mount work has been moved into vfs_mount().
* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.
* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.
* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)
* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.
* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.
* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.