opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-04-29 18:32:49 -04:00

Author	SHA1	Message	Date
Mateusz Guzik	76583fa294	cache: use counter(9) API to maintain statistics Previously the code would just increment statistics while only holding a shared lock, in effect losing updates. Separate tracking for nchstats is removed as values can be obtained from existing counters. Note that some fields are updated by external consumers and are left unfixed. This should not be a serious issue as this structure looks quite obsolete. No strong objections: kib	2016-01-21 01:04:03 +00:00
Mateusz Guzik	6b53d1bc6f	cache: ansify functions and fix some style issues No functional changes.	2016-01-07 02:04:17 +00:00
Mark Johnston	3616095801	Fix style issues around existing SDT probes. - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week	2015-12-16 23:39:27 +00:00
Andriy Gapon	2f2f522b5d	save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBE SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days	2015-09-28 12:14:16 +00:00
Kirk McKusick	17518b1a2b	Track changes to kern.maxvnodes and appropriately increase or decrease the size of the name cache hash table (mapping file names to vnodes) and the vnode hash table (mapping mount point and inode number to vnode). An appropriate locking strategy is the key to changing hash table sizes while they are in active use. Reviewed by: kib Tested by: Peter Holm Differential Revision: https://reviews.freebsd.org/D2265 MFC after: 2 weeks	2015-09-06 05:50:51 +00:00
Mateusz Guzik	752fc07d33	vfs: implement v_holdcnt/v_usecount manipulation using atomic ops Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free list) of either counter are still guarded with vnode interlock. Reviewed by: kib (earlier version) Tested by: pho	2015-07-16 13:57:05 +00:00
Edward Tomasz Napierala	6289b482ec	Modify kern___getcwd() to take max pathlen limit as an additional argument. This will be used for the Linux emulation layer - for Linux, PATH_MAX is 4096 and not 1024. Differential Revision: https://reviews.freebsd.org/D2335 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-04-21 13:55:24 +00:00
Kirk McKusick	f351915514	More accurately collect name-cache statistics in sysctl functions sysctl_debug_hashstat_nchash() and sysctl_debug_hashstat_rawnchash(). These changes are in preparation for allowing changes in the size of the vnode hash tables driven by increases and decreases in the maximum number of vnodes in the system. Reviewed by: kib@ Phabric: D2265	2015-04-18 00:59:03 +00:00
Dmitry Chagin	9f7a06f27e	Indeed, instead of hiding the kern___getcwd() bug by bogus cast in r276564, change path type to char * (pathnames are always char ). And remove bogus casts of malloc(). kern___getcwd() internally doesn't actually use or support u_char paths, except to copy them to a normal char * path. These changes are not visible to libc as libc/gen/getcwd.c misdeclares __getcwd() as taking a plain char * path. While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as we always have sysproto.h. Pointed out by: bde MFC after: 1 week	2015-01-04 10:34:02 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Sergey Kandaurov	bcdd3bceb6	vn_path_to_global_path: update comment.	2014-08-03 07:59:19 +00:00
Konstantin Belousov	fe20047039	Fix accounting for the negative cache entries when reusing v_cache_dd. Having ncneg diverge with the actual length of the ncneg tailq causes NULL dereference. Add assertion that an entry taken from ncneg queue is indeed negative. Reported by and discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-12-27 17:09:59 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Andriy Gapon	4633a4c379	namecache sdt: freebsd doesn't support structured characters yet :-) MFC after: 7 days	2013-07-09 08:58:34 +00:00
Kirk McKusick	3289d5877a	When renaming a directory from one parent directory to another, we need to call ufs_checkpath() to walk from our new location to the root of the filesystem to ensure that we do not encounter ourselves along the way. Until now, we accomplished this by reading the ".." entries of each directory in our path until we reached the root (or encountered an error). This change tries to avoid the I/O of reading the ".." entries by first looking them up in the name cache and only doing the I/O when the name cache lookup fails. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks	2013-03-20 17:57:00 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Rick Macklem	5e99212d36	Post r230394, the Lookup RPC counts for both NFS clients increased significantly. Upon investigation this was caused by name cache misses for lookups of "..". For name cache entries for non-".." directories, the cache entry serves double duty. It maps both the named directory plus ".." for the parent of the directory. As such, two ctime values (one for each of the directory and its parent) need to be saved in the name cache entry. This patch adds an entry for ctime of the parent directory to the name cache. It also adds an additional uma zone for large entries with this time value, in order to minimize memory wastage. As well, it fixes a couple of cases where the mtime of the parent directory was being saved instead of ctime for positive name cache entries. With this patch, Lookup RPC counts return to values similar to pre-r230394 kernels. Reported by: bde Discussed with: kib Reviewed by: jhb MFC after: 2 weeks	2012-03-03 01:06:54 +00:00
Maxim Konovalov	7dfdd83d56	o Reduce chances for integer overflow. o More verbose sysctl description added. MFC after: 2 weeks Sponsored by: Nginx, Inc.	2012-02-25 12:06:40 +00:00
John Baldwin	bf40d24a3f	Rename cache_lookup_times() to cache_lookup() and retire the old API and ABI stub for cache_lookup().	2012-02-06 17:00:28 +00:00
Konstantin Belousov	d5210589b7	Fix remaining calls to cache_enter() in both NFS clients to provide appropriate timestamps. Restore the assertions which verify that NCF_TS is set when timestamp is asked for. Reviewed by: jhb (previous version) MFC after: 2 weeks	2012-01-25 20:48:20 +00:00
Konstantin Belousov	7a7e609a32	Apparently, both nfs clients do not use cache_enter_time() consistently, creating some namecache entries without NCF_TS flag. This causes panic due to failed assertion. As a temporal relief, remove the assert. Return epoch timestamp for the entries without timestamp if asked. While there, consolidate the code which returns timestamps, into a helper cache_out_ts(). Discussed with: jhb MFC after: 2 weeks	2012-01-23 17:09:23 +00:00
Konstantin Belousov	c2b396f294	Remove the nc_time and nc_ticks elements from struct namecache, and provide struct namecache_ts which is the old struct namecache. Only allocate struct namecache_ts if non-null struct timespec *tsp was passed to cache_enter_time, otherwise use struct namecache. Change struct namecache allocation and deallocation macros into static functions, since logic becomes somewhat twisty. Provide accessor for the nc_name member of struct namecache to hide difference between struct namecache and namecache_ts. The aim of the change is to not waste 20 bytes per small namecache entry. Reviewed by: jhb MFC after: 2 weeks X-MFC-note: after r230394	2012-01-22 01:11:06 +00:00
John Baldwin	5aefb4cbbf	Close a race in NFS lookup processing that could result in stale name cache entries on one client when a directory was renamed on another client. The root cause for the stale entry being trusted is that each per-vnode nfsnode structure has a single 'n_ctime' timestamp used to validate positive name cache entries. However, if there are multiple entries for a single vnode, they all share a single timestamp. To fix this, extend the name cache to allow filesystems to optionally store a timestamp value in each name cache entry. The NFS clients now fetch the timestamp associated with each name cache entry and use that to validate cache hits instead of the timestamps previously stored in the nfsnode. Another part of the fix is that the NFS clients now use timestamps from the post-op attributes of RPCs when adding name cache entries rather than pulling the timestamps out of the file's attribute cache. The latter is subject to races with other lookups updating the attribute cache concurrently. Some more details: - Add a variant of nfsm_postop_attr() to the old NFS client that can return a vattr structure with a copy of the post-op attributes. - Handle lookups of "." as a special case in the NFS clients since the name cache does not store name cache entries for ".", so we cannot get a useful timestamp. It didn't really make much sense to recheck the attributes on the the directory to validate the namecache hit for "." anyway. - ABI compat shims for the name cache routines are present in this commit so that it is safe to MFC. MFC after: 2 weeks	2012-01-20 20:02:01 +00:00
Martin Matuska	9cbe30e1d5	Fix missing in r230129: kern_jail.c: initialize fullpath_disabled to zero vfs_cache.c: add missing dot in comment Reported by: kib MFC after: 1 month	2012-01-15 18:08:15 +00:00
Martin Matuska	f6e633a9e1	Introduce vn_path_to_global_path() This function updates path string to vnode's full global path and checks the size of the new path string against the pathlen argument. In vfs_domount(), sys_unmount() and kern_jail_set() this new function is used to update the supplied path argument to the respective global path. Unbreaks jailed zfs(8) with enforce_statfs set to 1. Reviewed by: kib MFC after: 1 month	2012-01-15 12:08:20 +00:00
Andriy Gapon	7a7ce668ef	put sys/systm.h at its proper place or add it if missing Reported by: lstewart, tinderbox Pointyhat to: avg, attilio MFC after: 1 week MFC with: r228430	2011-12-12 10:05:13 +00:00
Konstantin Belousov	f82360acf2	Existing VOP_VPTOCNP() interface has a fatal flow that is critical for nullfs. The problem is that resulting vnode is only required to be held on return from the successfull call to vop, instead of being referenced. Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination with the VOP_VPTOCNP() interface means that the directory vnode returned from VOP_VPTOCNP() is reclaimed in advance, causing vn_fullpath() to error with EBADF or like. Change the interface for VOP_VPTOCNP(), now the dvp must be referenced. Convert all in-tree implementations of VOP_VPTOCNP(), which is trivial, because vhold(9) and vref(9) are similar in the locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(), if any, should have no trouble with the fix. Tested by: pho Reviewed by: mckusick MFC after: 3 weeks (subject of re approval)	2011-11-19 07:50:49 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Rebecca Cran	8d065a3914	Fix some more style(9) issues.	2010-11-14 16:10:15 +00:00
Rebecca Cran	b389be97db	Fix style(9) issues from r215281 and r215282. MFC after: 1 week	2010-11-14 08:06:29 +00:00
Rebecca Cran	2baa5cddb6	Add some descriptions to sys/kern sysctls. PR: kern/148710 Tested by: Chip Camden <sterling at camdensoftware.com> MFC after: 1 week	2010-11-14 06:09:50 +00:00
Konstantin Belousov	3a40a00d56	Remove sysctl debug.ncnegfactor, it is renamed to vfs.ncnegfactor. MFC: do not	2010-10-30 14:08:26 +00:00
Konstantin Belousov	420cfbb460	Provide vfs.ncsizefactor instead of hard-coding namecache ratio. Move debug.ncnegfactor to vfs.ncnegfactor [1]. Provide some descriptions for the namecache related sysctls [1]. Based on the submission by: Rogier R. Mulhuijzen <drwilco drwilco net> [1] MFC after: 2 weeks X-MFC-note: remove debug.ncnegfactor in HEAD after MFC	2010-10-16 09:44:31 +00:00
Rui Paulo	79856499bd	Add an extra comment to the SDT probes definition. This allows us to get use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]	2010-08-22 11:18:57 +00:00
Ed Schouten	60ae52f785	Use ISO C99 integer types in sys/kern where possible. There are only about 100 occurences of the BSD-specific u_int*_t datatypes in sys/kern. The ISO C99 integer types are used here more often.	2010-06-21 09:55:56 +00:00
Konstantin Belousov	5673e3cb08	The cache_enter(9) function shall not be called for doomed dvp. Assert this. In the reported panic, vdestroy() fired the assertion "vp has namecache for ..", because pseudofs may end up doing cache_enter() with reclaimed dvp, after dotdot lookup temporary unlocked dvp. Similar problem exists in ufs_lookup() for "." lookup, when vnode lock needs to be upgraded. Verify that dvp is not reclaimed before calling cache_enter(). Reported and tested by: pho Reviewed by: kan MFC after: 2 weeks	2010-04-20 10:19:27 +00:00
Konstantin Belousov	3e22320c43	Fix typo. MFC after: 3 days	2010-04-15 17:17:02 +00:00
Konstantin Belousov	8f40845151	Correctly handle unlock for !MAKEENTRY case, after successfull attempt of lock upgrade cache shall be unlocked from write. Reported by: Lucius Windschuh <lwindschuh googlemail com> Reviewed by: kan Approved by: re (rwatson)	2009-08-14 10:57:28 +00:00
Konstantin Belousov	c808c9632d	Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by vn_open_cred in default implementation. Valid struct ucred is needed for audit and MAC, and curthread credentials may be wrong. This further requires modifying the interface of vn_fullpath(9), but it is out of scope of this change. Reviewed by: rwatson	2009-06-21 19:21:01 +00:00
Joe Marcus Clarke	8a4444049e	Unlock the cache lock before returning when we run out of buffer space trying to fill in the full path name. Reported by: David Naylor <naylor.b.david@gmail.com> Approved by: kib	2009-06-05 16:44:42 +00:00
Konstantin Belousov	1358a7957d	Unbreak the build. Add missed probes. Reviewed by: rwatson Pointy hat to: me	2009-05-31 20:16:06 +00:00
Konstantin Belousov	0449e6e1eb	Eliminate code duplication in vn_fullpath1() around the cache lookups and calls to vn_vptocnp() by moving more of the common code to vn_vptocnp(). Rename vn_vptocnp() to vn_vptocnp_locked() to signify that cache is locked around the call. Do not track buffer position by both the pointer and offset, use only buflen to record the start of the free space. Export vn_vptocnp() for external consumers as a wrapper around vn_vptocnp_locked() that locks the cache and handles hold counts. Tested by: pho	2009-05-31 14:57:43 +00:00
Alexander Kabaev	348496ad39	More fallout from negative dotdot caching. Negative entries should be removed from and reinserted to proper ncneg list. Reported by: pho Submitted by: kib	2009-04-17 18:11:11 +00:00
Alexander Kabaev	9cf6772211	Redo previous change using simpler patch that happens to be also more correct. Submitted by: tor	2009-04-14 23:56:48 +00:00
Alexander Kabaev	eed8a9edba	Fix yet another negative dotodot entry fallout. Reported by: pho	2009-04-14 23:46:57 +00:00
Alexander Kabaev	9d75482f99	Fix v_cache_dd handling for negative entries. v_cache_dd pointer was not populated in parent directory if negative entry was being created, yet entry itself was added to the nc_neg list. It was possible for parent vnode to get discarded later, leaving negative entry pointing to now unused memory block. Reported by: dho Revewed by: kib	2009-04-11 20:23:08 +00:00
Konstantin Belousov	fd409594c6	When zapping v_cache_dd for !MAKEENTRY case in cache_lookup(), we shall lock cache as writer. Reviewed by: kan	2009-04-11 16:12:20 +00:00
Konstantin Belousov	3f54086eba	Cache_lookup() for DOTDOT drops dvp vnode lock, allowing dvp to be reclaimed. Check the condition and return ENOENT then. In nfs_lookup(), respect ENOENT return from cache_lookup() when it is caused by dvp reclaim. Reported and tested by: pho	2009-04-10 10:22:44 +00:00
Robert Watson	5d5c174869	Nul-terminate strings in the VFS name cache, which negligibly change the size and cost of name cache entries, but make adding debugging and tracing easier. Add SDT DTrace probes for various namecache events: vfs:namecache:enter:done - new entry in the name cache, passed parent directory vnode pointer, name added to the cache, and child vnode pointer. vfs:namecache:enter_negative:done - new negative entry in the name cache, passed parent vnode pointer, name added to the cache. vfs:namecache:fullpath:enter - call to vn_fullpath1() is made, passed the vnode to resolve to a name. vfs:namecache:fullpath:hit - vn_fullpath1() successfully resolved a search for the parent of an object using the namecache, passed the discovered parent directory vnode pointer, name, and child vnode pointer. vfs:namecache:fullpath:miss - vn_fullpath1() failed to resolve a search for the parent of an object using the namecache, passed the child vnode pointer. vfs:namecache:fullpath:return - vn_fullpath1() has completed, passed the error number, and if that is zero, the vnode to resolve, and the returned path. vfs:namecache:lookup:hit - postive name cache entry hit, passed the parent directory vnode pointer, name, and child vnode pointer. vfs:namecache:lookup:hit_negative - negative name cache entry hit, passed the parent directory vnode pointer and name. vfs:namecache:lookup:miss - name cache miss, passed the parent directory pointer and the full remaining component name (not terminated after the cache miss component). vfs:namecache:purge:done - name cache purge for a vnode, passed the vnode pointer to purge. vfs:namecache:purge_negative:done - name cache purge of negative entries for children of a vnode, passed the vnode pointer to purge. vfs:namecache:purgevfs - name cache purge for a mountpoint, passed the mount pointer. Separate probes will also be invoked for each cache entry zapped. vfs:namecache:zap:done - name cache entry zapped, passed the parent directory vnode pointer, name, and child vnode pointer. vfs:namecache:zap_negative:done - negative name cache entry zapped, passed the parent directory vnode pointer and name. For any probes involving an extant name cache entry (enter, hit, zapp), we use the nul-terminated string for the name component. For misses, the remainder of the path, including later components, is provided as an argument instead since there is no handy nul-terminated version of the string around. This is arguably a bug. MFC after: 1 month Sponsored by: Google, Inc. Reviewed by: jhb, kan, kib (earlier version)	2009-04-07 20:58:56 +00:00
Alexander Kabaev	bb6418cbe3	Revert change 190655 temporarily. It breaks many setups where nullfs is used and needs to be revisited.	2009-04-04 17:48:38 +00:00
Peter Wemm	0e875ecafe	vn_vptocnp() unlocks the name cache and forgets to re-lock it before returning in one error case, and mistakenly unlocks it for the umount -f case.	2009-04-02 21:16:20 +00:00
Alexander Kabaev	607fc40b04	Replace v_dd vnode pointer with v_cache_dd pointer to struct namecache in directory vnodes. Allow namecache dotdot entry to be created pointing from child vnode to parent vnode if no existing links in opposite direction exist. Use direct link from parent to child for dotdot lookups otherwise. This restores more efficient dotdot caching in NFS filesystems which was lost when vnodes stoppped being type stable. Reviewed by: kib	2009-03-29 21:25:40 +00:00
John Baldwin	049ce0934f	When a file lookup fails due to encountering a doomed vnode from a forced unmount, consistently return ENOENT rather than EBADF. Reviewed by: kib MFC after: 1 month	2009-03-24 18:16:42 +00:00
Konstantin Belousov	15fb32c07d	Do not underflow the buffer and then report the problem. Check for the condition before the buffer write. Also, since buflen is unsigned, previous check was ignored. Reviewed by: marcus Tested by: pho	2009-03-20 11:08:57 +00:00
Konstantin Belousov	83817ce3b1	Remove unneeded braces to reduce used vertical screen space. The location was missed in r190140.	2009-03-20 11:03:55 +00:00
Konstantin Belousov	9194007261	Do not forget to adjust buflen for the first resolution of the path from namecache. While there, compare pointers for equiality. Reviewed by: marcus Tested by: pho	2009-03-20 11:00:39 +00:00
Konstantin Belousov	065fc451f8	The nc_nlen member of the struct namecache contains the length of the cached name, not the length + 1. PR: 132620, 132542 Reported by: bf2006a yahoo com Tested by: bf2006a, pho Reviewed by: marcus	2009-03-20 10:59:06 +00:00
Konstantin Belousov	c4a8c2ee24	When ktracing namei operations, log a result of the __getcwd(). MFC after: 1 week	2009-03-20 10:47:16 +00:00
Konstantin Belousov	bf5c835e1c	Remove unneeded braces to reduce used vertical screen space.	2009-03-20 10:04:00 +00:00
John Baldwin	4ab2a9a022	Move the debug.hashstat sysctl tree under DIAGNOSTIC. I measured the debug.hashstat.rawnchash sysctl in particular as taking 7 milliseconds on a 3GHz Intel Xeon (4x2) running 7.1. It accounted for almost a quarter of the total runtime of 'sysctl -a'. It also performs lots of copyout's while holding the namecache lock (this does not attempt to fix that). MFC after: 2 weeks	2009-03-09 19:04:53 +00:00
John Baldwin	03964c8e09	Enable caching of negative pathname lookups in the NFS client. To avoid stale entries, we save a copy of the directory's modification time when the first negative cache entry was added in the directory's NFS node. When a negative cache entry is hit during a pathname lookup, the parent directory's modification time is checked. If it has changed, all of the negative cache entries for that parent are purged and the lookup falls back to using the RPC. This required adding a new cache_purge_negative() method to the name cache to purge only negative cache entries for a given directory. Submitted by: mohans, Rick Macklem, Ricardo Labiaga @ NetApp Reviewed by: mohans	2009-02-19 22:28:48 +00:00
John Baldwin	9078981ab1	Convert the global mutex protecting the directory lookup name cache from a mutex to a reader/writer lock. Lookup operations first grab a read lock and perform the lookup. If the operation results in a need to modify the cache, then it tries to do an upgrade. If that fails, it drops the read lock, obtains a write lock, and redoes the lookup.	2009-01-28 19:05:18 +00:00
John Baldwin	8a7ef10b71	- Mark all standalone INT/LONG/QUAD sysctl's MPSAFE. This is done inside the SYSCTL() macros and thus does not need to be done for all of the nodes scattered across the source tree. - Mark the name-cache related sysctl's (including debug.hashstat.) MPSAFE. - Mark vm.loadavg MPSAFE. - Remove GIANT_REQUIRED from vmtotal() (everything in this routine already has sufficient locking) and mark vm.vmtotal MPSAFE. - Mark the vm.stats.(sys\|vm). sysctls MPSAFE.	2009-01-23 22:49:23 +00:00
Stephen McKay	58c1607e03	Add a limit on namecache entries. In normal operation, the number of cache entries is roughly equal to the number of active vnodes. However, when most of the recently accessed vnodes have many hard links, the number of cache entries can be 32000 times as large, exhausting kernel memory and provoking a panic in kmem_malloc(). MFC after: 2 weeks	2009-01-20 04:21:21 +00:00
Konstantin Belousov	83e73926ad	In r185557, the check for existing negative entry for the given name did not compared nc_dvp with supplied parent directory vnode pointer. Add the check and note that now branches for vp != NULL and vp == NULL are the same, thus can be merged. Reported and reviewed by: kan Tested by: pho MFC after: 2 weeks	2008-12-30 12:51:14 +00:00
Joe Marcus Clarke	4769218f4b	Do not KASSERT when vp->v_dd is NULL. Only directories which have had ".." looked up would have v_dd set to a non-NULL value. This fixes a panic seen when running installworld on a diskless system with a separate /usr file system. Submitted by: cracauer Approved by: kib	2008-12-23 20:43:42 +00:00
Konstantin Belousov	86dcb537c9	Keep the hold on the vnode during VOP_VPTOCNP() call, allowing the vop implementation to drop vnode lock, if needed. Reported and tested by: pho	2008-12-23 20:04:31 +00:00
Joe Marcus Clarke	b9022449b3	Add a new VOP, VOP_VPTOCNP, which translates a vnode to its component name on a best-effort basis. Teach vn_fullpath to use this new VOP if a regular VFS cache lookup fails. This VOP is designed to supplement the VFS cache to provide a better chance that a vnode-to-name lookup will succeed. Currently, an implementation for devfs is being committed. The default implementation is to return ENOENT. A big thanks to kib for the mentorship on this, and to pho for running it through his stress test suite. Reviewed by: arch Approved by: kib	2008-12-12 00:57:38 +00:00
Konstantin Belousov	d6568724e1	Shared lookup makes it possible to create several negative cache entries for one name. Then, creating inode with that name would remove one entry, leaving others dormant. Reclaiming the vnode would uncover negative entries, causing false return of ENOENT from the calls like stat, that do not create inode. Prevent creation of the duplicated negative entries. Reported and debugged with: pho Reviewed by: jhb X-MFC: after shared lookup changes	2008-12-02 11:14:16 +00:00
Joe Marcus Clarke	ef61995ebd	Move vn_fullpath1() outside of FILEDESC locking. This is being done in advance of teaching vn_fullpath1() how to query file systems for vnode-to-name mappings when cache lookups fail. Thanks to kib for guidance and patience on this process. Reviewed by: kib Approved by: kib	2008-11-25 15:36:15 +00:00
John Baldwin	d2722d704c	Part 1 of making shared lookups more resilient with respect to forced unmounts. When we upgrade a vnode lock from shared to exclusive during a name cache lookup, fail the lookup with EBADF if the vnode is invalidated while we are waiting for the exclusive lock. Also, for correctness (though I'm not sure it can occur in practice), downgrade an exclusively locked vnode if it should be share locked. Tested by: pho	2008-09-24 18:51:33 +00:00
John Baldwin	cbb598af66	Sort includes.	2008-09-18 20:04:22 +00:00
John Baldwin	969bf150df	Fix a race condition with concurrent LOOKUP namecache operations for a vnode not in the namecache when shared lookups are enabled (vfs.lookup_shared=1, it is currently off by default) and the filesystem supports shared lookups (e.g. NFS client). Specifically, if multiple concurrent LOOKUPs both miss in the name cache in parallel, each of the lookups may each end up adding an entry to the namecache resulting in duplicate entries in the namecache for the same pathname. A subsequent removal of the mapping of that pathname to that vnode (via remove or rename) would only evict one of the entries from the name cache. As a result, subseqent lookups for that pathname would still return the old vnode. This race was observed with shared lookups over NFS where a file was updated by writing a new file out to a temporary file name and then renaming that temporary file to the "real" file to effect atomic updates of a file. Other processes on the same client that were periodically reading the file would occasionally receive an ESTALE error from open(2) because the VOP_GETATTR() in nfs_open() would receive that error when given the stale vnode. The fix here is to check for duplicates in cache_enter() and just return if an entry for this same directory and leaf file name for this vnode is already in the cache. The check for duplicates is done by walking the per-vnode list of name cache entries. It is expected that this list should be very small in the common case (usually 0 or 1 entries during a cache_enter() since most files only have 1 "leaf" name). Reviewed by: ups, scottl MFC after: 2 months	2008-08-23 15:13:39 +00:00
Alfred Perlstein	cbd3ba3edf	Prevent crashes due to unlocked access to hash buckets in two sysctls. Use CACHE_LOCK to prevent crashes. Sysctls fixed: debug.hashstat.nchash and debug.hashstat.rawnchash. Obtained from: Juniper Networks MFC After: 1 week	2008-08-16 21:48:10 +00:00
Christian S.J. Peron	dfc714fba1	Currently, BSM audit pathname token generation for chrooted or jailed processes are not producing absolute pathname tokens. It is required that audited pathnames are generated relative to the global root mount point. This modification changes our implementation of audit_canon_path(9) and introduces a new function: vn_fullpath_global(9) which performs a vnode -> pathname translation relative to the global mount point based on the contents of the name cache. Much like vn_fullpath, vn_fullpath_global is a wrapper function which called vn_fullpath1. Further, the string parsing routines have been converted to use the sbuf(9) framework. This change also removes the conditional acquisition of Giant, since the vn_fullpath1 method will not dip into file system dependent code. The vnode locking was modified to use vhold()/vdrop() instead the vref() and vrele(). This will modify the hold count instead of modifying the user count. This makes more sense since it's the kernel that requires the reference to the vnode. This also makes sure that the vnode does not get recycled we hold the reference to it. [1] Discussed with: rwatson Reviewed by: kib [1] MFC after: 2 weeks	2008-07-31 16:57:41 +00:00
Pawel Jakub Dawidek	b03d720760	- Use LK_TYPE_MASK where needed. Actually after sys/sys/lockmgr.h:1.69 it is no longer needed, but for now we still want to be consistent with other similar checks in the tree. - Call ASSERT_VOP_ELOCKED() only when vget() returns 0. Reviewed by: jeff	2008-04-09 20:19:55 +00:00
Konstantin Belousov	0a3af16a75	Add the utility function vn_commname() to retrieve the command name from the vfs namecache, when available. Reviewed by: rwatson, rdivacky Tested by: pho	2008-03-31 11:53:03 +00:00
Robert Watson	237fdd787b	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink	2008-03-16 10:58:09 +00:00
Attilio Rao	81c794f998	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>	2008-02-25 18:45:57 +00:00
Attilio Rao	22db15c06f	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>	2008-01-13 14:44:15 +00:00
Attilio Rao	cb05b60a89	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>	2008-01-10 01:10:58 +00:00
Kris Kennaway	e6d64a0f15	Remove remaining Giant acquisition around vn_fullpath1. This was missed in r1.106 and has not been required for some years now. Reviewed by: jeff MFC After: 1 week	2007-11-22 21:26:25 +00:00
Pawel Jakub Dawidek	b4d7e2983c	Fix some locking cases where we ask for exclusively locked vnode, but we get shared locked vnode in instead when vfs.lookup_shared is set to 1. Discussed with: kib, kris Tested by: kris Approved by: re (kensmith)	2007-09-21 10:16:56 +00:00
Pawel Jakub Dawidek	dfe97ff4a5	We only flush entries related to the given file system. Currently there are no 'invalid' cache entires - file system is responsible for keeping it that way. The comment should have been updated in rev.1.25.	2007-06-18 09:28:24 +00:00
Pawel Jakub Dawidek	6e042171bd	To avoid a deadlock when handling .. directory during a lookup, we unlock parent vnode and relock it after locking child vnode. The problem was that we always relock it exclusively, even when it was share-locked. Discussed with: jeff	2007-05-25 22:23:38 +00:00
Pawel Jakub Dawidek	b4c85af977	We no longer need to put namecache entries onto temporary mplist. It was useful in revision 1.86, but should have been removed in 1.89.	2007-05-25 22:19:49 +00:00
Pawel Jakub Dawidek	950afe9972	The cache_leaf_test() function seems to be unused, so remove it.	2007-05-25 22:16:17 +00:00
Pawel Jakub Dawidek	f013ccb768	- Remove redundant initialization. - Compare pointer with NULL.	2007-05-22 23:05:48 +00:00
Robert Watson	5e3f7694b1	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff	2007-04-04 09:11:34 +00:00
Robert Watson	873fbcd776	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.	2007-03-05 13:10:58 +00:00
Christian S.J. Peron	4f0840f348	Axe Giant from vn_fullpath(9). The vnode -> pathname lookup should be filesystem agnostic. We are not touching any file system specific functions in this code path. Since we have a cache lock, there is really no need to keep Giant around here. This eliminates Giant acquisitions for any syscall which is auditing pathnames. Discussed with: jeff	2006-06-16 05:09:28 +00:00
John-Mark Gurney	e98b5a89de	remove duplicate sizeof vnode entry (debug.sizeof.vnode already existed)... move ncsize into debug.sizeof and rename to namecache...	2006-04-16 18:38:30 +00:00
Jeff Roberson	2f0bca553a	- Don't check v_mount for NULL to determine if a vnode has been recycled. Use the more appropriate VI_DOOMED flag instead. Sponsored by: Isilon Systems, Inc. MFC After: 1 week	2006-02-06 10:15:27 +00:00
Jeff Roberson	32b6dcd8a4	- Fix a leaked reference to a vnode via v_dd. We rely on cache_purge() and cache_zap() to clear the v_dd pointers when a directory vnode is forcibly discarded. For this to work, all vnodes with v_dd pointers to a directory must also have name cache entries linked via v_cache_dst to that dvp otherwise we could not find them at cache_purge() time. The following code snipit could break this guarantee by unlinking a directory before fetching it's dotdot. The dotdot lookup would initialize the v_dd field of the unlinked directory which could never be cleared. To fix this we don't initialize v_dd for orphaned vnodes. printf("rmdir: %d\n", rmdir("../foo")); /* foo is cwd */ printf("chdir: %d\n", chdir("..")); printf("%s\n", getwd(NULL)); Sponsored by: Isilon Systems, Inc. Discovered by: kkenn Approved by: re (blanket vfs)	2005-06-17 01:05:13 +00:00
Jeff Roberson	6bd8103d33	- Clear v_dd in cache_zap() instead of cache_purge() as cache_purge() may not be called in all cases where we free the cnp. Sponsored by: Isilon Systems, Inc.	2005-06-13 05:59:59 +00:00
Jeff Roberson	eff2d12635	- Add KTR_VFS messages for various name cache related events. Sponsored by: Isilon Systems, Inc.	2005-06-13 00:46:03 +00:00
Jeff Roberson	1b2da2d0fa	- Assert that we're not adding a doomed vnode to the name cache. Sponsored by: Isilon Systems, Inc.	2005-06-11 08:47:30 +00:00
Jeff Roberson	4585e3ac5a	- Change all filesystems and vfs_cache to relock the dvp once the child is locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details. Sponsored by: Isilon Systems, Inc.	2005-04-13 10:59:09 +00:00
David Schultz	7ce7f713ee	Eliminate v_id and v_ddid. The name cache now holds references to vnodes whose names it caches, so we no longer need a `generation number' to tell us if a referenced vnode is invalid. Replace the use of the parent's v_id in the hash function with the address of the parent vnode. Tested by: Peter Holm Glanced at by: jeff, phk	2005-03-30 03:01:36 +00:00
David Schultz	dd33f0d92f	Merge kern___cwd() and vn_fullpath(), which were virtually identical, except for places where people forget to update one of them. We now collect only one set of stats for both of these routines. Other changes in this commit include: - Start acquiring Giant again in vn_fullpath(), since it is required when crossing a mount point. - Expand the scope of the cache lock to avoid dropping it and picking it up again for every pathname component. This also makes it trivial to avoid races in stats collection. - Assert that nc_dvp == v_dd for directories instead of returning an error to userland when this is not true. AFAIK, it should always be true when v_dd is non-null. - For vn_fullpath(), handle the first (non-directory) vnode separately. Glanced at by: jeff, phk	2005-03-30 02:59:32 +00:00
Jeff Roberson	5280e61f2f	- Move the logic that locks and refs the new vnode from vfs_cache_lookup() to cache_lookup(). This allows us to acquire the vnode interlock before dropping the cache lock. This protects the vnodes identity until we have locked it. Sponsored by: Isilon Systems, Inc.	2005-03-29 12:59:06 +00:00
Jeff Roberson	571211c454	- Get rid of the old LOOKUP_SHARED code. namei() now supplies the proper lock flags via cn_lkflag. Sponsored by: Isilon Systems, Inc.	2005-03-29 10:08:23 +00:00
Jeff Roberson	b75719afea	- Invalidate the childrens v_dd pointers when we cache_purge() a directory. Otherwise the stale pointer may be accessed after a vnode is freed. Sponsored by: Isilon Systems, Inc.	2005-03-29 09:58:41 +00:00
Jeff Roberson	f7b404d88f	- Remove an unused variable. Sponsored by: Isilon Systems, Inc.	2005-03-28 13:29:48 +00:00
Jeff Roberson	ee5a0a2d7c	- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us. Sponsored by: Isilon Systems, Inc.	2005-03-28 09:26:17 +00:00
Jeff Roberson	fdd6a3ff3c	- All of the bugs which lead to the complication of the LOOKUP_SHARED config option have now been fixed. All filesystems are properly locked and checked via DEBUG_VFS_LOCKS. Remove the workaround code. Sponsored by: Isilon Systems, Inc.	2005-03-24 06:00:45 +00:00
Poul-Henning Kamp	2adc2b87c7	Make a SYSCTL_NODE and a mutex static	2005-02-10 12:16:42 +00:00
Jeff Roberson	799cc2dcee	- Simplify the cache locking. The lock order relationship with the vnode lock is much simpler than I originally thought it would be. Now, the cache lock is always acquired before the vnode lock. - Provide some gotos in __getcwd() to simplify the unlocking a bit. - Move Giant acquisition down into __getcwd(). Sponsored By: Isilon Systems, Inc.	2005-01-24 10:24:12 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
Warner Losh	7f8a436ff2	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core	2004-04-05 21:03:37 +00:00
Jeff Roberson	98d7d155c1	- Apply a big giant lock around the namecache. This has been sitting in my tree since BSDcon.	2003-10-05 07:13:50 +00:00
Dag-Erling Smørgrav	c2935410f6	Make the VFS cache use zones instead of malloc(9). This results in a small but noticeable increase in performance for name lookup operations. The code uses two zones, one for short names (less than 32 characters) and one for long names (up to NAME_MAX). Since most file names are fairly short, this saves a considerable amount of space that would otherwise be wasted if we always allocated NAME_MAX bytes. The cutoff value of 32 characters was picked arbitrarily and may benefit from some tweaking; it could also be made into a tunable. Submitted by: hmp	2003-06-13 08:46:13 +00:00
Dag-Erling Smørgrav	ffe92432e3	Whitespace cleanup.	2003-06-11 07:35:56 +00:00
David E. O'Brien	677b542ea2	Use __FBSDID().	2003-06-11 00:56:59 +00:00
Poul-Henning Kamp	cc34e37e5b	Backout the getcwd changes, a more comprehensive effort will be needed.	2003-03-20 10:40:45 +00:00
Poul-Henning Kamp	9eaf5abceb	(This commit certainly increases the need for a wash&clean of vfs_cache.c, but I decided that it was important for this patch to not bit-rot, and since it is mainly moving code around, the total amount of entropy is epsilon /phk) This is a patch to move the common parts of linux_getcwd() back into kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's extended functionality. The linux syscall linux_getcwd() in compat/linux/linux_getcwd.c has been rewritten to use it too. It should be possible to simplify libc's getcwd() after this. No doubt this code needs some cleaning up, since I've left in the sysctl variables I used for debugging. PR: 48169 Submitted by: James Whitwell <abacau@yahoo.com.au>	2003-03-17 12:21:08 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Andrew R. Reiter	1f5a94d5f6	- Update a couple of comments to make sense with what today's code is doing (stale comments make arr something something ;)).	2003-02-15 23:25:12 +00:00
Andrew R. Reiter	da8f0c8429	- Remove old comment for PURGE() as it no longer exists and implied it was a comment to cache_zap(). - Add a comment to quickly state what cache_zap() does. Reviewed by: phk, mux	2003-02-15 18:58:06 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Ian Dowse	48b52b7a32	Split up __getcwd so that kernel callers of the internal version can specify whether the buffer is in user or system space.	2002-09-02 22:40:30 +00:00
Jeff Roberson	18c6acee26	- Move a VOP assert to the right place. Spotted by: i386 tinderbox	2002-08-05 08:55:53 +00:00
Jeff Roberson	e6e370a7fe	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS	2002-08-04 10:29:36 +00:00
Alfred Perlstein	210a5a7169	nuke caddr_t.	2002-06-28 23:17:36 +00:00
Jeff Roberson	0e2d6cc899	Disable the shared locking namei() code for now. It breaks several stacking filesystems. This is on hold until the rest of VFS Locking is reviewed and deemed safe. It can be enabled with 'options LOOKUP_SHARED'.	2002-05-14 21:59:49 +00:00
Jeff Roberson	a59f8b9e6c	Turn #ifdef LOOKUP_SHARED into #ifndef LOOKUP_EXCLUSIVE to enable this behavior by default. Also, change the options line to reflect this. If there are no problems reported this will become the only behavior and the knob will be removed in a month or so. Demanded by: obrien	2002-04-09 05:14:17 +00:00
David Malone	cf4ce70bb3	Remove a comment which relates to the old name cache code, which was replaced in 1997. Approved by: phk	2002-04-07 08:58:31 +00:00
Alfred Perlstein	4d77a549fe	Remove __P.	2002-03-19 21:25:46 +00:00
Jeff Roberson	8de00f4a87	This patch adds the "LOCKSHARED" option to namei which causes it to only acquire shared locks on leafs. The stat() and open() calls have been changed to make use of this new functionality. Using shared locks in these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes. Also, this reduces the number of exclusive locks that are floating around in the system, which helps reduce the number of deadlocks that occur. A new kernel option "LOOKUP_SHARED" has been added. It defaults to off so this patch can be turned on for testing, and should eventually go away once it is proven to be stable. I have personally been running this patch for over a year now, so it is believed to be fully stable. Reviewed by: jake, obrien Approved by: jake	2002-03-12 04:00:11 +00:00
Eivind Eklund	eb8e6d5276	Document all functions, global and static variables, and sysctls. Includes some minor whitespace changes, and re-ordering to be able to document properly (e.g, grouping of variables and the SYSCTL macro calls for them, where the documentation has been added.) Reviewed by: phk (but all errors are mine)	2002-03-05 15:38:49 +00:00
Poul-Henning Kamp	362912ebcc	Remove cache_purgeleafdirs(), it has been #if 0 for quite some time.	2002-02-17 20:40:29 +00:00
Alfred Perlstein	9e209b124a	Include sys/_lock.h and sys/_mutex.h to reduce namespace pollution. Requested by: jhb	2002-01-13 21:37:49 +00:00
Alfred Perlstein	426da3bcfb	SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file fp); / increments reference count on a file / struct file fhold_locked(struct file fp); / like fhold but expects file to locked / struct file ffind_hold(struct thread , int fd); / finds the struct file in thread, adds one reference and returns it unlocked / struct file ffind_lock(struct thread , int fd); / ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.	2002-01-13 11:58:06 +00:00
Dag-Erling Smørgrav	45fb069ac9	Convert textvp_fullpath() into the more generic vn_fullpath() which takes a struct thread * and a struct vnode * instead of a struct proc *. Temporarily add a textvp_fullpath macro for compatibility.	2001-10-21 15:52:51 +00:00
Matthew Dillon	b5810bab2d	After extensive testing it has been determined that adding complexity to avoid removing higher level directory vnodes from the namecache has no perceivable effect and will be removed. This is especially true when vmiodirenable is turned on, which it is by default now. ( vmiodirenable makes a huge difference in directory caching ). The vfs.vmiodirenable and vfs.nameileafonly sysctls have been left in to allow further testing, but I expect to rip out vfs.nameileafonly soon too. I have also determined through testing that the real problem with numvnodes getting too large is due to the VM Page cache preventing the vnode from being reclaimed. The directory stuff made only a tiny dent relative to Poul's original code, enough so that some tests succeeded. But tests with several million small files show that the bigger problem is the VM Page cache. This will have to be addressed by a future commit. MFC after: 3 days	2001-10-01 04:33:35 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Ian Dowse	7476f7e87d	Fix a memory leak in __getcwd() that can occur after a filesystem has been forcibly unmounted. If the filesystem root vnode is reached and it has no associated mountpoint (vp->v_mount == NULL), __getcwd would return without freeing 'buf'. Add the missing free() call. PR: kern/30306 Submitted by: Mike Potanin <potanin@mccme.ru> MFC after: 1 week	2001-09-04 19:03:47 +00:00
Mark Murray	fb919e4d5a	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)	2001-05-01 08:13:21 +00:00
Greg Lehey	60fb0ce365	Revert consequences of changes to mount.h, part 2. Requested by: bde	2001-04-29 02:45:39 +00:00
Greg Lehey	d98dc34f52	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
Seigo Tanimura	759cb26335	Reclaim directory vnodes held in namecache if few free vnodes are available. Only directory vnodes holding no child directory vnodes held in v_cache_src are recycled, so that directory vnodes near the root of the filesystem hierarchy remain in namecache and directory vnodes are not reclaimed in cascade. The period of vnode reclaiming attempt and the number of vnodes attempted to reclaim can be tuned via sysctl(2). Suggested by: tegge Approved by: phk	2001-04-18 11:19:50 +00:00
Peter Wemm	9d10eb0c0c	Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to enable easy access to the hash chain stats. The raw prefixed versions dump an integer array to userland with the chain lengths. This cheats and calls it an array of 'struct int' rather than 'int' or sysctl -a faithfully dumps out the 128K array on an average machine. The non-raw versions return 4 integers: count, number of chains used, maximum chain length, and percentage utilization (fixed point, multiplied by 100). The raw forms are more useful for analyzing the hash distribution, while the other form can be read easily by humans and stats loggers.	2001-04-11 00:39:20 +00:00
Peter Wemm	439fea92c2	Use the same API as the example code. Allow the initial hash value to be passed in, as the examples do. Incrementally hash in the dvp->v_id (using the official api) rather than add it. This seems to help power-of-two predictable filename trees where the filenames repeat on a power-of-two cycle and the directory trees have power-of-two components in it. The simple add then mask was causing things like 12000+ entry collision chains while most other entries have between 0 and 3 entries each. This way seems to improve things.	2001-03-20 02:10:18 +00:00
Peter Wemm	6eb39ac8fc	Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash). Make the name cache hash as well as the nfsnode hash use it. As a special tweak, create an unsigned version of register_t. This allows us to use a special tweak for the 64 bit versions that significantly speeds up the i386 version (ie: int64 XOR int64 is slower than int64 XOR int32). The code layout is a little strange for the string function, but I was able to get between 5 to 10% improvement over the original version I started with. The layout affects gcc code generation choices and this way was fastest on x86 and alpha. Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is around 45% faster with -march=pentiumpro on a p6 cpu.	2001-03-17 09:31:06 +00:00
Poul-Henning Kamp	959b7375ed	Staticize some malloc M_ instances.	2000-12-08 20:09:00 +00:00
Peter Wemm	138e514cb5	Untangle vfsinit() a bit. Use seperate sysinit functions rather than having a super-function calling bits all over the place.	2000-12-06 07:09:08 +00:00
Robert Watson	aa5429970c	o Export nchstats ("VFS cache effectiveness statistics") using SYSCTL_OPAQUE. This removes a reason that systat requires setgid kmem. More to come.	2000-11-20 00:41:11 +00:00
Boris Popov	3ff1a2f43e	Add new flag PDIRUNLOCK to the component.cn_flags which should be set by filesystem lookup() routine if it unlocks parent directory. This flag should be carefully tracked by filesystems if they want to work properly with nullfs and other stacked filesystems. VFS takes advantage of this flag to perform symantically correct usage of vrele() instead of vput() if parent directory already unlocked. If filesystem fails to track this flag then previous codepath in VFS left unchanged. Convert UFS code to set PDIRUNLOCK flag if necessary. Other filesystmes will be changed after some period of testing. Reviewed in general by: mckusick, dillon, adrian Obtained from: NetBSD	2000-09-17 07:26:42 +00:00
Boris Popov	67b23794b1	Change variable naming to be consistent with the rest of VFS code. Reduce number of indirections by using already fetched values.	2000-09-10 03:46:12 +00:00
John Baldwin	9701cd40b4	Support for unsigned integer and long sysctl variables. Update the SYSCTL_LONG macro to be consistent with other integer sysctl variables and require an initial value instead of assuming 0. Update several sysctl variables to use the unsigned types. PR: 15251 Submitted by: Kelly Yancey <kbyanc@posi.net>	2000-07-05 07:46:41 +00:00
Jake Burkholder	e39756439c	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others	2000-05-26 02:09:24 +00:00
Jake Burkholder	740a1973a6	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd	2000-05-23 20:41:01 +00:00
Brian Feldman	b7db19017b	Move procfs_fullpath() to vfs_cache.c, with a rename to textvp_fullpath(). There's no excuse to have code in synthetic filestores that allows direct references to the textvp anymore. Feature requested by: msmith Feature agreed to by: warner Move requested by: phk Move agreed to by: bde	2000-04-26 11:57:45 +00:00
Brian Feldman	8a2852b12f	Move the declaration of "struct namecache" to vnode.h, as it can be useful elsewhere. Note, of course, that in an ideal world nothing should need to see our VFS implementation :-/	2000-04-22 03:44:00 +00:00
Peter Wemm	194a0b6c97	Avoid a panic in __getcwd(2) when combined with umount -f.	2000-02-14 06:09:01 +00:00
Poul-Henning Kamp	3b6fb88590	Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.	1999-10-03 12:18:29 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Poul-Henning Kamp	22f054e258	Fix a braino in the v_id wraparound code. Give more (current) details in comment. PR: 11307 Spotted by: Ville-Pertti Keinonen <will@iki.fi>	1999-04-24 17:58:14 +00:00
Bruce Evans	355a2610a7	Don't use CTL_VFS at the wrong level. This caused loops in the sysctl tree if CTL_VFS happened to get assigned as a type number to a vfs that has some vfs sysctls.	1998-09-09 07:41:41 +00:00
Bruce Evans	1aa9ea7cb9	Removed some bogus casts.	1997-12-19 23:18:37 +00:00
Poul-Henning Kamp	4a11ca4e29	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused	1997-11-07 08:53:44 +00:00
Poul-Henning Kamp	cec0f20ce7	VFS mega cleanup commit (x/N) 1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here. 2. Change VOP_BLKATOFF to a normal function in cd9660. 3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead. 4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done. 5. Fix another VCALL in vfs_cache.c (thanks Bruce!)	1997-10-16 10:50:27 +00:00
Poul-Henning Kamp	138ec1f71a	vnops megacommit 1. Use the default function to access all the specfs operations. 2. Use the default function to access all the fifofs operations. 3. Use the default function to access all the ufs operations. 4. Fix VCALL usage in vfs_cache.c 5. Use VOCALL to access specfs functions in devfs_vnops.c 6. Staticize most of the spec and fifofs vnops functions. 7. Make UFS panic if it lacks bits of the underlying storage handling.	1997-10-15 13:24:07 +00:00
Poul-Henning Kamp	46c320bab8	Add one more counter so we can truly find out how good our name cache is. If we don't find something and don't what to have found something, it's actually a success.	1997-09-24 15:54:10 +00:00
Poul-Henning Kamp	0054419366	A couple of handles to tweak, more statistics.	1997-09-24 07:46:54 +00:00
Poul-Henning Kamp	4d1122bdd6	Revert to the previous hashing, double the hashtable size instead.	1997-09-04 08:24:44 +00:00
Poul-Henning Kamp	119b6f4cf2	Use 2^N hash sizes rather than primesize, this replaces a division with an and. (Submitted by davidg) Preemptively record ".." values. Reviewed by: phk	1997-09-03 09:20:17 +00:00
Bruce Evans	e4ba6a82b0	Removed unused #includes.	1997-09-02 20:06:59 +00:00
Poul-Henning Kamp	a051452ae2	Change the 0xdeadb hack to a flag called VDOOMED. Introduce VFREE which indicates that vnode is on freelist. Rename vholdrele() to vdrop(). Create vfree() and vbusy() to add/delete vnode from freelist. Add vfree()/vbusy() to keep (v_holdcnt != 0 \|\| v_usecount != 0) vnodes off the freelist. Generalize vhold()/v_holdcnt to mean "do not recycle". Fix reassignbuf()s lack of use of vhold(). Use vhold() instead of checking v_cache_src list. Remove vtouch(), the vnodes are always vget'ed soon enough after for it to have any measuable effect. Add sysctl debug.freevnodes to keep track of things. Move cache_purge() up in getnewvnodes to avoid race. Decrement v_usecount after VOP_INACTIVE(), put a vhold() on it during VOP_INACTIVE() Unmacroize vhold()/vdrop() Print out VDOOMED and VFREE flags (XXX: should use %b) Reviewed by: dyson	1997-08-31 07:32:39 +00:00
Poul-Henning Kamp	0fa2443f0e	Uncut&paste cache_lookup(). This unifies several times in theory indentical 50 lines of code. The filesystems have a new method: vop_cachedlookup, which is the meat of the lookup, and use vfs_cache_lookup() for their vop_lookup method. vfs_cache_lookup() will check the namecache and pass on to the vop_cachedlookup method in case of a miss. It's still the task of the individual filesystems to populate the namecache with cache_enter(). Filesystems that do not use the namecache will just provide the vop_lookup method as usual.	1997-08-26 07:32:51 +00:00
Poul-Henning Kamp	2401f27c25	remove unused MAXVNODEUSE macro.	1997-08-04 07:31:36 +00:00
Poul-Henning Kamp	b15a966ec6	1. Add a {pointer, v_id} pair to the vnode to store the reference to the ".." vnode. This is cheaper storagewise than keeping it in the namecache, and it makes more sense since it's a 1:1 mapping. 2. Also handle the case of "." more intelligently rather than stuff the namecache with pointless entries. 3. Add two lists to the vnode and hang namecache entries which go from or to this vnode. When cleaning a vnode, delete all namecache entries it invalidates. 4. Never reuse namecache enties, malloc new ones when we need it, free old ones when they die. No longer a hard limit on how many we can have. 5. Remove the upper limit on namelength of namecache entries. 6. Make a global list for negative namecache entries, limit their number to a sysctl'able (debug.ncnegfactor) fraction of the total namecache. Currently the default fraction is 1/16th. (Suggestions for better default wanted!) 7. Assign v_id correctly in the face of 32bit rollover. 8. Remove the LRU list for namecache entries, not needed. Remove the #ifdef NCH_STATISTICS stuff, it's not needed either. 9. Use the vnode freelist as a true LRU list, also for namecache accesses. 10. Reuse vnodes more aggresively but also more selectively, if we can't reuse, malloc a new one. There is no longer a hard limit on their number, they grow to the point where we don't reuse potentially usable vnodes. A vnode will not get recycled if still has pages in core or if it is the source of namecache entries (Yes, this does indeed work :-) "." and ".." are not namecache entries any longer...) 11. Do not overload the v_id field in namecache entries with whiteout information, use a char sized flags field instead, so we can get rid of the vpid and v_id fields from the namecache struct. Since we're linked to the vnodes and purged when they're cleaned, we don't have to check the v_id any more. 12. NFS knew about the limitation on name length in the namecache, it shouldn't and doesn't now. Bugs: The namecache statistics no longer includes the hits for ".." and "." hits. Performance impact: Generally in the +/- 0.5% for "normal" workstations, but I hope this will allow the system to be selftuning over a bigger range of "special" applications. The case where RAM is available but unused for cache because we don't have any vnodes should be gone. Future work: Straighten out the namecache statistics. "desiredvnodes" is still used to (bogusly ?) size hash tables in the filesystems. I have still to find a way to safely free unused vnodes back so their number can shrink when not needed. There is a few uses of the v_id field left in the filesystems, scheduled for demolition at a later time. Maybe a one slot cache for unused namecache entries should be implemented to decrease the malloc/free frequency.	1997-05-04 09:17:38 +00:00
Bruce Evans	d8d6519c63	Fixed the hash formula. Lite2 doesn't have phashinit(), so Lite2's hash formula uses `& nchash'. This is very broken when nchash is a prime number instead of 1 less than a power of 2, but the Lite2 formula was merged in. Merged some cosmetic changes from Lite2, rev.1.21 and Lite1. The merge was difficult because the Lite2 code is essentially ours (phk's) except where Lite2 improved or broke it. Summary of the Lite2 changes: - in the copyright, phk's rights have been transferred to the Regents. This change should be reviewed. - nchENOENT went away; the "no" vnode is now simply 0. - comments were improved. - style was "improved". - goto instead of Fanatism (sic) was considered bad :-). - there are some small changes to support whiteouts. - new cache entries are added in more cases. More work is required near here to change the hash table size if kern.desiredvnodes is changed using sysctl. - rescanning of the hash bucket in cache_purgevfs() was removed. This change should be reviewed.	1997-03-08 15:22:14 +00:00
Peter Wemm	6875d25465	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.	1997-02-22 09:48:43 +00:00
John Dyson	996c772f58	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>	1997-02-10 02:22:35 +00:00
Jordan K. Hubbard	1130b656e5	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.	1997-01-14 07:20:47 +00:00
John Dyson	bd7e5f992e	Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.	1996-01-19 04:00:31 +00:00
Poul-Henning Kamp	79c0c4b77f	kern_conf.c: remove a now unused variable. vfs_cache.c: Fix a very rare probelm in the vnode-cache. Submitted by: Terry Lambert <terry@lambert.org>	1995-12-22 15:56:35 +00:00
Poul-Henning Kamp	f708ef1b9e	Another mega commit to staticize things.	1995-12-14 09:55:16 +00:00
Poul-Henning Kamp	a98ca4699e	Second batch of cleanup changes. This time mostly making a lot of things static and some unused variables here and there.	1995-10-29 15:33:36 +00:00
Bruce Evans	28f8db1403	Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.	1995-07-29 11:44:31 +00:00
Rodney W. Grimes	9b2e535452	Remove trailing whitespace.	1995-05-30 08:16:23 +00:00
David Greenman	cf8ad5100d	Fixed serious off by one bug I introduced that will likely cause the machine to panic whenever the name cache fills up. Submitted by: John Dyson	1995-04-15 00:49:35 +00:00
David Greenman	22e53424b2	kern_subr.c: Added a new type to uiomove - "UIO_NOCOPY" which causes it to update pointers and counts, but doesn't do any data copying. This is needed for upcoming changes to the way that the vnode pager does its page outs. Added a new hash init function call "phashinit" that allocates and initializes a prime number sized hash table. vfs_cache.c: Changed hashing algorithm to use the remainder of dividing by a prime number to improve the distribution characteristcs. Uses new phashinit function in kern_subr.c.	1995-04-04 02:01:13 +00:00
David Greenman	d7e3d98a5e	Patch from Kirk McKusick to fix a bug introduced in the Poul's vfs_cache rewrite.	1995-03-19 09:33:51 +00:00
Poul-Henning Kamp	47f196941e	Update a couple of counters.	1995-03-12 02:01:20 +00:00
David Greenman	914e6eb70d	Whoops, back out that last change - I misread what Poul had done there.	1995-03-10 20:29:51 +00:00
David Greenman	dbd90d413f	Don't thrash the name cache while trying to fill up the object cache. (Make a new cache entry until desiredvnodes is reached).	1995-03-10 20:26:29 +00:00
Poul-Henning Kamp	b2e10d6d6f	Clean up and improve the namecache. 1. We always keep one 16th of the vnodes on the freelist, so that the namecache doesn't get trashed. It used to be that it wasn't a problem, but the only vnodes getting released these days are directories and things which gets forced out of the VM/cache. The latter is not numerous enough to keep the pool of vnodes needed for the namecache sufficiently big. 2. Purge invalid entries in the namecache as soon as we notice them. This avoids a stale entry pushing out a valid entry on the LRU list. 3. Speed up the lookup in the namecache by avoid a special case branch. 4. Make the cache purge routines do the thing they're supposed to, and in a decently efficient manner. 5. Make the size of the namecache follow the number of vnodes, so that we can always point to all the vnodes we have in core. 6. Readability has gone way up. 7. Added a "options NCH_STATISTICS" feature that will gather more detailed statistics on the performance of the namecache. Reviewed by: davidg	1995-03-09 20:23:45 +00:00
Poul-Henning Kamp	a0e8a1e29b	Another little optimization to the nameicache. If an entry is stale, ditch it.	1995-03-08 01:40:44 +00:00
Poul-Henning Kamp	2425396b27	Improve the quality of the hash used in the namei-cache.	1995-03-08 01:08:03 +00:00
Poul-Henning Kamp	30f467d84a	Update vfs_cache.c to use the <sys/queue.h> macros. This makes it easier to read, but doesn't change the speed. Reviewed by: phk Obtained from: via NetBSD	1995-03-06 06:45:52 +00:00
Poul-Henning Kamp	797f2d22f0	All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.	1994-10-02 17:35:40 +00:00
David Greenman	3c4dd3568f	Added $Id$	1994-08-02 07:55:43 +00:00
Rodney W. Grimes	26f9a76710	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman	1994-05-25 09:21:21 +00:00
Rodney W. Grimes	df8bae1de4	BSD 4.4 Lite Kernel Sources	1994-05-24 10:09:53 +00:00

... 2 3 4 5 6 ...

348 commits