Commit graph

17679 commits

Author SHA1 Message Date
Mateusz Guzik
e35406c8f7 cache: lockless reverse lookup
This enables fully scalable operation for getcwd and significantly improves
realpath.

For example:
PATH_CUSTOM=/usr/src ./getcwd_processes -t 104
before:  1550851
after: 380135380

Tested by:	pho
2020-08-24 09:00:57 +00:00
Mateusz Guzik
feabaaf995 cache: drop the always curthread argument from reverse lookup routines
Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs.

Tested by:	pho
2020-08-24 08:57:02 +00:00
Mateusz Guzik
f0696c5e4b cache: perform reverse lookup using v_cache_dd if possible
Tested by:	pho
2020-08-24 08:55:55 +00:00
Mateusz Guzik
ce575cd0e2 cache: populate v_cache_dd for non-VDIR entries
It makes v_cache_dd into a little bit of a misnomer and it may be addressed later.

Tested by:	pho
2020-08-24 08:55:04 +00:00
Mateusz Guzik
f0d9c77e52 vfs: validate ndp state after the lookup
The intent is to remove known-to-be-nops NDFREE calls after many lookups.
2020-08-23 21:06:41 +00:00
Mateusz Guzik
4b5001196a vfs: convert nameiop into an enum
While here change the field size from long to int and move it into the
gap next to cn_flags.

Shrinks struct componentname from 64 to 56 bytes on amd64.
2020-08-23 21:05:39 +00:00
Mateusz Guzik
9ce9158b53 vfs: support denying access in vaccess_vexec_smr 2020-08-23 21:05:06 +00:00
Mateusz Guzik
ba3b099198 vfs: factor away doomed vnode handling into vdropl_final 2020-08-23 21:04:35 +00:00
Warner Losh
9b1f6cfc63 Fix another minor style glitch.
Pull { to the end of the struct line rather than having them on their
own line.
2020-08-23 20:38:10 +00:00
Konstantin Belousov
0cad2aa2dd Pass pointers to info parsed from notes, to brandinfo->header_supported filter.
Currently, we parse notes for the values of ELF FreeBSD feature flags
and osrel.  Knowing these values, or knowing that image does not carry
the note if pointers are NULL, is useful to decide which ABI variant
(brand) we want to activate for the image.

Right now this is only a plumbing change

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 20:06:55 +00:00
Konstantin Belousov
2b313da3bd kern_sharedpage.c: Add exec_sysvec_init_secondary() helper.
It allows a sysent to share existing usermode data in shared page with
other sysent, assuming ABI differences are not in the layout of the
page.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25273
2020-08-23 19:43:47 +00:00
Mateusz Guzik
2ca83b5c27 vfs: mark freevnode as noinline 2020-08-23 11:05:26 +00:00
Konstantin Belousov
c5bc28b273 Fix several issues with process group orphanage.
Attempt of adding assertions that pgrp->pg_jobc counters do not
underflow in r361967, reverted in r362910, points out bugs in the
handling of job control.  Peter Holm was able to narrow down the
problem to very easy reproduction with timeout(1) which uses reaping.

The following list of problems with calculation of pg_jobs which
directs SIGHUP/SIGCONT delivery for orphaned process group was
identified:
- Re-calculation of the orphaned status for children of exiting parent
  was wrong, but mostly unnoticed when all children were reparented to
  init(8).  When child can be reparented to a different process which
  could affect the child' job control state, it was not properly
  accounted for in pg_jobc.
- Lockless check for exiting process' parent process group is racy
  because nothing prevents the parent from changing its group
  membership.
- Exited process is left in the process group, until waited. This
  affects other calculations of pg_jobc.

Split handling of job control status on process changing its process
group, and process exiting.  Calculate increments and decrements for
pg_jobs by exact checking the orphanage instead of assuming process
group membership for children and parent.  Move the call to killjobc()
later under the proctree_lock.  Mark exiting process in killjobc()
with a new flag P_TREE_GRPEXITED and skip it for all pg_jobc
calculations after the flag is set.

Add checker that independently recalculates pg_jobc value and compares
it with the memoized process group state. This is enabled under INVARIANTS.

Reviewed by:	jilles
Discussed with:	kevans
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D26116
2020-08-22 21:32:11 +00:00
Warner Losh
46809e695f Whitespace change to line up dev_sotfc definition. 2020-08-22 19:18:31 +00:00
Warner Losh
b14f13459e Retire obsolete sysctl hw.bus.devctl_disable
hw.bus.devctl_disable has tagged been obsolete for a decade. Remove it. Also
remove some long obsolete comments. This was done and backed out once in 2014,
but we've had enough releases with the 'new' method of setting queue length that
we can just remove this sysctl now (stable/11, stable/12 and current all don't
reference it).
2020-08-22 19:02:15 +00:00
Mateusz Guzik
de0fcd3a44 vfs: assert that HASBUF is only set with SAVENAME or SAVESTART
as requested by the caller. The intent is to eradicate the mostly
spurious NDFREE_PNBUF calls.
2020-08-22 16:58:59 +00:00
Mateusz Guzik
1e448a1558 cache: stronger vnode asserts in cache_enter_time 2020-08-22 16:58:34 +00:00
Mateusz Guzik
cd4a1797b0 fd: pwd_drop after releasing filedesc lock
Fixes a potential LOR against vnode lock.
2020-08-22 16:57:45 +00:00
Mateusz Guzik
760a430bb3 vfs: add a work around for vp_crossmp bug to realpath
The actual bug is not yet addressed as it will get much easier after other
problems are addressed (most notably rename contract).

The only affected in-tree consumer is realpath. Everyone else happens to be
performing lookups within a mount point, having a side effect of ni_dvp being
set to mount point's root vnode in the worst case.

Reported by:	pho
2020-08-22 06:56:04 +00:00
Mateusz Guzik
19337211f8 vfs: fix freevnode accounting
Most notably add the missing decrement to vhold_smr.

    .-'---`-.
  ,'          `.
  |             \
  |              \
  \           _  \
  ,\  _    ,'-,/-)\
  ( * \ \,' ,' ,'-)
   `._,)     -',-')
     \/         ''/
      )        / /
     /       ,'-'

Reported by:	Dan Nelson <dnelson_1901@yahoo.com>
Fixes:	r362827 ("vfs: protect vnodes with smr")
2020-08-21 21:24:14 +00:00
Warner Losh
773e541e8d Use devctl.h instead of bus.h to reduce newbus pollution.
There's no need for these parts of the kernel to know about newbus,
so narrow what is included to devctl.h for device_notify_*.

Suggested by: kib@
2020-08-21 00:03:24 +00:00
Warner Losh
553f053bfa Move from TAILQ to STAILQ because the nodes are a bit smaller. 2020-08-20 17:14:44 +00:00
Warner Losh
c9133e6c04 Make devctl_queue_data_f and devctl_queue_data private.
I thought we'd need them, but nobody is using them. Narrow the interface. This
will facilitate changes in the future.
2020-08-20 17:14:33 +00:00
Warner Losh
0f2c2c1c58 Use names suggested by kib@ in review D25969, move call for unmount to not call
with vnode locked, use NOWAIT alloc and only report when we don't overflow.

These changes were accidentally omitted from r364402, except for the not
reporting on overflow. They were lumped in with a debugging commit in my tree
that I omitted w/o realizing this.

Other issues from the review are pending some other changes I need to do first.
2020-08-20 16:52:48 +00:00
Mateusz Guzik
17838b5869 cache: don't use cache_purge_negative when renaming
It avoidably scans (and evicts) unrelated entries. Instead take
advantage of passed componentname and perform a hash lookup
for the exact one.

Sample data from buildworld probed on cache_purge_negative extended
to count both scanned and evicted entries on each call are below.
At most it has to evict 1.

  evicted
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@                          19506
               1 |@@@@@                                    5820
               2 |@@@@@@                                   7751
               4 |@@@@@                                    6506
               8 |@@@@@                                    5996
              16 |@@@                                      4029
              32 |@                                        1489
              64 |                                         193
             128 |                                         109
             256 |                                         56
             512 |                                         16
            1024 |                                         7
            2048 |                                         3
            4096 |                                         1
            8192 |                                         1
           16384 |                                         0

  scanned
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@                                       2456
               1 |@                                        1496
               2 |@@                                       2728
               4 |@@@                                      4171
               8 |@@@@                                     5122
              16 |@@@@                                     5335
              32 |@@@@@                                    6279
              64 |@@@@                                     5671
             128 |@@@@                                     4558
             256 |@@                                       3123
             512 |@@                                       2790
            1024 |@@                                       2449
            2048 |@@                                       3021
            4096 |@                                        1398
            8192 |@                                        886
           16384 |                                         0
2020-08-20 10:06:50 +00:00
Mateusz Guzik
39f8815070 cache: add cache_rename, a dedicated helper to use for renames
While here make both tmpfs and ufs use it.

No fuctional changes.
2020-08-20 10:05:46 +00:00
Mateusz Guzik
16be9f9956 cache: reimplement cache_lookup_nomakeentry as cache_remove_cnp
This in particular removes unused arguments.
2020-08-20 10:05:19 +00:00
Rick Macklem
102829aa92 Add the MSG_TLSAPPDATA flag to indicate "return ENXIO" for non-application TLS
data records.

The kernel RPC cannot process non-application data records when
using TLS. It must to an upcall to a userspace daemon that will
call SSL_read() to process them.

This patch adds a new flag called MSG_TLSAPPDATA that the kernel
RPC can use to tell sorecieve() to return ENXIO instead of a non-application
data record, when that is what is at the top of the receive queue.
I put the code in #ifdef KERN_TLS/#endif, although it will build without
that, so that it is recognized as only useful when KERN_TLS is enabled.
The alternative to doing this is to have the kernel RPC re-queue the
non-application data message after receiving it, but that seems more
complicated and might introduce message ordering issues when there
are multiple non-application data records one after another.

I do not know what, if any, changes will be required to support TLS1.3.

Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D25923
2020-08-19 23:42:33 +00:00
Warner Losh
8ef773d1b4 Add VFS FS events for mount and unmount to devctl/devd
Report when a filesystem is mounted, remounted or unmounted via devd, along with
details about the mount point and mount options.

Discussed with:	kib@
Reviewed by: kirk@ (prior version)
Sponsored by: Netflix
Diffential Revision: https://reviews.freebsd.org/D25969
2020-08-19 17:10:04 +00:00
Mateusz Guzik
6c55d6e030 cache: when adding an already existing entry assert on a complete match 2020-08-19 15:08:14 +00:00
Mateusz Guzik
7c75f14f5b cache: tidy up the comment above cache_prehash 2020-08-19 15:07:28 +00:00
Andrew Turner
fd8f4f3beb Mark COVERAGE and KCOV as part of KCSAN
While not strictly true this stops them from trying to use the KCSAN atomic
hooks and allows these to be compiled into the same kernel.

Sponsored by:	Innovate UK
2020-08-19 14:11:25 +00:00
Mateusz Guzik
8f226f4c23 vfs: remove the always-curthread td argument from VOP_RECLAIM 2020-08-19 07:28:01 +00:00
Mateusz Guzik
7ad2a82da2 vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error
Most consumers pass NULL.
2020-08-19 02:51:17 +00:00
Mateusz Guzik
4b3208a97b vfs: sanity check mount counters in vfs_op_enter 2020-08-19 02:50:09 +00:00
Mark Johnston
b21b022a81 Revert r364310.
Some of the resulting fallout in CAM does not appear straightforward to
fix, so simply revert the commit for now in the absence of a better
solution.

Discussed with:	mjg
Reported by:	dhw
2020-08-18 14:09:49 +00:00
Gleb Smirnoff
1921bb7b68 With INVARIANTS panic immediately if M_WAITOK is requested in a
non-sleepable context.  Previously only _sleep() would panic.
This will catch misuse of M_WAITOK at development stage rather
than at stress load stage.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D26027
2020-08-17 15:37:08 +00:00
Konstantin Belousov
beb27033aa Fix powerpc build.
Sponsored by:	The FreeBSD Foundation
2020-08-16 22:50:59 +00:00
Konstantin Belousov
fbca789fc3 VMIO read
If possible, i.e. if the requested range is resident valid in the vm
object queue, and some secondary conditions hold, copy data for
read(2) directly from the valid cached pages, avoiding vnode lock and
instantiating buffers.  I intentionally do not start read-ahead, nor
handle the advises on the cached range.

Filesystems indicate support for VMIO reads by setting VIRF_PGREAD
flag, which must not be cleared until vnode reclamation.

Currently only filesystems that use vnode pager for v_objects can
enable it, due to reliance on vnp_size.  There is a WIP to handle it
for tmpfs.

Reviewed by:	markj
Discussed with:	jeff
Tested by:	pho
Benchmarked by:	mjg
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D25968
2020-08-16 21:02:45 +00:00
Mateusz Guzik
6041408826 vfs: retire vrefl as a symbol
vrefl calls vref and there is only one in-tree consumer.

Keep it as a macro for assertion purposes.
2020-08-16 18:51:12 +00:00
Mateusz Guzik
5faf134cce vfs: assert that VI_TEXT_REF is not already set 2020-08-16 18:45:31 +00:00
Mateusz Guzik
3c5d2ed71f cache: add NOCAPCHECK to the list of supported flags for lockless lookup
It is de facto supported in that lockless lookup does not do any capability
checks.
2020-08-16 18:33:24 +00:00
Mateusz Guzik
8ab4becab0 vfs: use namei_zone for getcwd allocations
instead of malloc.

Note that this should probably be wrapped with a dedicated API and other
vn_getcwd callers did not get converted.
2020-08-16 18:21:21 +00:00
Mateusz Guzik
494c0f2a83 vfs: mark HASBUF as an internal flag
There is no setter for cn_pnbuf.
2020-08-16 17:55:20 +00:00
Mateusz Guzik
a92a971bbb vfs: remove the thread argument from vget
It was already asserted to be curthread.

Semantic patch:

@@

expression arg1, arg2, arg3;

@@

- vget(arg1, arg2, arg3)
+ vget(arg1, arg2)
2020-08-16 17:18:54 +00:00
Conrad Meyer
b2d52e5c43 witness(4): Print stack of prior observed lock order on reversal
The first time Witness observes a lock order between two locks, it records
the caller's stack.  On detected reversal, print out that previous observed
stack.  It is quite possible that the reported "LOR" is the correct
ordering, and the violation was the observed earlier ordering.

Reviewed by:	mjg
Differential Revision:	https://reviews.freebsd.org/D26070
2020-08-15 19:45:50 +00:00
Jason A. Harmening
f3ba85ccc8 kenv: avoid sleepable alloc for integer tunables
Avoid performing a potentially-blocking malloc for kenv lookups that will only
perform non-destructive integer conversions on the returned buffer. Instead,
perform the strtoq() in-place with the kenv lock held.

While here, factor the logic around kenv_lock acquire and release into
kenv_acquire() and kenv_release(), and use these functions for some light
cleanup. Collapse getenv_string_buffer() into kern_getenv(), as the former
no longer has any other callers and the only additional task performed by
the latter is a WITNESS check that hasn't been useful since r362231.

PR:		248250
Reported by:	gbe
Reviewed by:	mjg
Tested by:	gbe
Differential Revision:	https://reviews.freebsd.org/D26010
2020-08-14 21:37:38 +00:00
Mark Johnston
85232c2ff1 Rename the pipe_map field of struct pipe.
This is to avoid conflicts with a upcoming macro.  pipe_pages is a
more accurate name since the field tracks pages wired into the kernel as
part of a process-to-process copy operation.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-08-14 14:50:41 +00:00
Conrad Meyer
8a0edc914f Add prng(9) API
Add prng(9) as a replacement for random(9) in the kernel.

There are two major differences from random(9) and random(3):

- General prng(9) APIs (prng32(9), etc) do not guarantee an
  implementation or particular sequence; they should not be used for
  repeatable simulations.

- However, specific named API families are also exposed (for now: PCG),
  and those are expected to be repeatable (when so-guaranteed by the named
  algorithm).

Some minor differences from random(3) and earlier random(9):

- PRNG state for the general prng(9) APIs is per-CPU; this eliminates
  contention on PRNG state in SMP workloads.  Each PCPU generator in an
  SMP system produces a unique sequence.

- Better statistical properties than the Park-Miller ("minstd") PRNG
  (longer period, uniform distribution in all bits, passes
  BigCrush/PractRand analysis).

- Faster than Park-Miller ("minstd") PRNG -- no division is required to
  step PCG-family PRNGs.

For now, random(9) becomes a thin shim around prng32().  Eventually I
would like to mechanically switch consumers over to the explicit API.

Reviewed by:	kib, markj (previous version both)
Discussed with:	markm
Differential Revision:	https://reviews.freebsd.org/D25916
2020-08-13 20:48:14 +00:00
Mateusz Guzik
b38ad2683a vfs: add missing pwd_drop on error in namei_setup
Reported by:	pho
2020-08-13 10:24:45 +00:00
Mateusz Guzik
36f47512d9 vfs: inline vrefcnt 2020-08-12 04:53:20 +00:00
Mateusz Guzik
4c2d103a02 vfs: garbage collect vrefactn 2020-08-12 04:53:02 +00:00
Mateusz Guzik
6883f07e97 vfs: reimplement vref on top of vget
No change in generated assembly.
2020-08-12 04:52:35 +00:00
Conrad Meyer
0ac9e27ba9 devfs: Abstract locking assertions
The conversion was largely mechanical: sed(1) with:

  -e 's|mtx_assert(&devmtx, MA_OWNED)|dev_lock_assert_locked()|g'
  -e 's|mtx_assert(&devmtx, MA_NOTOWNED)|dev_lock_assert_unlocked()|g'

The definitions of these abstractions in fs/devfs/devfs_int.h are the
only non-mechanical change.

No functional change.
2020-08-12 00:32:31 +00:00
Mateusz Guzik
3b44443626 devfs: rework si_usecount to track opens
This removes a lot of special casing from the VFS layer.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D25612
2020-08-11 14:27:57 +00:00
Mateusz Guzik
2d0631dd08 vfs: stricter validation for flags passed to namei in cn_flags
namei de facto expects that the naimeidata object is properly initialized,
but at the same time it mixes consumer-passable and internal flags, while
tolerating this part by explicitly clearing some of them.

Tighten the interface instead.

While here renumber the flags and denote the gap between the 2 variants.

Try to piggy back th renumber on the just bumped __FreeBSD_version.
2020-08-11 01:34:40 +00:00
Mateusz Guzik
25e42ee217 vfs: drop the hello world stat probes from the vfs provider
Interested parties can get the same information by hoooking on vop_stat.
2020-08-10 18:11:00 +00:00
Mateusz Guzik
5e79447d60 cache: let SAVESTART passthrough
The flag is only passed for non-LOOKUP ops and those fallback to the slowpath.
2020-08-10 12:28:56 +00:00
Mateusz Guzik
bb48255cf5 cache: resize struct namecache to a multiply of alignment
For example struct namecache on amd64 is 100 bytes, but it has to occupies
104. Use the extra bytes to support longer names.
2020-08-10 12:05:55 +00:00
Mateusz Guzik
8b62cebea7 cache: remove unused variables from cache_fplookup_parse 2020-08-10 11:51:56 +00:00
Mateusz Guzik
03337743db vfs: clean MNTK_FPLOOKUP if MNT_UNION is set
Elides checking it during lookup.
2020-08-10 11:51:21 +00:00
Mateusz Guzik
c571b99545 cache: strlcpy -> memcpy 2020-08-10 10:40:14 +00:00
Mateusz Guzik
3ba0e51703 vfs: partially support file create/delete/rename in lockless lookup
Perform the lookup until the last 2 elements and fallback to slowpath.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:35:18 +00:00
Mateusz Guzik
21d5af2b30 vfs: drop the thread argumemnt from vfs_fplookup_vexec
It is guaranteed curthread.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:34:22 +00:00
Mateusz Guzik
7f70080150 vfs: disallow NOCACHE with LOOKUP
This means there is no expectation lookup will purge the terminal entry,
which simplifies lockless lookup.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2020-08-10 10:33:40 +00:00
Mateusz Guzik
51ea7bea91 vfs: add VOP_STAT
The current scheme of calling VOP_GETATTR adds avoidable overhead.

An example with tmpfs doing fstat (ops/s):
before: 7488958
after:  7913833

Reviewed by:	kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D25910
2020-08-07 23:06:40 +00:00
Mateusz Guzik
1ff80a3400 vfs: release the interlock after failing to set VHOLD_NO_SMR
While here add more comments.

Diagnosed by:	markj
Reported by:	pho
Fixes:	r362827 ("vfs: protect vnodes with smr")
2020-08-07 19:36:08 +00:00
Warner Losh
f7bb4f88c5 Remove obsolete part of comment. It was cut and pasted from the old version of
this function, and was never relevant to the new version.
2020-08-07 18:21:48 +00:00
Hans Petter Selasky
826c079373 Add full support support for dynamic allocation and freeing of epoch's.
Make sure to reclaim epoch structures when they are freed to support
dynamic allocation and freeing of epoch structures.

While at it, move the 64 supported epoch control structures to the
static memory domain. This overall simplifies the management and
debugging of system epoch's.

Reviewed by:		kib, markj
Differential Revision:	https://reviews.freebsd.org/D25960
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-08-07 15:32:42 +00:00
Mark Johnston
0ffec1b03d Clean up reassignbuf() and buf_vlist_remove() a bit.
- Convert panic() calls to INVARIANTS-only assertions.  The PCTRIE code
  provides some of the same protection since it will panic upon an
  attempt to remove a non-resident buffer.
- Update the comment above reassignbuf() to reflect reality.

Reviewed by:	cem, kib, mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25965
2020-08-06 15:43:15 +00:00
Mark Johnston
7013797e34 Remove the vfs.reassignbufcalls counter and sysctl.
As the 20-year old comment above it suggests, the counter is of dubious
value.  Moreover, the (global) counter was not updated precisely and
hurts scalability.

Reviewed by:	cem, kib, mjg
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25965
2020-08-06 15:42:59 +00:00
Mateusz Guzik
e910c93eea cache: add more predicts for failing conditions 2020-08-06 04:20:14 +00:00
Mateusz Guzik
95888901f7 cache: plug unititalized variable use
CID:	1431128
2020-08-06 04:19:47 +00:00
Mateusz Guzik
bb62c418fd vfs hash: annotate the lock with __exclusive_cache_line
Note the code does not scale in the current form.
2020-08-05 19:34:13 +00:00
Mateusz Guzik
4f00177887 pipe: reduce atime precision
The routine is called on successful write and read, which on pipes happens a
lot and for small sizes.

Precision provided by default seems way bigger than necessary and it causes
problems in vms on amd64 (it rdtscp's which vmexits). getnanotime seems to
provide the level roughly in lines of Linux so we should be good here.

Sample result from will-it-scale pipe1_processes -t 1 (ops/s):
before: 426464
after: 3247421

Note the that atime handling for named pipes is broken with and without the
patch. The filesystem code is never used for updating atime and never looks
at the updated field. Consequently, while there are no provisions added to
handle named pipes separately, the change is a nop for that case.

Differential Revision:	 https://reviews.freebsd.org/D23964
2020-08-05 19:15:59 +00:00
Andrey V. Elsukov
edde7a538b Add m__getjcl SDT probe.
Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2020-08-05 11:39:09 +00:00
Mateusz Guzik
e1b1971c05 cache: don't ignore size passed to nchinittbl 2020-08-05 09:38:02 +00:00
Mateusz Guzik
d292b1940c vfs: remove the obsolete privused argument from vaccess
This brings argument count down to 6, which is passable without the
stack on amd64.
2020-08-05 09:27:03 +00:00
Mateusz Guzik
2b86f9d6d0 cache: convert the hash from LIST to SLIST
This reduces struct namecache by sizeof(void *).

Negative side is that we have to find the previous element (if any) when
removing an entry, but since we normally don't expect collisions it should be
fine.

Note this adds cache_get_hash calls which can be eliminated.
2020-08-05 09:25:59 +00:00
Mateusz Guzik
cf8ac0de81 cache: reduce zone alignment to 8 bytes
It used to be sizeof of the given struct to accomodate for 32 bit mips
doing 64 bit loads, but the same can be achieved with requireing just
64 bit alignment.

While here reorder struct namecache so that most commonly used fields
are closer.
2020-08-05 09:24:38 +00:00
Mateusz Guzik
d61ce7ef50 cache: convert ncnegnash into a macro
It is a read-only var with value known at compilation time.
2020-08-05 09:24:00 +00:00
Mateusz Guzik
158ab70c24 vfs: tidy up namei entry point
- predict for string copy errors
- reshuffle inititalistion of vars which are not needed
2020-08-05 07:33:39 +00:00
Mateusz Guzik
2840f07d4f cache: cleanup lockless entry point
- remove spurious bzero
- assert ni_lcf, it has to be set by namei by this point
2020-08-05 07:32:26 +00:00
Mateusz Guzik
8ccf01e0e2 cache: stop messing with cn_lkflags
See r363882.
2020-08-05 07:30:57 +00:00
Mateusz Guzik
27c4618df5 cache: stop messing with cn_flags
This removes flag setting/unsetting carried over from regular lookup.
Flags still get for compatibility when falling back.

Note .. and . handling can get partially folded together.
2020-08-05 07:30:17 +00:00
Mateusz Guzik
db99ec5656 vfs: support lockless dotdot lookup
Tested by:	pho
2020-08-04 23:07:42 +00:00
Mateusz Guzik
b403aa126e cache: add NCF_WIP flag
This allows making half-constructed entries visible to the lockless lookup,
which now can check for either "not yet fully constructed" and "no longer valid"
state.

This will be used for .. lookup.
2020-08-04 23:07:00 +00:00
Mateusz Guzik
6e10434c02 cache: add cache_purge_vgone
cache_purge locklessly checks whether the vnode at hand has any namecache
entries. This can race with a concurrent purge which managed to remove
the last entry, but may not be done touching the vnode.

Make sure we observe the relevant vnode lock as not taken before proceeding
with vgone.

Paired with the fact that doomed vnodes cannnot receive entries this restores
the invariant that there are no namecache-related writing users past cache_purge
in vgone.

Reported by:	pho
2020-08-04 23:04:29 +00:00
Mateusz Guzik
bd66a0750f mtx: add mtx_wait_unlocked 2020-08-04 23:00:00 +00:00
Mateusz Guzik
8541ae04b4 rms: fix typo: bitmamp -> bitmap
Reported by:	kib
2020-08-04 20:31:03 +00:00
Mateusz Guzik
1164f7a566 cache: factor away failed vexec handling 2020-08-04 19:55:26 +00:00
Mateusz Guzik
0439b00ea8 cache: assorted tidy ups 2020-08-04 19:55:00 +00:00
Mateusz Guzik
18bd02e2ce cache: factor away lockless dot lookup and add missing stat + sdt probe 2020-08-04 19:54:37 +00:00
Mateusz Guzik
17a66c7087 vfs: add vfs_op_thread_enter/exit _crit variants
and employ them in the namecache. Eliminates all spurious checks for preemption.
2020-08-04 19:54:10 +00:00
Mateusz Guzik
0311b05fec cache: add missing numcache detrement on insertion failure 2020-08-04 19:52:52 +00:00
Mateusz Guzik
3211e783e3 rms: add a comment explaining performance deficiencies of write locking 2020-08-04 19:52:16 +00:00
Mark Johnston
96ad26eefb Remove free_domain() and uma_zfree_domain().
These functions were introduced before UMA started ensuring that freed
memory gets placed in domain-local caches.  They no longer serve any
purpose since UMA now provides their functionality by default.  Remove
them to simplyify the kernel memory allocator interfaces a bit.

Reviewed by:	cem, kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25937
2020-08-04 13:58:36 +00:00
Konstantin Belousov
6e0c8e1ae2 Add SOL_LOCAL symbolic constant for unix socket option level.
The constant seems to exists on MacOS X >= 10.8.

Requested by:	swills
Reviewed by:	allanjude, kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25933
2020-08-03 22:13:02 +00:00
Warner Losh
e67c55c998 Some function had the blank lines, others didn't. Most of the ones that didn't
were newer, so remove this now-optional blank line everywhere.
2020-08-03 22:12:18 +00:00
Konstantin Belousov
ca9a39acb3 Provide more correct description for sysctl kern.smp.cores.
Reported by:	dewayne@heuristicsystems.com.au
PR:	248454
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-08-03 17:17:17 +00:00