Commit graph

118854 commits

Author SHA1 Message Date
Andriy Gapon
20e9cab5fa iscsi: do not hold the global lock while tearing down a session
It should be sufficient to hold the lock just for removing the session
from the session list.  Everything else should be covered by the session
specific lock.

On top of that, at present we can get a deadlock caused by waiting on
the CAM SIM reference count while holding the global lock.  A specific
scenario involving ZFS is this:
- concurrent termination of two sessions, S1 and S2
- session S1 completed all I/Os and sleeps in CAM waiting for device
  close by ZFS;
- session S2 is also dead now, but can not forcefully complete
  outstanding requests by calling iscsi_session_cleanup() from
  iscsi_maintenance_thread_terminate(), since it can't get the same
  global sc_lock;
- as soon as there are unfinished requests, ZFS can not do
  spa_config_enter() as writer, and so can not close the device for
  session S1;
- deadlock.

Reported by:	Ben RUBSON <ben.rubson@gmail.com>
Tested by:	Ben RUBSON <ben.rubson@gmail.com>
Reviewed by:	mav, trasz
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D12652
2017-10-17 15:39:38 +00:00
Ryan Libby
8a7a65717a gdb kernel server: fixup Search:memory style
This is a NFC patch to move around the Search:memory implementation so
that it doesn't exceed the standard column width and doesn't take so
much vertical space in gdb_trap.

Submitted by:	Daniel O'Connor <darius@dons.net.au>
Reviewed by:	cem, jhb
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12684
2017-10-17 01:12:17 +00:00
Rick Macklem
f49c813c1d Use taskqueue(9) to do writes/commits to mirrored DSs concurrently.
When the NFSv4.1 pNFS client is using a Flexible File Layout specifying
mirrored Data Servers, it must do the writes and commits to all mirrors.
This patch modifies the client to use a taskqueue to perform these writes
and commits concurrently.
The number of threads can't be changed for taskqueue(9), so it is set
to 4 * mp_ncpus by default, but this can be overridden by setting the
sysctl vfs.nfs.pnfsiothreads.

Differential Revision:	https://reviews.freebsd.org/D12632
2017-10-16 23:28:12 +00:00
Andriy Voskoboinyk
6623429867 mbuf(9): unbreak m_fragment()
- Fix it by replacing m_cat() with m_prev->m_next = m_new
(m_cat() will try to append data - as a result, there will be no
fragmentation).
- Move some constants out of the loop.

Was previously tested with D4077.

Differential Revision:	https://reviews.freebsd.org/D4090
2017-10-16 21:46:11 +00:00
Andriy Voskoboinyk
c64c1f95a7 ifnet(9): split ifc_alloc_unit() (should simplify code flow)
Allocate smallest unit number from pool via ifc_alloc_unit_next()
and exact unit number (if available) via ifc_alloc_unit_specific().

While here, address possible deadlock (mentioned in PR).

PR:		217401
MFC after:	5 days
Differential Revision:	https://reviews.freebsd.org/D12551
2017-10-16 21:21:31 +00:00
Konstantin Belousov
e9445808a8 Re-evaluate thread' signal mask after ptracestop().
The stop drops process lock, which allows the signal mask to be
changed and our selected signal might become blocked, i.e. should be
returned to the process queue instead of delivery.

Also, for the existing check of the process no longer having an
attached debugger, we should not loose the signal, but requeue it.

Reported and tested by:	bdrewery
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-16 20:21:51 +00:00
Konstantin Belousov
cd735d8f5a Improve assertion that an ignored or blocked signal is not delivered.
Split two conditions into separate asserts.  Print additional details,
like the signal number and action value.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-16 20:15:19 +00:00
Konstantin Belousov
0167b33b81 Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-16 20:11:29 +00:00
Matt Joras
0d8e04054e Properly reset the fields in clean_unrhdr.
In r324542 I neglected to reset the first and last fields of struct
unrhdr. This causes a tmpfs to fail the unr(9) consistency checks with
DIAGNOSTIC on. Fix this by resetting the fields by calling init_unrhdr.
While here, change a loop to use TAILQ_FOREACH_SAFE since it is more
readable and equally fast.

Reported by:	David Wolfskill <david@catwhisker.org>
Approved by:	rstone (mentor)
Sponsored by:	Dell EMC Isilon
2017-10-16 16:14:50 +00:00
Konstantin Belousov
ca1f624517 Fix the pv_chunks pc_lru tailq handling in reclaim_pv_chunk().
For processing, reclaim_pv_chunk() removes the pv_chunk from the lru
list, which makes pc_lru linkage invalid.  Then the pmap lock is
released, which allows for other thread to free the last pv entry
allocated from the chunk and call free_pv_chunk(), which tries to
modify the invalid linkage.

Similarly, the chunk is inserted into the private tailq new_tail
temporary.  Again, free_pv_chunk() might be run and corrupt the
linkage for the new_tail after the pmap lock is dropped.

This is a consequence of r299788 elimination of pvh_global_lock, which
allowed for reclaim to run in parallel with other pmap calls which
free pv chunks.

As a fix, do not remove the chunk from pc_lru queue, use a marker to
remember the position in the queue iteration.  We can safely operate
on the chunks after the chunk's pmap is locked, we fetched the chunk
after the marker, and we checked that chunk pmap is same as we have
locked, because chunk removal from pc_lru requires both pv_chunk_mutex
and the pmap mutex owned.

Note that the fix lost an optimization which was present in the
previous algorithm.  Namely, new_tail requeueing rotated the pv chunks
list so that reclaim didn't scan the same pv chunks that couldn't be
freed (because they contained a wired and/or superpage mapping) on
every invocation.  An additional change is planned which would improve
this.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-10-16 15:16:24 +00:00
Alexander Motin
eeec68eae7 Add Creative vendor ID.
MFC after:	1 week
2017-10-16 12:54:53 +00:00
Michal Meloun
a86d798210 Save VFP state in getcontext(3) on ARM.
This is a last followup of r315974, which fixes userland part
of VFP save/restore problems described in PR 217611.

PR:		217611
MFC after:	2 weeks
2017-10-16 12:53:54 +00:00
Warner Losh
e4e7fb2337 Explicitly inlcude SYSDIR in the include path -- need machine path too? 2017-10-16 03:59:58 +00:00
Warner Losh
a7fa2fb669 LOADER_foo_SUPPORTED 2017-10-16 03:59:52 +00:00
Warner Losh
6b9f688352 Move all the ficl common code into ficl.mk
There's a number of copies of basically identical code to enable
building forth in /boot/loader. Move it all into ficl.mk.
2017-10-16 03:59:44 +00:00
Warner Losh
8ed8e50775 create defs.mk for common definitions 2017-10-16 03:59:38 +00:00
Warner Losh
bcbe0c006e tweak style 2017-10-16 03:59:33 +00:00
Warner Losh
7f20726e4b Move common/Makefile.inc to sys/boot/loader.mk.
Makefile.inc has a specific meaning in the tree, and
common/Makefile.inc doesn't quite fit into that. Rename it to
loader.mk and it will be a place to collect common things to all
/boot/loader programs there.

Sponsored by: Netflix
2017-10-16 03:59:28 +00:00
Warner Losh
7e705f54f8 Rename top level Makefile.ficl to ficl.mk. 2017-10-16 03:59:22 +00:00
Warner Losh
6c4b856dbe Move orphaned man pages into new man directory from common. This helps
keep cleaer that common is just for the MI files for /boot/loader
programs.

Sponsored by: Netflix
2017-10-16 03:59:17 +00:00
Warner Losh
cb0ba37ec5 Unify boot1 with loader
Refactor boot1 to use the same I/O code as /boot/loader uses. Refactor
to use the common efi_main.c.

Submitted by: Eric McCorkle
Differential Revision: https://reviews.freebsd.org/D10447
2017-10-16 03:59:11 +00:00
Warner Losh
1f88be2d3a Zero out the ccb's alloated on the stack for the dump routines to more
closely match a ccb returned from xpt_get_ccb().

Sponsored by: Netflix
2017-10-15 23:54:04 +00:00
Warner Losh
fa271a5d09 Closer examination shows that nvme and CAM both normally zero-fill
allocations (for req and ccb, which ultimately contain the
nvme_cmd). As such, we can micro-optimize these routines. Add a
comment to this effect, and bzero the ccb used to make the requests
for the nda dump rotuine so it more closely matches a ccb allocated
with xpt_get_ccb().

Sponsored by: Netflix
2017-10-15 23:53:55 +00:00
Rick Macklem
57ef3db3a9 Fix the client IP address reported by nfsdumpstate for 64bit arch and NFSv4.1.
The client IP address was not being reported for some NFSv4 mounts by
nfsdumpstate. Upon investigation, two problems were found for mounts
using IPv4. One was that the code (originally written and tested on i386)
assumed that a "u_long" was a "uint32_t" and would exactly store an
IPv4 host address. Not correct for 64bit arches.
Also, for NFSv4.1 mounts, the field was not being filled in. This was
basically correct, because NFSv4.1 does not use a callback address.
However, it meant that nfsdumpstate could not report the client IP addr.
This patch should fix both of these issues.
For IPv6, the address will still not be reported. The original NFSv4 RFC
only specified IPv4 callback addresses. I think this has changed and, if so,
a future commit to fix reporting of IPv6 addresses will be needed.

Reported by:	manu
PR:		223036
MFC after:	2 weeks
2017-10-15 22:22:27 +00:00
Michael Tuexen
80a2d1406f Fix the handling of parital and too short chunks.
Ensure that the current behaviour is consistent: stop processing
of the chunk, but finish the processing of the previous chunks.

This behaviour might be changed in a later commit to ABORT the
assoication due to a protocol violation, but changing this
is a separate issue.

MFC after:	3 days
2017-10-15 19:33:30 +00:00
Tijl Coosemans
03cea61b3a Add information needed by Linux libdrm 2.4.74 (shipped with CentOS 7.4).
Create a config file for PCI devices that exposes their configuration
space.  Only fields needed by libdrm are filled in (vendor, device,
revision, subvendor and subdevice).

Link /sys/class/drm/card%d/device to the PCI device directory.
2017-10-15 19:28:14 +00:00
Tijl Coosemans
df4c975275 Set DEVNAME to dri/card%d. This works with both in-tree drm and drm-next
and is also the value used on Linux.

Tested by:	Greg V <greg@unrelenting.technology>
2017-10-15 19:21:15 +00:00
Tijl Coosemans
11ce4d9f39 When a Linux program tries to access a /path the kernel tries
/compat/linux/path before /path.  Stop following symbolic links when
looking up /compat/linux/path so dead symbolic links aren't ignored.
This allows syscalls like readlink(2) and lstat(2) to work on such links.
And open(2) will return an error now instead of trying /path.
2017-10-15 18:53:21 +00:00
Warner Losh
29431e54b9 Use nvme_ctrlr_poll instead of nvme_ctrlr_intx_handler since it is
more general and doesn't try to access registers that may be undefined
when the card is in MSIX mode.

This change, along with r324630, r324631, r324632, makes nda crash
dumps work again. Previously, they only worked on CPU 0 when the stack
garbage was just so.

Sponsored by: Netflix
Suggested by: scottl@ (who provided earlier version of the patch)
2017-10-15 16:19:09 +00:00
Warner Losh
bb1c7be429 Create general polling function for the nvme controller. Use it when
we're doing the various pin-based interrupt modes. Adjust
nvme_ctrlr_intx_handler to use nvme_ctrlr_poll.

Sponsored by: Netflix
Suggested by: scottl@
2017-10-15 16:18:08 +00:00
Warner Losh
c4231018d0 Be nicer on the dump stack by allocating only a ccb_nvmeio rather than
a full ccb. This saves a few hundre bytes, which might be important
during a crash dump...

Sponsored by: Netflix
Suggested by: scottl@
2017-10-15 16:18:03 +00:00
Warner Losh
fbed8df259 Explicitly set reserved fields and 'fuse' to 0. This prevents us from
acidentally sending bogus values in these fields, which some drives
may reject with an error or worse (undefined behavior).

This is especially needed for the ndadump routine which allocates the
cmd from stack garbage....

Sponsored by: Netflix
2017-10-15 16:17:59 +00:00
Warner Losh
717bff5d85 Update comment to reflect actual default timeout.
Sponsored by: Netflix
2017-10-15 16:17:55 +00:00
Tijl Coosemans
834804f3fe Add special handling for current in-tree drm devices, like r323692 added
for drm-next.
2017-10-15 16:08:22 +00:00
Tijl Coosemans
f3792e07f6 Use sizeof instead of strlen on string constants. The compiler doesn't
optimise the strlen calls away with -ffreestanding.
2017-10-15 16:03:45 +00:00
Edward Tomasz Napierala
4ffeccf1e8 Replace some magic numbers in usb_template(4) code with #defines.
There should be no functional changes.

Reviewed by:	hselasky
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D12670
2017-10-15 11:46:11 +00:00
Ryan Libby
70da35b745 mlx4: use enum constants instead of const vars for case exprs
Follow up from r324201 to fix compilation with gcc, which complains
about non-ICE case expressions.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D12675
2017-10-14 23:25:44 +00:00
Fedor Uporov
04660064b8 Add extended attributes support to fuse kernel module.
Author:         kem
Reviewed by:    cem, pfg (mentor)
Approved by:    pfg (mentor)
MFC after:      2 weeks

Differential Revision: https://reviews.freebsd.org/D12485
2017-10-14 19:02:52 +00:00
Michael Tuexen
8c8e10b763 Code cleanup, not functional change.
This avoids taking a pointer of a packed structure which allows simpler
compilation of the userland stack.

MFC after:	1 week
2017-10-14 10:02:59 +00:00
Mateusz Guzik
bf3341233e Fix wrong v_free_count annotation - (f) instead of (a)
Reported by:	alc
2017-10-14 04:27:58 +00:00
Mateusz Guzik
e280ce465b mtx: fix up owner_mtx after r324609
Now that MTX_UNOWNED is 0 the test was alwayas false.
2017-10-14 00:47:30 +00:00
Mateusz Guzik
1dbf52e7d9 Reduce traffic on vm_cnt.v_free_count
The variable is modified with the highly contended page free queue lock.
It unnecessarily shares a cacheline with purely read-only fields and is
re-read after the lock is dropped in the page allocation code making the
hold time longer.

Pad the variable just like the others and store the value as found with
the lock held instead of re-reading.

Provides a modest 1%-ish speed up in concurrent page faults.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D12665
2017-10-13 21:54:34 +00:00
Mateusz Guzik
30a33cefae mtx: change MTX_UNOWNED from 4 to 0
The value is spread all over the kernel and zeroing a register is
cheaper/shorter than setting it up to an arbitrary value.

Reduces amd64 GENERIC-NODEBUG .text size by 0.4%.

MFC after:	1 week
2017-10-13 20:31:56 +00:00
Kristof Provost
3b07bb2a64 Support the D-Link DWM-222 LTE Dongle
Submitted by:	Daniel Hänschke <jailedemon@googlemail.com>
2017-10-13 19:41:35 +00:00
Mark Johnston
9db0f8e76f Make the PHOLD in linux_wait_event_common() unconditional.
After some in-progress work is committed, this would otherwise be the only
instance of #if(n)def NO_SWAPPING in the tree. Moreover, the requisite
opt_vm.h include was missing, so the PHOLD/PRELE calls were always being
compiled in anyway.

MFC after:	1 week
2017-10-13 19:27:33 +00:00
Alan Cox
41bf90bb78 Address two problems with sendfile(..., SF_NOCACHE) and apply one
"optimization".  First, sendfile(..., SF_NOCACHE) frees pages without
checking whether those pages are mapped.  This can leave the system
with mappings to free or repurposed pages.  Second, a page can be
busied between the time of the current busy test and acquiring the
object lock.  Essentially, the test performed before the object lock
is acquired can only be regarded as an optimization to short-circuit
further work on the page.  It cannot, however, be relied upon to prove
that it is safe to free the page.  Third, when sendfile(..., SF_NOCACHE)
was originally implemented, vm_page_deactivate_noreuse() did not yet
exist.  Use vm_page_deactivate_noreuse() instead of vm_page_deactivate(),
because it comes closer to freeing the page.

In collaboration with:	glebius
Discussed with:	gallatin, kib, markj
X-MFC after:	r324448
2017-10-13 16:31:50 +00:00
Konstantin Belousov
53faf5a7d4 Evaluate the real size of the sblk_zone.
Submitted by:	ota@j.email.ne.jp
PR:	221356
Reviewed by:	alc, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D12660
2017-10-13 16:23:05 +00:00
Ruslan Bukin
07ff05c2ae o Support for Kabylake CPU PMCs (fall down to PMC_CPU_INTEL_SKYLAKE).
o Fix bugs in events descriptions for Skylake, Skylake Xeon and Haswell.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D12654
2017-10-13 15:02:29 +00:00
Hans Petter Selasky
627ac5b4e3 Don't call selrecord() outside the select system call in the LinuxKPI, because
then td->td_sel is NULL and this will result in a segfault inside selrecord().
This happens when only using kqueue() to poll for read and write events.
If select() and kqueue() is mixed there won't be a segfault.

Reported by:	Johannes Lundberg
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-10-13 14:14:46 +00:00
Ed Maste
6e309d75d2 ANSIfy vm_kern.c
PR:		222673
Submitted by:	ota@j.email.ne.jp
MFC after:	1 week
2017-10-13 13:53:19 +00:00