Commit graph

117637 commits

Author SHA1 Message Date
John Baldwin
51645e836d Store a 32-bit PT_LWPINFO struct for 32-bit process core dumps.
Process core notes for a 32-bit process running on a 64-bit host need to
use 32-bit structures so that the note layout matches the layout of notes
of a core dump of a 32-bit process under a 32-bit kernel.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D11407
2017-06-29 21:31:13 +00:00
Navdeep Parhar
98c9236978 Adjust sowakeup post-r319685 so that it continues to make upcalls but
still avoids calling soconnected during sodisconnected.

Discussed with:	glebius@
Sponsored by:	Chelsio Communications
2017-06-29 19:43:27 +00:00
Andrey V. Elsukov
785c0d4d97 Fix IPv6 extension header parsing. The length field doesn't include the
first 8 octets.

Obtained from:	Yandex LLC
MFC after:	3 days
2017-06-29 19:06:43 +00:00
Konstantin Belousov
d137278838 Do not cast struct kevent_args or struct freebsd11_kevent_args to
struct g_kevent_args.

On some architectures, e.g. PowerPC, there is additional padding in uap.

Reported and tested by:	andreast
Sponsored by:	The FreeBSD Foundation
2017-06-29 14:40:33 +00:00
Toomas Soome
ee059e6369 loader: chain load relocate data declaration is bad
The implementation is using fixed size array allocated in asm module,
need to use proper array declaration for C source.

CID:		1376405
Reported by:	Coverity, cem
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D11321
2017-06-29 04:33:55 +00:00
Adrian Chadd
86a656deb0 [ath_hal] if building with ALQ, ensure we actually depend upon ALQ. 2017-06-29 03:59:02 +00:00
Adrian Chadd
b6c4248397 [mips] [ar71xx] Since the wlan/ath drivers use ALQ, ensure we build the module
here otherwise we can't load said module.
2017-06-29 03:58:01 +00:00
Adrian Chadd
05e89dd3a1 [mips] make this compile again after all of the config changes. 2017-06-29 03:57:22 +00:00
Ian Lepore
75ac55b81f Add iic_recover_bus.c. Should have been part of r320461. 2017-06-29 02:19:30 +00:00
Ian Lepore
8928c2e4f5 Add bus recovery handling to the imx5/imx6 i2c driver. 2017-06-29 01:59:39 +00:00
Ian Lepore
930b312319 Add iic_recover_bus(), a helper function that can be used by any i2c driver
which is able to manipulate the clock and data lines directly.

When an i2c bus is hung by a slave device stuck in the middle of a
transaction that didn't complete properly, this function manipulates the
clock and data lines in a sequence known to reliably reset slave devices.
The most common cause of a hung i2c bus is a system reboot in the middle of
an i2c transfer (so it doesnt' happen often, but now there is a way other
than power cycling to recover from it).
2017-06-29 01:50:58 +00:00
Ian Lepore
f03b277259 If an i2c transfer ends due to error, issue a stop on the bus even if the
nostop option is set, if a start was issued.

The nostop option doesn't mean "never issue a stop" it means "only issue
a stop after the last in a series of transfers".  If the transfer ends
due to error, then that was the last transfer in the series, and a stop
is required.

Before this change, any error during a transfer when nostop is set would
effectively hang the bus, because sc->started would never get cleared,
and that caused all future calls to iicbus_start() to return an error
because it looked like the bus was already active.  (Unrelated errors in
handling the nostop option, to be addressed separately, could lead to
this bus hang condition even on busses that don't set the nostop option.)
2017-06-29 00:29:15 +00:00
Rick Macklem
ad6eb97601 Fix an NFSv3 client case that probably never happens.
If an NFSv3 server were to reply with weak cache consistency attributes,
but not post operation attributes, the client would use garbage attributes
from memory. This was spotted during work on the code for the NFSv4.1 client.
I have never seen evidence that this happens and it wouldn't make sense
for an NFSv3 server to do this, so this patch is basically "theoretical",
but does fix the problem if a server were to do the above.

PR:		219552
MFC after:	2 weeks
2017-06-28 21:37:08 +00:00
Ian Lepore
c592231080 Implement gpio input by reading the pad state register, not the data register.
When a pin is set for input the value in the DR will be the same as the PSR.

When a pin is set for output the value in the DR is the value output to the
pad, and the value in the PSR is the actual electrical level sensed on the
pad, and they can be different if the pad is configured for open-drain mode
and some other entity on the board is driving the line low.
2017-06-28 20:28:47 +00:00
Kirk McKusick
9c4f551e98 Create a new function ffs_getcg() to read in and verify a cylinder
group. Change all code points that open-coded this functionality
to use the new function. This commit is a refactoring with no
change in functionality.

In the future this change allows more robust checking of cylinder
group reads along the lines discussed in the hardening UFS session
at BSDCan (retry I/O, add checksums, etc). For more detail see the
session notes at https://wiki.freebsd.org/DevSummit/201706/HardeningUFS

Reviewed by: kib
2017-06-28 17:32:09 +00:00
Andriy Gapon
1db5f1724b fix an architectural problem introduced in r320156, ZFS ABD import
The implementation of ZFS refcount_t uses the emulated illumos mutex
(the sx lock) and the waiting memory allocation when ZFS_DEBUG is
enabled.  This makes refcount_t unsuitable for use in GEOM g_up
thread where sleeping is prohibited.

When importing the ABD change I modified vdev_geom using illumos
vdev_disk as an example.  As a result, I added a call to abd_return_buf
in vdev_geom_io_intr.  The latter is called on g_up thread while the
former uses refcount_t.

This change fixes the problem by deferring the abd_return_buf call to
the previously unused vdev_geom_io_done that is called on a ZFS zio
taskqueue thread where sleeping is allowed.

A side bonus of this change is that now a vdev zio has a pointer
to its corresponding bio while the zio is active.

Reported by:	Shawn Webb <shawn.webb@hardenedbsd.org>
Tested by:	Shawn Webb <shawn.webb@hardenedbsd.org>
MFC after:	1 week
X-MFC with:	r320156
2017-06-28 13:59:20 +00:00
Conrad Meyer
bb751fbbc7 Complete support for IO_APPEND flag in fuse
This finishes what r245164 started and makes open(..., O_APPEND) work again
after r299753.

- Pass ioflags, incl. IO_APPEND, down to the direct write backend (r245164
  added it to only the bio backend).
- (r299753 changed the WRONLY backend from bio to direct.)

PR:		220185
Reported by:	Ben RUBSON <ben.rubson at gmail.com>
Reviewed by:	bapt@, rmacklem@
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11348
2017-06-28 13:56:15 +00:00
Konstantin Belousov
6a97a3f756 Treat the addr argument for mmap(2) request without MAP_FIXED flag as
a hint.

Right now, for non-fixed mmap(2) calls, addr is de-facto interpreted
as the absolute minimal address of the range where the mapping is
created.  The VA allocator only allocates in the range [addr,
VM_MAXUSER_ADDRESS].  This is too restrictive, the mmap(2) call might
unduly fail if there is no free addresses above addr but a lot of
usable space below it.

Lift this implementation limitation by allocating VA in two passes.
First, try to allocate above addr, as before.  If that fails, do the
second pass with less restrictive constraints for the start of
allocation by specifying minimal allocation address at the max bss
end, if this limit is less than addr.

One important case where this change makes a difference is the
allocation of the stacks for new threads in libthr.  Under some
configuration conditions, libthr tries to hint kernel to reuse the
main thread stack grow area for the new stacks.  This cannot work by
design now after grow area is converted to stack, and there is no
unallocated VA above the main stack.  Interpreting requested stack
base address as the hint provides compatibility with old libthr and
with (mis-)configured current libthr.

Reviewed by:	alc
Tested by:	dim (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-28 04:02:36 +00:00
Navdeep Parhar
3a5426dd3c cxgbe/t4_tom: Do not include space taken by the TCP timestamp option in
the "effective MSS" for the connection.  The chip expects it this way.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	3 days
Sponsored by:	Chelsio Communications
2017-06-27 22:05:06 +00:00
Warner Losh
594ffc03cd Add new definitions for namespaces.
Sponsored by: Netflix
Submitted by: Matt Williams (via D11330)
2017-06-27 20:24:39 +00:00
Konstantin Belousov
34d3e89f33 Do not ignore an error from vm_mmap_object().
Found and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-27 20:12:13 +00:00
Kenneth D. Merry
59fe76647c Fix a panic in camperiphfree().
If a peripheral driver (e.g. da, sa, cd) is added or removed from the
peripheral driver list while an unrelated peripheral driver instance (e.g.
da0, sa5, cd2) is going away and is inside camperiphfree(), we could
dereference an invalid pointer.

When peripheral drivers are added or removed (see periphdriver_register()
and periphdriver_unregister()), the peripheral driver array is resized
and existing entries are moved.

Although we hold the topology lock while we traverse the peripheral driver
list, we retain a pointer to the location of the peripheral driver pointer
and then drop the topology lock.  So we are still vulnerable to the list
getting moved around while the lock is dropped.

To solve the problem, cache a copy of the peripheral driver pointer.  If
its storage location in the list changes while we have the lock dropped, it
won't have any effect.

This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed
to individual instances like "da0", "cd0") are not generally part of a
reference counting scheme to guard against deregistering them while there
are instances active.  The caller (generally the person unloading a module)
has to be aware of active drivers and not unload something that is in use.

sys/cam/cam_periph.c:
	In camperiphfree(), cache a pointer to the peripheral driver
	instance to avoid holding a pointer to an invalid memory location
	in the event that the peripheral driver list changes while we have
	the topology lock dropped.

PR:		kern/219701
Submitted by:	avg
MFC after:	3 days
Sponsored by:	Spectra Logic
2017-06-27 19:26:02 +00:00
Kenneth D. Merry
89763b3f8e In scsi_zbc_in(), fill in the length in the ZBC IN CDB.
Without the allocation length set, the target will either reject
the command or complete it without transferring any data.

This fixes the REPORT ZONES command for SCSI ZBC protocol devices,
as well as ATA ZAC protocol devices that are behind a SCSI to ATA
translation layer.  (LSI/Broadcom's 12Gb SAS adapters translate ZBC
commands to ZAC commands.)  Those are Host Aware and Host Managed SMR
drives.

This will fix REPORT ZONE commands sent to the da(4) driver via the
GEOM bio interface and zonectl, and REPORT ZONE commands sent from
camcontrol(8).

Note that in the case of camcontrol(8), we currently only send
SCSI ZBC commands to native SCSI protocol devices, not ATA devices
behind a SAT layer.

sys/cam/scsi/scsi_da.c:
	Fill in the length field in scsi_zbc_in().

MFC after:	3 days
Sponsored by:	Spectra Logic
2017-06-27 17:55:25 +00:00
Navdeep Parhar
ea168fbc64 cxgbe/iw_cxgbe: Disable debug output by default. The help text for the sysctl
already says that the default is 0.

Sponsored by:	Chelsio Communications
2017-06-27 17:48:11 +00:00
Navdeep Parhar
edc9c9cd54 cxgbe/iw_cxgbe: Catch up with r319722. The socket lock is not the same as the
lock for the receive buffer any more.

Sponsored by:	Chelsio Communications
2017-06-27 17:45:47 +00:00
Alan Cox
d37837249b Address the remaining integer overflow issues with the "skip" parameters
and "next_skip" variables.  The "skip" value in struct blist has long been
a 64-bit quantity but various functions have implicitly truncated this
value to 32 bits.  Now, all arithmetic involving the "skip" value is 64
bits wide.  (This should allow us to relax the size limit on a swap device
in the swap pager.)

Maintain the ability to test this allocator as a user-space application by
including <stdbool.h>.

Remove an unused variable from blst_radix_print().

Reviewed by:	kib, markj
MFC after:	4 weeks
Differential Revision:	https://reviews.freebsd.org/D11358
2017-06-27 17:45:26 +00:00
Navdeep Parhar
98d391a0d3 cxgbe/t4_tom: sbspace on listening sockets is no longer supported (as of
r319722), use sol_sbrcv_hiwat instead.

Sponsored by:	Chelsio Communications
2017-06-27 17:43:28 +00:00
Conrad Meyer
f5b7359a00 Fix one more place uio_resid is truncated to int
A follow-up to r231949 and r194990.

Reported by:	pho@
Reviewed by:	kib@, markj@
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11373
2017-06-27 17:23:20 +00:00
Enji Cooper
6bfe453238 Fix LINT, broken by a -Wformat warning in r320329 with PFS_DELEN being
changed from %d to a long-width type.

Use uintmax_t casting and %ju to futureproof the format string against
potential changes with either the #define or the implementation-specific
definition for offsetof(..).
2017-06-27 17:01:46 +00:00
Warner Losh
1d6e811063 Namespace is 32-bits, don't cast it to 16 here 2017-06-27 16:48:05 +00:00
Andrew Turner
355ffcc842 Add parentheses missed in r320388
Sponsored by:	DARPA, AFRL
2017-06-27 16:30:01 +00:00
Edward Tomasz Napierala
3c264086aa Revert part of r320359, as suggested by rmacklem@. That case is only used
for nfsuserd -manage-gids and shouldn't depend on sysctl.

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2017-06-27 15:14:06 +00:00
Pedro F. Giffuni
26f36b55b6 ext2fs: Support e2di_uid_high and e2di_gid_high.
The fields exist on all versions of the filesystem and using them is a mount
option on linux. For FreeBSD, the corresponding i_uid and i_gid are always
long enough so use them by default.

Reviewed by:	Fedor Uporov
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D11354
2017-06-27 15:07:19 +00:00
Andrew Turner
c90baf6817 Some of the atomic_clear_* functions were incorrectly defined to be an
atomic add. Correct these, fixing a NULL-pointer dereference in netgraph.

PR:		220273
MFC after:	3 days
Sponsored by:	DARPA, AFRL
2017-06-27 10:45:13 +00:00
Josh Paetzel
a4d3d010b8 ioctl METEORGBRIG in bktr_core.c forgets to add 128 to value
PR:	59289
Submitted by:	Danovitsch@Vitsch.net
2017-06-27 03:57:31 +00:00
Josh Paetzel
f7c7e94da3 driver incorrectly handles the setting of frame rates
PR:	36415
Submitted by:	brandt@fokus.gmd.de
2017-06-27 03:45:09 +00:00
Justin Hibbits
3d1357108a Disable interrupts when updating the TLB
Without disabling interrupts it's possible for another thread to preempt
and update the registers post-read (tlb1_read_entry) or pre-write
(tlb1_write_entry), and confuse the kernel with mixed register states.

MFC after:	2 weeks
2017-06-27 01:57:22 +00:00
Justin Hibbits
b436609213 Update comments and simplify conditionals for compat32
Only amd64 (because of i386) needs 32-bit time_t compat now, everything else is
64-bit time_t.  Rather than checking on all 64-bit time_t archs, only check the
oddball amd64/i386.

Reviewed By: emaste, kib, andrew
Differential Revision: https://reviews.freebsd.org/D11364
2017-06-27 01:29:10 +00:00
Marcelo Araujo
4323355e76 With r318394 seems it breaks gpart(8) in some embedded systems such like PCEngines,
RPI1-B, Alix and APU2 boards as well as NanoBSD with the following message:

vnode_pager_generic_getpages_done: I/O read error 5

Seems the breakage was because it was missed to include acr in glabel update.

Reported by:	Peter Blok <pblok@bsd4all.org>,
		madpilot, imp and trasz.
Reviewed by:	trasz
Tested by:	Peter Blok and madpilot.
MFC after:	3 days.
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D11365
2017-06-27 01:22:27 +00:00
Andrew Turner
8e26c6b62f In _bswap16 and _bswap32 cast constant values to the appropriate type. This is
similar to what is done in the x86 code.

Sponsored by:	DARPA, AFRL
2017-06-26 22:32:52 +00:00
Oleksandr Tymoshenko
7db267b3f6 [arm] Use correct index value when checking range validity
Reviewed by:	andrew
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D9145
2017-06-26 21:45:33 +00:00
Mark Johnston
a00230f61a Sort SRCS.
MFC after:	1 week
2017-06-26 21:14:33 +00:00
Mark Johnston
b3db6c0140 Fix a memory leak in ses_get_elm_devnames().
After r307132 the sbuf buffer is malloc()ed, but corresponding
sbuf_delete() call was missing.

Fix a nearby whitespace bug.

MFC after:	3 days
Sponsored by:	Dell EMC Isilon
2017-06-26 19:41:14 +00:00
Kurt Lidl
58deaaf128 Add IPSEC support to mips ERL kernel config file 2017-06-26 18:28:00 +00:00
Mark Johnston
9ea3e14182 Implement parts of the hrtimer API in the LinuxKPI.
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11359
2017-06-26 16:28:46 +00:00
Edward Tomasz Napierala
6a3450e178 Add vfs.nfsd.nfsd_enable_uidtostring, which works just like
vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1,
it forces the NFSv4 server to return numeric UIDs and GIDs instead
of "user@domain" strings. This helps with clients that can't
translate returned identifiers, eg when rerooting.

The same can be achieved by just never running nfsuserd(8),
but the sysctl is useful to toggle the behaviour back and forth
without rebooting.

Reviewed by:	rmacklem (earlier version)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D11326
2017-06-26 13:11:21 +00:00
Andriy Gapon
16454bee3a linux_getdents, linux_readdir: fix mismatch between malloc and free tags
MFC after:	3 days
2017-06-26 09:13:25 +00:00
Andriy Gapon
c20b00c6af zfs: port vdev_file part of illumos change 3306
3306 zdb should be able to issue reads in parallel
illumos/illumos-gate/31d7e8fa33fae995f558673adb22641b5aa8b6e1
https://www.illumos.org/issues/3306

The upstream change was made before we started to import upstream commits
individually.  It was imported into the illumos vendor area as r242733.
That commit was MFV-ed in r260138, but as the commit message says
vdev_file.c was left intact.

This commit actually implements the parallel I/O for vdev_file using a
taskqueue with multiple thread.  This implementation does not depend on
the illumos or FreeBSD bio interface at all, but uses zio_t to pass
around all the relevent data.  So, the code looks a bit different from
the upstream.

This commit also incorporates ZoL commit
zfsonlinux/zfs/bc25c9325b0e5ced897b9820dad239539d561ec9 that fixed
https://github.com/zfsonlinux/zfs/issues/2270
We need to use a dedicated taskqueue for exactly the same reason as ZoL
as we do not implement TASKQ_DYNAMIC.

Obtained from:	illumos, ZFS on Linux
MFC after:	2 weeks
2017-06-26 09:10:09 +00:00
Justin Hibbits
fbcf7bcdf4 Solve the y2038 problem for powerpc
AKA Make time_t 64 bits on powerpc(32).

PowerPC currently (until now) was one of two architectures with a 32-bit time_t
on 32-bit archs (the other being i386).  This is an ABI breakage, so all ports,
and all local binaries, *must* be recompiled.

Tested by:	andreast, others
MFC after:	Never
Relnotes:	Yes
2017-06-26 02:25:19 +00:00
Rick Macklem
81b07aac10 Add support to the NFSv4.1/pNFS client for commits through the DS.
A NFSv4.1/pNFS server using File Layout can specify that Commit operations
are to be done against the DS instead of MDS. Since no extant pNFS
server did this, the code was untested and "#ifdef notyet".
The FreeBSD pNFS server I am developing does specify that Commits be done
through the DS, so the code has been enabled/tested.
This patch should only affect the case of a pNFS server that specfies
Commits through the DS.

PR:		219551
MFC after:	2 weeks
2017-06-26 00:43:04 +00:00
Konstantin Belousov
8a89ca9425 For now, allow mprotect(2) over the guards to succeed regardless of
the requested protection.

The syscall returns success without changing the protection of the
guard.  This is consistent with the current mprotect(2) behaviour on
the unmapped ranges.  More important, the calls performed by libc and
libthr to allow execution of stacks, if requested by the loaded ELF
objects, do the expected change instead of failing on the grow space
guard.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-25 23:16:37 +00:00
Konstantin Belousov
19f49ad30f Correctly handle small MAP_STACK requests.
If mmap(2) is called with the MAP_STACK flag and the size which is
less or equal to the initial stack mapping size plus guard,
calculation of the mapping layout created zero-sized guard.  Attempt
to create such entry failed in vm_map_insert(), causing the whole
mmap(2) call to fail.

Fix it by adjusting the initial mapping size to have space for
non-empty guard.  Reject MAP_STACK requests which are shorter or equal
to the configured guard pages size.

Reported and tested by:	Manfred Antar <null@pozo.com>
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-25 20:06:05 +00:00
Konstantin Belousov
ae5bb0cac8 Remove stale part of the comment.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-25 19:59:39 +00:00
Mark Johnston
ee7c3198cd Add u64_to_user_ptr() to the LinuxKPI.
MFC after:	1 week
2017-06-25 19:30:20 +00:00
Mark Johnston
1fde37964d Add ns_to_ktime() to the LinuxKPI.
MFC after:	1 week
2017-06-25 19:28:01 +00:00
Mark Johnston
934277c59c Add a couple of macros to lockdep.h in the LinuxKPI.
MFC after:	1 week
2017-06-25 19:23:14 +00:00
Mark Johnston
0bfde0a7c7 Add the thaw_early method to struct dev_pm_ops in the LinuxKPI.
MFC after:	1 week
2017-06-25 19:21:59 +00:00
Mark Johnston
4eb1bcfc62 Add noop_lseek() to the LinuxKPI.
MFC after:	1 week
2017-06-25 19:20:12 +00:00
Konstantin Belousov
f141ed73a2 Style.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-25 18:40:59 +00:00
Dmitry Chagin
0a7c8d302a PFS_DELEN is the sum of the permanent part of the struct dirent and
fixed size for the name buffer PFS_NAMELEN.
As r318736 was commited (ino64 project) the size of the permanent part
of the struct dirent was changed, so calulate PFS_DELEN properly.
2017-06-25 15:21:51 +00:00
Andrew Turner
e899a0575b Stop calling cpu_dcache_wb_range from PTE_SYNC.
We set the shareability attributes in TCR_EL1 on boot. These tell the
hardware the pagetables are in cached memory so there is no need to flush
the entries from the cache to memory.

This has about 4.2% improvement in system time and 2.7% improvement in
user time for a buildkernel -j48 on a ThunderX.

Keep the old code for now to allow for further comparisons.
2017-06-25 13:22:49 +00:00
Emmanuel Vadot
ac485d3178 Remove ALLWINNER kernel config file, all release image for SMP Allwinner
board uses GENERIC and it's not updated for newer SoC.
2017-06-25 11:31:39 +00:00
Gleb Smirnoff
64290befc1 Provide sbsetopt() that handles socket buffer related socket options.
It distinguishes between data flow sockets and listening sockets, and
in case of the latter doesn't change resource limits, since listening
sockets don't hold any buffers, they only carry values to be inherited
by their children.
2017-06-25 01:41:07 +00:00
Rick Macklem
a351e99ce6 Add two new compound RPCs to the NFSv4.1/pNFS client.
When the NFSv4.1 client is doing pNFS, it needs to get an Open and
a Layout for every file it will be doing I/O on. The current code
does two separate RPCs to get these. This patch adds two new compounds
that do the both the Open and LayoutGet in the same RPC, reducing the
RPC count.
It also factors out the code that sets up and parses the LayoutGet operation
into separate functions, so that the code doesn't get duplicated for
these new RPCs.
This patch is fairly large, but should only affect the NFSv4.1 client
when the "pnfs" option is specified.

PR:		219550
MFC after:	2 weeks
2017-06-24 20:01:21 +00:00
Alan Cox
e22415906d Increase the pageout cluster size to 32 pages.
Decouple the pageout cluster size from the size of the hash table entry
used by the swap pager for mapping (object, pindex) to a block on the
swap device(s), and keep the size of a hash table entry at its current
size.

Eliminate a pointless macro.

Reviewed by:	kib, markj (an earlier version)
MFC after:	4 weeks
Differential Revision:	https://reviews.freebsd.org/D11305
2017-06-24 17:10:33 +00:00
Konstantin Belousov
19bd0d9c85 Implement address space guards.
Guard, requested by the MAP_GUARD mmap(2) flag, prevents the reuse of
the allocated address space, but does not allow instantiation of the
pages in the range.  It is useful for more explicit support for usual
two-stage reserve then commit allocators, since it prevents accidental
instantiation of the mapping, e.g. by mprotect(2).

Use guards to reimplement stack grow code.  Explicitely track stack
grow area with the guard, including the stack guard page.  On stack
grow, trivial shift of the guard map entry and stack map entry limits
makes the stack expansion.  Move the code to detect stack grow and
call vm_map_growstack(), from vm_fault() into vm_map_lookup().

As result, it is impossible to get random mapping to occur in the
stack grow area, or to overlap the stack guard page.

Enable stack guard page by default.

Reviewed by:	alc, markj
Man page update reviewed by:	alc, bjk, emaste, markj, pho
Tested by:	pho, Qualys
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D11306 (man pages)
2017-06-24 17:01:11 +00:00
Konstantin Belousov
546bb2d7f0 Do not try to unmark MAP_ENTRY_IN_TRANSITION marked by other thread.
The issue is catched by "vm_map_wire: alien wire" KASSERT at the end
of the vm_map_wire().  We currently check for MAP_ENTRY_WIRE_SKIPPED
flag before ensuring that the wiring_thread is curthread. For HOLESOK
wiring, this means that we might see WIRE_SKIPPED entry from different
wiring.

The fix it by only checking WIRE_SKIPPED if the entry is put
IN_TRANSITION by us.  Also fixed a typo in the comment explaining the
situation.

Reported and tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-24 16:47:41 +00:00
Emmanuel Vadot
7f61394200 Allwinner: Add support for H2 Plus SoC
H2+ SoC is a stripped down version of H3 without gigabit ethernet and 4K HDMI.
Also add sun8i-h2-plus-orangepi-zero.dts to the build as we run on this board.
2017-06-24 16:41:26 +00:00
Konstantin Belousov
c377ff617b Translate between abridged and full x87 tags for compat32
ptrace(PT_GETFPREGS).

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-24 11:38:31 +00:00
Konstantin Belousov
f7df80f4ed Fix indent.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-06-24 10:19:06 +00:00
Emmanuel Vadot
644e7b2956 loader.efi: Disable smbios for arm
The smbios code does a lot of unaligned access, since we don't really
care about smbios info on ARM (not all board expose information and those
who does don't expose useful ones) disable smbios for this arch (at least
for now).
2017-06-24 09:33:25 +00:00
Michael Tuexen
f4358911bf Handle sctp_get_next_param() in a consistent way.
This addresses an issue found by Felix Weinrank using libfuzz.
While there, use also consistent nameing.

MFC after:	3 days
2017-06-23 21:01:57 +00:00
Ed Maste
22398b764e Allow Clang's integrated assembler to assemble boot0
dim@ compared clang IAS-built and GNU as-built boot0 and found them
equivalent.  IAS encoded one instruction using two bytes where GNU as
used three, and another instruction using three bytes where GNU as used
two.  The net result is equivalent and tested, so there is no need to
force IAS off for boot0.
2017-06-23 18:41:49 +00:00
Ed Maste
cdd89b9897 Introduce LINKER_FEATURES to avoid duplicating version logic
Submitted by:	bdrewery
Reported by:	kib
2017-06-23 17:21:37 +00:00
Ed Maste
53e0bebaca enable --build-id for the kernel link
A Build-ID is an identifier generated at link time to uniquely identify
ELF binaries.  It allows efficient confirmation that an executable or
shared library and a corresponding standalone debuginfo file match.
(Otherwise, a checksum of the debuginfo file must be calculated when
opening it in a debugger.)

The FreeBSD base system includes GNU bfd ld 2.17.50 as the linker for
architectures other than arm64.  Build-ID support was added to bfd ld
shortly after that version, so was not previously available to us.

We can now start making use of Build-ID as we migrate to using lld or
bfd ld from ports, conditionally enabled based on the LINKER_TYPE and
LINKER_VERSION make variables added in r320244 and subsequent commits.

Reviewed by:	dim
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D11314
2017-06-23 15:57:58 +00:00
Andriy Gapon
103f3b6099 jedec_ts: add support for devices manufactured by IDT
Full manufacturer name is Integrated Device Technology, Inc.
Supported devices include TSE2002B3C and TS3000B3A.

MFC after:	1 week
2017-06-23 11:55:43 +00:00
Mahdi Mokhtari
4b36080668 Fix caveat in new implementation of linprocfs_docpuinfo():
Prevent kernel panic in case that extended-cpuid isn't supported by CPU

Reviewed by:	kib, ngie, trasz
Approved by:	trasz
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11294
2017-06-23 10:36:27 +00:00
Michael Tuexen
d44b45df2c Check the length of a COOKIE chunk before accessing fields in it.
Thanks to Felix Weinrank for reporting the issue he found by using
libFuzzer.

MFC after:	3 days
2017-06-23 10:09:49 +00:00
Michael Tuexen
1a7abbb3be Use a longer buffer for messages in ERROR chunks.
This allows them to be sent in a non truncated way and addresses a warning
given by newver versions of gcc.
Thanks to Anselm Jonas Scholl for reporting it and providing a patch.
2017-06-23 09:27:31 +00:00
Andriy Gapon
ee2d3c0a5b fix gcc-specific fallout from r320156, MFV of r318946, ZFS ABD
Reported by:	jhibbits
MFC after:	1 week
X-MFC with:	r320156
2017-06-23 08:42:53 +00:00
Michael Tuexen
94f66d603a Honor the backlog field. 2017-06-23 08:35:54 +00:00
Michael Tuexen
3017b21bb6 Improve compilation on platforms different from FreeBSD. 2017-06-23 08:34:01 +00:00
Andriy Gapon
9823ed182c jedec_ts: read device id from the correct register
Due to my braino / typo the driver was reading the Vendor ID register
twice.

MFC after:	3 days
2017-06-23 06:25:39 +00:00
Andriy Gapon
3385c74539 MFV r319950: 5220 L2ARC does not support devices that do not provide 512B access
FreeBSD note: the actual change has been in FreeBSD since r297848.  This
commit accounts for integration of that change with subsequent changes,
especially r320156 (MFV of r318946) and r314274.

illumos/illumos-gate@403a8da73c
403a8da73c

https://www.illumos.org/issues/5220
  There are disk devices that have logical sector size larger than 512B, for
  example 4KB. That is, their physical sector size is larger than 512B and they
  do not provide emulation for 512B sector sizes. For such devices both a data
  offset and a data size must be properly aligned. L2ARC should arrange that
  because it uses physical I/O.
  zio_vdev_io_start() performs a necessary transformation if io_size is not
  aligned to vdev_ashift, but that is done only for logical I/O. Something
  similar should be done in L2ARC code.
      * a temporary write buffer should be allocated if the original buffer is
        not going to be compressed and its size is not aligned
      * size of a temporary compression buffer should be ashift aligned
      * for the reads, if a size of a target buffer is not sufficiently large and
        it is not aligned then a temporary read buffer should be allocated

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>

MFC after:	3 weeks
2017-06-22 17:10:34 +00:00
Andriy Gapon
ae5ec64b88 MFV r319742: 8056 zfs send size estimate is inaccurate for some zvols
illumos/illumos-gate@0255edcc85
0255edcc85

https://www.illumos.org/issues/8056
  The send size estimate for a zvol can be too low, if the size of the record
  headers (dmu_replay_record_t's) is a significant portion of the size.
  This is typically the case when the data is highly compressible, especially
  with embedded blocks.
  The problem is that dmu_adjust_send_estimate_for_indirects() assumes that
  blocks are the size of the "recordsize" property (128KB).
  However, for zvols, the blocks are the size of the "volblocksize" property
  (8KB). Therefore, we estimate that there will be 16x less record headers than
  there really will be.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>

MFC after:	3 weeks
2017-06-22 16:58:09 +00:00
Andriy Gapon
e70097b50f MFV r318947: 7578 Fix/improve some aspects of ZIL writing.
FreeBSD note: this commit removes small differences between what mav
committed to FreeBSD in r308782 and what ended up committed to illumos
after addressing all review comments.

illumos/illumos-gate@c5ee46810f
c5ee46810f

https://www.illumos.org/issues/7578
  After some ZIL changes 6 years ago zil_slog_limit got partially broken
  due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
  Actually because of other changes about that time zl_itx_list_sz is not
  really required to implement the functionality, so this patch removes
  some unneeded broken code and variables.
  Original idea of zil_slog_limit was to reduce chance of SLOG abuse by
  single heavy logger, that increased latency for other (more latency critical)
  loggers, by pushing heavy log out into the main pool instead of SLOG. Beside
  huge latency increase for heavy writers, this implementation caused double
  write of all data, since the log records were explicitly prepared for SLOG.
  Since we now have I/O scheduler, I've found it can be much more efficient
  to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
  to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.
  Existing ZIL implementation had problem with space efficiency when it
  has to write large chunks of data into log blocks of limited size. In some
  cases efficiency stopped to almost as low as 50%. In case of ZIL stored on
  spinning rust, that also reduced log write speed in half, since head had to
  uselessly fly over allocated but not written areas. This change improves
  the situation by offloading problematic operations from z*_log_write() to
  zil_lwb_commit(), which knows real situation of log blocks allocation and
  can split large requests into pieces much more efficiently. Also as side
  effect it removes one of two data copy operations done by ZIL code WR_COPIED
  case.
  While there, untangle and unify code of z*_log_write() functions.
  Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
  block boundary, that may also improve efficiency if ZPL is made to do that.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alexander Motin <mav@FreeBSD.org>

MFC after:	3 weeks
2017-06-22 16:52:22 +00:00
Conrad Meyer
dfa85e21c4 sglist.h: Fix sg_refs signedness to match refcount(9)
PR:		220122
Reported by:	Mark Millard <markmi at dsl-only.net>
Sponsored by:	Dell EMC Isilon
2017-06-22 15:52:18 +00:00
Ed Maste
3886399158 retire arm64 kernel module linker workaround
Relocatable linking in aarch64 ld from binutils 2.25.1 does not work.
The linker corrupts the references to the external symbols which are
defined by other object in the linking set and should therefore lose
the GOT entry.

The problem is fixed in later versions of GNU ld and does not exist in
the in-tree lld linker that we now use by default for arm64, so the
workaround can be removed.

Reviewed by:	kib
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D11302
2017-06-22 15:09:42 +00:00
Ed Maste
4267fb758b Make structure padding explicit in EFI_MEMORY_DESCRIPTOR
The EFI memory descriptor 64-bit aligns PhysicalStart on both 32- and
64-bit platforms.  Make the padding explicit for i386 EFI.

Submitted by:	Siva Mahadevan <smahadevan@freebsdfoundation.org>
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D11301
2017-06-22 14:30:09 +00:00
Bryan Drewery
210ecc0021 Rework logic for skipping .depend/.meta file read/stat/writes.
- Rename _SKIP_READ_DEPEND to _SKIP_DEPEND since it also avoids writing.
- This now uses .NOMETA to avoid reading any .meta files related to
  DEPENDOBJS.  Objects not in OBJS/DEPENDOBJS may still have their .meta
  files read in if they are in the dependency graph.
- This also avoids statting .meta and .depend files in the META_MODE +
  -DNO_FILEMON case.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-06-22 05:34:41 +00:00
Pedro F. Giffuni
a821bdcfd9 ext2fs: add dir_nlink feature support.
ext4 on linux has always supported more than 32000 directories through
the dir_nlink feature, but FreeBSD was unable to catch up on this feature.
As part of the 64 bit inode changes nlink_t has been extended and this
feature is now possible.

Submitted by:	Fedor Uporov
Differential Revision:	https://reviews.freebsd.org/D11210
2017-06-22 02:43:32 +00:00
Ed Maste
1f7d7cd76a msdosfs: reformat a comment to reduce NetBSD diffs 2017-06-22 01:11:20 +00:00
Rick Macklem
6d7963ecd4 Ensure that the credentials field of the NFSv4 client open structure is
initialized.

bdrewery@ has reported panics "newnfs_copycred: negative nfsc_ngroups".
The only way I can see that this occurs is that the credentials field of
the open structure gets used before being filled in.
I am not sure quite how this happens, but for the file create case, the
code is serialized via the vnode lock on the directory. If, somehow, a
link to the same file gets created just after file creation, this might
occur.

This patch ensures that the credentials field is initialized to a reasonable
set of credentials before the structure is linked into any list, so I
this should ensure it is initialized before use.
I am committing the patch now, since bdrewery@ notes that the panics
are intermittent and it may be months before he knows if the patch fixes
his problem.

Reported by:	bdrewery
MFC after:	2 weeks
2017-06-22 00:17:15 +00:00
Bryan Drewery
202d6f8c61 Fix various 'make *clean *all *install' combinations.
This follows commits like r320174 in share/mk/bsd.dep.mk.

MFC after:	3 days
Sponsored by:	Dell EMC Isilon
2017-06-21 19:55:26 +00:00
Konstantin Belousov
0ec97ffc10 Call pmap_copy() only for map entries which have the backing object
instantiated.

Calling pmap_copy() on non-faulted anonymous memory entries is useless.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-21 18:54:28 +00:00
Konstantin Belousov
00de677313 Assert that the protection of a new map entry is a subset of the max
protection.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-21 18:51:30 +00:00
Zbigniew Bodek
3706b98788 Enable arm,io-coherent property of PL310 L2 cache on Armada 38x platforms
This patch disables outer cache sync in PL310 driver
by adding "arm,io-coherent" property. In addition to
the previous patches it was the last bit needed
for enabling proper operation of Armada 38x SoCs
with the IO cache coherency.

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: mmel
Differential revision: https://reviews.freebsd.org/D11204
2017-06-21 18:28:37 +00:00
Zbigniew Bodek
3361fdc431 Create root DMA tag and fix MBUS windows on DMA coherent platforms
Armada 38x SoCs, in order to work properly in IO-coherent mode,
requires an update of the MBUS windows attributesd.

This patch also configures nexus coherent dma tag, because all
busses and children devices have to inherit this setting in runtime.
The latter has to be executed as a sysinit (SI_SUB_DRIVERS type),
so that bus_dma_tag_create() can be executed properly.

Submitted by: Michal Mazur <mkm@semihalf.com>
 	      Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: ian
Differential revision: https://reviews.freebsd.org/D11203
2017-06-21 18:27:05 +00:00
Zbigniew Bodek
474612c372 Enable setting the dma tag at the nexus level
Allow to set the dma tag for nexus in the platform init code,
so that all busses and devices would be able to inherit it.
This change is useful e.g. for setting coherent dma tag for
the platforms with hardware IO cache coherency.

Submitted by: ian
      	      Michal Mazur <mkm@semihalf.com>
Reviewed by: ian
Differential revision: https://reviews.freebsd.org/D11202
2017-06-21 18:25:35 +00:00
Zbigniew Bodek
990b485c6f Introduce support for DMA coherent ARM platforms
- Inherit BUS_DMA_COHERENT flag from parent buses
- Use cacheable memory attributes on dma coherent platform
- Disable cache synchronization on coherent platform

Changes are based on ARMv8 busdma code and commit r299683.

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: ian
Differential revision: https://reviews.freebsd.org/D11201
2017-06-21 18:23:28 +00:00
Mark Johnston
c73cdca2c4 Update io-mapping.h in the LinuxKPI.
Add io_mapping_init_wc() and add a third (unused) parameter to
io_mapping_map_wc().

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11286
2017-06-21 18:20:17 +00:00
Mark Johnston
47d8a7d4d1 Add missing lock destructor invocations to the LinuxKPI unload handler.
MFC after:	1 week
2017-06-21 18:17:32 +00:00
Mark Johnston
9b6197df69 Include kmod.h from the LinuxKPI's module.h.
MFC after:	1 week
2017-06-21 18:15:47 +00:00
Mark Johnston
33baed9452 Add a lockdep macro to the LinuxKPI.
Also fix some nearby style issues.

MFC after:	1 week
2017-06-21 18:08:36 +00:00
Hans Petter Selasky
cde3f930bc Allow the VM fault handler to be NULL in the LinuxKPI when handling a
memory map request. When the VM fault handler is NULL a return code of
VM_PAGER_BAD is returned from the character device's pager populate
handler. This fixes compatibility with Linux.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2017-06-21 14:38:52 +00:00
Andriy Gapon
a4a2976d8a fix several fallouts from r320156, ZFS ABD import
All of the problems were related to the FreeBSD-only features.
One was caused by a mismerge in the zfsbootcfg support code.
All others were in the TRIM support code.

MFC after:	1 week
X-MFC with:	r320156
2017-06-21 08:12:07 +00:00
Andriy Gapon
ebf3b53dac fix several fallouts from r320156, ZFS ABD import
All of the problems were related to the FreeBSD-only features.
One was caused by a mismerge in the zfsbootcfg support code.
All others were in the TRIM support code.

Reported by:	ken,
		O. Hartmann <ohartmann@walstatt.org>,
		Trond Endrestøl <Trond.Endrestol@fagskolen.gjovik.no>
MFC after:	1 week
X-MFC with:	r320156
2017-06-21 08:10:45 +00:00
Sepherosa Ziehau
cdb5402c4c hyperv/storvsc: Reduce log verbosity
On some windows hosts TEST_UNIT_READY command will return
SRB_STATUS_ERROR and sense data "NOT READY asc:3a,1 (Medium
not present - tray closed)", this occurs periodically, and
not hurt anything else.  So, we prefer to ignore this kind
of errors.

PR:		219973
Submitted by:	Hongjiang Zhang <hongzhan microsoft com>
MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D11271
2017-06-21 06:44:56 +00:00
Alan Cox
3a5d839ebc Eliminate an unused macro.
MFC after:	3 days
2017-06-21 03:55:45 +00:00
Ed Maste
5964294736 add -znotext to kernel module link invocation
ARM kernel modules require .text relocations (DT_TEXTREL) in shared
object ouptut, which is not allowed by default by lld.  Add the -znotext
option to enable this.  For simplicity add it unconditionally: it is
already default and thus either redundant (GNU BFD ld and gold from
ports) or ignored as an unknown option (GNU BFD ld 2.17.50 in the base
system).

Reviewed by:	kib
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D11250
2017-06-21 00:33:16 +00:00
Alexander Motin
eb76cfbcb7 Add some device IDs for Intel Denverton SoCs. 2017-06-21 00:30:57 +00:00
David C Somayajulu
d2b62c5897 Add pkts_cnt_oversized to stats. 2017-06-20 21:17:05 +00:00
Pedro F. Giffuni
aee33af1f1 Attempt to treat "metadata" as a collectively singular noun.
Or at least more consistent.

Input from:	matteo, ian
2017-06-20 20:22:34 +00:00
Luiz Otavio O Souza
aee2c3a86a Always ignore the START and STOP bits whenever the control register is
being overwritten, they are set only bits (cleared by hardware).

Disable the Acknowledge of the controller slave address.  The slave mode is
not supported.

Make sure the interrupt flag bit is being cleared as recommended, add a
delay() _after_ clear the interrupt bit.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-06-20 18:38:51 +00:00
Luiz Otavio O Souza
872625501b Make ofw_iicbus attach to twsi I2C controllers.
Add the ofw_bus_get_node() callback in mv_twsi, it is mandatory for the
ofw_iicbus usage.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-06-20 18:25:27 +00:00
Luiz Otavio O Souza
db79a5f9f6 Allow the use of extended media types with if_mvneta, so it can report 2.5G
speeds properly.

While here remove a couple of stray white spaces.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-06-20 18:14:56 +00:00
Luiz Otavio O Souza
091d140c99 Add support to 2.5G uplink for the MV88E6141 and MV88E6341 switches.
Force the switch port settings for fixed media types.

Tested with:	88E6176, 88E6141
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-06-20 18:11:23 +00:00
Andriy Gapon
f9cdbaba8d MFV r318946: 8021 ARC buf data scatter-ization
illumos/illumos-gate@770499e185
770499e185

https://www.illumos.org/issues/8021
  The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
  community) changes the way the ARC allocates `b_pdata` memory from using linear
  `void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
  improves ZFS's performance by helping to defragment the address space occupied
  by the ARC, in particular for cases where compressed ARC is enabled. It could
  also ease future work to allocate pages directly from `segkpm` for minimal-
  overhead memory allocations, bypassing the `kmem` subsystem.
  This is essentially the same change as the one which recently landed in ZFS on
  Linux, although they made some platform-specific changes while adapting this
  work to their codebase:
  1. Implemented the equivalent of the `segkpm` suggestion for future work
  mentioned above to bypass issues that they've had with the Linux kernel memory
  allocator.
  2. Changed the internal representation of the ABD's scatter/gather list so it
  could be used to pass I/O directly into Linux block device drivers. (This
  feature is not available in the illumos block device interface yet.)

FreeBSD notes:
- the actual (default) chunk size is 4KB (despite the text above saying 1KB)
- we can try to reimplement ABDs, so that they are not permanently
  mapped into the KVA unless explicitly requested, especially on
  platforms with scarce KVA
- we can try to use unmapped I/O and avoid intermediate allocation of a
  linear, virtual memory mapped buffer
- we can try to avoid extra data copying by referring to chunks / pages
  in the original ABD

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>

MFC after:	3 weeks
2017-06-20 17:39:24 +00:00
Andriy Gapon
42ce346fcc revert r315852 which introduced zio_buf_alloc_nowait for use in vdev_queue_aggregate
I think that the change is still good, but reconciling it with a planned
merge of the ARC buf data scatter-ization is a bit more tedious
than I can handle.

MFC after:	17 days
2017-06-20 16:55:30 +00:00
Pedro F. Giffuni
af2ae31d93 Improve grammar concerning "metadata".
Remove unnecessary space while here.
2017-06-20 14:35:19 +00:00
Pedro F. Giffuni
d23db91ef4 ext2fs: Add uninit_bg feature support.
From the linux tune2fs(8) manpage:
"Allow the kernel to initialize bitmaps and inode tables and keep a high
watermark for the unused inodes in a filesystem, to reduce e2fsck(8) time.
This first e2fsck run after enabling this feature will take the full time,
but subsequent e2fsck runs will take only a fraction of the original time,
depending on how full the file system is."

Submitted by:	Fedor Uporov
Differential Revision:	https://reviews.freebsd.org/D11211
2017-06-20 14:28:51 +00:00
Zbigniew Bodek
8cbc8d3dd1 Disable PL310 outer cache sync for IO coherent platforms
When a PL310 cache is used on a system that provides hardware
coherency, the outer cache sync operation is useless, and can be
skipped. Moreover, on some systems, it is harmful as it causes
deadlocks between the Marvell coherency mechanism, the Marvell PCIe
or Crypto controllers and the Cortex-A9.

To avoid this, this commit introduces a new Device Tree property
'arm,io-coherent' for the L2 cache controller node, valid only for the
PL310 cache. It identifies the usage of the PL310 cache in an I/O
coherent configuration. Internally, it makes the driver disable the
outer cache sync operation.

Note, that other outer-cache operations are not removed, as they may
be needed for certain situations, such as booting secondary CPUs.
Moreover, in order to enable IO coherent operation, the decision
whether to use L2 cache maintenance callbacks is done in busdma
layer, which was enabled in one of the previous commits.

Submitted by: Michal Mazur <mkm@semihalf.com>
	      Marcin Wojtas <mw@semihalf.com>
Reviewed by: mmel
Obtained from: Semihalf
Differential revision: https://reviews.freebsd.org/D11245
2017-06-20 11:11:42 +00:00
Zbigniew Bodek
b50f666958 Implement workaround for Armada 38X family HW issue between CPU and devices
There is a hardware problem between Cortex-A9 CPUs and on-chip devices
in Armada 38X SoCs that may cause hang on heavy load. This can be
however worked around by mapping all registers and PCI IO
as strongly ordered instead of device memory.

Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Reviewed by: mmel
Tested by: mw_semihalf.com
Obtained from: Semihalf
Differential revision: https://reviews.freebsd.org/D10218
2017-06-20 11:09:38 +00:00
Emmanuel Vadot
4dd0826b45 Add sun8i-h3-orangepi-one.dts to the build
We support the board and have a u-boot port for it.
2017-06-20 04:58:12 +00:00
Emmanuel Vadot
ef7b0c3b49 Remove some custom DTS files as we are using upstream ones. 2017-06-20 03:41:06 +00:00
Emmanuel Vadot
d1f30066f0 Update the GNU DTS file from Linux 4.11 2017-06-20 03:13:49 +00:00
Rick Macklem
ee791357a2 Add the definition of maxbcachebuf to sys/buf.h.
r320070 removed the definition of maxbcachebuf from sys/param.h to
fix the build for arm.
This patch adds the definition of maxbcachebuf to sys/buf.h, which
should be ok, since sys/buf.h is not being included in arm/arm/elf_note.S.

Suggested by:	kib
MFC after:	2 weeks
2017-06-19 22:07:53 +00:00
Konstantin Belousov
cf619a92d2 Fix batched unload for DMAR busdma in qi mode.
Do not queue dmar_map_entries with zeroed gseq to
dmar_qi_invalidate_locked().  Zero gseq stops the processing in the qi
task.  Do not assign possibly uninitialized on-stack gseq to map
entries when requeuing them on unit tlb_flush queue.  Random garbage
in gsec is interpreted as too high invalidation sequence number and
again stop the processing in the task.

Make the sequence numbers generation completely contained in
dmar_qi_invalidate_locked() and dmar_qi_emit_wait_seq().  Upper code
directly passes boolean requesting emiting wait command instead of
trying to provide hint to avoid it by passing NULL gseq pointer.

Microoptimize the requeueing to tlb_flush queue by doing it for the
whole queue.

Diagnosed and tested by:	Brett Gutstein <bgutstein@rice.edu>
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-19 21:48:52 +00:00
Mark Johnston
704cb42f2a Fix the !TD_IS_IDLETHREAD(curthread) locking assertions.
Most of the lock slowpaths assert that the calling thread isn't an idle
thread. However, this may not be true if the system has panicked, and in
some cases the assertion appears before a SCHEDULER_STOPPED() check.

MFC after:	3 days
Sponsored by:	Dell EMC Isilon
2017-06-19 21:09:50 +00:00
Kenneth D. Merry
6f579fdb17 Fix a potential sleep while holding a mutex in the sa(4) driver.
If the user issues a MTIOCEXTGET ioctl, and the tape drive in question has
a serial number that is longer than 80 characters, we malloc a buffer in
saextget() to hold the output of cam_strvis().

Since a mutex is held in that codepath, doing a M_WAITOK malloc could lead
to sleeping while holding a mutex.  Change it to a M_NOWAIT malloc and bail
out if we fail to allocate the memory.  Devices with serial numbers longer
than 80 bytes are very rare (I don't recall seeing one), so this
should be a very unusual case to hit.  But it is a bug that should be fixed.

sys/cam/scsi/scsi_sa.c:
	In saextget(), if we need to malloc a buffer to hold the output of
	cam_strvis(), don't wait for the memory.  Fail and return an error
	if we can't allocate the memory immediately.

PR:		kern/220094
Submitted by:	Jia-Ju Bai <baijiaju1990@163.com>
MFC after:	3 days
Sponsored by:	Spectra Logic
2017-06-19 20:48:00 +00:00
Bryan Drewery
c99b67a794 Utilize SYSROOT from r320119 in places where DESTDIR may be wanting WORLDTMP.
Since buildenv exports SYSROOT all of these uses will now look in
WORLDTMP by default.

sys/boot/efi/loader/Makefile
        A LIBSTAND hack is no longer required for buildenv.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2017-06-19 20:47:24 +00:00
Konstantin Belousov
212e02c836 Ignore the P_SYSTEM process flag, and do not request
VM_MAP_WIRE_SYSTEM mode when wiring the newly grown stack.

System maps do not create auto-grown stack.  Any stack we handled,
even for P_SYSTEM, must be for user address space.  P_SYSTEM processes
with mapped user space is either init(8) or an aio worker attached to
other user process with aio buffer pointing into stack area.  In either
case, VM_MAP_WIRE_USER mode should be used.

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-19 20:40:59 +00:00
Konstantin Belousov
711dba24d7 Allow negative aio_offset only for the read and write LIO ops on
device nodes.

Otherwise, the current check of aio_offset == -1LL makes it possible
to pass negative file offsets down to the filesystems. This trips
assertions and is even unsafe for e.g. FFS which keeps metadata at
negative offsets.

Reported and tested by:	pho
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D11266
2017-06-19 15:17:17 +00:00
Emmanuel Vadot
acd690d524 allwinner: Configure pins for DTS >= Linux 4.11
Starting with DTS from Linux 4.11, the pins list, function, drive and pull
are no longer prefixed with "allwinner,".
Allow the pinctrl driver to handle both case.
2017-06-19 06:30:04 +00:00
Rick Macklem
95ac7f1a74 Fix the NFS client/server so that it actually uses the 64bit ino_t filenos.
The code still doesn't use d_off. That will come in a future commit.
The code also removes the checks for servers returning a fileno that
doesn't fit in 32bits, since that should work ok now.
Bump __FreeBSD_version since this patch changes the interface between
the NFS kernel modules.

Reviewed by:	kib
2017-06-18 21:48:31 +00:00
Warner Losh
4558167c59 Put ARM_USE_V6_BUSDMA into the SAM9G20EK reference kernel to try to
track down the unaligned I/O issues we have with at least USB on that
platform.
2017-06-18 21:03:53 +00:00
Warner Losh
5274cd55d5 Create a new option ARM_USE_V6_BUSDMA to force an armv4/5 kernel to
use the armv6 busdma interface. This interface uses more memory than
the armv4 one, but bounces more data more often so may be more correct
than the armv4 one. It is intended for debugging purposes only at the
moment.
2017-06-18 21:03:48 +00:00
Warner Losh
b2dc7525d5 Include the generic cpu.h instead of the v4/v6 specific cpu.h. This
one change allows it to be compiled either for v4 or v6.
2017-06-18 21:03:43 +00:00
Warner Losh
9cbc5bd323 Load the transmit dma buffer at attach time as well. We don't need to
load and unload it all the time since the buffer never changes. In
addition, we were loading it with a hardware spin lock held, which
makes the sleepable lock in busdma (for the bounce pages) trigger a
witness warning, as well as ipend being called with it held by uart,
which made it impossible to unload.

These differences don't matter with the v4 busdma implementation, but
they do with the v6 implementation since the latter likes to bounce
transactions more, and will always do so for Atmel's driver.

It's more efficient as well as being more correct.
2017-06-18 21:03:35 +00:00
Pedro F. Giffuni
5342d6e0d2 ext2fs: Enable RO huge_file feature support.
We can have support for reading ext4 "huge" files but we can't write
(anything) on ext4. and some filesystem. Formally enable the feature so
that we can mount such filesystems.

Submitted by:	Fedor Uponov
Differential Revision:	https://reviews.freebsd.org/D11209
2017-06-18 20:55:46 +00:00
Mark Johnston
8504aa9852 Add kthread parking support to the LinuxKPI.
Submitted by:	kmacy (original version)
Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11264
2017-06-18 19:22:05 +00:00
Alan Cox
d4e3484bd9 Change blist_alloc()'s allocation policy from first-fit to next-fit so
that disk writes are more likely to be sequential.  This change is
beneficial on both the solid state and mechanical disks that I've
tested.  (A similar change in allocation policy was made by DragonFly
BSD in 2013 to speed up Poudriere with "stressful memory parameters".)

Increase the width of blst_meta_alloc()'s parameter "skip" and the local
variables whose values are derived from it to 64 bits.  (This matches the
width of the field "skip" that is stored in the structure "blist" and
passed to blst_meta_alloc().)

Eliminate a pointless check for a NULL blist_t.

Simplify blst_meta_alloc()'s handling of the ALL-FREE case.

Address nearby style errors.

Reviewed by:	kib, markj
MFC after:	5 weeks
Differential Revision:	https://reviews.freebsd.org/D11247
2017-06-18 18:23:39 +00:00
Ian Lepore
fc0dd0d307 Add a driver for the imx6 EPIT timer that can be used as the system
timecounter instead of the GPT timer, freeing up the more flexible GPT
hardware for other uses.  The EPIT driver is a standard (always in the
kernel) driver, and the existing GPT driver is now optional and included
only if you ask for device imx_gpt.
2017-06-18 18:22:52 +00:00
Ian Lepore
cb058296ca Only register as the platform DELAY() implementation if the setup of the
global timer was successful, since the implementation tries to read it.

Notably, if the platform has a variable-frequency global timer (because
of dynamic frequency scaling), it doesn't set up the global timer for use
as a system timecounter, and in that case it also can't use it for DELAY.
Such platforms use different timer hardware for both timecounter and DELAY.
2017-06-18 17:26:54 +00:00
Mark Johnston
4eb18346d1 Avoid including list.h in LinuxKPI headers.
list.h includes a number of FreeBSD headers as a workaround for the
LIST_HEAD name collision. To reduce pollution, avoid including list.h
in commonly used headers when it is not explicitly needed.

Reviewed by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11249
2017-06-18 16:43:57 +00:00
Rick Macklem
1d9f01b18e Take "extern int maxbcachebuf" out of sys/param.h, since it breaks the
arm build.

In the arm build, elf_note.S includes sys/param.h and then does an
elf macro called ELFNOTE(). Although the compile error doesn't make
sense to me, I believe it just means that an "extern ..." can't exist
in param.h for this inclusion case.
I suspect adding #if !defined(LOCORE) might fix the build, but this
commit just takes the definition out.
I will ask freebsd-current@ what is the best was to deal with this
and do a subsequent commit after that.

Reported by:	melounmichal@gmail.com
2017-06-18 12:28:43 +00:00
Ed Maste
dbaa9ebf1b Add ZFS to Linux statfs ftype
PR:		220086
Reviewed by:	cem
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D11252
2017-06-18 11:51:03 +00:00
Ed Maste
fa8d7c4417 arm: add .arch_extension sec for smc instruction
Clang 4.0 accepts the smc instruction with or without specifying
.arch_extension sec, but Clang 5.0 produces an error without it.

MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
2017-06-18 00:08:38 +00:00
Emmanuel Vadot
46375c65f6 make.conf: Add the possibility to use another DTC
Add a make.conf DTC variable that control which DTC (Device Tree Compiler)
to use.

Reviewed by:	bdrewery, imp
Differential Revision:	https://reviews.freebsd.org/D9577
2017-06-17 23:34:53 +00:00
Mark Johnston
8239734079 Remove prototypes for unimplemented LinuxKPI functions.
MFC after:	1 week
2017-06-17 22:52:23 +00:00
Rick Macklem
d1c5e240a8 Make MAXBCACHEBUF a tunable called vfs.maxbcachebuf.
By making MAXBCACHEBUF a tunable, it can be increased to allow for
larger read/write data sizes for the NFS client.
The tunable is limited to MAXPHYS, which is currently 128K.
Making MAXPHYS a tunable or increasing its value is being discussed,
since it would be nice to support a read/write data size of 1Mbyte
for the NFS client when mounting the AmazonEFS file service.

Reviewed by:	kib
MFC after:	2 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D10991
2017-06-17 22:24:19 +00:00
Sean Bruno
fa5416a819 Revert r319989 "bnxt(4) Enable LRO support"
This generates startup LORs and panics when adding elements to bridge
devices. I will document further in https://reviews.freebsd.org/D10681

PR:	220073
Submitted by:	dchagin
Reported by:	db
2017-06-17 17:42:52 +00:00
Ed Maste
ea2f16965e arm: set appropriate section flags for .init_pagetable
The arm kernel linker scripts place the .init_pagetable section in .bss,
but .init_pagetable had no section flags set, and so did not match the
expected flags for .bss.

GNU ld silently ignores this case, but lld reports an error:

ld: error: incompatible section flags for .bss
>>> locore.o:(.init_pagetable): 0x0
>>> output section .bss: 0x3

PR:		220055
Submitted by:	mmel, Rafael Espíndola
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
2017-06-17 14:46:14 +00:00
Kevin Lo
2cd325579c - Fix incorrect values in the computation of CCK and OFDM transmit power
for the rtl8188eu chipset
- Rename struct r92c_rom member names: s/channel_plan/reserved5/,
  s/xtal_calib/channel_plan to be compliant with definitions of the efuse
  in vendor hal_pg.h
2017-06-17 14:39:25 +00:00
Michal Meloun
c40a5f8a40 Manually load tunable CPU quirks.
These are needed too early, far before SYSINIT is processed.

Reported by:	zbb
Pointy hat to:	mmel
MFC after:	3 weeks
MFC with: 	r319896
2017-06-17 14:36:25 +00:00
Konstantin Belousov
746e20fdb1 Correct translations between abridged and full x87 tags.
Reported and tested by:	karnajit wangkhem <karnajitw@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-17 11:25:31 +00:00
Ruslan Bukin
5c118142b4 Undefine temporary macro.
This fixes world build.

Sponsored by:	DARPA, AFRL
2017-06-17 07:36:46 +00:00
Alan Cox
87b0ab69a9 Pages that are passed to swap_pager_putpages() should already be fully
dirty.  Assert that they are fully dirty rather than redundantly calling
vm_page_dirty() on them.

Reviewed by:	kib, markj
MFC after:	1 week
X-MFC after:	r319932
2017-06-17 03:05:25 +00:00
Konstantin Belousov
3b115db081 Bump __FreeBSD_version for r320043, struct event 64-bit data.
Sponsored by:	The FreeBSD Foundation
2017-06-17 01:06:48 +00:00
Konstantin Belousov
eb84ca643c Regen. 2017-06-17 00:58:19 +00:00
Konstantin Belousov
2b34e84335 Add abstime kqueue(2) timers and expand struct kevent members.
This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.

To make this useful, data member of the struct kevent must be extended
to 64bit.  Using the opportunity, I also added ext members.  This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.

The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).

Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2).  Compat shims
are provided for both host native and compat32.

Requested by:	bapt
Reviewed by:	bapt, brooks, ngie (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D11025
2017-06-17 00:57:26 +00:00
Konstantin Belousov
f2eb97b2cd Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D11025
2017-06-16 23:41:13 +00:00
Toomas Soome
769bad9f8a Add chain loader support for loader
Implement simple chain loader in loader; this update does add chain command,
taking device or file as argument to load and start new boot loader.

In case of BIOS, the chain will read the boot block to address 0000:7c00 and
jumps on it. In case of UEFI, the chain command is to be used with efi
application, typically stored in EFI System Partition.

The update also does add simple menu entry, if the variable chain_disk is set.
The value of the variable chain_disk is used as argument for chain loading.

Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D5992
2017-06-16 20:08:44 +00:00
Zbigniew Bodek
2d80caa28a Revert change to description introduced in r320002
Currently some ARM platforms implement their own platform_probe_and_attach()
function and other use common routine that calls platform's PLATFORM_ATTACH
method.
Keep the old description to match the preferred way of naming things.

Pointed out by: andrew
2017-06-16 17:31:56 +00:00
Zbigniew Bodek
11a6a330c8 Enhance Armada 38x SoC identification string
Add hw_clockrate and CPU frequency, basing on sample-at-reset
configuration.

Submitted by:	Arnaud Ysmal <arnaud.ysmal@stormshield.eu>
		Marcin Wojtas <mw@semihalf.com>
Obtained from: Stormshield, Semihalf
Sponsored by: Stormshield
Reviewed by: andrew
Differential revision: https://reviews.freebsd.org/D10899
2017-06-16 17:18:29 +00:00
Zbigniew Bodek
9aa2805d01 Minor style improvements to pmap_remap_vm_attr()
Use correct platform_ function name in the comment and remove
redundant tabs.
2017-06-16 13:53:02 +00:00
Zbigniew Bodek
131b07cdcf Fix typo in "Marvell" string
Change Marwell to Marvell

Pointed out by: Ravi Pokala <rpokala@mac.com>
2017-06-16 10:16:24 +00:00
Adrian Chadd
da150e22ac [ar71xx] migrate all of the duplicate configuration out into a shared config file.
This brings the default configurations (drivers, net80211 settings, etc) and some
of the shared configuration into std.AR_MIPS_BASE.  I haven't yet moved the
-current settings (witness, memguard, etc) into it.

This should simplify building a lot of the same test images for my MIPS AP board
development and testing.

This is a work in progress; it's not designed to be perfect!
2017-06-16 00:44:23 +00:00
Sean Bruno
c8f53ac685 bnxt(4): Implement temporary workaround in driver to report supported media
types that are currently unavailable from the firmware.  e.g. 10G, 25G, 50G
& 100G

Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	venkatkumar.duvvuru@broadcom.com
Differential Revision:	https://reviews.freebsd.org/D10816
2017-06-15 21:14:48 +00:00
Sean Bruno
51a621f7c0 bnxt(4) Enable LRO support
iflib - Handle out of order packet delivery from hardware in support of LRO

Out of order updates to rxd's is fixed in r315217. However, it is not
completely fixed.  While refilling the buffers, iflib is not considering
the out of order descriptors. Hence, it is refilling sequentially.
"idx" variable in _iflib_fl_refill routine is incremented sequentially.
By doing refilling sequentially, it will override the SGEs that
are *IN USE* by other connections.  Fix is to maintain a bitmap of
rx descriptors and differentiate the used one with unused one and
refill only at the unused indices.  This patch also fixes a
few bugs in bnxt, related to the same feature.

Submitted by:	bhargava.marreddy@broadcom.com
Reviewed by:	shurd@
Differential Revision:	https://reviews.freebsd.org/D10681
2017-06-15 21:06:03 +00:00
Gleb Smirnoff
2b8e036bfc Plug read(2) and write(2) on listening sockets. 2017-06-15 20:11:29 +00:00
Navdeep Parhar
a8c4fcb9c7 cxgbe(4): Fix per-queue netmap operation.
Do not attempt to initialize netmap queues that are already initialized
or aren't supposed to be initialized.  Similarly, do not free queues
that are not initialized or aren't supposed to be freed.

PR:		217156
Sponsored by:	Chelsio Communications
2017-06-15 19:56:59 +00:00
Sean Bruno
3600bd1e38 Revert r319921 which seems to cause NFS booting assertion panics in
various configurations.

Reported by:	pho@
2017-06-15 17:46:20 +00:00
Konstantin Belousov
e6c44f65d4 Some minor improvements to vnode_pager_generic_putpages().
- Add asserts that the pages to write are dirty.  The last page, if
  partially written, is only required to be dirty, while completely
  written pages should have all dirty bit set.
- Use uintmax_t to print vm_page pindexes.
- Use NULL instead of casted zero.
- Remove if () test which duplicated the loop ending condition.
- Miscellaneous style fixes.

Reviewed by:	alc, markj (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-15 14:34:33 +00:00
Hans Petter Selasky
d4a7f90c0a Use static device numbering instead of dynamic one when creating
mlx4en network interfaces. This prevents infinite unit number growth
typically when the mlx4en driver is used inside virtual machines which
support runtime PCI attach and detach.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2017-06-15 11:56:40 +00:00
Ryan Libby
c74ae2ca93 ddb show socket debugging
Display the mbuf/cluster count for a sockbuf and fix a couple whitespace
issues in the output.

Reviewed by:	jhb, markj (both previous version)
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11062
2017-06-15 04:49:12 +00:00
David C Somayajulu
9efd0ba788 Upgrade STORMFW to 8.30.0.0 and ecore version to 8.30.0.0
Add support for pci deviceID 0x8070 for QLE41xxx product line which
supports 10GbE/25GbE/40GbE

MFC after:5 days
2017-06-15 02:45:43 +00:00
Andriy Gapon
602cf4e4a7 MFV r319951: 8311 ZFS_READONLY is a little too strict
illumos/illumos-gate@2889ec41c0
2889ec41c0

https://www.illumos.org/issues/8311
  Description:
  There was a misunderstanding about the enforcement details of the "Read-only"
  flag introduced for SMB/CIFS compatibility, way back in 2007 in the Sun PSARC
  2007/315 case.
  The original authors thought enforcement of the READONLY flag should work
  similarly as the IMMUTABLE flag. Unfortunately, that enforcement is
  incompatible with the expectations of Windows applications using this feature
  through the SMB service. Applications assume (and the MS File System Algorithms
  MS-FSA confirms they should) that an SMB client can:
  (a) Open an SMB handle on a file with read/write access,
  (b) Set the DOS attributes to include the READONLY flag,
  (c) continue to have write access via that handle.
  This access model is essentially the same as a Unix/POSIX application that
  creates a file (with read/write access), uses fchmod() to change the file mode
  to something not granting write access (i.e. 0444), and then continues to write
  that file using the open handle it got before the mode change.
  Currently, the SMB server works-around this problem in a way that will become
  difficult to maintain as we implement support for SMB3 persistent handles, so
  SMB depends on this fix.
  I've written a test program that can be used to demonstrate this problem, and
  added it to zfs-tests (tests/functional/acl/cifs/cifs_attr_004_pos).
  It currently fails, but will pass when this problem fixed.
  Steps to Reproduce:
    Run the test program on a ZFS file system.
  Expected Results:
    Pass
  Actual Results:
    Fail.

Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Prakash Surya <prakash.surya@delphix.com>
Author: Gordon Ross <gwr@nexenta.com>
MFC after:	2 weeks
2017-06-14 16:55:47 +00:00
Mark Johnston
e804572d2b Fix indentation.
MFC after:	1 week
2017-06-14 16:55:23 +00:00
Andriy Gapon
7d506d0d57 MFV r319948: 5428 provide fts(), reallocarray(), and strtonum()
illumos/illumos-gate@4585130b25
4585130b25

https://www.illumos.org/issues/5428

Most of the upstream change is not applicable to FreeBSD.
Only the renaming of strtonum to zfs_strtonum is relevant to us.
And we already had it partially done.

Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
MFC after:	1 week
2017-06-14 16:42:38 +00:00
Andriy Gapon
b8d341fe26 MFV r319945,r319946: 8264 want support for promoting datasets in libzfs_core
illumos/illumos-gate@a4b8c9aa65
a4b8c9aa65

https://www.illumos.org/issues/8264
  Oddly there is a lzc_clone function, but no lzc_promote function.

Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@kebe.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
MFC after:	1 week
2017-06-14 16:31:36 +00:00
Gleb Smirnoff
7737de9515 Check return value from soaccept().
Coverity:	1376209
2017-06-14 16:13:20 +00:00
John Baldwin
fecabb72e1 Don't try to assign interrupts to a CPU on single-CPU systems.
All interrupts are routed to the sole CPU in that case implicitly.
This is a regression in EARLY_AP_STARTUP.  Previously the 'assign_cpu'
variable was only set when a multi-CPU system finished booting, so
it's value both meant that interrupts could be assigned and that
there was more than one CPU.

PR:		219882
Reported by:	ota@j.email.ne.jp
MFC after:	3 days
2017-06-14 13:34:09 +00:00
Ryan Libby
76f2c27264 ddb show files: fix up file types and whitespace
This makes ddb show files more descriptive and also adjusts the
whitespace to align the columns for non-32-bit architectures.

Reviewed by:	cem (previous version), jhb
Approved by:	markj (mentor)
Differential Revision:	https://reviews.freebsd.org/D11061
2017-06-14 07:46:52 +00:00
Justin Hibbits
37ea599bf7 Actually add the mpc85xx_get_platform_clock() function.
Follow up r319935 by actually committing the mpc85xx_get_platform_clock()
function.  This function was created to facilitate other development, and I
thought I had committed it earlier.

Some blocks depend on the platform clock rather than the system clock.
The System clock is derived from the platform clock as one-half the
platform clock.  Rewrite mpc85xx_get_system_clock() to use the new
function.

Pointy-hat to:	jhibbits
2017-06-14 04:26:37 +00:00
Justin Hibbits
3c804fef82 Use mpc85xx_get_platform_clock() instead of rolling our own.
Now that we have a single source for the platform clock, we don't need to
roll our own in every user.
2017-06-14 04:16:37 +00:00
Mark Johnston
0d48e7e839 Don't call vm_pager_page_unswapped() when writing or deleting a dirty page.
The swap space backing a clean page is released when it is first dirtied,
so there's no need to attempt to release swap space when the page is
already dirty.

Reviewed by:	alc
MFC after:	1 week
2017-06-14 03:55:11 +00:00
Mark Johnston
01bc16bb0e Free the request page if an I/O error occurs while reading from swap.
After such a failure, the page is invalid, so there's point in keeping it
around. Moreover, such pages were not being inserted into the active queue,
making them unreclaimable until a subsequent write or delete made them
valid.

Reported by:	alc
Reviewed by:	alc (previous revision)
MFC after:	1 week
2017-06-14 03:50:02 +00:00
Mark Johnston
cc2fe2b0fb Fix handling of subpage BIO_WRITE and BIO_DELETE requests on swap MDs.
Such requests would previously mark the entire page as valid, which was
incorrect since nothing guaranteed that the page's contents had been
initialized. This change also modifies subpage BIO_DELETEs so that the
entire page is marked dirty, rather than only a subrange. There is no
benefit to creating partially dirty swap pages.

Reviewed by:	alc, kib (previous version)
MFC after:	3 days
2017-06-14 03:45:26 +00:00
Enji Cooper
e6b01ed73a Use nitems(..) when computing max instead of the longhand version of
the same logic

MFC after:	1 month
2017-06-14 02:28:10 +00:00
Sean Bruno
c929e4b05c bnxt: In case of multi queues, have unique name for different IRQs.
Submitted by:	bhargava.marreddy@broadcom.com
Differential Revision:	https://reviews.freebsd.org/D11149
2017-06-13 23:49:49 +00:00
Sean Bruno
f7587db036 Add new sysctl to allow changing of timing of the txq timers.
Add new sysctl to override use of busdma in the driver.

Submitted by:	Drew Gallitin <gallatin@netflix.com>
2017-06-13 23:16:38 +00:00
Sean Bruno
aa8fa07cd8 Plug mbuf leak in the busdma path of iflib.
Submitted by:	Michael Tuexen <tuexen@freebsd.org>
Reported by:	Drew Gallitin <gallatin@netflix.com>
2017-06-13 19:32:23 +00:00
Konstantin Belousov
81c3737d95 Remove stray return.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-06-13 19:02:12 +00:00
Zbigniew Bodek
08d94c6eab Enable HWPMC overflow IRQ on both CPUs in MPIC
This commit enables usage of HWPMC interrupts for the
Marvell SoCs, which use MPIC (Armada38x and ArmadaXP).
Those interrupts require extra unmasking, comparing to
others. Also, in order to process counters per-CPU,
they are masked/unmasked using separate registers' sets
for each core.

Submitted by: Michal Mazur <mkm@semihalf.com>
    	      Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10913
2017-06-13 18:55:21 +00:00
Zbigniew Bodek
6cb40391c4 Fix INVARIANTS debug code in HWPMC
When HWPMC stops sampling, ps_pmc may be freed before samples
are processed. In such situation treat PMC as stopped.
Add "ifdef" to fix build without INVARIANTS code.

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10912
2017-06-13 18:53:56 +00:00
Zbigniew Bodek
95ca4f5a0e Fix event table for Cortex A9.
Removed events 0x8 (INSTR_EXECUTED), 0xE (PC_PROC_RETURN) and
0x13-0x1d not supported on Cortex A9.
Add events 0x68 and 0x6E which replaced 0x8 and 0xE.

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10911
2017-06-13 18:52:39 +00:00
Zbigniew Bodek
ab632d9651 Fix HWPMC interrupt handling in Counting Mode
Additionally:
 - Fix support for Cycle Counter (evsel == 0xFF)
 - Stop and mask interrupts from all counters on init and finish

Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10910
2017-06-13 18:51:23 +00:00
Zbigniew Bodek
9b035ae174 Add detection of CPU class for ARMv6/v7
Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: andrew
Differential revision: https://reviews.freebsd.org/D10909
2017-06-13 18:50:08 +00:00
Zbigniew Bodek
0f7028d7f8 Enable in-band link management on A388-Clearfog board
This patch adds in-band link management over SGMII of the
SFP transceiver on Armada-388-Clearfog board.

Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Netgate
Reviewed by: loos
Differential revision: https://reviews.freebsd.org/D10708
2017-06-13 18:48:51 +00:00
Zbigniew Bodek
824e6d6bc4 Enable neta controller support in ARMADA38X
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D10707
2017-06-13 18:47:42 +00:00
Zbigniew Bodek
a8d7fc4ac1 Introduce Armada 38x/XP network controller support
This patch contains a new driver for the network unit of Marvell
Armada 38x/XP SoCs, called NETA. This support was thoroughly tested
and optimised in terms of stability and performance. Additional
hardware features, like Buffer Management (BM) or Parser and Classifier
(PnC) will be progressively supported as needed.

Submitted by: Fabien Thomas <fabien.thomas@stormshield.eu>
	      Arnaud Ysmal <arnaud.ysmal@stormshield.eu>
	      Zbigniew Bodek <zbb@semihalf.com>
	      Michal Mazur <mkm@semihalf.com>
	      Bartosz Szczepanek <bsz@semihalf.com>
	      Marcin Wojtas <mw@semihalf.com>

Obtained from:	Semihalf
Sponsored by:	Stormshield (main development)
		Netgate (cleanup and upstreaming)
Differential revision: https://reviews.freebsd.org/D10706
2017-06-13 18:46:29 +00:00
Zbigniew Bodek
eb3ffa577b Prevent multiple lock initialization in e6000sw probe
r319886 ("Add the initial support for the Marvell 88E6141
and 88E6341 switches.") unveiled a problem with possible
multiple lock creation. Move its initialization
to the driver attach and for obtaining the switch ID
create a temprorary one, which is immediately destroyed
after the check.

Submitted by: Zbigniew Bodek <zbb@semihalf.com>
	      Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
2017-06-13 18:35:14 +00:00
Alan Cox
4be4fd5d5f Reduce the frequency of hint updates on allocation without incurring
additional allocation overhead.  Previously, blst_meta_alloc() updated the
hint after every successful allocation.  However, these "eager" hint
updates are of no actual benefit if, instead, the "lazy" hint update at
the start of blst_meta_alloc() is generalized to handle all cases where
the number of available blocks is less than the requested allocation.
Previously, the lazy hint update at the start of blst_meta_alloc() only
handled the ALL-FULL case.  (I would also note that this change provides
consistency between blist_alloc() and blist_fill() in that their hint
maintenance is now entirely lazy.)

Eliminate unnecessary checks for terminators in blst_meta_alloc() and
blst_meta_fill() when handling ALL-FREE meta nodes.

Eliminate the field "bl_free" from struct blist.  It is redundant.  Unless
the entire radix tree is a single leaf, the count of free blocks is stored
in the root node.  Instead, provide a function blist_avail() for obtaining
the number of free blocks.

In blst_meta_alloc(), perform a sanity check on the allocation once rather
than repeating it in a loop over the meta node's children.

In blst_leaf_fill(), use the optimized bitcount*() function instead of a
loop to count the blocks being allocated.

Add or improve several comments.

Address some nearby style errors.

Reviewed by:	kib
MFC after:	6 weeks
Differential Revision:	https://reviews.freebsd.org/D11146
2017-06-13 17:49:49 +00:00
Mark Johnston
46514e7d76 Hint at the intended usage for the "ll" field of struct uuid_private.
Discussed with:	kib
MFC after:	1 week
2017-06-13 15:37:04 +00:00
Ian Lepore
ce97f69621 Add missing header dependencies (based on looking in the .depend file).
Reported by:	gjb
2017-06-13 14:07:13 +00:00
Michal Meloun
7bf5720a3f Implement tunable CPU quirks.
These quirks are intended for optimizing CPU performance, not for
applying errata workarounds. Nobody can expect that CPU with unfixed
errata is stable enough to execute the kernel until quirks are applied.

MFC after: 3 weeks
2017-06-13 12:07:18 +00:00
Andrey V. Elsukov
b83aa367a5 Resurrect RTF_RNH_LOCKED flag and restore ability to call rtalloc1_fib()
with acquired RIB lock.

This fixes a possible panic due to trying to acquire RIB rlock when it is
already exclusive locked.

PR:		215963, 215122
MFC after:	1 week
Sponsored by:	Yandex LLC
2017-06-13 10:52:31 +00:00
Ed Maste
993114dd2a Correct bitwise test in mac_bsdextended ugidfw_rule_valid()
PR:		218039
CID:		1008934
Reported by:	Coverity, PVS-Studio
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D10300
2017-06-13 01:17:58 +00:00
Luiz Otavio O Souza
ff2748ec10 Add the initial support for the Marvell 88E6141 and 88E6341 switches.
Right now the driver only supports port VLANs, so make sure
etherswitch_getinfo() return the proper switch capabilities.

Handle the cases where not all ports are in use (that will also require
etherswitch cooperation).

Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-06-13 00:42:23 +00:00
Rick Macklem
6149361b1e Define NFS_MAXXDR as the upper bound on XDR overhead in an NFS RPC.
This definition is a part of the maxiotune2 patch that will be
committed soon. It is being committed separately to ease merging
with the pNFS projects subversion trees.

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D10991
2017-06-12 23:41:20 +00:00
Luiz Otavio O Souza
c3e9b4db8c Update the current version of netmap to bring it in sync with the github
version.

This commit contains mostly refactoring, a few fixes and minor added
functionality.

Submitted by:	Vincenzo Maffione <v.maffione at gmail.com>
Requested by:	many
Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-06-12 22:53:18 +00:00
Konstantin Belousov
b43ce76c77 Add ptrace(PT_GET_SC_ARGS) command to return debuggee' current syscall
arguments.

Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D11080
2017-06-12 21:15:43 +00:00
Konstantin Belousov
f5a077c390 Print unimplemented syscall number to the ctty on SIGSYS, if enabled
by the knob kern.lognosys.

Discussed with:	imp
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
X-Differential revision:	https://reviews.freebsd.org/D11080
2017-06-12 21:11:11 +00:00
Konstantin Belousov
2d88da2f06 Move struct syscall_args syscall arguments parameters container into
struct thread.

For all architectures, the syscall trap handlers have to allocate the
structure on the stack.  The structure takes 88 bytes on 64bit arches
which is not negligible.  Also, it cannot be easily found by other
code, which e.g. caused duplication of some members of the structure
to struct thread already.  The change removes td_dbg_sc_code and
td_dbg_sc_nargs which were directly copied from syscall_args.

The structure is put into the copied on fork part of the struct thread
to make the syscall arguments information correct in the child after
fork.

This move will also allow several more uses shortly.

Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
X-Differential revision:	https://reviews.freebsd.org/D11080
2017-06-12 21:03:23 +00:00
Navdeep Parhar
7127f6f455 cxgbe(4): Do not request an FEC setting that the port does not support.
MFC after:	3 days.
Sponsored by:	Chelsio Communications
2017-06-12 20:55:20 +00:00
Konstantin Belousov
43f41dd393 Make struct syscall_args visible to userspace compilation environment
from machine/proc.h, consistently on all architectures.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
X-Differential revision:	https://reviews.freebsd.org/D11080
2017-06-12 20:53:44 +00:00
Ed Maste
db20c27d28 msdosfs: adjust #ifdefs to be similar to NetBSD
- Add header guards where missing
- Make parts available for use in makefs

Sponsored by:	The FreeBSD Foundation
2017-06-12 20:42:37 +00:00
Mark Johnston
56060a373e Add a helper function for comparing struct uuids.
Submitted by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D11138
2017-06-12 20:14:44 +00:00
Dmitry Chagin
12bbbbb254 Remove the outdated definition.
MFC after:	1 week
2017-06-12 07:48:51 +00:00
Dmitry Chagin
ac1082e590 Since r318735 (ino64 project) the size of the native struct dirent is
equal or greater than the size of Linux struct dirent or struct dirent64.
So, remove LINUX_RECLEN_RATIO magic as useless.
2017-06-12 07:35:59 +00:00
Pedro F. Giffuni
4d7faf1b26 Remove unnecessary, and mismatched, comment.
Submitted by:	Fedor Uporov
2017-06-11 19:09:10 +00:00
Pedro F. Giffuni
b5f0918be5 extfs: fix the build with no UFS_ACL.
Some people may want to drop UFS-style ACLs for slimmer kernels.
Let's just not assume everyone needs ACLs.

Reported by:	bde
Submitted by:	Fedor Uporov
Differential Revision:	https://reviews.freebsd.org/D11145
2017-06-11 19:05:45 +00:00
Konstantin Belousov
fc8929cb29 More accurately handle early EFER restoration on resume.
Do not try to set LMA bit while CPU is still in legacy mode.
Apparently Intel CPUs ignore non-id writes to LMA, while AMD's
(over-)react with #GP.

Reported and tested by:	danfe
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-06-11 14:39:08 +00:00
Ian Lepore
a8f3fd8253 Convert from local code and constants for mac<->phy connection type to new
common fdt helper code.
2017-06-11 00:44:19 +00:00
Ian Lepore
b68031718e Add a driver for the Vitesse/Microsemi VSC8501 PHY. 2017-06-11 00:38:16 +00:00
Ian Lepore
7f153db853 Add some utility functions to help a PHY driver on an FDT-configured
system retrieve its config data from the fdt data.

The properties that are common to all phys are decoded and returned in a
structure.  The fdt node handles for the mac and phy devices are also
returned in the config data struct, so a driver can easily obtain additional
hardware-specific config values from the fdt data.
2017-06-11 00:16:21 +00:00
Ian Lepore
0f75981ae9 Add a set of constants describing the ways a MAC and PHY can be connected.
While the initial need for this is to help support phy drivers which are
configured with FDT data, there is nothing devicetree-specific about the
concept or the names, so they are available for use even on non-FDT systems.

The initial list of connection types comes from the current devicetree
bindings documentation, but values not documented there can be added to
the list in the future as needed, the values could be sorted into a
different order without perturbing FDT code, etc.  The only invariant
is that MII_CONTYPE_UNKNOWN should be first (so it has a value of zero,
so that a con-type variable in a softc, for example, is initialized to
MII_CONTYPE_UNKNOWN by default).
2017-06-10 23:55:13 +00:00
Ian Lepore
f5c49e5c89 Allow building if_ffec as a module. 2017-06-10 23:45:26 +00:00
Ian Lepore
ab3ad5bc81 if_ffec bugfixes related to harvesting of hardware-maintained statistics...
After harvesting the hardware statistics counters and summing them into the
interface stats, properly clear the hardware counters back to zero.  On imx5
and earlier hardware it is necessary to disable collection of stats while
writing zeroes to all the registers.  On imx6 and newer it turns out it's
not even possible to write zeroes, instead you have to toggle a special
"zero everything" control bit in a register.

Count incoming packets with a bad start frame delim as input errors, and
incoming packets dropped due to no fifo space as input drops.

Remove all code related to harvesting the hardware stats less often than
once per second.  It turns out the 32-bit stats registers are backed by
16-bit counters under the hood, and they can easily roll over if you only
harvest them once every 3 seconds like the old code was doing.  Now we just
read all the regs once a second.

The combination of not properly zeroing the stats registers and 16-bit
counters sometimes wrapping between harvest calls resulted in basically
unusable statistics before these changes.
2017-06-10 23:26:25 +00:00
Edward Tomasz Napierala
4c7008824b Switch the example name for variables controlling loading memory images
in /boot/defaults/loader.conf to something that's actually commonly used,
"mdroot".  It's arbitrary, but it's easier to find this way.

MFC after:	2 weeks
2017-06-10 19:05:45 +00:00
Eric Joyner
5b83a512c1 ixl(4)/ixlv(4): Fix some busdma tags and improper map NULL.
Description from Brett:

"The busdma tags used to create mappings for the tx and rx rings did not have
the device's tag as parents, meaning that they did not respect the device's
busdma properties. The other tags used in the driver had their parents set
appropriately.

Also, the dma maps for each buffer in ixl_txeof() were being NULLed after
being unloaded, which is an error because those maps are then reused without
being recreated (I believe this also leaked resources since the maps were not
destroyed). Simply removing the line that sets the maps to NULL gives the
desired behavior. There does not seem to be a similar problem with ixl_rxeof().
Functions to free the tx and rx rings also NULL out the dma maps for each
buffer, but this seems okay because the maps are destroyed and not reused in
this case.

With these fixes, my ixl card seems to be working with the IOMMU enabled."

Submitted by:	Brett Gutstein <bgutstein@rice.edu>
Reviewed by:	erj
Approved by:	Alan Cox <alc@rice.edu>
MFC after:	1 week
2017-06-10 18:56:30 +00:00
Alan Cox
015d7db6b6 Remove an unnecessary field from struct blist. (The comment describing
what this field represented was also inaccurate.)  Suggested by: kib

In r178792, blist_create() grew a malloc flag, allowing M_NOWAIT to be
specified.  However, blist_create() was not modified to handle the
possibility that a malloc() call failed.  Address this omission.

Increase the width of the local variable "radix" to 64 bits.  (This
matches the width of the corresponding field in struct blist.)

Reviewed by:	kib
MFC after:	6 weeks
2017-06-10 16:11:39 +00:00
Luiz Otavio O Souza
595d629c09 Remove an unnecessary variable from the switch softc structure and make the
functions that are used as booleans return real boolean values.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2017-06-09 20:38:18 +00:00
Justin Hibbits
880870b41a Follow up r313841 on powerpc
Close a potential race in reading the CPU dtrace flags, where a thread can
start on one CPU, and partway through retrieving the flags be swapped out,
while another thread traps and sets the CPU_DTRACE_NOFAULT.  This could
cause the first thread to return without handling the fault.

Discussed with:	markj@
2017-06-09 20:26:42 +00:00
Mark Johnston
f67b5de754 Implement pci_disable_device() in the LinuxKPI.
Submitted by:	kmacy
MFC after:	2 weeks
2017-06-09 19:57:27 +00:00
Mark Johnston
465659643b Augment wait queue support in the LinuxKPI.
In particular:
- Don't evaluate event conditions with a sleepqueue lock held, since such
  code may attempt to acquire arbitrary locks.
- Fix the return value for wait_event_interruptible() in the case that the
  wait is interrupted by a signal.
- Implement wait_on_bit_timeout() and wait_on_atomic_t().
- Implement some functions used to test for pending signals.
- Implement a number of wait_event_*() variants and unify the existing
  implementations.
- Unify the mechanism used by wait_event_*() and schedule() to put the
  calling thread to sleep.

This is required to support updated DRM drivers. Thanks to hselasky for
finding and fixing a number of bugs in the original revision.

Reviewed by:	hselasky
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10986
2017-06-09 19:41:12 +00:00
Alan Cox
1bece9d562 Style and comment fixes only.
Reviewed by:	kib
MFC after:	6 weeks
2017-06-09 17:19:27 +00:00
Alan Cox
a7249a6c85 blist_fill()'s return type is too narrow. blist_fill() accepts a 64-bit
quantity as the size of the range to fill, but returns a 32-bit quantity
as the number of blocks that were allocated to fill that range.  This
revision corrects that mismatch.  Currently, swaponsomething() limits
the size of a swap area to prevent arithmetic arithmetic overflow in
other parts of the blist allocator.  That limit has also prevented this
type mismatch from causing problems.

Reviewed by:	kib, markj
MFC after:	6 weeks
Differential Revision:	https://reviews.freebsd.org/D11096
2017-06-09 16:19:24 +00:00
Gleb Smirnoff
438902edbd Fix stat(2) on a listening socket. 2017-06-09 15:54:48 +00:00
Andrew Turner
52453d22f2 Allow the arm64 machine/vfp.h to be included without first including
machine/pcb.h. It he latter is only needed for struct pcb.
2017-06-09 15:47:14 +00:00
Andrew Turner
9a19869a5f Store the read-only thread pointer when scheduling a new thread. This is
not currently set, however we may wish to set it later.
2017-06-09 15:37:17 +00:00
Andriy Gapon
667002fa27 MFV r319741: 8156 dbuf_evict_notify() does not need dbuf_evict_lock
illumos/illumos-gate@dbfd9f9300
dbfd9f9300

https://www.illumos.org/issues/8156
  dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do
  the eviction itself (because the evict thread is not able to keep up).
  This can result in massive lock contention.
  It isn't necessary to hold the lock, because if we make the wrong choice
  occasionally, nothing bad will happen.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	1 week
2017-06-09 15:28:57 +00:00
Andriy Gapon
5f9cf93878 MFV r319739: 8005 poor performance of 1MB writes on certain RAID-Z configurations
illumos/illumos-gate@5b06278253
5b06278253

https://www.illumos.org/issues/8005
  RAID-Z requires that space be allocated in multiples of P+1 sectors,
  because this is the minimum size block that can have the required amount
  of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2
  sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector
  is a unit of 2^ashift bytes, typically 512B or 4KB.
  To satisfy this constraint, the allocation size is rounded up to the
  proper multiple, resulting in up to 3 "pad sectors" at the end of some
  blocks. The contents of these pad sectors are not used, so we do not
  need to read or write these sectors. However, some storage hardware
  performs much worse (around 1/2 as fast) on mostly-contiguous writes
  when there are small gaps of non-overwritten data between the writes.
  Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
  include pad sectors. If writing a pad sector will fill the gap between
  two (required) writes, we will issue the optional zio, thus doubling
  performance. The gap-filling performance improvement was introduced in
  July 2009.
  Writing the optional zio is done by the io aggregation code in
  vdev_queue.c. The problem is that it is also subject to the limit on
  the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
  default 128KB. For a given block, if the amount of data plus padding
  written to a leaf device exceeds zfs_vdev_aggregation_limit, the
  optional zio will not be written, resulting in a ~2x performance
  degradation.
  The problem occurs only for certain values of ashift, compressed block
  size, and RAID-Z configuration (number of parity and data disks). It
  cannot occur with the default recordsize=128KB. If compression is
  enabled, all configurations with recordsize=1MB or larger will be
  impacted to some degree.
  The problem notably occurs with recordsize=1MB, compression=off, with 10
  disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore

Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	10 days
2017-06-09 15:27:22 +00:00
Andriy Gapon
9f141f8d71 MFV r319738: 8155 simplify dmu_write_policy handling of pre-compressed buffers
illumos/illumos-gate@adaec86ad2
adaec86ad2

https://www.illumos.org/issues/8155
  When writing pre-compressed buffers, arc_write() requires that the compression
  algorithm used to compress the buffer matches the compression algorithm
  requested by the zio_prop_t, which is set by dmu_write_policy().
  This makes dmu_write_policy() and its callers a bit more complicated.
  We can simplify this by making arc_write() trust the caller to supply the type
  of pre-compressed buffer that it wants to write, and override the compression
  setting in the zio_prop_t.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after:	10 days
2017-06-09 15:26:03 +00:00
Andriy Gapon
61722264cd remove an unrelated local change from r319746
MFC after:	1 day
X-MFC with:	r319746
2017-06-09 15:21:28 +00:00
Andriy Gapon
ad2b1a296f MFV r319744,r319745: 8269 dtrace stddev aggregation is normalized incorrectly
illumos/illumos-gate@79809f9cf4
79809f9cf4

https://www.illumos.org/issues/8269
  It seems that currently normalization of stddev aggregation is done
  incorrectly.
  We divide both the sum of values and the sum of their squares by the
  normalization factor. But we should divide the sum of squares by the
  normalization factor squared to scale the original values properly.

FreeBSD note: the actual change was committed in r316853, this commit
adds the test files and record merge information.

Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after:	1 week
Sponsored by:	Panzura
2017-06-09 15:16:39 +00:00
Konstantin Belousov
40373cf5b8 Remove msdosfs -o large support.
Its purpose was to translate the values for msdosfs inode numbers,
which is calculated from the msdosfs structures describing the file,
into the range representable by 32bit ino_t.  The translation acted
for filesystems larger than 128Gb, it reserved the range 0xf0000000
(FILENO_FIRST_DYN) to UINT32_MAX and remembered some arbitrary
translation of ino >= FILENO_FIRST_DYN into this range.  It consumed
memory that could be only freed by unmount, and the translation was
not stable across remounts.

With ino_t type extended to 64 bit, there is no such issue and values
can be returned without compaction to 32bit.  That is, for the native
environments, the translation layer is not necessary and adds
significant undeserved code complexity.  For compat ABIs which use
32bit ino_t, the vfs.ino64_trunc_error sysctl provides some measures
to soften the failure mode when inode numbers truncation is not safe.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
2017-06-09 12:06:22 +00:00
Konstantin Belousov
7abe0df223 Enhance vfs.ino64_trunc_error sysctl.
Provide a new mode "2" which returns a special overflow indicator in
the non-representable field instead of the silent truncation (mode
"0") or EOVERFLOW (mode "1").

In particular, the typical use of st_ino to detect hard links with
mode "2" reports false positives, which might be more suitable for
some uses.

Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
2017-06-09 11:17:08 +00:00
Andriy Voskoboinyk
15eaaf082a rtwn: rename module (if_rtwn.ko -> rtwn.ko) to match module name + drop
manpage link.

Reported by:	mav, hselasky
2017-06-09 07:08:58 +00:00