Commit graph

37183 commits

Author SHA1 Message Date
Scott Long
bc451ea420 Revert r355021. In my haste to grep for Giant, I missed that it was in
conditional ifdefs for this driver.  We will consider removing those ifdefs
in the future.

Reported by:	imp
2019-11-26 17:25:49 +00:00
Alexander Motin
d93f6d3af3 Add some IDs of Intel Wildcat Point-LP.
MFC after:	1 week
2019-11-26 15:52:19 +00:00
Navdeep Parhar
e3338dee08 cxgbe(4): Allow the driver to specify multiple FECs that the firmware
should try in order to link up with the peer.

Various FEC variables within the driver can now have multiple bits set
instead of being powers of 2.  0 and -1 in the user knobs still mean no
FEC and auto (driver decides) respectively for backward compatibility,
but no-FEC and auto now have their own bits in the internal
representation.  There is a new bit that can be set to request the FEC
recommended by the cable/transceiver module.

Add sysctls to display link related capabilities of the local side as
well as the link partner.

Note that all this needs a new firmware and the documentation for the
driver FEC knobs will be updated after that firmware is added to the
driver.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-11-26 05:54:25 +00:00
Ed Maste
98b49d8e1a cfi: #include <limits.h> for ULONG_MAX after r355101
Reported by:	rlibby
MFC with:	r355101
2019-11-26 02:26:34 +00:00
Ed Maste
985d08fe52 cfi: check for inter overflow in cfi_devioctl
Reported by:    Pietro Oliva
Reviewed by:	markj
MFC after:	3 days
Security:	Possible OOB read in root-only ioctl
Sponsored by:	The FreeBSD Foundation
2019-11-25 21:21:37 +00:00
Alexander Motin
62ba8e8469 Report XLAT0 register for completeness. 2019-11-25 01:00:51 +00:00
Navdeep Parhar
515a40d5d9 cxgbe(4): sysctl to reset the temperature/voltage sensor.
# sysctl dev.<nexus>.<inst>.reset_sensor=1
# sysctl dev.t6nex.0.reset_sensor=1

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-11-24 16:40:54 +00:00
Warner Losh
dfdbb32093 Don't need giant for these drivers dev nodes.
Also, Giant isn't required to busy / unbusy a device, so drop that too while I'm
here. It's not done elsewhere in the tree and in the future will likely be
handled by a node lock to ensure consistency. Leave Giant in place for attach
and removing childing, as that's actually still needed, even if imperfect.

Remove stale comment about contigmalloc taking Giant and calling w/o the lock
held. Neither of these is still true.
2019-11-24 15:37:19 +00:00
Warner Losh
96b506a57c Hoist locking giant back up into the ioctl handler
Move the locking back into the ioctl handler. This "fixes" the race where we hve
a hot plug event just after the dropping of Giant in pci_find_dbsf, assuming the
driver doesn't then call anything that drops and picks up Giant again... It's a
little safer since don't think it doesn't, but we lack the tools to know for
sure.
2019-11-24 15:37:14 +00:00
Warner Losh
57aa9163fd Fix leak in state machine for commands.
When we get a device departed message from the firmware, we send a TARGET_REST
to the device to let the firmware know we're done and as part of the recovery
process. This will abort all the commands. While the documentation says the IOC
is responsible for writing the completion message for all the commands pending
with an aborted status, we sometimes have queued commands for the target that
haven't been completed so are in the INQUEUE state. So, when we later complete
the pending CCB as aborted, these commands are freed and we hit the "state not
busy" panic.

Elsewhere where we dequeue commands, we move the state to BUSY from INQUEUE. Do
that here as well. In talking to Ken, Scott and Justin, they recommended a
series of tests to see if this is 100% safe. Those tests are ongoing, but
preliminary tests suggest this is safe as we see no duplicate completions when
we hit this case at work. We have a machine that has a dodgy powersupply which
usually doesn't apply power to a few drives, but sometimes does when the machine
is under heavy load so we get a rash of the connect / disconnect messages over
half an hour. Without this change, we'd see state not busy panic. With this
change, the drives just annoyingly come and go without affecting the rest of the
machine, but without a complete error injection test suite, it's hard to know if
all edge cases are now covered or not.

Discussed with: scottl, ken, gibbs
2019-11-24 15:24:05 +00:00
Navdeep Parhar
e56d731b7d cxgbe(4): Update the firmware interface header.
This allows the driver to be updated for the next firmware without
waiting for it to be released.

MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2019-11-24 05:37:28 +00:00
Warner Losh
dd615d09c4 Push Giant down one layer
The /dev/pci device doesn't need GIANT, per se. However, one routine
that it calls, pci_find_dbsf implicitly does. It walks a list that can
change when PCI scans a new bus. With hotplug, this means we could
have a race with that scanning. To prevent that, take out Giant around
scanning the list.

However, given that we have places in the tree that drop giant, if
held when we call into them, the whole use of Giant to protect newbus
may be less effective that we desire, so add a comment about why we're
talking it out, and we'll address the issue when we lock newbus with
something other than Giant.
2019-11-23 23:43:52 +00:00
Conrad Meyer
b6db1cc710 random(4): De-export random_sources list
The internal datastructures do not need to be visible outside of
random_harvestq, and this helps ensure they are not misused.

No functional change.

Approved by:	csprng(delphij, markm)
Differential Revision:	https://reviews.freebsd.org/D22485
2019-11-22 20:24:15 +00:00
Scott Long
02d4535d2d Mark hpt27xx for removal in 13.0; all CAM drivers will be Giant-free by then.
Relnotes:	yes
2019-11-22 20:23:22 +00:00
Conrad Meyer
d7a23f9f6b random(4): Use ordinary sysctl definitions
There's no need to dynamically populate them; the SYSCTL_ macros take care
of load/unload appropriately already (and random_harvestq is 'standard' and
cannot be unloaded anyway).

Approved by:	csprng(delphij, markm)
Differential Revision:	https://reviews.freebsd.org/D22484
2019-11-22 20:22:29 +00:00
Conrad Meyer
f19de0a945 random(4): Abstract loader entropy injection
Break random_harvestq_prime up into some logical subroutines.  The goal
is that it becomes easier to add other early entropy sources.

While here, drop pre-12.0 compatibility logic.  loader default configuration
should preload the file as expeced since 12.0.

Approved by:	csprng(delphij, markm)
Differential Revision:	https://reviews.freebsd.org/D22482
2019-11-22 20:20:37 +00:00
Conrad Meyer
92ebf15da5 random(4): Remove unused definitions
Approved by:	csprng(gordon, markm)
Differential Revision:	https://reviews.freebsd.org/D22481
2019-11-22 20:18:07 +00:00
Conrad Meyer
cb285f7c7c random/ivy: Provide mechanism to read independent seed values from rdrand
On x86 platforms with the intrinsic, rdrand is a deterministic bit generator
(AES-CTR) seeded from an entropic source.  On x86 platforms with rdseed, it
is something closer to the upstream entropic source.  (There is more nuance;
a block diagram is provided in [1].)

On devices with rdrand and without rdseed, there is no good intrinsic for
acecssing the good entropic soure directly.  However, the DRBG is guaranteed
to reseed every 8 kB on these platforms.  As a conservative option, on such
hardware we can read an extra 7.99kB samples every time we want a sample
from an independent seed.

As one can imagine, this drastically slows the effective read rate of
RDRAND (a factor of 1024 on amd64 and 2048 on ia32).  Microbenchmarks on AMD
Zen (has RDSEED) show an RDRAND rate of 25 MB/s and Intel Haswell (no
RDSEED) show RDRAND of 170 MB/s.  This would reduce the read rate on Haswell
to ~170 kB/s (at 100% CPU).  random(4)'s harvestq thread periodically
"feeds" from pure sources in amounts of 128-1024 bytes.  On Haswell,
enabling this feature increases the CPU time of RDRAND in each "feed" from
approximately 0.7-6 µs to 0.7-6 ms.

Because there is some performance penalty to this more conservative option,
a knob is provided to enable the change.  The change does not affect
platforms with RDSEED.

[1]: https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide#inpage-nav-4-2

Approved by:	csprng(delphij, markm)
Differential Revision:	https://reviews.freebsd.org/D22455
2019-11-22 19:30:31 +00:00
Scott Long
8823960b8d Schedule the trm(4) driver for removal. It relies on Giant and thus has
required compat shims in CAM for 12 years.

Relnotes:	yes
2019-11-22 18:50:53 +00:00
John Baldwin
bddf73433e NIC KTLS for Chelsio T6 adapters.
This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters.
Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE
connections.

NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment
the encrypted TLS frames output by the crypto engine.  Instead, the
TOE is placed into a special setup to permit "dummy" connections to be
associated with regular sockets using KTLS.  This permits using the
TOE to segment the encrypted TLS records.  However, this approach does
have some limitations:

1) Regular TOE sockets cannot be used when the TOE is in this special
   mode.  One can use either TOE and TOE-based KTLS or NIC KTLS, but
   not both at the same time.

2) In NIC KTLS mode, the TOE is only able to accept a per-connection
   timestamp offset that varies in the upper 4 bits.  Put another way,
   only connections whose timestamp offset has the 28 lower bits
   cleared can use NIC KTLS and generate correct timestamps.  The
   driver will refuse to enable NIC KTLS on connections with a
   timestamp offset with any of the lower 28 bits set.  To use NIC
   KTLS, users can either disable TCP timestamps by setting the
   net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the
   tcp_new_ts_offset() function to clear the lower 28 bits of the
   generated offset.

3) Because the TCP segmentation relies on fields mirrored in a TCB in
   the TOE, not all fields in a TCP packet can be sent in the TCP
   segments generated from a TLS record.  Specifically, for packets
   containing TCP options other than timestamps, the driver will
   inject an "empty" TCP packet holding the requested options (e.g. a
   SACK scoreboard) along with the segments from the TLS record.
   These empty TCP packets are counted by the
   dev.cc.N.txq.M.kern_tls_options sysctls.

Unlike TOE TLS which is able to buffer encrypted TLS records in
on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS
records for retransmit requests as well as non-retransmit requests
that do not include the start of a TLS record but do include the
trailer.  The T6 NIC KTLS code tries to optimize some of the cases for
requests to transmit partial TLS records.  In particular it attempts
to minimize sending "waste" bytes that have to be given as input to
the crypto engine but are not needed on the wire to satisfy mbufs sent
from the TCP stack down to the driver.

TCP packets for TLS requests are broken down into the following
classes (with associated counters):

- Mbufs that send an entire TLS record in full do not have any waste
  bytes (dev.cc.N.txq.M.kern_tls_full).

- Mbufs that send a short TLS record that ends before the end of the
  trailer (dev.cc.N.txq.M.kern_tls_short).  For sockets using AES-CBC,
  the encryption must always start at the beginning, so if the mbuf
  starts at an offset into the TLS record, the offset bytes will be
  "waste" bytes.  For sockets using AES-GCM, the encryption can start
  at the 16 byte block before the starting offset capping the waste at
  15 bytes.

- Mbufs that send a partial TLS record that has a non-zero starting
  offset but ends at the end of the trailer
  (dev.cc.N.txq.M.kern_tls_partial).  In order to compute the
  authentication hash stored in the trailer, the entire TLS record
  must be sent as input to the crypto engine, so the bytes before the
  offset are always "waste" bytes.

In addition, other per-txq sysctls are provided:

- dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq
  using AES-CBC.

- dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq
  using AES-GCM.

- dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to
  compensate for the TOE engine not being able to set FIN on the last
  segment of a TLS record if the TLS record mbuf had FIN set.

- dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this
  txq including full, short, and partial records.

- dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header
  and payload) sent for TLS record requests.

- dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS
  record requests.

To enable NIC KTLS with T6, set the following tunables prior to
loading the cxgbe(4) driver:

hw.cxgbe.config_file=kern_tls
hw.cxgbe.kern_tls=1

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D21962
2019-11-21 19:30:31 +00:00
Ian Lepore
e3c42ad809 Rewrite iicdev_writeto() to use a single buffer and a single iic_msg, rather
than effectively doing scatter/gather IO with a pair of iic_msgs that direct
the controller to do a single transfer with no bus STOP/START between the
two buffers.  It turns out we have multiple i2c hardware drivers that don't
honor the NOSTOP and NOSTART flags; sometimes they just try to do the
transfers anyway, creating confusing failures or leading to corrupted data.
2019-11-21 19:13:05 +00:00
Hans Petter Selasky
c4e11f2231 Add USB ID for Diamond Multimedia BVU195 Display Link device.
Submitted by:	darius@dons.net.au
PR:		242128
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-11-21 16:42:25 +00:00
Gleb Smirnoff
71f0077631 Remove sio(4).
It had been disconnected from build in r181233 in 2008.

Reviewed by:	imp
2019-11-21 01:24:49 +00:00
Conrad Meyer
c41faf5591 random/ivy: Trivial refactoring
It is clearer to me to return success/error (true/false) instead of some
retry count linked to the inline assembly implementation.

No functional change.

Approved by:	core(csprng) => csprng(markm)
Differential Revision:	https://reviews.freebsd.org/D22454
2019-11-20 19:55:43 +00:00
Andriy Gapon
97d8f008af hyperv/storvsc: stash a pointer to hv_storvsc_request in ccb
A SIM-private field is used for that.
The pointer can be useful when examining a state of a queued ccb.
E.g., a ccb on a da_softc.pending_ccbs.

MFC after:	2 weeks
2019-11-19 07:20:59 +00:00
Alexander Motin
7280125e81 Add ioat_get_domain() to ioat(4) KPI.
This allows NUMA-aware consumers to reduce inter-domain traffic.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-11-19 02:09:04 +00:00
Alexander Motin
f0dd6a1787 Call bus_dma_dmar_set_buswide(9) added in r354830.
PLX NTB sends translated DMA requests not only from itsels, but from all
slots and functions of its bus.  By default DMAR blocks those additional.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-11-19 02:03:10 +00:00
Konstantin Belousov
fa83f68917 Add x86 msr tweak KPI.
Use the KPI to tweak MSRs in mitigation code.

Reviewed by:	markj, scottl
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22431
2019-11-18 20:53:57 +00:00
Scott Long
e372160177 TSX Asynchronous Abort mitigation for Intel CVE-2019-11135.
This CVE has already been announced in FreeBSD SA-19:26.mcu.

Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.

Control knobs are:
machdep.mitigations.taa.enable:
        0 - no software mitigation is enabled
        1 - attempt to disable TSX
        2 - use the VERW mitigation
        3 - automatically select the mitigation based on processor
	    features.

machdep.mitigations.taa.state:
        inactive        - no mitigation is active/enabled
        TSX disable     - TSX is disabled in the bare metal CPU as well as
                        - any virtualized CPUs
        VERW            - VERW instruction clears CPU buffers
	not vulnerable	- The CPU has identified itself as not being
			  vulnerable

Nothing in the base FreeBSD system uses TSX.  However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.

Reviewed by:	emaste, imp, kib, scottph
Sponsored by:	Intel
Differential Revision:	22374
2019-11-16 00:26:42 +00:00
Alexander Motin
348efb140e Initialize *comp_update with valid value.
I've noticed that sometimes with enabled DMAR initial write from device
to this address is somehow getting delayed, triggering assertion due to
zero default being invalid.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-11-15 23:01:09 +00:00
Alexander Motin
1f4a469d36 Cleanup address range checks in ioat(4).
- Deduce allowed address range for bus_dma(9) from the hardware version.
Different versions (CPU generations) have different documented limits.
 - Remove difference between address ranges for src/dst and crc.  At least
docs for few recent generations of CPUs do not mention anything like that,
while older are already limited with above limits.
 - Remove address assertions from arguments.  While I do not think the
addresses out of allowed ranges should realistically happen there due to
the platforms physical address limitations, there is now bus_dma(9) to
make sure of that, preferably via IOMMU.
 - Since crc now has the same address range as src/dst, remove crc_dmamap,
reusing dst2_dmamap instead.

Discussed with:	cem
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-11-15 22:47:59 +00:00
Navdeep Parhar
5877e649f0 cxgbev(4): Catch up with the pciids in the PF driver.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2019-11-15 18:48:14 +00:00
Gleb Smirnoff
782b97cb80 Fix regression from r353841: ctx.rc needs to be initialized,
otherwise driver might silently fail to initialize.

Pointy hat to:	glebius
2019-11-15 18:02:37 +00:00
Josh Paetzel
4c6bf7c398 Fix build with GCC
Fix suggested by:	jhb, scottl
Sponsored by:	Panzura
2019-11-15 01:07:39 +00:00
Josh Paetzel
052e12a508 Add the pvscsi driver to the tree.
This driver allows to usage of the paravirt SCSI controller
in VMware products like ESXi.  The pvscsi driver provides a
substantial performance improvement in block devices versus
the emulated mpt and mps SCSI/SAS controllers.

Error handling in this driver has not been extensively tested
yet.

Submitted by:	vbhakta@vmware.com
Relnotes:	yes
Sponsored by:	VMware, Panzura
Differential Revision:	D18613
2019-11-14 23:31:20 +00:00
Alexander Motin
3eb70a09f4 Pass more reasonable WAIT flags to bus_dma(9) calls.
MFC after:	2 weeks
2019-11-14 04:39:48 +00:00
Alexander Motin
7f215e071e Make ntb(4) send bus_get_dma_tag() requests to parent buses passing real
bus' child pointers instead of grandchilds.

DMAR does not like requests from devices not parented directly by PCI.

MFC after:	2 weeks
2019-11-14 04:34:58 +00:00
Scott Long
2058e7dbde Stop the VESA driver from whining loudly in the dmesg during boot on
systems that use EFI instead of BIOS.
2019-11-13 15:31:31 +00:00
John Baldwin
a1b2b6e184 Create a file to hold shared routines for dealing with T6 key contexts.
ccr(4) and TLS support in cxgbe(4) construct key contexts used by the
crypto engine in the T6.  This consolidates some duplicated code for
helper functions used to build key contexts.

Reviewed by:	np
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D22156
2019-11-13 00:53:45 +00:00
Konstantin Belousov
c08973d09c Workaround for Intel SKL002/SKL012S errata.
Disable the use of executable 2M page mappings in EPT-format page
tables on affected CPUs.  For bhyve virtual machines, this effectively
disables all use of superpage mappings on affected CPUs.  The
vm.pmap.allow_2m_x_ept sysctl can be set to override the default and
enable mappings on affected CPUs.

Alternate approaches have been suggested, but at present we do not
believe the complexity is warranted for typical bhyve's use cases.

Reviewed by:	alc, emaste, markj, scottl
Security:	CVE-2018-12207
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21884
2019-11-12 18:01:33 +00:00
D Scott Phillips
178d6bc844 nvdimm(4): Fix various problems when the using the second label index block
struct nvdimm_label_index is dynamically sized, with the `free`
bitfield expanding to hold `slot_cnt` entries. Fix a few places
where we were treating the struct as though it had a fixed sized.

Reviewed by:	cem
Approved by:	scottl (mentor)
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22253
2019-11-12 16:24:37 +00:00
D Scott Phillips
cf8b104f04 nvdimm(4): Only expose namespaces for accessible data SPAs
Apply the same user accessible filter to namespaces as is applied
to full-SPA devices. Also, explicitly filter out control region
SPAs which don't expose the nvdimm data area.

Reviewed by:	cem
Approved by:	scottl (mentor)
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D21987
2019-11-12 15:50:30 +00:00
Alexander Motin
028d96899b Add compact scraptchpad protocol for ntb_transport(4).
Previously ntb_transport(4) required at least 6 scratchpad registers,
plus 2 more for each additional memory window.  That is too much for some
configurations, where several drivers have to share resources of the same
NTB hardware.  This patch introduces new compact version of the protocol,
requiring only 3 scratchpad registers, plus one more for each additional
memory window.  The optimization is based on fact that neither of version,
number of windows or number of queue pairs really need more then one byte
each, and window sizes of 4GB are not very useful now.  The new protocol
is activated automatically when the configuration is low on scratchpad
registers, or it can be activated explicitly with loader tunable.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-11-10 03:37:45 +00:00
Alexander Motin
7aafa7c368 Allow splitting PLX NTB BAR2 into several memory windows.
Address Lookup Table (A-LUT) being enabled allows to specify separate
translation for each 1/128th or 1/256th of the BAR2.  Previously it was
used only to limit effective window size by blocking access through some
of A-LUT elements.  This change allows A-LUT elements to also point
different memory locations, providing to upper layers several (up to 128)
independent memory windows.  A-LUT hardware allows even more flexible
configurations than this, but NTB KPI have no way to manage that now.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-11-10 03:24:53 +00:00
Emmanuel Vadot
306e46eb1e generic_ehci_fdt: Fix compile when EXT_RESOURCES isn't present 2019-11-09 22:25:45 +00:00
Michal Meloun
124a91ac18 Implement support for (soft)linked clocks.
This kind of clock nodes represent temporary placeholder for clocks
defined later in boot process. Also, these are necessary to break
circular dependencies occasionally occurring in complex clock graphs.

MFC after: 3 weeks
2019-11-08 18:57:41 +00:00
Navdeep Parhar
43b5712444 cxgbe(4): Query Vdd from the firmware if its last known value is 0.
TVSENSE may not be ready by the time t4_fw_initialize returns and the
firmware returns 0 if the driver asks for the Vdd before the sensor is
ready.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-11-08 01:13:12 +00:00
Mark Johnston
1903c60041 iwm: Sync device initialization and reset code with iwlwifi.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-11-07 23:39:17 +00:00
Mark Johnston
666c8655f2 iwm: Implement support for scans with "adaptive" dwell time.
This is required by 9000-series firmware.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-11-07 23:39:04 +00:00
Mark Johnston
c513f15bf0 iwm: Use the default station for all transmits.
This is what iwlwifi seems to do, and the previous behaviour triggered
firmware panics during transmit on a 9560.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-11-07 23:38:49 +00:00