As part of my journey to make it easy to determine what's relying on tty
bits, remove a couple more. Some of these just outright didn't need it,
while others did rely on <sys/tty.h> pollution for mutex headers.
improvements, the ECN bits need to be exposed to the TCP SYNcache.
This change is a minimal modification to the function headers, without any
functional change intended.
Submitted by: Richard Scheffenegger
Reviewed by: rgrimes@, rrs@, tuexen@
Differential Revision: https://reviews.freebsd.org/D22436
in ofw_gpiobus_probe() return BUS_PROBE_DEFAULT rather than 0; we are not
the only possible driver to handle this device, we're just slightly better
than the base gpiobus (which probes at BUS_PROBE_GENERIC).
In the time since this code was first written, the gpio controller bindings
aquired the concept of a "hog" node which could be used to preset one or
more gpio pins as input or output at a specified level. This change doesn't
fully implement the hogging concept, it just filters out hog nodes when
instantiating child devices by scanning for child nodes in the fdt data.
The whole concept of having child nodes under the controller node is not
supported by the standard bindings, and appears to be a freebsd extension,
probably left over from the days when we had no support for cross-tree
phandle references in the fdt data.
I have no good explanation why it happens, but I found that in B2B mode
at least Xeon v4 NTB leaks accesses to its configuration memory at BAR0
originated from the link side to its host side. DMAR predictably blocks
those, making access to remote scratchpad registers in B2B mode impossible.
This change creates identity mapping in DMAR covering the BAR0 addresses,
making the NTB work fine with DMAR enabled. It seems like allowing single
4KB range at 32KB offset may be enough, but I don't see a reason to be so
specific.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
This was inherited from iwlwifi, which drives devices supported by both
iwn(4) and iwm(4) in FreeBSD. In iwm(4) _mvm is meaningless, so remove
it. OpenBSD made the same change a long time ago. No functional change
intended.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
should try in order to link up with the peer.
Various FEC variables within the driver can now have multiple bits set
instead of being powers of 2. 0 and -1 in the user knobs still mean no
FEC and auto (driver decides) respectively for backward compatibility,
but no-FEC and auto now have their own bits in the internal
representation. There is a new bit that can be set to request the FEC
recommended by the cable/transceiver module.
Add sysctls to display link related capabilities of the local side as
well as the link partner.
Note that all this needs a new firmware and the documentation for the
driver FEC knobs will be updated after that firmware is added to the
driver.
MFC after: 1 week
Sponsored by: Chelsio Communications
Also, Giant isn't required to busy / unbusy a device, so drop that too while I'm
here. It's not done elsewhere in the tree and in the future will likely be
handled by a node lock to ensure consistency. Leave Giant in place for attach
and removing childing, as that's actually still needed, even if imperfect.
Remove stale comment about contigmalloc taking Giant and calling w/o the lock
held. Neither of these is still true.
Move the locking back into the ioctl handler. This "fixes" the race where we hve
a hot plug event just after the dropping of Giant in pci_find_dbsf, assuming the
driver doesn't then call anything that drops and picks up Giant again... It's a
little safer since don't think it doesn't, but we lack the tools to know for
sure.
When we get a device departed message from the firmware, we send a TARGET_REST
to the device to let the firmware know we're done and as part of the recovery
process. This will abort all the commands. While the documentation says the IOC
is responsible for writing the completion message for all the commands pending
with an aborted status, we sometimes have queued commands for the target that
haven't been completed so are in the INQUEUE state. So, when we later complete
the pending CCB as aborted, these commands are freed and we hit the "state not
busy" panic.
Elsewhere where we dequeue commands, we move the state to BUSY from INQUEUE. Do
that here as well. In talking to Ken, Scott and Justin, they recommended a
series of tests to see if this is 100% safe. Those tests are ongoing, but
preliminary tests suggest this is safe as we see no duplicate completions when
we hit this case at work. We have a machine that has a dodgy powersupply which
usually doesn't apply power to a few drives, but sometimes does when the machine
is under heavy load so we get a rash of the connect / disconnect messages over
half an hour. Without this change, we'd see state not busy panic. With this
change, the drives just annoyingly come and go without affecting the rest of the
machine, but without a complete error injection test suite, it's hard to know if
all edge cases are now covered or not.
Discussed with: scottl, ken, gibbs
This allows the driver to be updated for the next firmware without
waiting for it to be released.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
The /dev/pci device doesn't need GIANT, per se. However, one routine
that it calls, pci_find_dbsf implicitly does. It walks a list that can
change when PCI scans a new bus. With hotplug, this means we could
have a race with that scanning. To prevent that, take out Giant around
scanning the list.
However, given that we have places in the tree that drop giant, if
held when we call into them, the whole use of Giant to protect newbus
may be less effective that we desire, so add a comment about why we're
talking it out, and we'll address the issue when we lock newbus with
something other than Giant.
The internal datastructures do not need to be visible outside of
random_harvestq, and this helps ensure they are not misused.
No functional change.
Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22485
There's no need to dynamically populate them; the SYSCTL_ macros take care
of load/unload appropriately already (and random_harvestq is 'standard' and
cannot be unloaded anyway).
Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22484
Break random_harvestq_prime up into some logical subroutines. The goal
is that it becomes easier to add other early entropy sources.
While here, drop pre-12.0 compatibility logic. loader default configuration
should preload the file as expeced since 12.0.
Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22482
On x86 platforms with the intrinsic, rdrand is a deterministic bit generator
(AES-CTR) seeded from an entropic source. On x86 platforms with rdseed, it
is something closer to the upstream entropic source. (There is more nuance;
a block diagram is provided in [1].)
On devices with rdrand and without rdseed, there is no good intrinsic for
acecssing the good entropic soure directly. However, the DRBG is guaranteed
to reseed every 8 kB on these platforms. As a conservative option, on such
hardware we can read an extra 7.99kB samples every time we want a sample
from an independent seed.
As one can imagine, this drastically slows the effective read rate of
RDRAND (a factor of 1024 on amd64 and 2048 on ia32). Microbenchmarks on AMD
Zen (has RDSEED) show an RDRAND rate of 25 MB/s and Intel Haswell (no
RDSEED) show RDRAND of 170 MB/s. This would reduce the read rate on Haswell
to ~170 kB/s (at 100% CPU). random(4)'s harvestq thread periodically
"feeds" from pure sources in amounts of 128-1024 bytes. On Haswell,
enabling this feature increases the CPU time of RDRAND in each "feed" from
approximately 0.7-6 µs to 0.7-6 ms.
Because there is some performance penalty to this more conservative option,
a knob is provided to enable the change. The change does not affect
platforms with RDSEED.
[1]: https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide#inpage-nav-4-2
Approved by: csprng(delphij, markm)
Differential Revision: https://reviews.freebsd.org/D22455
This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters.
Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE
connections.
NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment
the encrypted TLS frames output by the crypto engine. Instead, the
TOE is placed into a special setup to permit "dummy" connections to be
associated with regular sockets using KTLS. This permits using the
TOE to segment the encrypted TLS records. However, this approach does
have some limitations:
1) Regular TOE sockets cannot be used when the TOE is in this special
mode. One can use either TOE and TOE-based KTLS or NIC KTLS, but
not both at the same time.
2) In NIC KTLS mode, the TOE is only able to accept a per-connection
timestamp offset that varies in the upper 4 bits. Put another way,
only connections whose timestamp offset has the 28 lower bits
cleared can use NIC KTLS and generate correct timestamps. The
driver will refuse to enable NIC KTLS on connections with a
timestamp offset with any of the lower 28 bits set. To use NIC
KTLS, users can either disable TCP timestamps by setting the
net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the
tcp_new_ts_offset() function to clear the lower 28 bits of the
generated offset.
3) Because the TCP segmentation relies on fields mirrored in a TCB in
the TOE, not all fields in a TCP packet can be sent in the TCP
segments generated from a TLS record. Specifically, for packets
containing TCP options other than timestamps, the driver will
inject an "empty" TCP packet holding the requested options (e.g. a
SACK scoreboard) along with the segments from the TLS record.
These empty TCP packets are counted by the
dev.cc.N.txq.M.kern_tls_options sysctls.
Unlike TOE TLS which is able to buffer encrypted TLS records in
on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS
records for retransmit requests as well as non-retransmit requests
that do not include the start of a TLS record but do include the
trailer. The T6 NIC KTLS code tries to optimize some of the cases for
requests to transmit partial TLS records. In particular it attempts
to minimize sending "waste" bytes that have to be given as input to
the crypto engine but are not needed on the wire to satisfy mbufs sent
from the TCP stack down to the driver.
TCP packets for TLS requests are broken down into the following
classes (with associated counters):
- Mbufs that send an entire TLS record in full do not have any waste
bytes (dev.cc.N.txq.M.kern_tls_full).
- Mbufs that send a short TLS record that ends before the end of the
trailer (dev.cc.N.txq.M.kern_tls_short). For sockets using AES-CBC,
the encryption must always start at the beginning, so if the mbuf
starts at an offset into the TLS record, the offset bytes will be
"waste" bytes. For sockets using AES-GCM, the encryption can start
at the 16 byte block before the starting offset capping the waste at
15 bytes.
- Mbufs that send a partial TLS record that has a non-zero starting
offset but ends at the end of the trailer
(dev.cc.N.txq.M.kern_tls_partial). In order to compute the
authentication hash stored in the trailer, the entire TLS record
must be sent as input to the crypto engine, so the bytes before the
offset are always "waste" bytes.
In addition, other per-txq sysctls are provided:
- dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq
using AES-CBC.
- dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq
using AES-GCM.
- dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to
compensate for the TOE engine not being able to set FIN on the last
segment of a TLS record if the TLS record mbuf had FIN set.
- dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this
txq including full, short, and partial records.
- dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header
and payload) sent for TLS record requests.
- dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS
record requests.
To enable NIC KTLS with T6, set the following tunables prior to
loading the cxgbe(4) driver:
hw.cxgbe.config_file=kern_tls
hw.cxgbe.kern_tls=1
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21962
than effectively doing scatter/gather IO with a pair of iic_msgs that direct
the controller to do a single transfer with no bus STOP/START between the
two buffers. It turns out we have multiple i2c hardware drivers that don't
honor the NOSTOP and NOSTART flags; sometimes they just try to do the
transfers anyway, creating confusing failures or leading to corrupted data.
It is clearer to me to return success/error (true/false) instead of some
retry count linked to the inline assembly implementation.
No functional change.
Approved by: core(csprng) => csprng(markm)
Differential Revision: https://reviews.freebsd.org/D22454
A SIM-private field is used for that.
The pointer can be useful when examining a state of a queued ccb.
E.g., a ccb on a da_softc.pending_ccbs.
MFC after: 2 weeks
PLX NTB sends translated DMA requests not only from itsels, but from all
slots and functions of its bus. By default DMAR blocks those additional.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Use the KPI to tweak MSRs in mitigation code.
Reviewed by: markj, scottl
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22431
This CVE has already been announced in FreeBSD SA-19:26.mcu.
Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.
Control knobs are:
machdep.mitigations.taa.enable:
0 - no software mitigation is enabled
1 - attempt to disable TSX
2 - use the VERW mitigation
3 - automatically select the mitigation based on processor
features.
machdep.mitigations.taa.state:
inactive - no mitigation is active/enabled
TSX disable - TSX is disabled in the bare metal CPU as well as
- any virtualized CPUs
VERW - VERW instruction clears CPU buffers
not vulnerable - The CPU has identified itself as not being
vulnerable
Nothing in the base FreeBSD system uses TSX. However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.
Reviewed by: emaste, imp, kib, scottph
Sponsored by: Intel
Differential Revision: 22374
I've noticed that sometimes with enabled DMAR initial write from device
to this address is somehow getting delayed, triggering assertion due to
zero default being invalid.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
- Deduce allowed address range for bus_dma(9) from the hardware version.
Different versions (CPU generations) have different documented limits.
- Remove difference between address ranges for src/dst and crc. At least
docs for few recent generations of CPUs do not mention anything like that,
while older are already limited with above limits.
- Remove address assertions from arguments. While I do not think the
addresses out of allowed ranges should realistically happen there due to
the platforms physical address limitations, there is now bus_dma(9) to
make sure of that, preferably via IOMMU.
- Since crc now has the same address range as src/dst, remove crc_dmamap,
reusing dst2_dmamap instead.
Discussed with: cem
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
This driver allows to usage of the paravirt SCSI controller
in VMware products like ESXi. The pvscsi driver provides a
substantial performance improvement in block devices versus
the emulated mpt and mps SCSI/SAS controllers.
Error handling in this driver has not been extensively tested
yet.
Submitted by: vbhakta@vmware.com
Relnotes: yes
Sponsored by: VMware, Panzura
Differential Revision: D18613