The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC. This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.
This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):
/* Idempotent */
alloc_foo
{
if (!SW_ALLOCATED)
init_iq/init_eq/init_fl no-fail sw init
alloc_iq_fl/alloc_eq/alloc_wrq may-fail sw alloc
add_foo_sysctls, etc. no-fail post-alloc items
if (!HW_ALLOCATED)
alloc_iq_fl_hwq/alloc_eq_hwq hw resource allocation
}
/* Idempotent */
free_foo
{
if (!HW_ALLOCATED)
free_iq_fl_hwq/free_eq_hwq release hw resources
if (!SW_ALLOCATED)
free_iq_fl/free_eq/free_wrq release sw resources
}
The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent. The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.
MFC after: 1 month
Sponsored by: Chelsio Communications
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts (this driver covers over 20 years
of silicon)
Reviewed by: erj
Approved by: markj
Sponsored by: Pink Floyd - Any Colour You Like (in kind)
Differential Revision: https://reviews.freebsd.org/D29872
Allow new enclosure to replace previously existing one if there is
no completely unused table entry, same as it is done for devices.
If we can not process DPM due to corruption -- wipe it and restart
from scratch. Otherwise I don't see a way to recover persistence if
something go wrong and there is no BIOS to recover it for us.
Together this solves a problem that appeared when 9300-8i firmware
update to 16.00.10.00 somehow switched its mapping mode from Device
Persistence to Enclosure/Slot without wiping the DPM table. It made
HBA completely unusable, since overflowed and conflicting mapping
table was unable to map any of enclosures and so devices.
Also while there make some enclosure mapping errors more informative.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
3e7bae0821 turns the BUS_READ_IVAR() failure from a warning into a
KASSERT. For certain PCI audio devices such like snd_csa(4) and
snd_emu10kx(4), the ac97_create() keeps the device handler generated
by device_add_child(pci_dev, "pcm"), which is not really a PCI device
handler. This in turn causes the subsequent pci_get_subdevice()
inside ac97_initmixer() triggering a panic.
This patch tries to put a bandaid for the aforementioned pcm device
children such that they can use the correct PCI handler(from parent)
to avoid a KASSERT panic in the INVARIANTS kernel.
Tested with: snd_csa(4), snd_ich(4), snd_emu10kx(4)
Reviewed by: imp
MFC after: 1 month
There are two kinds of routines in the driver that read statistics from
the hardware: the cxgbe_* variants read the per-port MPS/MAC registers
and the vi_* variants read the per-VI registers. They can be called
from the 1Hz callout or if_get_counter. All stats collection now takes
place under the callout lock and there is a new flag to indicate that
these routines should not access any hardware register.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
While the PV SCSI SG list can handle 512k of SG entries, it can only do
so for I/O that's aligned to 4k or better. newfs_msdos does unaligned
I/O, so triggers too long for host errors in cam when a 512k I/O is
attempted. Prefer power of 2 256k to the absolute maximum 508k, though
that can be revisited should the latter show to give significant
performance improvement.
MFC After: 3 days
Tested by: darius on discord (508k version of patch)
Sponsored by: Netflix
Since then, the FreeBSD USB stack has got proper USB RNDIS support.
PR: 254345
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
- SNDSTAT_LABEL_* are renamed to SNDST_DSPS_*, and SNDSTAT_LABEL_DSPS
becomes SNDST_DSPS.
- Centralize channel number/rate/formats into a single nvlist
The above nvlist is named "info_play" and "info_rec"
- Expose only encoding format in pfmts/rfmts. Userland has no direct
access to AFMT_ENCODING/CHANNEL/EXTCHANNEL macros, thus it serves no
meaning to expose too much information through this pair of labels.
However pminrate/rminrate, pmaxrate/rmaxrate, pfmts/rfmts are
deprecated and will be removed in future.
This commit keeps ioctls ABI compatibility with __FreeBSD_version
1400006 for now. In future the compat ABI with 1400006 will be removed
once audio/virtual_oss is rebuilt.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29770
iwnstats was not compiling because of some issues raised by the clang
compiler due to -Werror. As a tool it is not connected to world build.
Add missing field "barker_mrc" initialization in struct
iwn_sensitivity_limits for -Wmissing-field-initializers, remove unused
pointer *is on iwn_stats_*_print functions and unused variables for
-Wunused-parameter and -Wunused-variable.
The value for field "barker_mrc" of struct iwn2030_sensitivity_limits
was obtained from linux 3.2 wireless/iwlwifi driver code (iwl-2000.c:115
.barker_corr_th_min_mrc = 390).
Also set BINDIR in Makefile to make it possible to install under
/usr/local/sbin/iwnstats as it require super user.
Reviewed by: adrian
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29800
Add support for current and future client platform PCI IDs. These are
all I219 variants and have no known driver changes versus previous
generation client platform I219 variants.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29801
There are a number of issues in the e1000 multicast filter handling
that have been present for a long time. Take the updated approach from
ixgbe(4) which does not have the issues.
The issues are outlined in the PR, in particular this solves crossing
over and under the hardware's filter limit, not programming the
hardware filter when we are above its limit, disabling SBP (show bad
packets) when the tunable is enabled and exiting promiscuous mode, and
an off-by-one error in the em_copy_maddr function.
PR: 140647
Reported by: jtl
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29789
We don't need to set the bits here since the if/else if/else statements
fully cover setting these bit pairs.
Reported by: markj
Reviewed by: markj, erj
Approved by: #intel_networking
MFC aftter: 1 week
Differential Revision: https://reviews.freebsd.org/D29827
The NAV (network allocation vector) register reflects the current MAC
tracking of NAV - when it will stay quiet before transmitting.
Other devices transmit their frame durations in their 802.11 PHY headers
and all devices that hear a frame - even if it's one in an encoding
they don't understand - will understand the low bitrate PHY header that
includes the frame duration. So, they'll set NAV to this value so
they'll stay quiet until the transmit completes.
Anyway, sometimes the PHY NAV header is garbled and sometimes, notably
older broadcom devices, will fake a long NAV so they can get "cleaner" air
for local calibration. When this happens, the hardware will stay quiet
for quite some time and this can lead to missed/stuck beacons, or
(for Very Large Values) a MAC hang.
This code just adds the ability to get/set the NAV; the driver will
need to take care of using it during transmit hangs and beacon misses
to see if it's due to a trash looking NAV.
We must make sure that incoming packets will never overflow the netmap
buffers, even when the user is using the offset feature. In the typical
scenario, the netmap buffer is 2KiB and, with an MTU of 1500, there are
~500 bytes available for user offsets.
Unfortunately, some NICs accept incoming packets even when they are
larger then the MTU. This means that the only way to stop DMA from
overflowing the netmap buffers, when offsets are allowed, is to choose
a hardware buffer length which is smaller than the netmap buffer
length. For most NICs and for 2KiB netmap buffers, this means 1024
bytes, which is unconveniently small.
The current code will select the small hardware buf size even when
offsets are not in use. The main purpose of this change is to
fix this bug by returning to the normal behavior for the no-offsets
case.
At the same time, the patch pushes the handling of the offset case
to the lower level driver code, so that it can be made NIC-specific
(in future patches).
First, two of those four checks are unreachable.
Second, I don't believe there should be ">=" instead of ">".
Third, bus_dma(9) already returns the same EFBIG if ">".
This fixes false I/O errors in worst S/G cases with maxphys >= 2MB.
MFC after: 1 week
This is the April update to vendor/wpa committed upstream
2021/04/07.
This is MFV efec822389.
Suggested by: philip
Reviewed by: philip
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D29744
Explicitly disable ring synchronization before calling
callbacks that may result in a hardware reset.
Before this patch we relied on capturing the down/up events which,
however, may not be issued by all drivers.
"It looks like it would be less confusing to rename 'count' to
something like 'idx', since that's what it's used for in this
function."
Reviewed by: erj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29798
There is a weird limit of AGTIAPI_MAX_DMA_SEGS (128) S/G segments per
I/O since the initial driver import. I don't know why it was added,
can only guess some hardware limitation, but in worst case it means
maximum I/O size of 508KB. Respect it to be safe, rounding to 256KB.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
It is a direct request for data corruptions, one report of which we
have received. I am very surprised that only one.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Restore 525e07418c after the iflib conversion of igb(4). This
reenables random MAC address generation when attaching to a VF with a
zeroed MAC.
PR: 253535
Reported by: Balaev PA <mail@void.so>
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29785
The boundary differentiating "lem" vs "em" class devices was wrong
after the iflib conversion of lem(4).
The Packet Buffer size for 82547 class chips was not set correctly
after the iflib conversion of lem(4).
These changes restore functionality on an 82547 for the submitter.
PR: 236119
Reported by: Jeff Gibbons <jgibbons@protogate.com>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29766
This is a debugging tunable that shouldn't have retained this setting
after the initial iflib conversion of the driver
PR: 248934
Reported by: Franco Fichtner <franco@opnsense.org>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29768
A doomed VI does not have a valid ifnet.
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: np
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29662
There haven't been any non-obscure drivers that supported this
functionality and it has been impossible to test to ensure that it
still works. The only known consumer of this interface was the engine
in OpenSSL < 1.1. Modern OpenSSL versions do not include support for
this interface as it was not well-documented.
Reviewed by: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29736
iscsid can be sleeping in iscsi_ioctl() causing the destroy_dev() to
sleep forever if iscsi.ko is unloaded while iscsid is running.
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: mav
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29688
Use .o files directly. Replace the .o.uu files that we uudecode with .o files.
Adjust the kernel and module build to cope.
Suggestions by: markj@, emaste@
Sposnored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D29636
uudecode the .o.uu files and commit directly to the tree. Adjust the build
infrastructure to cope with the new location, both for the kernel and modules.
Sposnored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D29635
Store the .o files directly in the tree. We no longer need to play uuencode
games like we did in the CVS days. Adjust the build infrastructure to match.
Reviewed by: markj@
Sposnored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D29634
We no longer need to use uuencode to uuencode files in our tree. Store the .o
file directly instead. Adjust the build to cope with the new arrangement.
Suggestions by: emaste, bz, donner
Reviewed by: markm
Sposnored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D29632
The driver needs to provide a LinuxKPI device structure to register
itself with the IB subsystem. It was erroneously using a copy of its
FreeBSD device structure for this purpose.
Use linux_pci_attach_device() instead, following the example of the
Chelsio iwarp driver. Also ensure that we don't leak the faked device
during detach.
Reviewed by: hselasky
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29595
The mbuf allocated could be a chain and must be freed with m_freem.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29579
That fixes disabled keyboard input after Xorg server has been stopped.
Reviewed by: whu
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D28171
Such type cannot be used in code that is in common between
FreeBSD and Linux. Use the FreeBSD type instead.
MFC after: 3 days
Reported by: markj
Differential Revision: https://reviews.freebsd.org/D29677
similarly to the Linux driver, by a tunable read only sysctl.
Submitted by: Oleg Sidorkin <osidorkin@gmail.com>
PR: 254884
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
The size of the ATU MEM/IO windows is implicitly casted to uint32_t.
Because of that some window sizes were silently demoted to 0 and ignored.
Check the size if its too large, trim it to 4GB and print a warning message.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: mw
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29625
The lower pmc event bits were masked off to find the PMC event ID.
The doesn't work when there are more events. Switch it to use the
offser relative to the first event while also checking the ID is
in the expected range.
Reviewed by: gnn, ray
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D29600
This function will be used for exposing DMI info as sysctls in the
smbios module (in an upcoming review).
While here, add __packed to the structs.
Reviewed by: dab
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29270
On some systems (e.g. Lenovo ThinkPad X240, Apple MacBookPro12,1)
the SMBIOS entry point is not found in the <0xFFFFF space.
Follow the SMBIOS spec and use the EFI Configuration Table for
locating the entry point on EFI systems.
Reviewed by: rpokala, dab
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29276
Commit: f2f1ab39c0 ("pci_user: call bus_translate_resource before BAR mmap")
broke build for 32-bit platforms due to rman_res_t and vm_paddr_t
incompatible types. Fix that.
On some armv8 machines it is possible that the mapping between CPU
and PCI bus BAR base addresses is not 1:1. In case a BAR is allocated
in kernel using bus_alloc_resource_any this translation is handled in
ofw_pci_activate_resource.
Do the same in pci_user.c by calling bus_translate_resource devmethod.
This fixes mmaping BARs to userspace on Marvell SoCs (Armada 7k8k/CN913x)
and possibly many other platforms.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Marvell
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29604
Use viewport "2" instead of "0" and change window type from MEM to IO.
Without these changes the MEM ATU window can be overwritten with the IO one.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29516
Add flow_control to hw.ixl tunables tree to let override
initial flow control configuration for all interfaces.
Keep using configuration set by NVM by default.
Reviewed by: erj@, gallatin@
Tested by: gowtham.kumar.ks_intel.com
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D29338
Summary:
They're nearly identical, so don't use two copies. Merge the newer
driver into the older one, and move it to a common location.
Add the Semihalf and associated copyrights in addition to mine, since
it's a non-trivial amount of code merged.
Reviewed By: mw
Differential Revision: https://reviews.freebsd.org/D29520
Usually on boot the MPS is already configured by BIOS. But we've
found that on hot-plug it is not true at least for our Supermicro
X11 boards. As result, mismatch between root's configuration of
256 bytes and device's default of 128 bytes cause problems for some
devices, while others seem to work fine.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Rename the last remaining bits depending on ifnet from linux/netdevice.h
instead of using the compat macros. This helps clearing up
struct netdevice being struct ifnet from linux/netdevice.h.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky, kib
X-D-R: D29366
Differential Revision: https://reviews.freebsd.org/D29497
Under geom(4) nvme_ns_bio_process() is on the path where sleep
is prohibited as g_io_shedule_down() calls THREAD_NO_SLEEPNG()
before geom->start().
Reviewed By: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29539
There is no change in the source of the stats (t4_get_port_stats or
t4_get_vi_stats) but the per-port callout is gone.
Sponsored by: Chelsio Communications
Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D29527
This fixes a panic due to stale so->so_proto if t4_tom is unloaded and
one or more connections that were previously offloaded are still around
in TIME_WAIT state.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29503
LLVM12 complains if you change the symbol binding:
error: mfs_root_end changed binding to STB_WEAK [-Werror,-Winline-asm]
error: mfs_root changed binding to STB_WEAK [-Werror,-Winline-asm]
The netmap monitor intercepts any TX/RX packets on the monitored
port. However, before this change there was no way to tell
whether an intercepted packet was being transmitted or received
on the monitored port.
A TXMON flag in the netmap slot has been added for this purpose.
This feature enables applications to ask netmap to transmit or
receive packets starting at a user-specified offset from the
beginning of the netmap buffer. This is meant to ease those
packet manipulation operations such as pushing or popping packet
headers, that may be useful to implement software switches,
routers and other packet processors.
To use the feature, drivers (e.g., iflib, vtnet, etc.) must have
explicit support. This change does not add support for any driver,
but introduces the necessary kernel changes. However, offsets support
is already included for VALE ports and pipes.
Use strncmp() instead of bcmp(), so that we don't have to find the
minimum of the string lengths before comparing.
Reviewed by: kib
Reported by: KASAN
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29463
This avoids some atomics by using counter_u64 for TX and relying on
existing single-threading (single ithread per rxq) for RX.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29383
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues. Currently it only holds a work queue but will
include additional state in future changes.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29382
Remove unused #includes of LinuxKPI headers noticed while trying to
solve LinuxKPI struct net_device and related functions.
Neither netdevice.h nor inetdevice.h nor notifier.h seem to be needed.
This takes cxgbe(4) out of the picture of D29366.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: np
X-D-R: D29366 (extracted as further cleanup)
Differential Revision: https://reviews.freebsd.org/D29432
Remove unused #includes of a LinuxKPI header noticed while trying to
solve LinuxKPI struct net_device and related functions.
This takes qlnxr out of the picture of D29366.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
X-D-R: D29366 (extracted as further cleanup)
Remove linux/inetdevice.h as neither of the two inline functions there
are used here.
Sposored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky
X-D-R: D29366 (extracted as further cleanup)
Differential Revision: https://reviews.freebsd.org/D29428
When debugging leaked MSI/MSI-X vectors through LinuxKPI I found
the informational printf unhelpful. Rather than just stating we
leaked also tell how many MSI or MSI-X vectors we leak.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: jhb
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29394
net80211 changed a while back to support per-VAP config for things
rather than it being global. This is to support firmware NICs that
support per-VAP flags and configuration where the firmware will figure
out how to combine them.
However, it introduced a fun timing issue - those changes used to happen
to the shared ic state before newstate() was called, but now they're
also tasks and they can happen /after/.
This isn't a problem for ath(4), but it exposed some interesting
timing and config bugs here. Notably, I saw short slot NOT being
configured in 5GHz mode during some associations, so 5GHz stuff
would hang or behave poorly. Other times the follow-up auth has
the right config, so it didn't hang.
So for now, just flip this over to using the per-VAP flags which
are correct when newstate() is called. net80211 should also have
those flags synch'ed to the global ic state before newstate() runs
and that can come in a subsequent commit.
Whilst here also fix plcp to be consistently logged as a hex value.
Tested:
* iwn(4) Intel 6205, STA mode, both 2GHz and 5GHz
Differential Revision: https://reviews.freebsd.org/D29379
Reviewed by: bz
The hw.cxgbe.kern_tls tunable was used for this in the past and if it
was set then all T6 adapters would be configured for NIC TLS operation
and could not be reconfigured for TOE without a reload. With this
change ifconfig can be used to manipulate toe and txtls caps like any
other caps. hw.cxgbe.kern_tls continues to work as usual but its
effects are not permanent any more.
* Enable nic_ktls_ofld in the default configuration file and use the
firmware instead of direct register manipulation to apply/rollback
NIC TLS configuration. This allows the driver to switch the hardware
between TOE and NIC TLS mode in a safe manner. Note that the
configuration is adapter-wide and not per-port.
* Remove the kern_tls config file as it works with 100G T6 cards only
and leads to firmware crashes with 25G cards. The configurations
included with the driver (with the exception of the FPGA configs) are
supposed to work with all adapters.
Reported by: Veeresh U.K. at Chelsio
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D29291
upper_32_bits() and lower_32_bits() are defined twice in this file.
With the extra conditinal removed on LinuxKPI in 3b1ecc9fa1
they are also included from there already. Use the LinuxKPI version
and remove the two local ones.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: hselasky
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29392
We are not aware of any out-of-tree consumers anymore
which would need KPI support for before Linux version 5.
Update the two in-tree consumers to use the new KPI.
This allows us to remove the extra version check and
will also give access to {lower,upper}_32_bits() unconditionally.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: hselasky, rlibby, rstone
MFC-after: 2 weeks
X-MFC: to 13 only
Differential Revision: https://reviews.freebsd.org/D29391
Only attempt to fetch the configuration data and connect the shared
ring once the frontend has switched to the 'Connected' state. This
seems to be inline with what Linux netback does, and is required to
make newer versions of NetBSD netfront work, since NetBSD only
publishes the required configuration before switching to the Connected
state.
MFC after: 1 week
Sponsored by: Citrix Systems R&D
This avoids mixing the use of two different enums which modern C
compilers warn about.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29301
For guests running under some kind of VMMs, configuration structure is
available in memory space but not I/O space.
Reported by: Yuan Rui <number201724@me.com>
MFC after: 2 weeks
Reviewed by: rpokala, bryanv, jhb
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D28818
The MSI-X resource shouldn't be assumed to be always on BAR1.
The Virtio v1.1 Spec did not specify that MSI-X table and PBA BAR has to
be BAR1 either.
Reported by: Yuan Rui <number201724@me.com>
MFC after: 2 weeks
Reviewed by: bryanv, jhb
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D28817
While here, make sure only the PF driver attempts to program the global
RSS key (with options RSS). The VF driver doesn't have access to those
device registers.
MFC after: 1 week
Sponsored by: Chelsio Communications
A repeat call will recreate the memory windows in the hardware and move
them to their last-known positions without repeating any of the software
initialization.
MFC after: 1 week
Sponsored by: Chelsio Communications
The decision whether a TCP packet is sent over IPv4 or IPv6 was
based on ethertype, which works correctly. In D27926 the criteria
was changed to checking if the CSUM_IP_TSO flag is set in the
csum-flags and then considering it to be TCP/IPv4.
However, the TCP stack sets the flag to CSUM_TSO for IPv4 and IPv6,
where CSUM_TSO is defined as CSUM_IP_TSO|CSUM_IP6_TSO.
Therefore TCP/IPv6 packets gets mis-classified as TCP/IPv4,
which breaks TSO for TCP/IPv6.
This patch bases the check again on the ethertype.
This fix will be MFC instantly as discussed with re(gjb).
MFC after: instantly
PR: 254366
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D29331
In some cases like broken hardware nvme(4) may wait minutes for
controller response before timeout. Doing so in a tight spin loop
made whole system unresponsive.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29309
Sponsored by: iXsystems, Inc.
After length decisions, we've decided that the if_wg(4) driver and
related work is not yet ready to live in the tree. This driver has
larger security implications than many, and thus will be held to
more scrutiny than other drivers.
Please also see the related message sent to the freebsd-hackers@
and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on
2021/03/16, with the subject line "Removing WireGuard Support From Base"
for additional context.
These ioctl commands aim to provide easier ways for user space
applications to enumerate existing audio devices and the node they can
potentially use.
The exchange of device lists between user space and kernel is done on
nv(9). Some ioctl commands are added to /dev/sndstat node:
- SNDSTAT_REFRESH_DEVS
- SNDSTAT_GET_DEVS
- SNDSTAT_ADD_USER_DEVS
- SNDSTAT_FLUSH_USER_DEVS
Bump __FreeBSD_version to reflect the addition of the ioctls.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D26884
Definitions inside usr.sbin/bhyve/virtio.h are thrown away.
Definitions in sys/dev/virtio are used instead.
This reduces code duplication.
Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29084
The netmap_ioctl() function has a reference counting bug in case of
NETMAP_REQ_PORT_INFO_GET command. When `hdr->nr_name[0] == '\0'`,
the function does not decrease the refcount of "nmd", which is
increased by netmap_mem_find(), causing a refcount leak.
Reported by: Xiyu Yang <sherllyyang00@gmail.com>
Submitted by: Carl Smith <carl.smith@alliedtelesis.co.nz>
MFC after: 3 days
PR: 254311
This file got resynced with OpenBSD to pick up fixes that had taken
place after the version initially ported to FreeBSD. KASSERT there is
more like MPASS here.
Reported by: David Wolfskill <david@catwhisker.org>
The RSC support feature introduced a bit field "rm_internal" in
struct rndis_pktinfo with total size unchanged.
The guest does not use this field in the tx path. However we need to
initialize it to zero in case older hosts which are not aware of this
field.
Fixes: a491581f ("Hyper-V: hn: Enable vSwitch RSC support")
MFC after: 2 weeks
Sponsored by: Microsoft
This is the culmination of about a week of work from three developers to
fix a number of functional and security issues. This patch consists of
work done by the following folks:
- Jason A. Donenfeld <Jason@zx2c4.com>
- Matt Dunwoodie <ncon@noconroy.net>
- Kyle Evans <kevans@FreeBSD.org>
Notable changes include:
- Packets are now correctly staged for processing once the handshake has
completed, resulting in less packet loss in the interim.
- Various race conditions have been resolved, particularly w.r.t. socket
and packet lifetime (panics)
- Various tests have been added to assure correct functionality and
tooling conformance
- Many security issues have been addressed
- if_wg now maintains jail-friendly semantics: sockets are created in
the interface's home vnet so that it can act as the sole network
connection for a jail
- if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0
- if_wg now exports via ioctl a format that is future proof and
complete. It is additionally supported by the upstream
wireguard-tools (which we plan to merge in to base soon)
- if_wg now conforms to the WireGuard protocol and is more closely
aligned with security auditing guidelines
Note that the driver has been rebased away from using iflib. iflib
poses a number of challenges for a cloned device trying to operate in a
vnet that are non-trivial to solve and adds complexity to the
implementation for little gain.
The crypto implementation that was previously added to the tree was a
super complex integration of what previously appeared in an old out of
tree Linux module, which has been reduced to crypto.c containing simple
boring reference implementations. This is part of a near-to-mid term
goal to work with FreeBSD kernel crypto folks and take advantage of or
improve accelerated crypto already offered elsewhere.
There's additional test suite effort underway out-of-tree taking
advantage of the aforementioned jail-friendly semantics to test a number
of real-world topologies, based on netns.sh.
Also note that this is still a work in progress; work going further will
be much smaller in nature.
MFC after: 1 month (maybe)
TSFOOR happens if a beacon with a given TSF isn't received within the
programmed/expected TSF value, plus/minus a fudge range. (OOR == out of range.)
If this happens then it could be because the baseband/mac is stuck, or
the baseband is deaf. So, do a cold reset and resync the beacon to
try and unstick the hardware.
It also happens when a bad AP decides to err, slew its TSF because they
themselves are resetting and they don't preserve the TSF "well."
This has fixed a bunch of weird corner cases on my 2GHz AP radio upstairs
here where it occasionally goes deaf due to how much 2GHz noise is up
here (and ANI gets a little sideways) and this unsticks the station
VAP.
For AP modes a hung baseband/mac usually ends up as a stuck beacon
and those have been addressed for a long time by just resetting the
hardware. But similar hangs in station mode didn't have a similar
recovery mechanism.
Tested:
* AR9380, STA mode, 2GHz/5GHz
* AR9580, STA mode, 5GHz
* QCA9344 SoC w/ on-board wifi (TL-WDR4300/3600 devices); 2GHz
STA mode
Right now ts_antenna is either 0 or 1 in each supported HAL so
this is purely a sanity check.
Later on if I ever get magical free time I may add some extensions
for the NICs that can have slightly more complicated antenna switches
for transmit and I'd like this to not bust memory.
Completions for crypto requests on port 1 can sometimes return a stale
cookie value due to a firmware bug. Disable requests on port 1 by
default on affected firmware.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26581
These fixes are only relevant for requests on the second port. In
some cases, the crypto completion data, completion message, and
receive descriptor could be written in the wrong order.
- Add a separate rx_channel_id that is a copy of the port's rx_c_chan
and use it when an RX channel ID is required in crypto requests
instead of using the tx_channel_id.
- Set the correct rx_channel_id in the CPL_RX_PHYS_ADDR used to write
the crypto result.
- Set the FID to the first rx queue ID on the adapter rather than the
queue ID of the first rx queue for the port.
- While here, use tx_chan to set the tx_channel_id though this is
identical to the previous value.
Reviewed by: np
Reported by: Chelsio QA
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29175
Receive Segment Coalescing (RSC) in the vSwitch is a feature available in
Windows Server 2019 hosts and later. It reduces the per packet processing
overhead by coalescing multiple TCP segments when possible. This happens
mostly when TCP traffics are among different guests on same host.
This patch adds netvsc driver support for this feature.
The patch also updates NVS version to 6.1 as needed for RSC
enablement.
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D29075
nvme drives are configured early in boot. However, a number of the configuration
steps takes which take a while, so we defer those to a config intrhook that runs
before the root filesystem is mounted. At the same time, the PCI hot plug wakes
up and tests the status of the card. It may decide that the card has gone away
and deletes the child. As part of that process nvme_detach is called. If this
call happens after the config_intrhook starts to run, but before it is finished,
there's a race where we can tear down the device's soft state while the
config_intrhook is still using it. Use the new config_intrhook_drain to
disestablish the hook. Either it will be removed w/o running, or the routine
will wait for it to finish. This closes the race and allows safe hotplug at any
time, even very early in boot.
Sponsored by: Netflix, Inc
Reviewed by: jhb, mav
Differential Revision: https://reviews.freebsd.org/D29006
The pwm utility cant set the only flag defined (PWM_POLARITY_INVERTED) so this
patch add the option -I (capital letter i) to send it to the drivers.
None of existing PWM driver have implemented support for flags.
But soon:ish I will put up an review of a pwm driver using TI OMAP DMTimer.
Differential Revision: https://reviews.freebsd.org/D29137
MFC after: 2 weeks
Fix the types of period and duty in share/man/man9/pwmbus.9 to match the one in sys/dev/pmw/pwmbus.c.
Reviewed By: rpokala
Differential Revision: https://reviews.freebsd.org/D29139
MFC after: 3 days
We don't need the result before next sleep time, so no reason to
additionally increase interrupt latency.
While there, remove extra PM ticks to microseconds conversion, making
C2/C3 sleep times look 4 times smaller than really. The conversion
is already done by AcpiGetTimerDuration(). Now I see reported sleep
times up to 0.5s, just as expected for planned 2 wakeups per second.
MFC after: 1 month
It has been observed that some systems are often unable to resume from
ddb after entering with debug.kdb.enter=1. Checking the status further
shows the terminal is blocked waiting in tty_drain(), but it never makes
progress in clearing the output queue, because sc->sc_txbusy is high.
I noticed that when entering polling mode for the debugger, IER_TXRDY is
set in the failure case. Since this bit is never tracked by the softc,
it will not be restored by ns8250_bus_ungrab(). This creates a race in
which a TX interrupt can be lost, creating the hang described above.
Ensuring that this bit is restored is enough to prevent this, and resume
from ddb as expected.
The solution is to track this bit in the sc->ier field, for the same
lifetime that TX interrupts are enabled.
PR: 223917, 240122
Reviewed by: imp, manu
Tested by: bz
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29130
The names are self-explanatory; these are currently only used by the
wg(8) tool, but they are handy data points to have.
Reviewed by: grehan
MFC after: 3 days
Discussed with: decke
Differential Revision: https://reviews.freebsd.org/D29143
We have no use for the udphdr or this hlen local, just spell out the
addition inline.
MFC after: 3 days
Reviewed by: grehan, markj
Differential Revision: https://reviews.freebsd.org/D29142
Some framebuffer properties obtained from the device tree were not being
properly converted to host endian.
Replace OF_getprop calls by OF_getencprop where needed to fix this.
This fixes boot on PowerPC64 LE, when using ofwfb as the system console.
Reviewed by: bdragon
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27475
The kernel-side already accepted a persistent-keepalive-interval, so
just add a verb to ifconfig(8) for it and start exporting it so that
ifconfig(8) can view it.
PR: 253790
MFC after: 3 days
Discussed with: decke
No sleeping allowed here, so avoid it. Collect the subset of data we
want inside of the epoch, as we'll need extra allocations when we add
items to the nvlist.
Reviewed by: grehan (earlier version), markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29124
This partially reverts df55485085 but still fixes the leak. It was
overlooked (sigh) that some packets will exceed MHLEN and cannot be
physically contiguous without clustering, but we don't actually need
it to be. m_defrag() should pull up enough for any of the headers that
we do need to be accessible.
Fixes: df55485085
Pointy hat; kevans
When we enter C2+ state via memory read, it may take chipset some
time to stop CPU. Extra register read covers that time. But MWAIT
makes CPU stop immediately, so we don't need to waste time after
wakeup with interrupts still disabled, increasing latency.
On my system it reduces ping localhost latency, waking up all CPUs
once a second, from 277us to 242us.
MFC after: 1 month
Even though the information is very limited, it seems the intent of
this flag is to control ACPI_BITREG_BUS_MASTER_STATUS use for C3,
not force ACPI_BITREG_ARB_DISABLE manipulations for C2, where it was
never needed, and which register not really doing anything for years.
It wasted lots of CPU time on congested global ACPI hardware lock
when many CPU cores were trying to enter/exit deep C-states same time.
On idle 80-core system it pushed ping localhost latency up to 20ms,
since badport_bandlim() via counter_ratecheck() wakes up all CPUs
same time once a second just to synchronously reset the counters.
Now enabling C-states increases the latency from 0.1 to just 0.25ms.
Discussed with: kib
MFC after: 1 month
This structure is shared among multiple instances of a driver, so we
should ensure that it doesn't somehow get treated as if there's a
separate instance per interface. This is especially important for
software-only drivers like wg.
DEVICE_REGISTER() still returns a void * and so the per-driver sctx
structures are not yet defined with the const qualifier.
Reviewed by: gallatin, erj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29102
ath_hal is compiled into the kernel by default and so always prints a
message to dmesg even when the system has no ath hardware.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
This updates the driver to align with the version included in
the "Intel Ethernet Adapter Complete Driver Pack", version 25.6.
There are no major functional changes; this mostly contains
bug fixes and changes to prepare for new features. This version
of the driver uses the previously committed ice_ddp package
1.3.19.0.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Tested by: jeffrey.e.pieper@intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D28640
netmap is compiled into the kernel by default so initialization was
always reported, and netmap uses a formatting convention not used in the
rest of the kernel.
Reviewed by: vmaffione
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29099
Firmware access from t4_attach takes place without any synchronization.
The driver should not panic (debug kernels) if something goes wrong in
early communication with the firmware. It should still load so that
it's possible to poke around with cxgbetool.
MFC after: 1 week
Sponsored by: Chelsio Communications
When rx packet contains hash value sent from host, store it in
the mbuf's flowid field so when the same mbuf is on the tx path,
the hash value can be used by the host to determine the outgoing
network queue.
MFC after: 2 weeks
Sponsored by: Microsoft
It closes tiny race when the flag could be set between being cleared
and the space is checked, that would create us some more work. The
flag setting is protected by both locks, so we can clear it in either
place, but in between both locks are dropped.
MFC after: 1 week
I think it allowed to avoid some TX thread wakeups while the socket
buffer is full. But add there another options if ic_check_send_space
is set, which means socket just reported that new space appeared, so
it may have sense to pull more data from ic_to_send for better TX
coalescing.
MFC after: 1 week
Add sysctl link_active_on_if_down, which allows user to control
if interface is kept in active state when it is brought
down with ifconfig. Set it to enabled by default to preserve
backwards compatibility.
Reviewed by: erj
Tested by: gowtham.kumar.ks@intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D28028
HW keeps track of RX errors using several counters, each for
specific type of errors. Report RX errors to OS as sum
of all those counters: CRC errors, illegal bytes, checksum,
length, undersize, fragment, oversize and jabber errors.
There is no HW counter for frames with invalid L3/L4 checksums
so add a SW one.
Also add a "rx_errors" sysctl with a copy of netstat IERRORS
counter value to make it easier accessible from scripts.
Reviewed By: erj
Tested By: gowtham.kumar.ks@intel.com
Sponsored By: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D27639
HW keeps track of RX errors using several counters, each for
specific type of errors. Report RX errors to OS as sum
of all those counters: CRC errors, illegal bytes, checksum,
length, undersize, fragment, oversize and jabber errors.
Also, add new "rx_errs" sysctl in the dev.ix.N.mac_stats tree. This is
to provide an another way to display the sum of RX errors.
Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed By: erj
Tested By: gowtham.kumar.ks@intel.com
Sponsored By: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D27191
This fixes mpr driver on big-endian devices.
Tested on powerpc64 and powerpc64le targets using a SAS9300-8i card
(LSISAS3008 pci vendor=0x1000 device=0x0097)
Submitted by: Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by: luporl, alfredo, Sreekanth Reddy <sreekanth.reddy@broadcom.com> (by email)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25785
Merge the following fixes from https://github.com/pfsense/FreeBSD-src
1940e7d3 Save address of ingress packets to allow wg to work on HA
8f5531f1 Fix connection to IPv6 endpoint
825ed9ee Fix tcpdump for wg IPv6 rx tunnel traffic
2ec232d3 Fix issue with replying to INITIATION messages in server mode
ec77593a Return immediately in wg_init if in DETACH'd state
0f0dde6f Remove unnecessary wg debug printf on transmit
2766dc94 Detect and fix case in wg_init() where sockets weren't cleaned up
b62cc7ac Close the UDP tunnel sockets when the interface has been stopped
Reviewed by: kevans
Obtained from: pfSense 2.5
MFC after: 3 days
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D28962
Mainly link errors interrupts should only be activated on fully linked port,
otherwise noise on lanes can cause livelock. But we don't have error
counters yet, so leave these interrupts disabled.
PAT_WRITE_BACK is x86-only, whereas sys/dev/xen could be shared
between multiple architectures.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D28831
In the Neoverse N1 SDP PCIe driver we need to map a page shared
between the firmware and the kernel. Previously we would use
pmap_kenter for this, however as this is not standardised between
architectures switch to the common pmap_qenter.
While here fix the error handling code to clean up on failure.
Reviewed by: br
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D28890
Make the pwm_backlight module depend on backlight, so it
has access to the backlight interface symbols. Otherwise you'll
get an error like:
link_elf: symbol backlight_get_info_desc undefined
Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com>
MFC after: 3 days
PR: 253765
Add it to the x86 GENERIC and MINIMAL kernels
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
Reviewed by: rpokala
Differential Revision: https://reviews.freebsd.org/D28738
ipmi_ssif will `smbus_request_bus()` to do multiple smbus requests
(which requests the iicbus), and then here in `bread()` we also need to
request the bus because `bread()` takes multiple transactions.
This causes deadlock as it's waiting for the bus it already has without
`IIC_RECURSIVE`.
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D28742
The previous implementation was reported to try to coalesce packets
in situations when it should not, that resulted in assertion later.
This implementation better checks the first packet of the chain for
the coallescing elligibility.
MFC after: 3 days
Instead of 2-4 socket reads per PDU this can do as low as one read
per megabyte, dramatically reducing TCP overhead and lock contention.
With this on iSCSI target I can write more than 4GB/s through a
single connection.
MFC after: 1 month
Multi page rings are mapped using a single hypercall that gets passed
an array of grants to map. One of the grants in the array failing to
map would lead to the failure of the whole ring setup operation, but
there was no cleanup of the rest of the grant maps in the array that
could have likely been created as a result of the hypercall.
Add proper cleanup on the failure path during ring setup to unmap any
grants that could have been created.
This is part of XSA-361.
Sponsored by: Citrix Systems R&D
- Make frontends call unified CTL core method ctl_datamove_done()
to report move completion. It allows to reduce code duplication
in differerent backends by accounting DMA time in common code.
- Add to ctl_datamove_done() and be_move_done() callback samethr
argument, reporting whether the callback is called in the same
context as ctl_datamove(). It allows for some cases like iSCSI
write with immediate data or camsim frontend write save one context
switch, since we know that the context is sleepable.
- Remove data_move_done() methods from struct ctl_backend_driver,
unused since forever.
MFC after: 1 month
T5 and above have extra bits for the optional filter fields. This is a
correctness issue and not just a waste because a filter mode valid on a
T4 (36b) may not be valid on a T5+ (40b).
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Allow the filter mask (aka the hashfilter mode when hashfilters are
in use) to be set any time it is safe to do so. The requested mask
must be a subset of the filter mode already. The driver will not change
the mode or ingress config just to support a new mask.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
1. Query the firmware for filter mode, mask, and related ingress config
instead of trying to figure them out from hardware registers. Read
configuration from the registers only when the firmware does not
support this query.
2. Use the firmware to set the filter mode. This is the correct way to
do it and is more flexible as well. The filter mode (and associated
ingress config) can now be changed any time it is safe to do so.
The user can specify a subset of a valid mode and the driver will
enable enough bits to make sure that the mode is maxed out -- that
is, it is not possible to set another bit without exceeding the
total width for optional filter fields. This is a hardware
requirement that was not enforced by the driver previously.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
- Implements little-endian support (powerpc64le)
- Adds 'hw.ofwfb.physaddr' kernel parameter so user can manually
provide correct address if it's not detected correctly
- Adds 'hw.ofwfb.argb32_pixel' so user can set it manually if
colors are inverted due to incorrect pixel format (default = 1)
- Automatically selects RGBA32 pixel format if NVidia graphic adapter
is detected (sets hw.ofwfb.argb32_pixel=0)
Machines equipped with NVidia graphic adapters tend to use RGBA32
pixel format. By default ARGB32 pixel format is used, proved to work
on machines equipped with ATI graphic adapter and the onboard adapter
used on Talos II and Blackbird machines from Raptor Computing Systems.
Original patch developed by bdragon
Reviewed by: bdragon, luporl
MFC after: 3 days
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D28604
The ACPI NFIT specification defines a set of "NVDIMM State Flags". These
flags are already reported by `acpidump -t', but this change makes them
available on a per-device basis, in a format that is more easily parsed.
To simplify this, introduce acpi_nfit_get_memory_maps_by_dimm(), which
locates the (ACPI_NFIT_MEMORY_MAP)s associated with a given
(nfit_handle_t).
Reviewed by: mav, cem
Tested by: mav, rpokala (version for stable/12)
MFC after: 3 days
Sponsored by: Panasas
BUS_DMA_NOCACHE should only be used when one needs to guarantee the
created mapping has uncached memory attributes, usually as a result
of buggy hardware. Normal use cases should pass BUS_DMA_COHERENT, to
create an appropriate mapping based on the flags passed to
bus_dma_tag_create().
This should have no functional change, since the DMA tags in this driver
are created without the BUS_DMA_COHERENT flag.
Reported by: mmel
Reviewed by: mmel, Thomas Skibo <thomas-bsd@skibo.net>
MFC after: 3 days
Per the i2c spec, a slave device can stretch SCL idefinitely, so 25ms is
a bit arbitrary in general. smbus does specify an optional timeout
recovery mechanism to be done at about 25~35ms, but the IPMI SSIF spec
says that BMCs don't have any obligation to implement that.
The BMC on Altra seems to mostly respond within 25ms, but occasionally
will stretch SCL for ~300 msec.
Also, the count_us mechanism seems to actually timeout around 25%
earlier than it would claim (timeout really happening around 19ms
instead of 25ms).
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
Reviewed by: manu, imp
Differential Revision: https://reviews.freebsd.org/D28747
In the new ENA-based instances like c6gn, the vector table moved to a
new PCIe bar - BAR1. Previously it was always located on the BAR0, so
the resources were already allocated together with the registers.
As the FreeBSD isn't doing any resource allocation behind the scenes,
the driver is responsible to allocate them explicitly, before other
parts of the OS (like the PCI code allocating MSIx) will be able to
access them.
To determine dynamically BAR on which the MSIx vector table is present
the pci_msix_table_bar() is being used and the new BAR is allocated if
needed.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 3 days
Read the PF-only hardware settings directly in get_params__post_init.
Split the rest into two routines used by both the PF and VF drivers: one
that reads the SGE rx buffer configuration and another that verifies
miscellaneous hardware configuration.
MFC after: 1 week
Sponsored by: Chelsio Communications
This updates r311987/fb1d9b7f4113d which allowed any number of vectors to be
used. Since we're just attaching one instance, the meaning of more than one
vector is not clear and seems to cause problems. Fall back to old methods for
these cards.
PR: 235016
Submitted by: David Cross
These errors do not clear so to NULL, so the existing check was
treating these failures as success. The rest of do_pass_establish()
then tried to use the listen socket as if it was a connection socket
newly created by syncache_expand().
In addition, for negative return values, do not send a RST to the
peer.
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D28243
The fallback for __align_up() used by roundup2() uses __typeof__()
which doesn't work for bitfields. This fixes the build on GCC which
uses the fallback.
Reviewed by: arichardson, markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D28599
When refill_fl() fails to allocate large (9/16KB) mbuf cluster, it
falls back to safe (4KB) ones. But it still saved into sd->zidx
the original fl->zidx instead of fl->safe_zidx. It caused problems
with the later use of that cluster, including memory and/or data
corruption.
While there, make refill_fl() to use the safe zone for all following
clusters for the call, since it is unlikely that large succeed.
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Reviewed by: np, jhb
Differential Revision: https://reviews.freebsd.org/D28716
FreeBSD when running as a dom0 under Xen is not supposed to access the
run time services directly, and instead should proxy the calls through
Xen using an hypercall interface that exposes access to selected run
time services.
Implement the efirt interface on top of the Xen provided hypercalls.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D28621
Introduce a set of hooks for MI EFI public functions, so that a new
implementation can be done. This will be used to implement the Xen PV
EFI interface that's used when running FreeBSD as a Xen dom0 from UEFI
firmware. Also make the efi_status_to_errno non-static since it will
be used to evaluate status return values from the PV interface.
No functional change indented.
Sponsored by: Citrix Systems R&D
Reviewed by: kib, imp
Differential revision: https://reviews.freebsd.org/D28620
Locking the second lock which causes the LOR, can be skipped because
the code updating the shared variables is always executing from the
same USB thread.
lock order reversal:
1st 0xfffff80005cc3840 pcm7:play:dsp7.p0 (pcm play channel, sleep mutex)
@ usb_transfer.c:2342
2nd 0xfffff80005cc3860 pcm7:record:dsp7.r0 (pcm record channel, sleep mutex)
@ uaudio.c:2317
lock order pcm record channel -> pcm play channel established at:
witness_checkorder+0x461
__mtx_lock_flags+0x98
dsp_mmap_single+0x151
vm_mmap_cdev+0x65
devfs_mmap_f+0x143
kern_mmap_req+0x594
sys_mmap+0x46
amd64_syscall+0x12e
fast_syscall_common+0xf8
lock order pcm play channel -> pcm record channel attempted at:
witness_checkorder+0xd82
__mtx_lock_flags+0x98
uaudio_chan_play_callback+0xeb
usbd_callback_wrapper+0x7ec
usb_command_wrapper+0x7e
usb_callback_proc+0x8e
usb_process+0xf3
fork_exit+0x80
fork_trampoline+0xe
Found by: Stefan Ehmann <shoesoft@gmx.net>
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
defined by hardware rather than cached one to match HIDIOCGRDESC ioctl.
This fixes errors reported by hid-tools being run against /dev/hidraw#
device node belonging to driver which overloads report descriptor.
MFC after: 1 week
Ignore fantom keyboard state reports entirelly rather than ignore
RollOver states for each key separatelly. Latter results in spurious
release/push pairs of events on each fantom keyboard state report.
Reported by: Jan Martin Mikkelsen <janm_AT_transactionware_DOT_com>
Submitted by: Jan Martin Mikkelsen (initial version)
PR: 253249
MFC after: 1 week
Ignore fantom keyboard state reports entirelly rather than ignore
RollOver states for each key separatelly. Latter results in spurious
release/push pairs of events on each fantom keyboard state report.
Reported by: Jan Martin Mikkelsen <janm_AT_transactionware_DOT_com>
Submitted by: Jan Martin Mikkelsen (initial version)
PR: 253249
MFC after: 1 week
Previously, iscsi_poll() just panicked. This meant if you got a panic
on a box when using the iSCSI initiator, the attempt to shutdown would
trigger a nested panic and never write out a core. Now, CCB's sent to
iSCSI devices (such as the sychronize-cache request in dashutdown())
just fail with a timeout during a panic shutdown.
Reviewed by: scottl, mav
MFC after: 2 weeks
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D28455
Since commit 8fa6abb6f4 ("Expose clang's alignment builtins and use
them for roundup2/rounddown2"), clang emits warnings for several
alignment operations in these drivers because the operation is a no-op.
The compiler is arguably being too strict here, but in the meantime
let's silence the warnings by conditionally compiling the alignment
operations.
Reviewed by: arichardson, hselasky
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28576
My YOGA requires a minimum of 7 to parse w/o an error. Since the memory savings
are trivial and the yoga a popular system, bump the default up to 8. There's no
API/ABI issues in doing this. This hid_item struct isn't exported to userland
and the one libusbhid has is different and only shares a name...
MFC After: 3 days
Reviewed by: wulf@
Differential Revision: https://reviews.freebsd.org/D28543
It appears that production versions of EPYC firmware get the _STA method right
for these nodes. In fact, this workaround breaks on production hardware by
including too many uart nodes. This work around was for pre-release hardware
that wound up not having a large deployment. Move this work around to a kernel
option since the machines that needed it have been powered off and are difficult
to resurrect. Should there be a more significant deployment than is understood,
we can restrict it based on smbios strings.
Discussed with: mmacy@, seanc@, jhb@
MFC After: 3 days
This adds a new sysctl to Wellspring Touchpad driver for controlling
Z-Axis (2-finger vertical scroll) direction "hw.usb.wsp.z_invert".
Submitted by: James Wright <james.wright_AT_digital-chaos_DOT_com>
Reviewed by: wulf
PR: 253321
Differential revision: https://reviews.freebsd.org/D28521
vt is using static buffers for on screen data, the buffer size is
calculated based on maximum supported screen size and 8x16 font.
When using hi-res graphics and very smaller than 8x16 font, we
need to be careful not to overflow static buffers in vt.
Testing: I did test by building smaller buffers than vt currently is using,
royger was testing on actual 4k capable hardware.
MFC after: 1 week
Tested by: royger
X700 family of controllers has limited number of available VLAN
HW filters. Driver did not handle properly a case when user
assigned more VLANs to the interface which had all filters
already in use. Fix that by disabling HW filtering when
it is impossible to create filters for all requested VLANs.
Keep track of registered VLANs using bitstring to be able
to re-enable HW filtering when number of requested VLANs
drops below the limit.
Also switch all allocations to use M_IXL malloc type
to ease detecting memory leaks in the driver.
Reviewed by: erj
Tested by: gowtham.kumar.ks@intel.com
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28137
Add endiannes conversions in order to support big-endian platforms
Submitted by: Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by: luporl, alfredo, kadesai (on email)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26531
- limit maximum segment size to 2048 bytes. Although dwmmc supports a buffer
fragment with a maximum length of 4095 bytes, use the nearest lower power
of two as the maximum fragment size. Otherwise, busdma create excessive
buffer fragments.
- fix off by one error in computation of the maximum data transfer length.
- in addition, reserve two DMA descriptors that can be used by busdma
bouncing. The beginning or end of the buffer can be misaligned.
- Don’t ignore errors passed to bus_dmamap_load() callback function.
- In theory, a DMA engine may be running at time when next dma descriptor is
constructed. Create a full DMA descriptor before OWN bit is set.
MFC after: 2 weeks
This fixes an issue where a private key contained bits that should
have been cleared by the clamping process, but were passed through
to the scalar multiplication routine and resulted in an invalid
public key.
Issue diagnosed (and an initial fix proposed) by shamaz.mazum in
PR 252894.
This fix suggested by Jason Donenfeld.
PR: 252894
Reported by: shamaz.mazum
Reviewed by: dch
MFC after: 3 days
DataSN for solicited Data-Out is per-R2T. Since we handle whole R2T
in one go, we don't need to store it anywhere, especially in global
per-command structure. This may allow us to handle multiple R2T per
command at once, if we decide, or may be relax locking.
Rename the second use of that field to io_referenced_task_tag.
MFC after: 1 month
As we get started with no memory allocator, we set up static font data
for font passed by loader (if there is any). At this time, we also must
set refcount 1, and refcount will get incremented in cnprobe() callback.
At some point the memory allocator will be available, and we will set up
properly allocated font data, but we should not disturb the refcount.
PR: 253147
Memory and PCI resources are freed with no particular order. This could
cause use-after-frees when detaching following a failed attach. For
instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but
iflib_tqg_detach() attempts to access this array. Similarly, adapter
queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to
access adapter queues to free PCI resources.
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D27634
- The behavior implemented in r362905 resulted in delayed transmission
of packets in some cases, causing performance issues. Use a different
heuristic to predict tx requests.
- Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing
entirely. It can be changed at any time. There is no change in
default behavior.
Investigation of iSCSI target data corruption reports brought me to
discovery that cxgb(4) expects mbufs to be physically contiguous, that
is not true after I've started using m_extaddref() in software iSCSI
for large zero-copy transmissions. In case of fragmented memory the
driver transmitted garbage from pages following the first one due to
simple use of pmap_kextract() for the first pointer instead of proper
bus_dmamap_load_mbuf_sg(). Seems like it was done as some optimization
many years ago, and at very least it is wrong in a world of IOMMUs.
This patch just removes that optimization, plus limits packet coalescing
for mbufs crossing page boundary, also depending on assumption of one
segment per packet.
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Reviewed by: mmacy, np
Differential revision: https://reviews.freebsd.org/D28428
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer
that points to unmapped address. Then, this was extended to array of
such pointers. Then, such mbufs were augmented with header/trailer.
Basically, extended mbufs are extended, and set of features is subject
to change. The new name should be generic enough to avoid further
renaming.
newer controller have a sparce bus space that can be figured
out by probing the HW. This gives the starting bus number.
When reading the PCI config. space behind the VMD controller,
the offset of the starting bus needs to be subtracted from
the bus being read.
Fixed a bug in which in which not all of the devices
directly attached to the VMD controller would be probed.
On my initial test HW, a switch was found at bus 0, slot 0
and function 0. All of the NVME drives were behind that
switch. Now scan for all slots and functions attached to
bus 0. If a something was found then run attach after the
scan. On detach also go through all slots and functions
on bus 0.
Tested with device ID's: 0x201d & 0x9a0b
Tested by: nc@
MFC after: 7 days
PR: 252253
Move software iSCSI tunables/sysctls into kern.icl.soft subtree.
Replace several hardcoded length constants there with variables.
While there, stretch the limits to better match Linux' open-iscsi
and our own initiator with new MAXPHYS of 1MB. Our CTL target is
also optimized for up to 1MB I/Os, so there is also a match now.
For Windows 10 and VMware 6.7 initiators at default settings it
should make no change, since previous limits were sufficient there.
Tests of QD1 1MB writes from FreeBSD over 10GigE link show throughput
increase by 29% on idle connection and 132% with concurrent QD8 reads.
MFC after: 3 days
Sponsored by: iXsystems, Inc.
This is a regression issue after the new UAR API was introduced
by f8f5b459d2 .
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
It's rather confusing when adapter->hw and hw are mixed and matched
within a particular function.
Some of this was missed in cd1cf2fc1d
and r353778 respectively.
nids(4) was a clever idea in the early 2000's when the market was
flooded with 10/100 NICs with Windows-only drivers, but that hasn't been
the case for ages and the driver has had no meaningful maintenance in
ages. It only supports Windows-XP era drivers.
Also remove:
- ndis support from wpa_supplicant
- ndiscvt(8)
Reviewed By: emaste, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D27609
RFC 7143 (11.2.1.8):
An ITT value of 0xffffffff is reserved and MUST NOT be assigned for a
task by the initiator. The only instance in which it may be seen on
the wire is in a target-initiated NOP-In PDU (Section 11.19) and in
the initiator response to that PDU, if necessary.
MFC after: 1 month
This reverts commit aa37baf3d7.
The reverted commit was motivated by a problem observed on stable/12,
but it turns out that a better solution was committed in r348309 but not
MFCed. So, revert this change since it is unnecessary and not really
correct: it assumes that the order in which module metadata records is
defined determines their order in the output linker set. While this
seems to hold in my testing, it is not guaranteed.
Reported by: cem
Discussed with: imp
MFC after: 3 days
By default, axgbe driver does a receiver reset after predefined number
of retries for the link to come up. However, this receiver reset
doesn't always suffice, due to an hardware issue.
In that case, as a workaround, a complete phy reset is necessary.
This patch introduces a sysctl that can be set to 1 to let the driver
reset the phy completely, rather than just doing receiver reset.
The workaround will be removed once the issue is fixed by means
of firmware update.
This patch also fixes the handling of the direct attach cables
properly.
Submitted by: rajesh1.kumar_amd.com
Differential Revision: https://reviews.freebsd.org/D28266
Handle the case of a tagged command arriving when there's an untagged
one still outstanding gracefully instead of panicing.
While at it:
- Replace fancy arithmetics with a simple assignment as busy_itl can
only ever be either 0 or 1.
- Fix a comment typo in sym_free_ccb().
- In ___dma_getp(), remove dead code. [1]
- In sym_sir_bad_scsi_status(), add missing FALLTHROUGH. [2]
While at it:
- For getbaddrcb(), remove __unused from the nseg argument as it's in
fact used when compiling with INVARIANTS.
- In sym_int_sir(), ensure in all branches that cp is not NULL before
using it.
Reported by: Coverity
CID: 1008861 [1], 114996 [2]
PNP info definitions currently have an unfortunate requirement in that
they must follow the associated module definition in the module metadata
linker set. Otherwise devmatch can segfault while processing the linker
hints file since kldxref maintains the order in the linker set.
A number of drivers violate this requirement. In some cases this can
cause devmatch(8) to segfault when processing the linker hints file.
Work around the problem for now simply by adjusting the drivers.
Reviewed by: imp
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D28260
For hardware offload solicited data may potentially be handled more
efficiently than unsolicited due to direct data placement. Or there
can be some unsolicited write buffering limitations. It may create
situations where FirstBurstLength limit is really useful.
Software driver though has no those factors, having to do memcopy()
any way and having no so hard limit on the temporary storage. Same
time more active use of unsolicited transfers allows to avoid some
of Ready To Transfer (R2T) PDU round-trip times and processing.
This change effectively doubles from 64KB to 128KB the maximum size
of write command that can be transferred within one link RTT. Tests
of (64KB, 128KB] QD1 writes mixed with simultaneous QD8 reads over
the same connection, increasing RTT, shows almost double write speed
and half latency, while we should be able to afford few megabytes of
RAM for additional buffering on a target these days.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
This was blindly moved in r360722 but the variable being printed is not
yet initialised. It's of little use and can easily be added back in the
right place if needed by someone debugging, so just delete it.
Reported by: bryanv
Rather than have every device register itself for both virtio_pci and
virtio_mmio, provide a VIRTIO_DRIVER_MODULE wrapper to declare both,
merge VIRTIO_SIMPLE_PNPTABLE with VIRTIO_SIMPLE_PNPINFO and make the
latter register for both buses. This also has the benefit of abstracting
away the available transports and their names.
Reviewed by: bryanv
Differential Revision: https://reviews.freebsd.org/D28073
We must check MagicValue not just Version before anything else, and then
we must check DeviceID and immediately abort if zero (and this must not
be an error).
Do all this when probing rather than at the start of attaching as that's
where this belongs, and provides a clear boundary between the device
detection and device initialisation parts of the specified driver
initialisation process. This also means we don't create empty device
instances for placeholder devices, reducing clutter on systems that
pre-allocate a large number, such as QEMU's AArch64 virt machine (which
provides 32).
Reviewed by: bryanv
Differential Revision: https://reviews.freebsd.org/D28070
Define the maximum numbers of segments to allow for non-page alignment
at the beginning and end of a maxphys size transfer. Also set
ccb_pathinq.maxio consistent with maxphys.
Reviewed by: imp
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D28043
on success instead of 0 to match Linux.
Imprivata binary depends on this.
Submitted by: Shunchao Hu <ankohuu_outlook.com>
Reviewed by: wulf
Differential revision: https://reviews.freebsd.org/D28218
Many I2C "compatibility" mouse devices found on touchpads continue to
return last report data in sampling mode after touch has been ended.
That results in cursor drift. Filter out such a reports with comparing
content of current report with content of previous one.
Reported by: many
Tested by: omatsuda, gllb (github.com)
Obtained from: sysutils/iichid
There is a report that reading of surface/button switch feature report
causes SYN1B7D touchpad malfunction. As specs does not require it to
be readable assume that report usages have default value on attach and
last written value during operation. Do not apply default usage values
on attachment and resume.
While here fix manpage typos and add avg@ to copyright header.
Reported by: Jakob Alvermark <jakob_AT_alvermark_DOT_net>
Reviewed by: avg
Differential revision: https://reviews.freebsd.org/D28196
This patch is a quick hack to change the internal Ethertype used
within the chip. All frames with this type are dropped silently.
This patch allows you to overwrite the factory default 0x88a8, which
is used by IEEE 802.1ad VLAN stacking.
Reviewed by: kp, philip, brueffer
Approved by: kp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24179
Try to standardize how drivers negotiate feature and the
function names
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27930
- Add and fix a few error path counters
- Improve sysctl descriptions
- Use flags consistently to determine IPv4 vs IPv6
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27926
Prior to V1, the driver would enable interrupts and then notify the
host that DRIVER_OK. Since for V1, DRIVER_OK needs to be set before
notifying the virtqueues, there may be items in the queues waiting
to be processed by the time interrupts are enabled.
This fixes a bug where the Rx queue would appear stuck, only being
usable after an interface down/up cycle.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27922
For multiqueue, we may use fewer than the provided maximum number of
queues. Try to limit allocations of the unused queues: no interrupts,
no indirect descriptors, and no taskqueues.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27921
Verify the max_virtqueue_pairs is within the range allowed by
the spec.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27920
This useful when running on hosts that support checksum offloading
but not the GUEST_TSO (LRO) feature. Or potentially, some GRO-like
support when doing forwarding.
Only enable SW LRO when the host LRO is not available since both
tends to be harmful, and difficult to enable/disable selectively
with only a single IFCAP_LRO flag.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27919
This allows the Rx checksum and LRO to be modified without a full
reinit of the device.
Remove IFCAP_RXCSUM_IPV6 from the interface capabilities since in
VirtIO Rx checksums are just enabled or disabled for all protocols.
Properly update IFCAP_LRO if LRO is becomes disabled when Rx
checksums are disabled.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27916
In modern VirtIO, the virtqueues cannot be notified before setting
DRIVER_OK status.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27932
Defer the ether_ifattach until the interface capabilities
are configured
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27913
This improves spec compliance because the driver is not suppose
to notify the device prior to setting the DRIVER_OK status, which
could happen with the VIRTIO_NET_F_CTRL_MAC_ADDR.
The VIRTIO_NET_F_MAC feature should always be negotiated so would
be a rare situation.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27910
This may have been required in an early, early, early version of the
specification but I cannot find any reference to it, and a promiscuous
default seems very odd so remove this code.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27909
This features lets the guest driver know the speed and duplex of
the "link". Instead of trying to support many media types based
on the possible/likely speeds/duplexes, only use the speed to
set the interface baudrate.
Cleanup ifmedia code to match other drivers.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27908
This feature lets the guest driver know the maximum MTU size
supported by the host device. If set, use this to limit the
acceptable MTUs, and improve how the receive mbuf cluster size
then is selected.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27907
- Fix the NEEDS_CSUM and DATA_VALID checksum flags. The NEEDS_CSUM
checksum is incomplete (partial) so offer a fallback for the driver
to calculate the checksum. Simplify DATA_VALID because we know
the host has validated the checksum.
- Default 4K mbuf clusters for mergeable buffers. May need to
scale this down to 2K clusters in certain configurations such
many queue pairs, big queues (like 4096 in GCP), and low memory.
- Use the MTU when calculated the receive mbuf cluster size
when not doing TSO/LRO. This will need more adjustment once
the MTU feature is supported.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27906
Very basic support to get packets flowing on modern QEMU but still
several conformance issues remain that will be addressed in later
commits.
First of many passes at cleaning up various accumulated cruft
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27904
Rework the header file changes from 2cc8a52 to use our
canonical upstream, Linux.
geom_disk already checks DISKFLAG_CANDELETE for BIO_DELETE
so remove an unnecessary check.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27902
This only supports the legacy virtqueue format that is now called
"Split Virtqueues". Support for the new "Packed Virtqueues" described
in v1.1 is left for a later date.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27857
Use the existing legacy PCI driver as the basis for shared code
between the legacy and modern PCI drivers. The existing virtio_pci
kernel module will contain both the legacy and modern drivers.
Changes to the virtqueue and each device driver (network, block, etc)
for V1 support come in later commits.
Update the MMIO driver to reflect the VirtIO bus method changes, but
the modern compliance can be improved on later.
Note that the modern PCI driver requires bus_map_resource() to be
implemented, which is not the case on all archs.
The hw.virtio.pci.transitional tunable default value is zero so
transitional devices will continue to be driven via the legacy
driver.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27856
And subsequent fix 576b099a.
By adding the mergable header to the vtnet_rx_header structure, the size
was increased by 2 bytes, breaking the alignment of this structure as
described the in preceding comments.
Furthermore, the mergable header does not belong the structure. With the
mergable feature, the header is placed in line with the data, so there is
no need for a separate segment, and misleading to follow the mergable
header with any padding.
The V1 header is effectively identical to mergable header, and the driver
has long supported the mergable feature. Revert this so the later changes
that add V1 support can show how V1 is derived from the existing mergable
buffers support, and to facilitate a later MFC.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27855
The context record contains key material precomputed by the driver at
session creation time. Rather than storing various components of the
context record in each session, go a bit further and store the full
context record image so that safexcel_process() can simply copy the
image into each request submitted to the hardware. This simplifies the
data path and eliminates a bunch of unnecessary conditional logic that
was getting executed for each request.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Rather than preallocating a set of requests and moving them between
queues during state transitions, maintain a shadow of the command
descriptor ring to track the driver context of each request. This is
simpler and requires less synchronization between safexcel_process() and
the ring interrupt handler.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Rather than returning a hard error in this case, return ERESTART so that
upper layers get a chance to retry the request (or drop it, depending on
the desired policy).
This case is hard to hit due to the somewhat low bound on queued
requests, but that will no longer be true after an upcoming change.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
This gives better performance in some tests than the previous policy of
statically binding each session to a ring.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
AMD 10GbE hardware is designed to have two buffers per receive descriptor to
support split header feature. For this purpose, the driver was designed to use
2 iflib freelists per receive queue. So, that buffers from 2 freelists are used
to refill an entry in the receive descriptor. The current design holds good
with regular data traffic.
But, when netmap comes into play, the current design will not fit in. The
current netmap interfaces and netmap implementation in iflib doesn't seem
to accomodate the design of 2 freelists per receive queue. So, exercising
Netmap capability with inbuilt tools like bridge, pkt-gen doesn't work with
the 2 freelists driver design.
So, the driver design is changed to accomodate the current netmap interfaces
and netmap implementation in iflib by using single freelist per receive queue
approach when Netmap capability is exercised without disturbing the current
2 freelists approach.
The dev.ax.sph_enable tunable can be set to 0 to configure the single
free list mode.
Thanks to Stephan Dewt for his Initial set of code changes for the stated
problem.
Submitted by: rajesh1.kumar_amd.com
Approved by: vmaffione
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27797
Print warning when we can't parse a console specification (this may
not appear on the console, but will appear in dmesg).
Also, accept key:value and key=value. There's no reason not to
and it makes this more forgiving of mistakes.
Reviewed by: rpokala@
Differential Revision: https://reviews.freebsd.org/D28168
usbhid(4) is disabled by default to avoid conflicts with existing USB HID
drivers. To enable it place following lines to /boot/loader.conf:
hw.usb.usbhid.enable=1
usbhid_load="YES"
Suggested by: jhb
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D28124
OpenBSD never accepted this driver, and instead wrote their
own minimal one (sys/dev/acpi/tpm.c for suspending the device).
Reviewed by: stevek, emaste
Differential Revision: D10321
This driver supports some arm and arm64 boards equipped with
"snps,dw-wdt"-compatible watchdog device.
Tested on RK3399-based board (RockPro64).
Once started watchdog device cannot be stopped.
Interrupt handler has mode to kick watchdog even when software does not do it
properly.
This can be controlled via sysctl: dev.dwwdt.prevent_restart.
Also - driver handles system shutdown and prevents from restart when system
is asked to reboot.
Submitted by: kjopek@gmail.com
Differential Revision: https://reviews.freebsd.org/D26761
Only x86 provides optimized implementations via the blake2 module. The
software "reference" implementation is already included in the crypto(4)
module, we can drop the extra MODULE_DEPEND for other platforms.
Without this change, if_wg.ko could not be loaded due to the missing
dependency.
PR: 252156
Reported by: gbe
Sponsored by: The FreeBSD Foundation
When detaching the if_ure(4) driver, the TX active USB transfer array may
point to freed USB transfers. Given that the number of USB transfers is
very low, simply start all transfers every time there is a packet to
keep safe from use-after-free.
PR: 252608
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
-pci_get_class : This function search for a matching pci device based on
the class/subclass and returns a newly created pci_dev.
- pci_{save,restore}_state : This is analogous to ours with the same name
- pci_is_root_bus : Return true if this is the root bus
- pci_get_domain_bus_and_slot : This function search for a matching pci
device based on domain, bus and slot/function concat into a single
unsigned int (devfn) and returns a newly created pci_dev
- pci_bus_{read,write}_config* : Read/Write to the config space.
While here add some helper function to alloc and fill the pci_dev struct.
Reviewed by: hselasky, bz (older version)
Differential Revision: https://reviews.freebsd.org/D27550
pci_find_class_from help finding one or multiple device matching
a class and subclass.
If the from argument is not null we will first loop in the device list
until we find the matching device and only then start to check if the
class/subclass matches.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D27549
The cgem(4) driver was updated to support 64-bit bus addressing in
facdd1cd20. However, the committed version determines this in an
un-idiomatic way. Change the compile-time conditional to check
BUS_SPACE_MAXADDR, rather than comparing int and pointer sizes.
Reported by: jrtc27
Use an interface compatible with the Linux one so that the user-space
libraries already using the Linux interface can be used without much
modifications.
This allows an open privcmd instance to limit against which domains it
can act upon.
Sponsored by: Citrix Systems R&D
Use an interface compatible with the Linux one so that the user-space
libraries already using the Linux interface can be used without much
modifications.
This allows user-space to make use of the dm_op family of hypercalls,
which are used by device models.
Sponsored by: Citrix Systems R&D
The interface is mostly the same as the Linux ioctl, so that we don't
need to modify the user-space libraries that make use of it.
The ioctl is just a proxy for the XENMEM_acquire_resource hypercall.
Sponsored by: Citrix Systems R&D
Restore the hwofs functionality temporarily disabled by
7ba6ecf216 to prevent issues with iflib.
This patch brings the necessary changes to iflib to
enable howfs to allow interface restarts without
disrupting netmap applications actively using its
rings.
After this change, it becomes possible for multiple
non-cooperating netmap applications to use non-overlapping
subsets of the available netmap rings without clashing
with each other.
PR: 252453
MFC after: 1 week
Add 64-bit address support to Cadence CGEM Ethernet driver for use in
other SoCs such as the Zynq UltraScale+ and SiFive HighFive Unleashed.
Reviewed by: philip, 0mp (manpages)
Differential Revision: https://reviews.freebsd.org/D24304
Similarly to what done for iflib in 1d238b07d5,
this patch prevents access to the krings during the interface
reset triggered by netmap_register().
MFC after: 1 week
The netmap_reset() function is meant to be called by the driver
when they initialize (or re-initialize) a hardware ring.
However, since the introduction of support for opening (in
netmap mode) a subset of the available rings, netmap_reset()
may be called multiple times on actively used rings, causing
both kring and netmap ring to transition to an inconsistent
state.
This changes improves the situation by resetting all the
indices fields of the kring to 0, as expected after the
reinitialization of a hardware ring.
PR: 252518
MFC after: 1 week
When different processes open separate subsets of the
available rings of a same netmap interface, a device
reset may be performed while one of the processes
is actively using some rings (e.g., caused by another
process executing a nmport_open()).
With this patch, such situation will cause the
active process to get a POLLERR, so that it can
have a chance to detect the situation.
We also guarantee that no process is running a txsync
or rxsync (ioctl or poll) while an iflib device reset
is in progress.
PR: 252453
MFC after: 1 week
The NVMe byte-swap routines for big-endian platforms used memcpy() to
move the unaligned 64-bit value into a temp register to byte swap it.
Instead of introducing a dependency, manually byte-swap the values in
place.
Point hat: me
The device mapping table contains sc->max_devices entries, so only
indices in [0, sc->max_devices) are valid.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27964
Previously we copied in the request into a stack-allocated structure
that could be smaller than the request size. Furthermore, we checked
the request size only after doing the copyin.
Fix this by allocating a buffer to hold the request, then copying the
buffer's contents into a command descriptor. This is a bit heavy-handed
but I expect the overhead will not be noticeable. The approach of
coping the header in first is susceptible to TOCTOU problems.
Reviewed by: imp
Reported by: maxpl0it@protonmail.com
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27963
safexcel_ring_intr() could fail to observed that sc_blocked is set after
completing all outstanding ops for a ring, in which case blocked ops
would be deferred forever.
Request structures are managed by individual rings, so move the
"blocked" flag into the per-ring state block and use the ring lock to
synchronize with safexcel_process(). Remove sc_mtx since it is now
unused.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
mtx_init() does not make a copy of the name so the buffer must be valid
for the lifetime of the driver instance. Store each ring's lock's name
in the ring structure.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
The new UAR API already offsets the UAR map pointer the mlx5en(4) is using.
While at it remove some no longer needed variables for keeping track
of the current BF offset.
This fixes a regression issue after the new UAR allocation APIs
were introduced.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Add decoding of the Device Self-test log page and the ability to start
or abort a test.
Reviewed by: imp, mav
Tested by: Muhammad Ahmad <muhammad.ahmad@seagate.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D27517
This ioctl would instantly induce a panic, likely since near inception, up
until 0861c7d3e0. Lack of previous interest in fixing it combined with
the problematic interface (exports a pointer, really a physical address)
brings us to the natural conclusion: remove it until a useful consumer
forward.
If it eventually gets resurrected, the interface should definitely not
return in this exact form and likely needs to be reimagined.
The associated KPI, efi_get_table, is left intact for the time being.
Reviewed by: imp, jrtc27
Also discussed with: brooks, jhb
Differential Revision: https://reviews.freebsd.org/D28030
Virtual PMCs could be running on multiple CPUs so this needs to be
a per-CPU value.
Submitted by: rwatson (earlier version)
Reviewed by: gnn
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D27973
When testing hwpmc on arm64 we found the counter could overflow while
reading the event count. Handle this case in the armv7 code by also
checking if the overflow bit is set and incrementing the overflow
cound as needed.
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D27969
This change include several changes as listed below all related to UAR.
UAR is a special PCI memory area where the so-called doorbell register and
blue flame register live. Blue flame is a feature for sending small packets
more efficiently via a PCI memory page, instead of using PCI DMA.
- All structures and functions named xxx_uuars were renamed into xxx_bfreg.
- Remove partially implemented Blueflame support from mlx5en(4) and mlx5ib.
- Implement blue flame register allocator.
- Use blue flame register allocator in mlx5ib.
- A common UAR page is now allocated by the core to support doorbell register
writes for all of mlx5en and mlx5ib, instead of allocating one UAR per
sendqueue.
- Add support for DEVX query UAR.
- Add support for 4K UAR for libmlx5.
Linux commits:
7c043e908a74ae0a935037cdd984d0cb89b2b970
2f5ff26478adaff5ed9b7ad4079d6a710b5f27e7
0b80c14f009758cefeed0edff4f9141957964211
30aa60b3bd12bd79b5324b7b595bd3446ab24b52
5fe9dec0d045437e48f112b8fa705197bd7bc3c0
0118717583cda6f4f36092853ad0345e8150b286
a6d51b68611e98f05042ada662aed5dbe3279c1e
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
- call pci_iov_detach() on detaching from PCI device to take care of hang
on destroying VFs after PF is down.
- disable eswitch SRIOV support right after pci_iov_detach(),
else the eswitch cleanup sometimes occur while the SRIOV flow table
is still present.
Submitted by: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Remove wi(4). pccard is going away, and wi only supports PC Card
devices, though it has a minor amount of glue to also support
PCI cards. However, removing the one without removing the other
is hard, so the whole driver is being removed.
Relnotes: Yes
PC Card support is being removed, so remove its attachment here. ndis
is slated to be removed entirely for 13, but that's not been done yet.
Relnotes: Yes
This change includes:
hpen - Generic / MS Windows compatible HID pen tablet driver.
hgame - Generic game controller and joystick driver.
xb360gp - Xbox360-compatible game controller driver.
Submitted by: Greg V <greg_unrelenting.technology>
Reviewed by: hselasky (as part of D27993)
hidmap is a kernel module that maps HID input usages to evdev events.
Following dependent drivers is included in the commit:
hms - HID mouse driver.
hcons - Consumer page AKA Multimedia keys driver.
hsctrl - System Controls page (Power/Sleep keys) driver.
ps4dshock - Sony DualShock 4 gamepad driver.
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D27993
which installs /dev/uhid# alias to hidraw character device for
compatibility with some existing uhid(4) users like Firefox.
As side effect it renames traditional uhid(4) driver to hidraw
to make possible using of common unit number allocator.
Requested by: Greg V <greg_unrelenting.technology>
Reviewed by: hselasky (as part of D27992)
This driver provides raw access to HID devices through uhid(4)-compatible
interface and is based on pre-8.x uhid(4) code. Unlike uhid(4) it does
not take devices in to monopoly ownership and allows parallel access
from other drivers.
hidraw supports Linux's hidraw-compatible interface as well.
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D27992
This allows to mark HID-device interrupt handlers as MP-SAFE.
Atomics-based lockless key event queue with swi_giant taskqueue is used
to pass key-press events into Giant-protected system console.
Reviewed by: hselasky (as part of D27991)
This change implements hid_if.m methods for HID-over-USB protocol [1].
Also, this change adds USBHID_ENABLED kernel option which changes
device_probe() priority and adds/removes PnP records to prefer usbhid
over ums, ukbd, wmt and other USB HID device drivers and vice-versa.
The module is based on uhid(4) driver. It is disabled by default for
now due to conflicts with existing USB HID drivers.
[1] https://www.usb.org/sites/default/files/hid1_11.pdf
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D27893
hidquirk(4) is derived from usb_quirk(4) and inherits all its HID-related
functionality. It does not support ioctl(2) interface yet.
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D27890
This driver provides support for multiple HID driver attachments
to single HID transport backend. This ability existed in Net/OpenBSD
(uhidev and ihidev drivers) but has never been ported to FreeBSD.
Unlike Net/OpenBSD we do not use report number alone to distinct report
source but we follow MS way and use a top level collection (TLC) usage
index that report belongs to as a location key.
The driver performs child device autodiscovery based on HID report
descriptor data, proxying of HID requests from child devices to parent
transport backends and broadcasting of interrupts in backward direction.
Differential revision: https://reviews.freebsd.org/D27888
Create an abstract HID interface that provides hardware independent
access to HID capabilities and functions through the device tree.
hid_if.m resembles existing USBHID KPI and consist of next methods:
HID method USBHID variant
-----------------------------------------------------------------------
hid_intr_setup usbd_transfer_setup (INTERRUPT IN xfer)
hid_intr_unsetup usbd_transfer_unsetup (INTERRUPT IN xfer)
hid_intr_start usbd_transfer_start (INTERRUPT IN xfer)
hid_intr_stop usbd_transfer_drain (INTERRUPT IN xfer)
hid_intr_poll usbd_transfer_poll (INTERRUPT IN xfer)
hid_get_rdesc usbd_req_get_report_descriptor
hid_read No direct analog. Not intended for common use.
hid_write uhid(4) write()
hid_get_report usbd_req_get_report
hid_set_report usbd_req_set_report
hid_set_idle usbd_req_set_idle
hid_set_protocol usbd_req_set_protocol
This change is part of D27888
Also hide shim code added in a previous commit under COMPAT_USBHID12.
Note: it is enough to add -DCOMPAT_USBHID12 to CFLAGS to compile old
code with new HID subsystem, but it is not enough to link it at runtime.
HID dependency has to be added explicitly with MODULE_DEPEND macro.
Reviewed by: manu, hselasky (as part of D27887)
This does an import of quirk stubs, debugging macros from USB code and
numerous usage constants used by dependent drivers.
Besides, this change renames some functions to get a better matching
with userland library and NetBSD/OpenBSD HID code. Namely:
- Old hid_report_size() renamed to hid_report_size_max()
- New hid_report_size() calculates size of given report rather than
maximum size of all reports.
- hid_get_data_unsigned() renamed to hid_get_udata()
- hid_put_data_unsigned() renamed to hid_put_udata()
Compat shim functions are provided in usbhid.h to make possible compile
of legacy code unmodified after this change.
Reviewed by: manu, hselasky
Differential revision: https://reviews.freebsd.org/D27887
It will be used by the upcoming HID-over-i2C implementation. Should be
no-op, except hid.ko module dependency is to be added to affected drivers.
Reviewed by: hselasky, manu
Differential revision: https://reviews.freebsd.org/D27867
It is possible that the client list lock is taken by other process for too
long due to e.g. IO timeouts. Allow user to terminate open() in this case.
Reviewed by: markj (as part of D27865)
At the beginning of evdev there was a LOR between hardware driver's and
evdev client list locks as they were taken in different order at
driver's interrupt and evdev open()/close() handlers.
The LOR was fixed with introduction of evdev_register_mtx() function
which allowed to use a hardware driver's lock as evdev client list lock.
While this works good with PS/2 and USB, this does not work with I2C.
Unlike PS/2 and USB, I2C open()/close() handlers do unbound sleeps
while waiting for I2C bus to release and while performing IO.
This change uses epoch(9) for traversing evdev client list in interrupt
handler to avoid the LOR thus making possible to convert evdev client
list lock to sleepable sx.
While here add brief locking protocol description.
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D27865
hid_locate() currently ignores all HID items which tagged as constant,
i.e. bit 0 of main item data is set to 1. See p.6.2.2.4 of
hid1_11.pdf [1]. Such an items are unconditionally treated as
byte-alignment padding. While that may be right decision for input and
output reports that is wrong for features reports. Feature reports can
contain constant capabilities e.g. 'Contact Count Maximum'.
See: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232040
Remove check for constant from hid_locate() to make possible parsing of
such a reports.
[1] https://www.usb.org/sites/default/files/documents/hid1_11.pdf
Reviewed by: hselasky
Obtained from: sysutils/iichid
Differential Revision: https://reviews.freebsd.org/D27747
Doing a 'dd' over iscsi will reliably cause stalls. Tx
cleaning _should_ reliably happen as data is sent.
However, currently if the transmit queue fills it will
wait until the iflib timer (hz/2) runs.
This change causes the the tx taskq thread to be run
if there are completed descriptors.
While here:
- make timer interrupt delay a sysctl
- simplify txd_db_check handling
- comment on INTR types
Background on the change:
Initially doorbell updates were minimized by only writing to the register
on every fourth packet. If txq_drain would return without writing to the
doorbell it scheduled a callout on the next tick to do the doorbell write
to ensure that the write otherwise happened "soon". At that time a sysctl
was added for users to avoid the potential added latency by simply writing
to the doorbell register on every packet. This worked perfectly well for
e1000 and ixgbe ... and appeared to work well on ixl. However, as it
turned out there was a race to this approach that would lockup the ixl MAC.
It was possible for a lower producer index to be written after a higher one.
On e1000 and ixgbe this was harmless - on ixl it was fatal. My initial
response was to add a lock around doorbell writes - fixing the problem but
adding an unacceptable amount of lock contention.
The next iteration was to use transmit interrupts to drive delayed doorbell
writes. If there were no packets in the queue all doorbell writes would be
immediate as the queue started to fill up we could delay doorbell writes
further and further. At the start of drain if we've cleaned any packets we
know we've moved the state machine along and we write the doorbell (an
obvious missing optimization was to skip that doorbell write if db_pending
is zero). This change required that tx interrupts be scheduled periodically
as opposed to just when the hardware txq was full. However, that just leads
to our next problem.
Initially dedicated msix vectors were used for both tx and rx. However, it
was often possible to use up all available vectors before we set up all the
queues we wanted. By having rx and tx share a vector for a given queue we
could halve the number of vectors used by a given configuration. The problem
here is that with this change only e1000 passed the necessary value to have
the fast interrupt drive tx when appropriate.
Reported by: mav@
Tested by: mav@
Reviewed by: gallatin@
MFC after: 1 month
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D27683
In the conversion into a tunable, we converted the
size of the s/g list used by the driver to be based
off of a hardcoded size of 128k rather than maxphys,
this caused performance problems for us. Revert this
to use the maxphys tunable.
Note that this constant is used to size dynamically allocated
things, and not static data structs, so this is safe.
Reviewed By: imp, kib, mav
Tested By:i dhw
Differential Revision: https://reviews.freebsd.org/D28023
Sponsored by: Netflix
Code changes in this commit were obtained from straight from OpenBSD's
uplcom.c with almost no modification, the list of chip names and USB
IDs was obtained from Linux.
Differential Revision: https://reviews.freebsd.org/D27952
Submitted by: tomli_tomli.me (Yifeng Li)
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
We increment the overflow count when receiving an overflow interrupt
with special care to check if it happens while reading the event counter.
Sponsored by: Innovate UK
It appears we must read MIB values as 2 4-byte words, lower address
first. A single 8-byte MIB read returns the value with the lower 4
bytes copied into the upper 4 bytes, resulting in bogus byte counter
values.
Reviewed by: mw
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D27870
Release a grabbed page's busy state only after marking it as referenced.
Otherwise there exists a narrow window where the page could be freed
before the update. Before r356902 this was not a problem since the
object lock was held.
Discussed with: kib
Sponsored by: The FreeBSD Foundation
When a break-to-debugger is triggered, kdb will grab the console and vt(4)
will generally switch back to ttyv0. If one issues a continue from the
debugger, then kdb will ungrab the console and the system rolls on.
This change adds a perhaps minor feature: when we're down to grab == 0 and
if vt actually switched away to ttyv0, switch back to the tty it was
previously on before the console was grabbed.
The justification behind this is that a typical flow is to work in
!ttyv0 to avoid console spam while occasionally dropping to ddb to inspect
system state before returning. This could easily enough be tossed behind
a sysctl or something if it's not generally appreciated, but I anticipate
indifference.
Reviewed by: ray
Differential Revision: https://reviews.freebsd.org/D27110
vt_allocate_keyboard only needs to unwind the effects of keyboard-grabbing,
rather than any associated vt window action that may have also happened.
Split out the bits that do the keyboard work into *_noswitch equivalents,
and use those in keyboard allocation. This will be less error-prone when a
later change will offer up different window state behavior when the console
is ungrabbed.
Reviewed by: ray
Differential Revision: https://reviews.freebsd.org/D27110
The firmware header loaded into an rsu(4) device has to be customized
to reflect device settings. The driver was overwriting the header
from the shared firmware image before sending it to the device. If
two devices attached at the same time with different settings, one
device could potentially get a corrupted header. The recent changes
in a095390344 exposed this bug in the
form of a panic as the firmware blobs are now marked read-only in
object files and mapped read-only by the kernel.
To avoid the bug, change the driver to allocate a copy of the firmware
header on the stack that is initialized before writing it to the
device.
PR: 252163
Reported by: vidwer+fbsdbugs@gmail.com
Tested by: vidwer+fbsdbugs@gmail.com
Reviewed by: hselasky, bz, emaste
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27850
The handshake timer can race with another thread sending a FIN or RST
to close a TOE TLS socket. Just bail from the timer without
rescheduling if the connection is closed when the timer fires.
Reported by: Sony Arpita Das @ Chelsio QA
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D27583
Xenstore watches received are queued in a list and processed in a
deferred thread. Such queuing was done without any checking, so a
guest could potentially trigger a resource starvation against the
FreeBSD kernel if such kernel is watching any user-controlled xenstore
path.
Allowing limiting the amount of pending events a watch can accumulate
to prevent a remote guest from triggering this resource starvation
issue.
For the PV device backends and frontends this limitation is only
applied to the other end /state node, which is limited to 1 pending
event, the rest of the watched paths can still have unlimited pending
watches because they are either local or controlled by a privileged
domain.
The xenstore user-space device gets special treatment as it's not
possible for the kernel to know whether the paths being watched by
user-space processes are controlled by a guest domain. For this reason
watches set by the xenstore user-space device are limited to 1000
pending events. Note this can be modified using the
max_pending_watch_events sysctl of the device.
This is XSA-349.
Sponsored by: Citrix Systems R&D
MFC after: 3 days
The issue was found while building cxgbe with gcc 10 (in illumos),
the array subscription check is warning us about outside the bounds
access.
See also: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
Each entry actually stores a native pointer, not a uint64_t quantity. While
we're here, go ahead and export the pointer as-is rather than converting it
to KVA. This may be more useful as consumers can map /dev/mem and observe
the entry.
For reference, see: sys/contrib/edk2/Include/Uefi/UefiSpec.h
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27669
Account for any residual bytes. This is only relevant for vnode-backed
md(4) devices.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27738
The divider table already contains the correct HW divider value, it should
not be modified by other flags such as 'CLK_DIV_ZERO_BASED'.
MFC after: 4 weeks
These drivers should have been removed along with tl(4) as part of
7c897ca91f and r347918 respectively
as these fromer made sure to only ever attach to the latter, e. g.:
<...>
static int
tlphy_probe(device_t dev)
{
if (!mii_dev_mac_match(dev, "tl"))
return (ENXIO);
<...>
If LED state is set through evdev interface, than asynchronous nature
of USB transfer callback can lead to change of order of events echoed
back to userland as it causes LED events to be echoed with some lag.
Fix that with echoing of LED events synchronously in ioctl handler.
Reviewed by: hselasky
Obtained from: sysutils/iichid
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D27750
Unlike AT keyboards, HID devices are able to send all pc105 key
states within a single report. Let evdev to transmit all key state
changes within a single report too.
Reviewed by: hselasky
Obtained from: sysutils/iichid
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D27749
Add support for LAN found on Thinkpad USB-C and Thunderbolt Gen 2
docking stations.
Submitted by: ali.abdallah@suse.com
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Hardware timestamp reporting is disabled by default as it produces many
extra events which are not handled by consumers like libinput.
Add hw.usb.wmt.timestamps=1 tunable to loader.conf to enable it.
Obtained from: sysutils/iichid
In Hybrid mode, the number of contacts that can be reported in one
report is less than the maximum number of contacts that the device
supports. For example, a device that supports a maximum of 4
concurrent physical contacts, can set up its top-level collection to
deliver a maximum of two contacts in one report. If four contact
points are present, the device can break these up into two serial
reports that deliver two contacts each.
Obtained from: sysutils/iichid
Original if_dwc driver used m_defrag as an implementation shortcut but on
1000Mb networks it affects performance. Implement multi-descriptor support for
TX path.
Tested on RK3399-Firefly, patch adds ~15% of network throughput.
Reviewed By: manu
Differential Revision: https://reviews.freebsd.org/D27520