This avoids mixing the use of two different enums which modern C
compilers warn about.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29301
For guests running under some kind of VMMs, configuration structure is
available in memory space but not I/O space.
Reported by: Yuan Rui <number201724@me.com>
MFC after: 2 weeks
Reviewed by: rpokala, bryanv, jhb
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D28818
The MSI-X resource shouldn't be assumed to be always on BAR1.
The Virtio v1.1 Spec did not specify that MSI-X table and PBA BAR has to
be BAR1 either.
Reported by: Yuan Rui <number201724@me.com>
MFC after: 2 weeks
Reviewed by: bryanv, jhb
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D28817
While here, make sure only the PF driver attempts to program the global
RSS key (with options RSS). The VF driver doesn't have access to those
device registers.
MFC after: 1 week
Sponsored by: Chelsio Communications
A repeat call will recreate the memory windows in the hardware and move
them to their last-known positions without repeating any of the software
initialization.
MFC after: 1 week
Sponsored by: Chelsio Communications
The decision whether a TCP packet is sent over IPv4 or IPv6 was
based on ethertype, which works correctly. In D27926 the criteria
was changed to checking if the CSUM_IP_TSO flag is set in the
csum-flags and then considering it to be TCP/IPv4.
However, the TCP stack sets the flag to CSUM_TSO for IPv4 and IPv6,
where CSUM_TSO is defined as CSUM_IP_TSO|CSUM_IP6_TSO.
Therefore TCP/IPv6 packets gets mis-classified as TCP/IPv4,
which breaks TSO for TCP/IPv6.
This patch bases the check again on the ethertype.
This fix will be MFC instantly as discussed with re(gjb).
MFC after: instantly
PR: 254366
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D29331
In some cases like broken hardware nvme(4) may wait minutes for
controller response before timeout. Doing so in a tight spin loop
made whole system unresponsive.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29309
Sponsored by: iXsystems, Inc.
After length decisions, we've decided that the if_wg(4) driver and
related work is not yet ready to live in the tree. This driver has
larger security implications than many, and thus will be held to
more scrutiny than other drivers.
Please also see the related message sent to the freebsd-hackers@
and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on
2021/03/16, with the subject line "Removing WireGuard Support From Base"
for additional context.
These ioctl commands aim to provide easier ways for user space
applications to enumerate existing audio devices and the node they can
potentially use.
The exchange of device lists between user space and kernel is done on
nv(9). Some ioctl commands are added to /dev/sndstat node:
- SNDSTAT_REFRESH_DEVS
- SNDSTAT_GET_DEVS
- SNDSTAT_ADD_USER_DEVS
- SNDSTAT_FLUSH_USER_DEVS
Bump __FreeBSD_version to reflect the addition of the ioctls.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D26884
Definitions inside usr.sbin/bhyve/virtio.h are thrown away.
Definitions in sys/dev/virtio are used instead.
This reduces code duplication.
Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29084
The netmap_ioctl() function has a reference counting bug in case of
NETMAP_REQ_PORT_INFO_GET command. When `hdr->nr_name[0] == '\0'`,
the function does not decrease the refcount of "nmd", which is
increased by netmap_mem_find(), causing a refcount leak.
Reported by: Xiyu Yang <sherllyyang00@gmail.com>
Submitted by: Carl Smith <carl.smith@alliedtelesis.co.nz>
MFC after: 3 days
PR: 254311
This file got resynced with OpenBSD to pick up fixes that had taken
place after the version initially ported to FreeBSD. KASSERT there is
more like MPASS here.
Reported by: David Wolfskill <david@catwhisker.org>
The RSC support feature introduced a bit field "rm_internal" in
struct rndis_pktinfo with total size unchanged.
The guest does not use this field in the tx path. However we need to
initialize it to zero in case older hosts which are not aware of this
field.
Fixes: a491581f ("Hyper-V: hn: Enable vSwitch RSC support")
MFC after: 2 weeks
Sponsored by: Microsoft
This is the culmination of about a week of work from three developers to
fix a number of functional and security issues. This patch consists of
work done by the following folks:
- Jason A. Donenfeld <Jason@zx2c4.com>
- Matt Dunwoodie <ncon@noconroy.net>
- Kyle Evans <kevans@FreeBSD.org>
Notable changes include:
- Packets are now correctly staged for processing once the handshake has
completed, resulting in less packet loss in the interim.
- Various race conditions have been resolved, particularly w.r.t. socket
and packet lifetime (panics)
- Various tests have been added to assure correct functionality and
tooling conformance
- Many security issues have been addressed
- if_wg now maintains jail-friendly semantics: sockets are created in
the interface's home vnet so that it can act as the sole network
connection for a jail
- if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0
- if_wg now exports via ioctl a format that is future proof and
complete. It is additionally supported by the upstream
wireguard-tools (which we plan to merge in to base soon)
- if_wg now conforms to the WireGuard protocol and is more closely
aligned with security auditing guidelines
Note that the driver has been rebased away from using iflib. iflib
poses a number of challenges for a cloned device trying to operate in a
vnet that are non-trivial to solve and adds complexity to the
implementation for little gain.
The crypto implementation that was previously added to the tree was a
super complex integration of what previously appeared in an old out of
tree Linux module, which has been reduced to crypto.c containing simple
boring reference implementations. This is part of a near-to-mid term
goal to work with FreeBSD kernel crypto folks and take advantage of or
improve accelerated crypto already offered elsewhere.
There's additional test suite effort underway out-of-tree taking
advantage of the aforementioned jail-friendly semantics to test a number
of real-world topologies, based on netns.sh.
Also note that this is still a work in progress; work going further will
be much smaller in nature.
MFC after: 1 month (maybe)
TSFOOR happens if a beacon with a given TSF isn't received within the
programmed/expected TSF value, plus/minus a fudge range. (OOR == out of range.)
If this happens then it could be because the baseband/mac is stuck, or
the baseband is deaf. So, do a cold reset and resync the beacon to
try and unstick the hardware.
It also happens when a bad AP decides to err, slew its TSF because they
themselves are resetting and they don't preserve the TSF "well."
This has fixed a bunch of weird corner cases on my 2GHz AP radio upstairs
here where it occasionally goes deaf due to how much 2GHz noise is up
here (and ANI gets a little sideways) and this unsticks the station
VAP.
For AP modes a hung baseband/mac usually ends up as a stuck beacon
and those have been addressed for a long time by just resetting the
hardware. But similar hangs in station mode didn't have a similar
recovery mechanism.
Tested:
* AR9380, STA mode, 2GHz/5GHz
* AR9580, STA mode, 5GHz
* QCA9344 SoC w/ on-board wifi (TL-WDR4300/3600 devices); 2GHz
STA mode
Right now ts_antenna is either 0 or 1 in each supported HAL so
this is purely a sanity check.
Later on if I ever get magical free time I may add some extensions
for the NICs that can have slightly more complicated antenna switches
for transmit and I'd like this to not bust memory.
Completions for crypto requests on port 1 can sometimes return a stale
cookie value due to a firmware bug. Disable requests on port 1 by
default on affected firmware.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26581
These fixes are only relevant for requests on the second port. In
some cases, the crypto completion data, completion message, and
receive descriptor could be written in the wrong order.
- Add a separate rx_channel_id that is a copy of the port's rx_c_chan
and use it when an RX channel ID is required in crypto requests
instead of using the tx_channel_id.
- Set the correct rx_channel_id in the CPL_RX_PHYS_ADDR used to write
the crypto result.
- Set the FID to the first rx queue ID on the adapter rather than the
queue ID of the first rx queue for the port.
- While here, use tx_chan to set the tx_channel_id though this is
identical to the previous value.
Reviewed by: np
Reported by: Chelsio QA
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29175
Receive Segment Coalescing (RSC) in the vSwitch is a feature available in
Windows Server 2019 hosts and later. It reduces the per packet processing
overhead by coalescing multiple TCP segments when possible. This happens
mostly when TCP traffics are among different guests on same host.
This patch adds netvsc driver support for this feature.
The patch also updates NVS version to 6.1 as needed for RSC
enablement.
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D29075
nvme drives are configured early in boot. However, a number of the configuration
steps takes which take a while, so we defer those to a config intrhook that runs
before the root filesystem is mounted. At the same time, the PCI hot plug wakes
up and tests the status of the card. It may decide that the card has gone away
and deletes the child. As part of that process nvme_detach is called. If this
call happens after the config_intrhook starts to run, but before it is finished,
there's a race where we can tear down the device's soft state while the
config_intrhook is still using it. Use the new config_intrhook_drain to
disestablish the hook. Either it will be removed w/o running, or the routine
will wait for it to finish. This closes the race and allows safe hotplug at any
time, even very early in boot.
Sponsored by: Netflix, Inc
Reviewed by: jhb, mav
Differential Revision: https://reviews.freebsd.org/D29006
The pwm utility cant set the only flag defined (PWM_POLARITY_INVERTED) so this
patch add the option -I (capital letter i) to send it to the drivers.
None of existing PWM driver have implemented support for flags.
But soon:ish I will put up an review of a pwm driver using TI OMAP DMTimer.
Differential Revision: https://reviews.freebsd.org/D29137
MFC after: 2 weeks
Fix the types of period and duty in share/man/man9/pwmbus.9 to match the one in sys/dev/pmw/pwmbus.c.
Reviewed By: rpokala
Differential Revision: https://reviews.freebsd.org/D29139
MFC after: 3 days
We don't need the result before next sleep time, so no reason to
additionally increase interrupt latency.
While there, remove extra PM ticks to microseconds conversion, making
C2/C3 sleep times look 4 times smaller than really. The conversion
is already done by AcpiGetTimerDuration(). Now I see reported sleep
times up to 0.5s, just as expected for planned 2 wakeups per second.
MFC after: 1 month
It has been observed that some systems are often unable to resume from
ddb after entering with debug.kdb.enter=1. Checking the status further
shows the terminal is blocked waiting in tty_drain(), but it never makes
progress in clearing the output queue, because sc->sc_txbusy is high.
I noticed that when entering polling mode for the debugger, IER_TXRDY is
set in the failure case. Since this bit is never tracked by the softc,
it will not be restored by ns8250_bus_ungrab(). This creates a race in
which a TX interrupt can be lost, creating the hang described above.
Ensuring that this bit is restored is enough to prevent this, and resume
from ddb as expected.
The solution is to track this bit in the sc->ier field, for the same
lifetime that TX interrupts are enabled.
PR: 223917, 240122
Reviewed by: imp, manu
Tested by: bz
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29130
The names are self-explanatory; these are currently only used by the
wg(8) tool, but they are handy data points to have.
Reviewed by: grehan
MFC after: 3 days
Discussed with: decke
Differential Revision: https://reviews.freebsd.org/D29143
We have no use for the udphdr or this hlen local, just spell out the
addition inline.
MFC after: 3 days
Reviewed by: grehan, markj
Differential Revision: https://reviews.freebsd.org/D29142
Some framebuffer properties obtained from the device tree were not being
properly converted to host endian.
Replace OF_getprop calls by OF_getencprop where needed to fix this.
This fixes boot on PowerPC64 LE, when using ofwfb as the system console.
Reviewed by: bdragon
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27475
The kernel-side already accepted a persistent-keepalive-interval, so
just add a verb to ifconfig(8) for it and start exporting it so that
ifconfig(8) can view it.
PR: 253790
MFC after: 3 days
Discussed with: decke
No sleeping allowed here, so avoid it. Collect the subset of data we
want inside of the epoch, as we'll need extra allocations when we add
items to the nvlist.
Reviewed by: grehan (earlier version), markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29124
This partially reverts df55485085 but still fixes the leak. It was
overlooked (sigh) that some packets will exceed MHLEN and cannot be
physically contiguous without clustering, but we don't actually need
it to be. m_defrag() should pull up enough for any of the headers that
we do need to be accessible.
Fixes: df55485085
Pointy hat; kevans
When we enter C2+ state via memory read, it may take chipset some
time to stop CPU. Extra register read covers that time. But MWAIT
makes CPU stop immediately, so we don't need to waste time after
wakeup with interrupts still disabled, increasing latency.
On my system it reduces ping localhost latency, waking up all CPUs
once a second, from 277us to 242us.
MFC after: 1 month
Even though the information is very limited, it seems the intent of
this flag is to control ACPI_BITREG_BUS_MASTER_STATUS use for C3,
not force ACPI_BITREG_ARB_DISABLE manipulations for C2, where it was
never needed, and which register not really doing anything for years.
It wasted lots of CPU time on congested global ACPI hardware lock
when many CPU cores were trying to enter/exit deep C-states same time.
On idle 80-core system it pushed ping localhost latency up to 20ms,
since badport_bandlim() via counter_ratecheck() wakes up all CPUs
same time once a second just to synchronously reset the counters.
Now enabling C-states increases the latency from 0.1 to just 0.25ms.
Discussed with: kib
MFC after: 1 month
This structure is shared among multiple instances of a driver, so we
should ensure that it doesn't somehow get treated as if there's a
separate instance per interface. This is especially important for
software-only drivers like wg.
DEVICE_REGISTER() still returns a void * and so the per-driver sctx
structures are not yet defined with the const qualifier.
Reviewed by: gallatin, erj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29102