Hardware TCP LRO results in problems in settings with IP forwarding
being enabled. In case of nodes without IP forwarding, using
software LRO is also beneficial in general, since it can provide better
information about what was received on the wire.
Therefore, disable hardware TCP LRO by default.
By tuning the loader tunable, this can be changed.
PR: 263229
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D52684
(cherry picked from commit 6e4b811009d63f33c59d51f28fd4a030ca90843e)
Enable the handling of the IFCAP_RXCSUM_IPV6 handling by handling
IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6 as a pair. Also make clear, that
software and hardware LRO require receive checksum offload.
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D52682
(cherry picked from commit eaf619fddcb21859311b895a0836da3171a01531)
When ALTQ is enabled, this driver does "hardware" accounting and soft
accounting at the same time. Prefer the "hardware" one to make the logic
simpler.
Reviewed by: zlei
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D44817
(cherry picked from commit 2a346c8993cbb92a321a7c25bd9ac4dcaae352d1)
While here, advertise the IFCAP_HWSTATS capability to avoid the net
stack from double counting it.
Co-authored-by: zlei
Reviewed by: zlei
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D44816
(cherry picked from commit a14d561e58529c9686a2efc47f4828ad82026e63)
When transmitting a packet over the vtnet interface, map the
csum flags CSUM_DATA_VALID | CSUM_PSEUDO_HDR to the virtio
flag VIRTIO_NET_HDR_F_DATA_VALID.
When receiving a packet over the virtio network channel, translate
the virtio flag VIRTIO_NET_HDR_F_NEEDS_CSUM not to CSUM_DATA_VALID |
CSUM_PSEUDO_HDR, but to CSUM_TCP, CSUM_TCP_IPV6, CSUM_UDP, or
CSUM_UDP_IPV6.
The second change fixes a series of issue related to checksum
offloading for if_vtnet.
While there, improve the stats counters to allow a detailed view
on what is going on in relation to checksum offloading.
PR: 165059
Reviewed by: tuexen, manpages
Differential Revision: https://reviews.freebsd.org/D51686
(cherry picked from commit 3008f30d2c2cabdd7e17f7fb922139da8681ffbd)
Fix the aggregation of the interface level counters
* dev.vtnet.X.tx_task_rescheduled,
* dev.vtnet.X.tx_tso_offloaded,
* dev.vtnet.X.tx_csum_offloaded,
* dev.vtnet.X.rx_task_rescheduled,
* dev.vtnet.X.rx_csum_offloaded, and
* dev.vtnet.X.rx_csum_failed.
Also ensure that dev.vtnet.X.tx_defrag_failed only counts the number
of times m_defrag() fails.
While there, mark sysctl-variables used for exporting statistics as
such (CTLFLAG_STATS).
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D51999
(cherry picked from commit 03da4395158d374b5e38623f6744ce31302b530c)
Originally ixgbe_if_update_admin_status() only handled 1G and 10G speeds,
causing any other speeds to display as "1 Gbps" in link status logs.
This issue is fixed by adding link speed to string conversion logic through
the introduction of a helper function, ixgbe_link_speed_to_str(), which
corrects the misleading logs to reflect accurate link speeds.
Signed-off-by: Yogesh Bhosale yogesh.bhosale@intel.com
PR: 288960
Reported by: Mike Belanger - QNX
Differential Revision: https://reviews.freebsd.org/D52442
(cherry picked from commit 46347b3619757e3d683a87ca03efaf2ae242335f)
We can do so trivially, so make these tables read-only. No functional
change intended.
Reviewed by: cem, emaste
MFC after: 2 weeks
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D52003
(cherry picked from commit d5f55356a2fbf8222fb236fe509821e12f1ea456)
Add PNP info so it the module can be by devmatch(8) and automatically
loaded. On non-x86 platforms it is not included in GENERIC.
Reviewed by: imp
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D52557
(cherry picked from commit e13b5298ec87be03da2231bc7b44a6a4b976b850)
It may pass packets up the stack and so needs to be called in a network
epoch. When a watchdog timeout happens, we need to enter a section
explicitly.
Reviewed by: zlei, glebius, adrian
MFC after: 2 weeks
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D51885
(cherry picked from commit b653a281f5a977ba73b3d405874f8af8e8b6b50d)
When MSI or legacy interrupt is used driver controls wheter
queues can trigger an interrupt with the Interrupt Linked List.
While processing traffic first index of the list is set to EOL
value to stop queues from triggering interrupts. This index was
not reset to the correct value when driver attempted to re-enable
interrupts from queues, what prevented driver from processing any
traffic. Fix that by setting correct first index in the
ixl_if_enable_intr function.
While at that fix the comments style and make ixl_if_enable_intr
and ixl_if_disable_intr more consistent.
Signed-off-by: Krzysztof Galazka <krzysztof.galazka@intel.com>
PR: 288077
Suggested by: Mike Belanger <mibelanger@qnx.com>
Approved by: kbowling (mentor), erj (mentor)
Tested by: gowtham.kumar.ks_intel.com,
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D51331
(cherry picked from commit 6f41c1fc39d9fa9db989a7b4f325c3ab85b8fb45)
Summary:
These came in the original DrvAPI commits in 2014, and are obsoleted by
bpf_mtap_if() and ether_bpf_mtap_if(). The `_if` suffix, rather than
prefix, conveys that it's operating on the bpf of the interface, instead
than the interface itself.
Reviewed by: glebius
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D41146
(cherry picked from commit 2a3716432d209c5fef1eb1a719f4c1914e7c8b5a)
Include opt_inet.h and opt_inet6.h early in the files including
virtio_net.h, since they use INET and/or INET6.
While there, remove redundant inclusion of sys/types.h, since it is
included already by sys/param.h.
There was a discussion to include opt_inet.h and opt_inet6.h also
in virtio_net.h. glebius suggested to add a mechanism for files
to check, if required opt_*.h files were included. virtio_net.h
will be the first consumer of this mechanism.
Reviewed by: glebius, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52046
(cherry picked from commit 3077532b1bb2911d3012ee90bae9d9499c960569)
Remove an always-false check for whether the request has already
completed before sleeping. Even if the request is complete, the
response tag is updated while holding the channel lock, which is also
held here.
No functional change intended.
Sponsored by: Klara, Inc.
(cherry picked from commit 28c9b13b236d25512cfe4e1902411ff421a14b64)
Replace priorities specified by a base priority and some hardcoded
offset value by symbolic constants. Hardcoded offsets prevent changing
the difference between priorities without changing their relative
ordering, and is generally a dangerous practice since the resulting
priority may inadvertently belong to a different selection policy's
range.
Since RQ_PPQ is 4, differences of less than 4 are insignificant, so just
remove them. These small differences have not been changed for years,
so it is likely they have no real meaning (besides having no practical
effect). One can still consult the changes history to recover them if
ever needed.
No functional change (intended).
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45390
(cherry picked from commit 8ecc41918066422d6788a67251b22d11a6efeddf)
This was recently added to Linux to improve incremental update support,
as you could previously add Allowed-IPs but not remove without replacing
the whole set (and thus, potentially disrupting existing traffic).
Removal is incredibly straightforward; we'll find it in p_aips first
to ensure that it's actually valid for this peer, then we'll delete it
from the radix tree before we remove the corresponding p_aips entry.
Reviewed by: Jason A. Donenfeld, jhb
(cherry picked from commit d15d610fac97df4fefed3f14b31dcfbdcec65bf9)
(cherry picked from commit d1ac3e245f084ee0637bde9a446687621358c418)
We'll re-use these in a future wg_aip_del() to perfectly reconstruct
what we expect to find in a_addr/a_mask.
Reviewed by: ivy, markj (both earlier version), Aaron LI, jhb
(cherry picked from commit 2475a3dab0d5c5614e303c0022a834f725e2a078)
The only difference in the wg_aip_add() call after IP validation is the
address family. Just pull that out into a variable and avoid the two
different callsites for wg_aip_add(). A future change will add a new
call for each case to remove an address from the peer, so it's nice to
avoid needing to repeat the logic for two different branches.
Reviewed by: Aaron LI, Jason A. Donenfeld, ivy, jhb, markj
(cherry picked from commit ba2607ae7dff17957d9e62ccd567ba716c168e77)
This was broken in c63d67e137f3, the early returns prevent building the
media lists as expected.
The BASE-T parts of the patch were suggested by "cyric@mm.st", while I
am adding the additional 40G AOC, 1CX, autoneg and unknown PHY fixes
based on code inspection. There may be additional work left here for
Broadcom but this is certainly better than the returns.
PR: 287395
Reported by: mickael.maillot@gmail.com, cyric@mm.st
Tested by: Einar Bjarni Halldórsson <einar@isnic.is>
(cherry picked from commit 5e6e4f752833acc96f1efc893318d3f6b74b9689)
The header file might be included after linux/stddef.h or others are
included and the macros would be re-defined.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D50156
(cherry picked from commit 152e6197615570e7a2f5f1c6c2ed00ecee9dd10c)
In order to be able to use MODULE_DEVICE_TABLE() with multiple bus
attachments, factor out the bus-specfic MODULE_PNP_INFO() and place
it next to the structure defining the table.
As it turns out bnxt(4) has been using the MODULE_DEVICE_TABLE() with
PCI attachments for the "auxillary" bus so far. That makes little sense.
Define the MODULE_PNP_INFO() to nothing for that. We may consider
pulling these LinucKPI bits in semi-native drivers into LinuxKPI
one day as that route is not really sustainabke.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp, dumbbell
Differential Revision: https://reviews.freebsd.org/D51049
(cherry picked from commit 2f5666c1727c949491f73e6c3277b7b542131714)
Otherwise we can end up with a lost interrupt, causing lost request
completion wakeups and hangs in the filesystem layer.
Continue processing until we enable interrupts and then observe an empty
queue, like other virtio drivers do.
Sponsored by: Klara, Inc.
In (unknown) situations it seems the i2c bus can have trouble,
while nothing about the current link state has changed, the driver
would react by going into a link down state, and start busylooping
on up to 4 cores. Even if there was a valid link, such spinning
on a cpu by a kernel thread would wreak havoc to existing and
new connections.
This patch does the following:
1. If such a bus failure occurs, we keep the last known link state.
2. Prevent busy looping by implementing the lockmgr() facility to
be able to sleep while the i2c code waits on the i2c ISR. We cap
this with a timeout.
3. Pin the admin queues to the last CPU in the system, to prevent
other scenarios where busy looping might occur from landing on CPU
0, which especially seems to cause a lot of issues.
Given the design constraints both in hardware and in software,
the lockmgr() seems to be the only viable option, even though
FreeBSD explicitly forbids sleeping in callout context, but
fails to explain why this is or offer alternatives.
axgbe: revert allocating admin queues to last CPU
The issue was resolved in 52454a1e5b.
Scheduled threads such as CARP are now no longer pinned to CPU 0, making sure
they always get their time slice even if CPUs are blocked.
Since the I/O expander chip does not do a reset when soft power
cycling, the driver will first turn off all LEDs when initializing,
although no specific routine seems to be called when powering down.
This means that the LEDs will stay on until the driver has booted up,
after which the driver will be in a consistent state.
Initially, RSF (Receive Queue Store and Forward) was disabled for
unknown reasons, but the cut-through mode that's enabled as a result
seems to send 0 length packets up to the DMA when the RX queue is
full.
Since the iflib interface needs axgbe_pci_init() and its phy starting capabilities, no data was passed in its absence.
With the NULL check of the axgbe_miibus we also resort back to an MDIO read as a module might be capable of both
clause 22 and clause 45 methods of communication.
with the move of phy_stop() to if_detach() in d50d4e8cd4, it's better to prevent reconfiguring the phy should the pci_init() callout trigger more than once.
Within the code path of autonegotiation for gigabit SFP modules was a bug, causing
a report of LINK_ERR for cases where an external SFP PHY was present. Fixing this issue
did not resolve to a link however, as it turned out that while autonegotiation interrupts
were happening, it's resulting status cannot be correctly determined in all cases. In these
specific cases we have no other option than to assume a module has negotiated to 1Gbit/s.
PHY-specific configuration has been delegated to the miibus driver, if an external PHY is present.
It's possible that the i2c bus does not recognize a PHY on the first pass, so in all cases we
retry up to a maximum of 5 times during each link poll pass to ensure we didn't miss the presence
of an external PHY.
This commit also addresses link issues on both 100 mbit and 1Gb fiber modules. Not all of these modules
have the correct data set according to SFF-8472, as such we first check for gigabit compliance and
the associated baudrate, otherwise we resort back to determining what type of fiber module is plugged
in by checking the baudrate, cable length and wavelength and setting the MAC speed accordingly.
It is possible for a machine to boot into a state in which the configuration register,
responsible for controlling wether an I/O signal is considered an input or output,
contains randomized values. It was assumed this was programmed by the BIOS.
If I/O is reversed, it's possible for the driver to think an SFPP module has been inserted
when there is none, leading to unrecoverable I2C errors.
The configuration register should contain a state which is determined and provided by the BIOS,
hence no hard-coded values are programmed here.
The current addition to the interrupt nesting level in
xen_arch_intr_handle_upcall() needs to be compensated in
xen_intr_handle_upcall(), otherwise interrupts dispatched by the upcall handler
end up seeing a td_intr_nesting_level of 2 or more, which makes them assume
there's been an interrupt nesting.
Such extra interrupt nesting count lead to statclock() reporting idle time as
interrupt, as the call from interrupt context will always be seen as a nested
one (td->td_intr_nesting_level >= 2) due to the nesting count increase done by
both xen_arch_intr_handle_upcall() and intr_execute_handlers().
Fix this by adjusting the nested interrupt count before dispatching interrupts
from xen_intr_handle_upcall().
PR: 277231
Reported by: Matthew Grooms <mgrooms@shrew.net>
Fixes: af610cabf1 ('xen/intr: adjust xen_intr_handle_upcall() to match driver filter')
Sponsored by: Cloud Software Group
Reviewed by: Elliott Mitchell <ehem+freebsd@m5p.com>
When executing `ifconfig -v` this will lead to stalls for a second per
interface due to the timeout being set to a static 10 without a module
placed, this patch makes sure this is only allowed once per insertion.
Build and sysctl configuration modes are introduced for QAT SPR
devices to disable safe dc mode. A new QAT driver build option
‘QAT_DISABLE_SAFE_DC_MODE’ is required to build the QAT driver
with code that allows a request to be sent to FW to override the
‘History Buffer’ mitigation. Default QAT driver builds do not
include this ‘QAT_DISABLE_SAFE_DC_MODE’ build option. Even if the
QAT driver was built with code that allows a request to be sent to
FW to override the ‘History Buffer’ mitigation, the QAT driver must
still be configured using sysctl to request an override of the
‘History Buffer’ mitigation if desired. The default QAT driver
configuration option sysctl dev.qat.X.disable_safe_dc_mode does not
allow override of the mitigation. The new sysctl attribute
disable_safe_dc_mode is to be set to 1 for overriding the history
buffer mitigation. Firmware for qat_4xxx is updated for this change.
If this mode is enabled, decompression throughput increases but may
result in a data leak if num_user_processes is more than 1.
This option is to be enabled only if your system is not prone to
user data leaks.
Reviewed by: markj, ziaee
MFC after: 2 weeks
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D50379
(cherry picked from commit 5a8e5215cef0dac1115853889e925099f61bb5fa)
The justification is the same as in commit
fb876eef219e ("e1000: Fix some issues in em_newitr()").
Reviewed by: kbowling
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50548
(cherry picked from commit ef062029ceffacb6bde3a5639a2bd8c4d59ca1df)
The justification is the same as in commit
a5b5220b1807 ("e1000: Initialize helper variables in em_newitr()").
Reviewed by: kbowling
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50547
(cherry picked from commit d6a9f49185797c6b67e517a3d83ef63436c8d4f3)
- Load packet and byte counters exactly once, as they can be
concurrently mutated.
- Rename bytes_packets to bytes_per_packet, which seems clearer.
- Use local variables that have the same types as the counter values,
rather than truncating unsigned long to u32.
Reviewed by: kbowling
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50416
(cherry picked from commit 731c145612dd6ffe457a562959a5c027acf13334)
Due to races with the threaded transmit and receive paths, it's possible
to have r/tx_bytes != 0 && r/tx_packets == 0, in which case the maximum
byte count could be left uninitialized. Initialize them to zero to
handle this case.
PR: 286819
Reviewed by: kbowling
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50416
(cherry picked from commit e0267657f3965a56d877075fe3d4d41b8afb2faf)
Device ID for E830-XXV adapters was changed from 12D3
to 12DE. Update driver accordingly and bump version
number.
Also remove subdevice id for E830-XXV-4 for OCP 3.0,
which was cancelled.
Signed-off-by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Approved by: kbowling (mentor), erj (mentor)
Tested by: Gowthamkumar K S <gowtham.kumar.ks@intel.com>
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D50327
(cherry picked from commit 0fed8828c95a9d2cbcb43147ff851ca6f2c21d0f)