Plug a theoretical memory/refcount leak when adding a vxlan rule.
This is not currently an actual leak, but it could become one.
PR: 287945
Reviewed by: kib
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D51883
The TLS RX context had the tcp sequence number of next TLS record set
in resync_tcp_sn parameter instead of in next_record_tcp_sn parameter
during hardware initialization. This prevent the hardware from
synchronizing with the TLS stream, and caused TLS offload to remain
inactive. Set next_record_tcp_sn to the next TCP sequence number and
resync_tcp_sn to zero to enable proper TLS record boundary detection
and activate hardware offload.
Reviewed by: kib, slavash
Sponsored by: NVidia networking
MFC after: 1 week
Upon collecting tls information, kernel calls driver to get driver/hw
tls state. Driver calls hw to get its tracking and authentication
states, and dump them into the driver state buffer. This requires a
sleep to wait for the hw response.
Reviewed by: kib
Sponsored by: NVidia networking
When encountering a failed NIC, the mlx5 driver will wait up to 120
secs for the firmware to respond. This timeout is absurdly huge, and
leads to boot times of 40 minutes to over an hour on our servers when a
NIC fails. This is because the driver will attempt to attach to the
failed NIC multiple times (once for each driver loaded after mlx5),
and wait 2 minutes on each attempt. This happens because the mlx5
driver is still the best match for the device. This delay then
triggers watchdog timeouts in our environment, rendering servers
with a failed NIC entirely unbootable without manual intervention.
Note that FW_INIT_WARN_MESSAGE_INTERVAL must also be decreased, as
it must be less than the init timeout.
Reviewed by: kib (initial version, before reducing warn interval)
Sponsored by: Netflix
Only export the array of ID names if either _WANT_SFF_8024_ID or
_WANT_SFF_8472_ID is defined. Exporting them unconditionally can
trigger unused variable warnings if a consumer doesn't use the array.
Reviewed by: olce, bz, brooks
Differential Revision: https://reviews.freebsd.org/D49955
Replace cable type detection with connector type for more accurate media
type selection. The connector type is queried directly from the PTYS
register and provides more reliable information about the physical port
type compared to cable type.
Reviewed by: slavash
Sponsored by: NVidia networking
MFC after: 1 week
Without the wait, mlx5e_destroy_rq() might free mbuf that is passed up
to the network stack on receive in mlx5e_poll_rx_cq().
Sponsored by: NVidia networking
MFC after: 1 week
This is needed to accomodate more data segments in wqes for 64K receive
mbuf chains.
Reviewed by: Ariel Ehrenberg <aehrenberg@nvidia.com>, Slava Shwartsman <slavash@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
Define it as the size of the single data segment in wqe.
Reviewed by: Ariel Ehrenberg <aehrenberg@nvidia.com>, Slava Shwartsman <slavash@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
The belief is that the 7*MCLBYTES limit was set to not hit the segment
limit for wqe busdma tag. But with the current mbuf allocator it is not
possible, and even if it was, the corresponding wqe fill would simply
fail.
Reviewed by: Ariel Ehrenberg <aehrenberg@nvidia.com>, Slava Shwartsman <slavash@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
Since the times the driver accepts s/g receive buffers, there is no
sense in trying to use pre-existing mbuf clusters sizes. The only
possible optimization is to use full page size if wqe size is greater
than MCLBYTES.
Reviewed by: Ariel Ehrenberg <aehrenberg@nvidia.com>, Slava Shwartsman <slavash@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
If the NIC is capable, just pass the full packet size, including L2/L3
headers, as the segment size. Otherwise, decrement the number of
strides by 1 to left the space for L2/IP headers, as it was done before.
But do the arithmetic on the segment number instead of the full packet
size.
Reviewed by: Ariel Ehrenberg <aehrenberg@nvidia.com>, Slava Shwartsman <slavash@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
This alone does not make hw lro configurable by sysctl, it only removes
unneeded complications for users to access it.
Reviewed by: Ariel Ehrenberg <aehrenberg@nvidia.com>, Slava Shwartsman <slavash@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
Use the correct device pointer to obtain the domain set for memory
allocation. Previously, the functions were incorrectly using the arg
parameter directly instead of accessing mlx5_core_dev.
Signed-off-by: Slava Shwartsman <slavash@nvidia.com>
Sponsored by: NVidia networking
MFC after: 1 week
The ipsec offload infra requires the EOPNOTSUPP error from driver to
understand that the SA is valid but offload cannot be performed.
Sponsored by: NVidia networking
In 4cc5d081d8, a change was introduced that manipulated
drv_ioctl_data->reqcap using IFCAP2 bits. This was noticed
when creating a mixed lagg with mce0 and ixl0 caused the
interfaces' txcsum caps to be disabled.
Fixes: 4cc5d081d8
Reviewed by: glebius
Sponsored by: Netflix
MFC After: 7 days
ipv6 flow tables were not connected to previous FS tables.
Created an additional table to serve as IPsec RX root.
This table has 2 rules for redirecting the received packets
to ipv4/ipv6 based on the IP family in the packet header.
Sponsored by: NVidia networking
Driver defined all flow context actions in MLX5_FLOW_CONTEXT_ACTION_*,
no need to duplicate them with mlx5_rule_fwd_action.
Sponsored by: NVidia networking
MFC after: 1 week
Change POOL_NEXT_SIZE define value from 0 to BIT(30), since this define
is used to request the available maximum sized flow table, and zero doesn't
make sense for it, whereas many places in the driver use zero explicitly
expecting the smallest table size possible but instead due to this
define they end up allocating the biggest table size unawarely.
Sponsored by: NVidia networking
Align the code of fdb steering with flow steering core
and add missing parts in namespace initialization and
in prio logic
PR: 281714
Sponsored by: NVidia networking
Formally, there are 12 bits for TCP header flags.
Use the accessor functions in more (kernel) places.
No functional change.
Reviewed By: cc, #transport, cy, glebius, #iflib, kbowling
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47063
Remove the array of port module status and instead save module status
and module number.
At boot, for each PCI function driver get event from fw about module
status. The event contains module number and module status. Driver
stores module number and module status.. When user (ifconfig) ask for
modules information, for each pci function driver first queries fw to
get module number of current pci function, then driver compares the
module number to the module number it stored before and if it matches
and module status is "plugged and enabled" then driver queries fw for
the eprom information of that module number and return it to the
caller.
In fact fw could have concluded that required module number of the
current pci function, but fw is not implemented this way. current
design of PRM/FW is that MCIA register handling is only aware of
modules, not the pci function->module connections. FW is designed to
take the module number written to MCIA and write/read the content
to/from the associated module's EPROM.
So, based on current FW design, we must supply the module num so fw
can find the corresponding I2C interface of the module to write/read.
Sponsored by: NVidia networking
MFC after: 1 week
Ensure all allocated tags have a hardware context associated.
The hardware context allocation is moved into the zone import
routine, as suggested by kib. This is safe because these zone
allocations are always done in a sleepable context.
I have removed the now pointless num_resources tracking,
and added sysctls / tunables to control UMA zone limits
for these tls tags, as well as a tunable to let the
driver pre-allocate tags at boot.
MFC after: 2 weeks
Only ever set the capabilities bits if kernel options are enabled.
Check for hardware capabilities before setting software bits.
Sponsored by: NVidia networking
MFC after: 1 week
Under massive connection thrashing (web server restarting), we see
long periods where the web server blocks when enabling ktls offload
when NIC ktls offload is enabled.
It turns out the driver uses a single-threaded linux work queue to
serialize the commands that must be sent to the nic to allocate and
free tls resources. When freeing sessions, this work is handled
asynchronously. However, when allocating sessions, the work is handled
synchronously and the driver waits for the work to complete before
returning. When under massive connection thrashing, the work queue is
first filled by TLS sessions closing. Then when new sessions arrive,
the web server enables kTLS and blocks while the tens or hundreds of
thousands of sessions closes queued up are processed by the NIC.
Rather than using the work queue to open a TLS session on the NIC,
switch to doing the open directly. This allows use to cut in front of
all those sessions that are waiting to close, and minimize the amount
of time the web server blocks. The risk is that the NIC may be out of
resources because it has not processed all of those session frees. So
if we fail to open a session directly, we fall back to using the work
queue.
Differential Revision: https://reviews.freebsd.org/D47260
Sponsored by: Netflix
Reviewed by: kib