The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
Previously private to t4_sge.c, this allows other parts of the driver
(such as NIC TLS) to use these helpers directly.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D38578
If parse_pkt returns EINPROGRESS, return from cxgbe_transmit
without queueing the packet in a txq. Use this to move the call
to ethofld_transmit for packet pacing into parse_pkt.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D38577
Prior to Conrad's changes to replace session integer IDs with a
pointer to the driver-specific state in commit 1b0909d51a, the
driver had to find the softc pointer from the adapter before it could
locate the ccr_session structure for a completed request. Since
Conrad's changes, the ccr_session pointer can now be obtained directly
from the crp. Add a backpoint from ccr_session back to ccr_softc and
use this in place of the ccr_softc member in cxgbe's struct adapter.
Sponsored by: Chelsio Communications
This is mostly cosmetic, but it also doesn't leave a gap of time where
no structures are valid. Instead, we permit the ISR to continue to
use the previous structure if the write to update cal_current isn't
yet visible.
Reviewed by: gallatin
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D36669
This uses fixed-point math already used elsewhere in the kernel for
sub-second time values. To avoid overflows this does require updating
the calibration once a second rather than once every 30 seconds. Note
that the cxgbe driver already queries multiple registers once a second
for the statistics timers. This version also uses fewer instructions
with no branches (for the math portion) in the per-packet fast path.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D36663
Chelsio has always been recording a timestamp in the mbuf (rcv_tstmp) but
not setting the M_TSTMP bit in the mbuf flags. This is because the timestamp
was just the free running 60bit clock. This change fixes that so that
we keep a synchronization by periodically (every 30 seconds after startup)
getting the timestamp and the current nanosecond time. We always keep
several sets around and the current one we always keep the current pair
and the previous pair of timestamps. This allows us to setup a ratio
between the two so we can correctly translate the time. Note that
we use special care to split the timestamp into seconds (per the clock tick)
and nanoseconds otherwise 64bit math would overflow.
Reviewed by: np
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36315
hw.cxgbe.cong_drop=2 will generate backpressure *and* drop frames for
queues that are congested.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
This could only happen on systems with PAGE_SIZE < 4K and FreeBSD
doesn't support such systems.
Reviewed by: np, imp, jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D36321
A COP (Connection Offload Policy) rule can now specify that the tx
and/or rx queue for a new tid should be selected in a round-robin
manner. There is no change in default behavior.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D34921
* New error_flags that can be used from the error ithread and elsewhere
without a synch_op.
* Stop the adapter immediately in t4_fatal_err but defer most of the
rest of the handling to a task. The task is allowed to sleep, unlike
the ithread. Remove async_event_task as it is no longer needed.
* Dump the devlog, CIMLA, and PCIE_FW exactly once on any fatal error
involving the firmware or the CIM block. While here, dump some
additional info (see dump_cim_regs) for these errors.
* If both reset_on_fatal_err and panic_on_fatal_err are set then attempt
a reset first and do not panic the system if it is successful.
MFC after: 1 week
Sponsored by: Chelsio Communications
The default sysctl context setup by newbus for a device is eventually
freed by device_sysctl_fini, which runs after the device driver's detach
routine. sysctl nodes associated with this context must not use any
resources (like driver locks, hardware access, counters, etc.) that are
released by driver detach.
There are a lot of sysctl nodes like this in cxgbe(4) and the fix is to
hang them off a context that is explicitly freed by the driver before it
releases any resource that might be used by a sysctl.
This fixes panics when running "sysctl dev.t6nex dev.cc" in a tight loop
and loading/unloading the driver in parallel.
Reported by: Suhas Lokesha
MFC after: 1 week
Sponsored by: Chelsio Communications
Modify the GPIO pins only on the Base-T cards and even there drive all
of them low instead of putting them in hi-z state. For the rest (this
is the common case), directly power off the PLLs of the high speed
serdes. This is the simplest method that does not involve or conflict
with the firmware but still works with all T4-T6 cards regardless of
what's plugged into the port.
This fixes a problem where the peer wouldn't always see a link down if
it is connected to the device using a -CR4 copper cable.
MFC after: 3 weeks
Sponsored by: Chelsio Communications
Move the type and function pointers for operations on existing send
tags (modify, query, next, free) out of 'struct ifnet' and into a new
'struct if_snd_tag_sw'. A pointer to this structure is added to the
generic part of send tags and is initialized by m_snd_tag_init()
(which now accepts a switch structure as a new argument in place of
the type).
Previously, device driver ifnet methods switched on the type to call
type-specific functions. Now, those type-specific functions are saved
in the switch structure and invoked directly. In addition, this more
gracefully permits multiple implementations of the same tag within a
driver. In particular, NIC TLS for future Chelsio adapters will use a
different implementation than the existing NIC TLS support for T6
adapters.
Reviewed by: gallatin, hselasky, kib (older version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31572
When a PDU with an error (bad padding, header digest, or data digest)
is received, log the error via ICL_WARN() and then reset the
connection via the ic_error callback.
While here, add per-rxq counters for errors.
Sponsored by: Chelsio Communications
ISO can be disabled before establishing a connection by setting
dev.tNnex.N.toe.iso to 0.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31223
The driver used to configure all available classes with some default
parameters on attach and the rest of t4_sched.c was written with the
assumption that all traffic classes are always valid in the hardware.
But this resulted in a lot of informational messages being logged in the
firmware's circular log, crowding out other more useful messages.
This change leaves the tx scheduler alone during attach to reduce the
spam in the devlog. The state of every class is now tracked separately
from its flags and there is support for an 'uninitialized' state.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
The NIC TLS and TOE TLS modes in cxgbe(4) both work with TLS key
contexts. Previously, TOE TLS supported TLS key contexts created by
two different methods, and NIC TLS had a separate bit of code copied
from NIC TLS but specific to KTLS. Now that TOE TLS only supports
KTLS, pull common code for creating TLS key contexts and programming
them into on-card memory into t4_keyctx.c.
Sponsored by: Chelsio Communications
- Process the list of local IPs once instead of once per adapter. Add
addresses from all VNETs to the driver's list but leave hardware
updates for later when the global VNET/IFADDR list locks have been
released.
- Add address to the hardware table synchronously when a CLIP entry is
requested for an address that's not already in there.
- Provide ioctls that allow userspace tools to manage addresses in the
CLIP table.
- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
automatically added to the CLIP table or not.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Add suspend/resume callbacks to the driver and a live reset built around
them. This commit covers the basic NIC and future commits will expand
this functionality to other stateful parts of the chip. Suspend and
resume operate on the chip (the t?nex nexus device) and affect all its
ports. It is not possible to suspend/resume or reset individual ports.
All these operations can be performed on a running NIC. A reset will
look like a link bounce to the networking stack.
Here are some ways to exercise this functionality:
/* Manual suspend and resume. */
# devctl suspend t6nex0
# devctl resume t6nex0
/* Manual reset. */
# devctl reset t6nex0
/* Manual reset with driver sysctl. */
# sysctl dev.t6nex.0.reset=1
/* Automatic adapter reset on any fatal error. */
# hw.cxgbe.reset_on_fatal_err=1
Suspend disables the adapter (DMA, interrupts, and the port PHYs) and
marks the hardware as unavailable to the driver. All ifnets associated
with the adapter are still visible to the kernel but operations that
require hardware interaction will fail with ENXIO. All ifnets report
link-down while the adapter is suspended.
Resume will reattach to the card, reconfigure it as before, and recreate
the queues servicing the existing ifnets. The ifnets are able to send
and receive traffic as soon as the link comes back up.
Reset is roughly the same as a suspend and a resume with at least one of
these events in between: D0->D3Hot->D0, FLR, PCIe link retrain.
MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC. This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.
This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):
/* Idempotent */
alloc_foo
{
if (!SW_ALLOCATED)
init_iq/init_eq/init_fl no-fail sw init
alloc_iq_fl/alloc_eq/alloc_wrq may-fail sw alloc
add_foo_sysctls, etc. no-fail post-alloc items
if (!HW_ALLOCATED)
alloc_iq_fl_hwq/alloc_eq_hwq hw resource allocation
}
/* Idempotent */
free_foo
{
if (!HW_ALLOCATED)
free_iq_fl_hwq/free_eq_hwq release hw resources
if (!SW_ALLOCATED)
free_iq_fl/free_eq/free_wrq release sw resources
}
The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent. The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.
MFC after: 1 month
Sponsored by: Chelsio Communications
There are two kinds of routines in the driver that read statistics from
the hardware: the cxgbe_* variants read the per-port MPS/MAC registers
and the vi_* variants read the per-VI registers. They can be called
from the 1Hz callout or if_get_counter. All stats collection now takes
place under the callout lock and there is a new flag to indicate that
these routines should not access any hardware register.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
There is no change in the source of the stats (t4_get_port_stats or
t4_get_vi_stats) but the per-port callout is gone.
Sponsored by: Chelsio Communications
Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D29527
This avoids some atomics by using counter_u64 for TX and relying on
existing single-threading (single ithread per rxq) for RX.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29383
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues. Currently it only holds a work queue but will
include additional state in future changes.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29382
The hw.cxgbe.kern_tls tunable was used for this in the past and if it
was set then all T6 adapters would be configured for NIC TLS operation
and could not be reconfigured for TOE without a reload. With this
change ifconfig can be used to manipulate toe and txtls caps like any
other caps. hw.cxgbe.kern_tls continues to work as usual but its
effects are not permanent any more.
* Enable nic_ktls_ofld in the default configuration file and use the
firmware instead of direct register manipulation to apply/rollback
NIC TLS configuration. This allows the driver to switch the hardware
between TOE and NIC TLS mode in a safe manner. Note that the
configuration is adapter-wide and not per-port.
* Remove the kern_tls config file as it works with 100G T6 cards only
and leads to firmware crashes with 25G cards. The configurations
included with the driver (with the exception of the FPGA configs) are
supposed to work with all adapters.
Reported by: Veeresh U.K. at Chelsio
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D29291
Allow the filter mask (aka the hashfilter mode when hashfilters are
in use) to be set any time it is safe to do so. The requested mask
must be a subset of the filter mode already. The driver will not change
the mode or ingress config just to support a new mask.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Read the PF-only hardware settings directly in get_params__post_init.
Split the rest into two routines used by both the PF and VF drivers: one
that reads the SGE rx buffer configuration and another that verifies
miscellaneous hardware configuration.
MFC after: 1 week
Sponsored by: Chelsio Communications
- The behavior implemented in r362905 resulted in delayed transmission
of packets in some cases, causing performance issues. Use a different
heuristic to predict tx requests.
- Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing
entirely. It can be changed at any time. There is no change in
default behavior.
It is common for freelists to be starving when a netmap application
stops. Mailbox commands to free queues can hang in such a situation.
Avoid that by not freeing the queues when netmap is switched off.
Instead, use an alternate method to stop the queues without releasing
the context ids. If netmap is enabled again later then the same queue
is reinitialized for use. Move alloc_nm_rxq and txq to t4_netmap.c
while here.
MFC after: 1 week
Sponsored by: Chelsio Communications
r367917 fixed the backpressure on the netmap rxq being stopped but that
doesn't help if some other netmap rxq is starved (because it is stopping
too although the driver doesn't know this yet) and blocks the pipeline.
An alternate fix that works in all cases will be checked in instead.
Sponsored by: Chelsio Communications
The netmap application using the driver is responsible for replenishing
the receive freelists and they may be totally depleted when the
application exits. Packets in flight, if any, might block the pipeline
in case there aren't enough buffers left in the freelist. Avoid this by
filling up the freelists with a driver allocated buffer.
MFC after: 1 week
Sponsored by: Chelsio Communications
The firmware can allocate ingress and egress context ids anywhere from
its configured range. Size the iq/eq maps to match the entire range
instead of assuming that the firmware always allocates the first
available context id.
Reported by: Baptiste Wicht @ Verisign
MFC after: 1 week
Sponsored by: Chelsio Communications
r365732 was the first attempt to get an accurate count but it was
writing to some read-only registers to clear them and that obviously
didn't work. Instead, note the counter's value when it is supposed to
be cleared and subtract it from future readings.
dev.<port>.stats.rx_fcs_error should not be serviced from the MPS
register for T6.
The stats.* sysctls should all use T5_PORT_REG for T5 and above. This
must have been missed in the initial T5 support years ago. Fix it while
here.
MFC after: 3 days
Sponsored by: Chelsio Communications
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway. This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).
In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.
Reviewed by: gallatin, hselasky, np, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26689
This allows the PF interfaces to communicate with the VF interfaces over
the internal switch in the ASIC. Fix the GL limits for VM work requests
while here.
MFC after: 3 days
Sponsored by: Chelsio Communications
Hardware assistance includes checksumming (tx and rx), TSO, and RSS on
the inner traffic in a VXLAN tunnel.
Relnotes: Yes
Sponsored by: Chelsio Communications