Implement kernel support for RFC 5549/8950.
* Relax control plane restrictions and allow specifying IPv6 gateways
for IPv4 routes. This behavior is controlled by the
net.route.rib_route_ipv6_nexthop sysctl (on by default).
* Always pass final destination in ro->ro_dst in ip_forward().
* Use ro->ro_dst to exract packet family inside if_output() routines.
Consistently use RO_GET_FAMILY() macro to handle ro=NULL case.
* Pass extracted family to nd6_resolve() to get the LLE with proper encap.
It leverages recent lltable changes committed in c541bd368f.
Presence of the functionality can be checked using ipv4_rfc5549_support feature(3).
Example usage:
route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0
Differential Revision: https://reviews.freebsd.org/D30398
MFC after: 2 weeks
A socket in the FIN_WAIT_1 state is marked disconnected by
do_close_con_rpl() even though there might still receive data pending.
This is because the socket at that point has set SBS_CANTRCVMORE which
causes the protocol layer to discard any data received before the FIN.
However, icl_cxgbei_conn_close needs to wait until all the data has
been discarded. Replace the wait for SS_ISDISCONNECTED with instead
waiting for final_cpl_received() to be called.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
ISO can be disabled before establishing a connection by setting
dev.tNnex.N.toe.iso to 0.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31223
The remote peer might send a FIN in the middle of a burst of data
PDUs. In the case of T6 with data PDU completion moderation, the
driver would not have seen these PDUs since the final PDU in the burst
was never received resulting in a stale rcv_nxt when the FIN is
received.
While here, invert the logic in the condition to be more readable and
always set tp->rcv_nxt from the sequence number in the CPL. This sets
the proper value of rcv_nxt for FINs on connections with data received
but not reported via a CPL (e.g. a partial iSCSI PDU burst interrupted
by a FIN).
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D30871
The driver used to configure all available classes with some default
parameters on attach and the rest of t4_sched.c was written with the
assumption that all traffic classes are always valid in the hardware.
But this resulted in a lot of informational messages being logged in the
firmware's circular log, crowding out other more useful messages.
This change leaves the tx scheduler alone during attach to reduce the
spam in the devlog. The state of every class is now tracked separately
from its flags and there is support for an 'uninitialized' state.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Recent firmwares are able to utilize the traffic classes of tx channels
that were previously unused. This effectively doubles the number of
traffic classes available per port for 2 port cards. Stop using the raw
per-channel value in the driver and ask the firmware for the number of
usable traffic classes instead.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
The NIC TLS and TOE TLS modes in cxgbe(4) both work with TLS key
contexts. Previously, TOE TLS supported TLS key contexts created by
two different methods, and NIC TLS had a separate bit of code copied
from NIC TLS but specific to KTLS. Now that TOE TLS only supports
KTLS, pull common code for creating TLS key contexts and programming
them into on-card memory into t4_keyctx.c.
Sponsored by: Chelsio Communications
TOE TLS offload was first supported via a customized OpenSSL developed
by Chelsio with proprietary socket options prior to KTLS being present
either in FreeBSD or upstream OpenSSL. With the addition of KTLS in
both places, cxgbe's TOE driver was extended to support TLS offload
via KTLS as well. This change removes the older interface leaving
only the KTLS bindings for TOE TLS.
Since KTLS was added to TOE TLS second, it was somehat shoe-horned
into the existing code. In addition to removing the non-KTLS TLS
offload, refactor and simplify the code to assume KTLS, e.g. not
copying keys into a helper structure that mimic'ed the non-KTLS mode,
but using the KTLS session object directly when constructing key
contexts.
This also removes some unused code to send TX keys inline in work
requests for TOE TLS. This code was never enabled, and was arguably
sending the wrong thing (it was not sending the raw key context as we
do for NIC TLS when using inline keys).
Sponsored by: Chelsio Communications
If an iSCSI connection is shutdown abruptly (e.g. by a RST from the
peer), pending iSCSI PDUs and page pod work requests can be in the
ulp_pduq when the final CPL is received indicating the death of the
connection.
Reported by: Jithesh Arakkan @ Chelsio
In 4427ac3675, the TOM driver stopped sending work requests to
program iSCSI page pods directly and instead queued them to be written
asynchronously with iSCSI PDUs. The queue of mbufs to send is
protected by the inp lock. However, the inp cannot be safely obtained
from the toep since a RST from the remote peer might have cleared
toep->inp asynchronously in an ithread. To fix, obtain the inp from
the socket as is already done in icl_cxgbei_conn_pdu_queue_cb() and
fail the new transfer setup with ECONNRESET if the connection has been
reset.
To avoid passing sockets or inps into the page pod routines, pull the
mbufq out of the two relevant page pod routines such that the routines
queue new work request mbufs to a caller-supplied mbufq.
Reported by: Jithesh Arakkan @ Chelsio
Fixes: 4427ac3675
- Process the list of local IPs once instead of once per adapter. Add
addresses from all VNETs to the driver's list but leave hardware
updates for later when the global VNET/IFADDR list locks have been
released.
- Add address to the hardware table synchronously when a CLIP entry is
requested for an address that's not already in there.
- Provide ioctls that allow userspace tools to manage addresses in the
CLIP table.
- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
automatically added to the CLIP table or not.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
A CAM target layer I/O CCB can use a S/G list of virtual address ranges
to describe its data buffer. This change adds zero-copy receive support
for such requests.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29908
As a result, CPL_FW4_ACK now returns credits for these work requests.
To support this, page pod work requests are now constructed in special
mbufs similar to "raw" mbufs used for NIC TLS in plain TX queues.
These special mbufs are stored in the ulp_pduq and dispatched in order
with PDU work requests.
Sponsored by: Chelsio Communications
Discussed with: np
Differential Revision: https://reviews.freebsd.org/D29904
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC. This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.
This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):
/* Idempotent */
alloc_foo
{
if (!SW_ALLOCATED)
init_iq/init_eq/init_fl no-fail sw init
alloc_iq_fl/alloc_eq/alloc_wrq may-fail sw alloc
add_foo_sysctls, etc. no-fail post-alloc items
if (!HW_ALLOCATED)
alloc_iq_fl_hwq/alloc_eq_hwq hw resource allocation
}
/* Idempotent */
free_foo
{
if (!HW_ALLOCATED)
free_iq_fl_hwq/free_eq_hwq release hw resources
if (!SW_ALLOCATED)
free_iq_fl/free_eq/free_wrq release sw resources
}
The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent. The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.
MFC after: 1 month
Sponsored by: Chelsio Communications
The mbuf allocated could be a chain and must be freed with m_freem.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29579
This fixes a panic due to stale so->so_proto if t4_tom is unloaded and
one or more connections that were previously offloaded are still around
in TIME_WAIT state.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29503
This avoids some atomics by using counter_u64 for TX and relying on
existing single-threading (single ithread per rxq) for RX.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29383
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues. Currently it only holds a work queue but will
include additional state in future changes.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29382
The hw.cxgbe.kern_tls tunable was used for this in the past and if it
was set then all T6 adapters would be configured for NIC TLS operation
and could not be reconfigured for TOE without a reload. With this
change ifconfig can be used to manipulate toe and txtls caps like any
other caps. hw.cxgbe.kern_tls continues to work as usual but its
effects are not permanent any more.
* Enable nic_ktls_ofld in the default configuration file and use the
firmware instead of direct register manipulation to apply/rollback
NIC TLS configuration. This allows the driver to switch the hardware
between TOE and NIC TLS mode in a safe manner. Note that the
configuration is adapter-wide and not per-port.
* Remove the kern_tls config file as it works with 100G T6 cards only
and leads to firmware crashes with 25G cards. The configurations
included with the driver (with the exception of the FPGA configs) are
supposed to work with all adapters.
Reported by: Veeresh U.K. at Chelsio
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D29291
This avoids mixing the use of two different enums which modern C
compilers warn about.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29301
1. Query the firmware for filter mode, mask, and related ingress config
instead of trying to figure them out from hardware registers. Read
configuration from the registers only when the firmware does not
support this query.
2. Use the firmware to set the filter mode. This is the correct way to
do it and is more flexible as well. The filter mode (and associated
ingress config) can now be changed any time it is safe to do so.
The user can specify a subset of a valid mode and the driver will
enable enough bits to make sure that the mode is maxed out -- that
is, it is not possible to set another bit without exceeding the
total width for optional filter fields. This is a hardware
requirement that was not enforced by the driver previously.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
These errors do not clear so to NULL, so the existing check was
treating these failures as success. The rest of do_pass_establish()
then tried to use the listen socket as if it was a connection socket
newly created by syncache_expand().
In addition, for negative return values, do not send a RST to the
peer.
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D28243
The handshake timer can race with another thread sending a FIN or RST
to close a TOE TLS socket. Just bail from the timer without
rescheduling if the connection is closed when the timer fires.
Reported by: Sony Arpita Das @ Chelsio QA
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D27583
By default, if a TOE TLS socket stops receiving data for more than 5
seconds, revert the connection back to plain TOE mode. This provides
a fallback if the userland SSL library does not support KTLS. In
addition, for client TLS 1.3 sockets using connect(), the TOE socket
blocks before the handshake has completed since the socket option is
only invoked for the final handshake.
The timeout defaults to 5 seconds, but can be changed at boot via the
hw.cxgbe.toe.tls_rx_timeout tunable or for an individual interface via
the dev.<nexus>.toe.tls_rx_timeout sysctl.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27470
This includes mbufs waiting for data from sendfile() I/O requests, or
mbufs awaiting encryption for KTLS.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27469
If TOE TLS is requested for an unsupported cipher suite or TLS
version, disable TLS processing and fall back to plain TOE. In
addition, if an error occurs when saving the decryption keys in the
card's memory, disable TLS processing and fall back to plain TOE.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27468
If a TOE TLS socket ends up using an unsupported TLS version or
ciphersuite, it must be downgraded to a "plain" TOE socket with TLS
encryption/decryption performed on the host. The previous
implementation of this fallback was incomplete and resulted in hung
connections.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27467
TCP SYNs in inner traffic will hit hardware listeners when VXLAN/NVGRE
rx parsing is enabled in the chip. t4_tom should pass on these SYNs to
the kernel and let it deal with them as if they arrived on the non-TOE
path.
Reported by: Sony at Chelsio
MFC after: 1 week
Sponsored by: Chelsio Communications
Otherwise, a socket can have a non-NULL tp->tod while TF_TOE is clear.
In particular, if a newly accepted socket falls back to non-TOE due to
an active open failure, the non-TOE socket will still have tp->tod set
even though TF_TOE is clear.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27028
This is mostly mechanical except for vmspace_exit(). There, use the new
refcount_release_if_last() to avoid switching to vmspace0 unless other
processes are sharing the vmspace. In that case, upon switching to
vmspace0 we can unconditionally release the reference.
Remove the volatile qualifier from vm_refcnt now that accesses are
protected using refcount(9) KPIs.
Reviewed by: alc, kib, mmel
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27057
In certain edge cases, the NIC might have only received a partial TLS
record which it needs to return to the driver. For example, if the
local socket was closed while data was still in flight, a partial TLS
record might be pending when the connection is closed. Receiving a
RST in the middle of a TLS record is another example. When this
happens, the firmware returns the the partial TLS record as plain TCP
data via CPL_RX_DATA. Handle these requests by returning an error to
OpenSSL (via so_error for KTLS or via an error TLS record header for
the older Chelsio OpenSSL interface).
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: Revision: https://reviews.freebsd.org/D26800
Flow control was disabled during initial TOE TLS development to
workaround a hang (and to match the Linux TOE TLS support for T6).
The rest of the TOE TLS code maintained credits as if flow control was
enabled which was inherited from before the workaround was added with
the exception that the receive window was allowed to go negative.
This negative receive window handling (rcv_over) was because I hadn't
realized the full implications of disabling flow control.
To clean this up, re-enable flow control on TOE TLS sockets. The
existing TPF_FORCE_CREDITS workaround is sufficient for the original
hang. Now that flow control is enabled, remove the rcv_over
workaround and instead assert that the receive window never goes
negative matching plain TCP TOE sockets.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26799
There were quite a few places where port_info was being accessed only to
get to the adapter.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25432
fibX_lookup_nh_ext().
fibX_lookup_nh_ represents pre-epoch generation of fib kpi,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D24975
o Shrink sglist(9) functions to work with multipage mbufs down from
four functions to two.
o Don't use 'struct mbuf_ext_pgs *' as argument, use struct mbuf.
o Rename to something matching _epg.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598
The following series of patches addresses three things:
Now that array of pages is embedded into mbuf, we no longer need
separate structure to pass around, so struct mbuf_ext_pgs is an
artifact of the first implementation. And struct mbuf_ext_pgs_data
is a crutch to accomodate the main idea r359919 with minimal churn.
Also, M_EXT of type EXT_PGS are just a synonym of M_NOMAP.
The namespace for the newfeature is somewhat inconsistent and
sometimes has a lengthy prefixes. In these patches we will
gradually bring the namespace to "m_epg" prefix for all mbuf
fields and most functions.
Step 1 of 4:
o Anonymize mbuf_ext_pgs_data, embed in m_ext
o Embed mbuf_ext_pgs
o Start documenting all this entanglement
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598
This largely reuses the TLS TOE support added in r330884. However,
this uses the KTLS framework in upstream OpenSSL rather than requiring
Chelsio-specific patches to OpenSSL. As with the existing TLS TOE
support, use of RX offload requires setting the tls_rx_ports sysctl.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24453