* Disable IFCAP_TOE automatically on all ifnets on all adapters during
unload. This is user-friendly and avoids panics due to stale ifnet
state after t4_tom is unloaded.
* Do not allow unload if tids are in use by the TOE on any adapter.
Reported by: Bimal Abraham @ Chelsio
Sponsored by: Chelsio Communications
(cherry picked from commit 9ba8670a8b175de79ea087688f51595b4f2db862)
An L2 entry in the driver's hash was marked STALE unconditionally if it
changed in the kernel when its driver refcount is 0. Fix the driver to
do this for VALID entries only.
Sponsored by: Chelsio Communications
(cherry picked from commit 3883300afe0bff5c5658274c4d8cfe773d08343e)
1. Mark the L2T entry valid only if t4_write_l2e succeeds, which won't
happen if the adapter is stopped. This prevents L2T entries from
sometimes getting (re)promoted to VALID on Tx activity during stop.
2. Discard a work request immediately instead of enqueueing it to the
arp queue if the adapter is stopped.
Fixes: c1c524852f62 cxgbe/t4_tom: Implement uld_stop and uld_restart for ULD_TOM.
Sponsored by: Chelsio Communications
(cherry picked from commit 07f47e8850d0639d474026b203013072aeb32c81)
The STALE state means the L2T entry is valid in hardware but needs to be
refreshed (ARP/NDP) in software. But stop/suspend wipes the hardware
L2T and STALE entries need to be updated just like VALID entries to match
actual hardware state.
Fixes: c1c524852f62 cxgbe/t4_tom: Implement uld_stop and uld_restart for ULD_TOM.
Sponsored by: Chelsio Communications
(cherry picked from commit 171e57967b3e53f0fb48116df5003ce17163295c)
This fixes a panic where the peer's ack to the synack arrives on a
different queue and do_pass_establish tries to remove the synqe from
synqe_list before it has been added by do_pass_accept_req.
Reported by: Sony Arpita Das @ Chelsio
Fixes: 283333c0e329 cxgbe/t4_tom: Track all synq entries in a per-adapter list.
Sponsored by: Chelsio Communications
(cherry picked from commit 674cbf38f6d0a0b307e52c4265da9f077606b035)
1. Remove toepcb from the toep_list on active open failure.
2. Purge the wr_list for an L2T entry on an adpater stop.
Fixes: c1c524852f62 cxgbe/t4_tom: Implement uld_stop and uld_restart for ULD_TOM.
Sponsored by: Chelsio Communications
(cherry picked from commit fef0e39f64a1db796ded8777dbee71fc287f6107)
This allows the adapter to be suspended or reset even when stateful TOE is
active, in some limited configurations.
The LLD has already stopped the adapter hardware and all its queues by the time
these ULD routines get called. The general approach in t4_tom is to purge the
lookup tables immediately so that they are ready for operation by the time the
adapter resumes, and park all the resources left hanging by the stopped hardware
into separate "stranded" queues that can be dealt with at leisure.
Outstanding active opens, live connections, and synq entries (for connections in
the middle of the 3-way handshake) are all treated as if the hardware had
reported an abrupt error for the tid. The servers/listeners are a bit different
in that no error is reported. They're just noted as non-functional when the
hardware stops and are recreated by the driver during restart.
Sponsored by: Chelsio Communications
(cherry picked from commit c1c524852f625cf5f420653f7850d1fe3ff6b4ca)
Live tid entries in tid_tab are either full fledged connections or synq
entries. toep_list tracks the connections already and this change adds
a synqe_list to track the synq entries. These two lists can be used to
enumerate and iterate over all live tids.
Sponsored by: Chelsio Communications
(cherry picked from commit 283333c0e329fd7aceff16fa3bf2b9892744d883)
L2T entries are used by both filters and TOE and the L2T is shared
between the base driver (LLD) and the TOM ULD. Add a flag to indicate
that the L2T is stopped, which means:
* t4_alloc_l2e and t4_l2t_alloc_switching will not allocate new entries.
* t4_tom will ignore all ARP/NDP updates from the kernel.
* Previously allocated L2T entries can still be freed.
Sponsored by: Chelsio Communications
(cherry picked from commit cd93fdee5c8bbdb00d10f8a1fa43f30f151a1ef7)
* Convert t4_uld_list to an array. There will be at most 3 items in the
list and it's simpler to track them in an array with a fixed slot for
each ULD.
* There is no need to refcount ULDs so stop doing that.
* Add uld_ prefix to all members of uld_info.
* Rename async_event to uld_stop to match its actual purpose. Call it
for all ULDs and not just ULD_IWARP.
Reviewed by: jhb
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D46029
(cherry picked from commit cf5e6370f15cffabbbf508083ba7d48ec8abfa79)
Do not assume that the table starts at index 0 and is typically 4K in
size. The only thing the driver needs to verify is that its use of
F_SYNC_WR doesn't collide with the L2T hwidx range.
Reviewed by: jhb
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D46028
(cherry picked from commit cfcfd3c7bf5b60da42b13ac5d8085c762613c302)
Final CPL means the tid is done in the hardware and other resources
associated with it can be freed right away. There is no need to wait
for the kernel to detach the toepcb.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45991
(cherry picked from commit 27479403a764cf3b97194887a1f819c1e35357aa)
The kernel used to call tod_pcb_detach when entering TIME_WAIT but that
seems to have changed, likely with the TIME_WAIT overhaul in the kernel
some time ago. Catch up by having the driver perform the detach.
The hardware does not handle TIME_WAIT so it's important to detach and
let the kernel arm the 2MSL timer to deal with it.
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: jhb
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D45990
(cherry picked from commit bbc326241d91ab2cee2ec2c5c0aa8a906480132f)
This affects TOE operation when multiple rx c-channels are in use for
offload, which is an unusual configuration.
Sponsored by: Chelsio Communications
(cherry picked from commit c6c6d4aff90da83a292b4c2bbbe1f4d6e01cd82e)
The driver always uses the same modulation queue as the channel and the
table is unnecessary.
Sponsored by: Chelsio Communications
(cherry picked from commit f76effed14b25bfa0c47b10f6d8a076104c48d94)
When this socket option is enabled, relatively large contiguous
buffers are allocated and used to receive data from the remote
connection. When data is received a wrapper M_EXT mbuf is queued to
the socket's receive buffer. This reduces the length of the linked
list of received mbufs and allows consumers to consume receive data in
larger chunks.
To minimize reprogramming the page pods in the adapter, receive
buffers for a given connection are recycled. When a buffer has been
fully consumed by the receiver and freed, the buffer is placed on a
per-connection free buffers list.
The size of the receive buffers defaults to 256k and can be set via
the hw.cxgbe.toe.ddp_rcvbuf_len sysctl. The
hw.cxgbe.toe.ddp_rcvbuf_cache sysctl (defaults to 4) determines the
maximum number of free buffers cached per connection. Note that this
limit does not apply to "in-flight" receive buffers that are
associated with mbufs in the socket's receive buffer.
Co-authored-by: Navdeep Parhar <np@FreeBSD.org>
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44001
(cherry picked from commit eba13bbc37ab4f45a8a3502d59c37d56d9a04ca5)
Use a separate state for when a request to set RX_QUIESCE has been
sent but the resulting TCB reply has not been received. In
particular, this correctly handles the case where data has been
received and queued in the receive queue before the quiesce request
takes effect.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44435
(cherry picked from commit 9978c6289d621ac9edc95acb4e0f527a62a49b03)
Most ULP modes in cxgbe's TOE are enabled on the fly when a protocol
is needed (e.g. ULP_MODE_ISCSI is enabled by cxgbei when offloading a
connection using iSCSI, and ULP_MODE_TLS is enabled when RX TLS keys
are programmed for a TOE connection). The one exception to this is
ULP_MODE_TCPDDP.
Currently the cxgbe driver enables ULP_MODE_TCPDDP when a TOE
connection is first created. However, since DDP connections cannot be
converted to other connection types, this requires some special
handling in the driver. For example, iSCSI daemons use the SO_NO_DDP
socket option to ensure TOE connections use ULP_MODE_NONE so they can
be converted to ULP_MODE_ISCSI. Similarly, using TLS receive offload
(ULP_MODE_TLS) requires disabling TCP DDP for new connections by
default.
This commit changes cxgbe to instead switch a connection from
ULP_MODE_NONE to ULP_MODE_TCPDDP when a connection first attempts to
use TCP DDP via aio_read(2). This permits connections to always start
as ULP_MODE_NONE and switch to a protocol-specific mode as needed.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43670
(cherry picked from commit a5a965d75934ae809884f8613bcad156bb5d7148)
Similar to dcfddc8dc091e7688abc8488a0307eba425fa7a2, replace the
simpler, inlined version with the full version.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D41690
(cherry picked from commit 897e564361624411c4e557e0817642e1477f0af4)
In particular, the kernel RPC layer used by the NFS client never
invokes pru_rcvd since it always reads data from the socket upcall
via MSG_SOCALLBCK which avoids calling pru_rcvd. As a result, on an
NFS client connection managed by t4_tom, RX credits were never
returned to the TOE connection to open the TCP window resulting in
connection hangs.
To fix, expand the set of conditions in do_rx_data where RX credits
are returned to match those in t4_rcvd_locked by calling the function
directly.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D41688
(cherry picked from commit dcfddc8dc091e7688abc8488a0307eba425fa7a2)
This change adds struct tcp_info fields corresponding to the following
struct tcpcb ones:
- snd_una
- snd_max
- rcv_numsacks
- rcv_adv
- dupacks
Note that while both tcp_fill_info() and fill_tcp_info_from_tcb() are
extended accordingly, no counterpart of rcv_numsacks is available in
the cxgbe(4) TOE PCB, though.
Sponsored by: NetApp, Inc. (originally)
This function actually only ever reads from the TCP PCB. Consequently,
also make the pointer to its TCP PCB parameter const.
Sponsored by: NetApp, Inc. (originally)
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
Summary:
Keep TOEDEV() macro for backwards compatibility, and add a SETTOEDEV()
macro to complement with the new accessors.
Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D38199
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.
The most import one is the inpcb itself. With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa. Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that. However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort. The new intotcpcb() macro is ready for such move.
The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.
One interesting side effect is that now TCP data is allocated from
SMR-protected zone. Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.
Large part of the change was done with sed script:
s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g
Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.
Differential revision: https://reviews.freebsd.org/D37127
Rather than requiring a socket to be created as a TLS socket from the
get go, switch a TOE socket from "plain" TOE to TLS mode when a
receive key is added to the socket.
The firmware is only able to switch a "plain" TOE connection to TLS
mode if the head of the pending socket data is the start of a TLS
record, so the connection is migrated to TLS mode as a multi-step
process.
When TOE TLS RX is enabled, the associated connection's receive side
is frozen via a flag in the TCB. The state of the socket buffer is
then examined to determine if the pending data in the socket buffer
ends on a TLS record boundary. If so, the connection is migrated to
TLS mode and unfrozen. Otherwise, the connection is unfrozen
temporarily until more data arrives. Once more data arrives, the
receive queue is frozen again and rechecked. This continues until the
connection is paused at a record boundary. Any records received
before TLS mode is enabled are decrypted as software records.
Note that this removes the 'rx_tls_ports' sysctl. TOE TLS offload for
receive is now enabled automatically on existing TOE connections when
using a KTLS-aware SSL library just as it was previously enabled
automatically for TLS transmit. This also enables TLS offload for TOE
connections which enable TLS after passing initial data in the clear
(e.g. STARTTLS with SMTP).
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D37351
The previous commit lost an implicit struct socket * cast. Use an
inline function instead as the macro is already rather long.
Fixes: e1401f7579 cxgbe: use standard sototcpcb() accessor macro to get socket's tcpcb
Sponsored by: Chelsio Communications
Mechanically cleanup INP_TIMEWAIT from the kernel sources. After
0d7445193a, this commit shall not cause any functional changes.
Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions. Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.
Differential revision: https://reviews.freebsd.org/D36400
o Assert that every protosw has pr_attach. Now this structure is
only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw. This was suggested
in 1996 by wollman@ (see 7b187005d1), and later reiterated
in 2006 by rwatson@ (see 6fbb9cf860).
o Make struct domain hold a variable sized array of protosw pointers.
For most protocols these pointers are initialized statically.
Those domains that may have loadable protocols have spacers. IPv4
and IPv6 have 8 spacers each (andre@ dff3237ee5).
o For inetsw and inet6sw leave a comment noting that many protosw
entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36232
Remove a few more remnants from the old pre-KTLS support and instead
assume that each work request sends a single TLS record.
Sponsored by: Chelsio Communications
Since c67f3b8b78 the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it. In 74a68313b5 macros that access
this mutex directly were added. Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.
This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.
This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally. However, it allows operation of a
protocol that doesn't use them.
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35152