Commit graph

313 commits

Author SHA1 Message Date
John Baldwin
bf5956c185 cxgbe: Support TCP_USE_DDP on offloaded TOE connections
When this socket option is enabled, relatively large contiguous
buffers are allocated and used to receive data from the remote
connection.  When data is received a wrapper M_EXT mbuf is queued to
the socket's receive buffer.  This reduces the length of the linked
list of received mbufs and allows consumers to consume receive data in
larger chunks.

To minimize reprogramming the page pods in the adapter, receive
buffers for a given connection are recycled.  When a buffer has been
fully consumed by the receiver and freed, the buffer is placed on a
per-connection free buffers list.

The size of the receive buffers defaults to 256k and can be set via
the hw.cxgbe.toe.ddp_rcvbuf_len sysctl.  The
hw.cxgbe.toe.ddp_rcvbuf_cache sysctl (defaults to 4) determines the
maximum number of free buffers cached per connection.  Note that this
limit does not apply to "in-flight" receive buffers that are
associated with mbufs in the socket's receive buffer.

Co-authored-by:	Navdeep Parhar <np@FreeBSD.org>
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44001

(cherry picked from commit eba13bbc37)
2024-04-12 12:25:17 -07:00
John Baldwin
7748f70561 cxgbe tom: Handle a race condition when enabling TLS offload
Use a separate state for when a request to set RX_QUIESCE has been
sent but the resulting TCB reply has not been received.  In
particular, this correctly handles the case where data has been
received and queued in the receive queue before the quiesce request
takes effect.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D44435

(cherry picked from commit 9978c6289d)
2024-04-08 11:07:13 -07:00
John Baldwin
d4ad8432aa ddp: Clear active DDP buffer members to NULL to pacify an assertion
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43999

(cherry picked from commit 25429e2743)
2024-04-08 11:06:53 -07:00
John Baldwin
b680e6da13 cxgbe tom: Enable ULP_MODE_TCPDDP on demand
Most ULP modes in cxgbe's TOE are enabled on the fly when a protocol
is needed (e.g. ULP_MODE_ISCSI is enabled by cxgbei when offloading a
connection using iSCSI, and ULP_MODE_TLS is enabled when RX TLS keys
are programmed for a TOE connection).  The one exception to this is
ULP_MODE_TCPDDP.

Currently the cxgbe driver enables ULP_MODE_TCPDDP when a TOE
connection is first created.  However, since DDP connections cannot be
converted to other connection types, this requires some special
handling in the driver.  For example, iSCSI daemons use the SO_NO_DDP
socket option to ensure TOE connections use ULP_MODE_NONE so they can
be converted to ULP_MODE_ISCSI.  Similarly, using TLS receive offload
(ULP_MODE_TLS) requires disabling TCP DDP for new connections by
default.

This commit changes cxgbe to instead switch a connection from
ULP_MODE_NONE to ULP_MODE_TCPDDP when a connection first attempts to
use TCP DDP via aio_read(2).  This permits connections to always start
as ULP_MODE_NONE and switch to a protocol-specific mode as needed.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43670

(cherry picked from commit a5a965d759)
2024-04-08 10:49:51 -07:00
John Baldwin
138ed6fee2 cxgbe: Add counters for POSIX async I/O requests handled by the driver
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43668

(cherry picked from commit c3d4aea6c5)
2024-04-08 10:39:16 -07:00
John Baldwin
9c50c9b776 sys: Use mbufq_empty instead of comparing mbufq_len against 0
Reviewed by:	bz, emaste
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D43338

(cherry picked from commit 8cb9b68f58)
2024-01-18 14:37:29 -08:00
John Baldwin
969dc06e91 cxgbe t4_tls: Call t4_rcvd_locked from do_rx_tls_cmp
Similar to dcfddc8dc0, replace the
simpler, inlined version with the full version.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D41690

(cherry picked from commit 897e564361)
2023-10-11 08:10:32 -07:00
John Baldwin
cb2cd58dbd cxgbe t4_tls: Don't bother returning RX credits for a protocol receive error
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D41689

(cherry picked from commit 75af2d951c)
2023-10-11 08:10:32 -07:00
John Baldwin
bd8cecc466 cxgbe tom: Call t4_rcvd_locked from do_rx_data to return RX credits
In particular, the kernel RPC layer used by the NFS client never
invokes pru_rcvd since it always reads data from the socket upcall
via MSG_SOCALLBCK which avoids calling pru_rcvd.  As a result, on an
NFS client connection managed by t4_tom, RX credits were never
returned to the TOE connection to open the TCP window resulting in
connection hangs.

To fix, expand the set of conditions in do_rx_data where RX credits
are returned to match those in t4_rcvd_locked by calling the function
directly.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D41688

(cherry picked from commit dcfddc8dc0)
2023-10-11 08:10:32 -07:00
John Baldwin
0677f5ccbb cxgbe ddp: Trim stale function prototype
Sponsored by:	Chelsio Communications
2023-08-23 14:30:16 -07:00
Marius Strobl
dc485b968d tcp_info: Add and export more FreeBSD-specific fields
This change adds struct tcp_info fields corresponding to the following
struct tcpcb ones:
- snd_una
- snd_max
- rcv_numsacks
- rcv_adv
- dupacks

Note that while both tcp_fill_info() and fill_tcp_info_from_tcb() are
extended accordingly, no counterpart of rcv_numsacks is available in
the cxgbe(4) TOE PCB, though.

Sponsored by:	NetApp, Inc. (originally)
2023-08-22 20:34:01 +02:00
Marius Strobl
8c6104c48e tcp_fill_info(): Change lock assertion on INPCB to locked only
This function actually only ever reads from the TCP PCB. Consequently,
also make the pointer to its TCP PCB parameter const.

Sponsored by:	NetApp, Inc. (originally)
2023-08-22 20:33:49 +02:00
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Warner Losh
4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Justin Hibbits
954712e8f6 Mechanically convert cxgb(4) and cxgbe(4) to IfAPI
Reviewed by:	np
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38597
2023-03-07 15:31:48 -05:00
Justin Hibbits
c255d1a401 IfAPI: Add if_llsoftc member accessors for TOEDEV
Summary:
Keep TOEDEV() macro for backwards compatibility, and add a SETTOEDEV()
macro to complement with the new accessors.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius
Differential Revision: https://reviews.freebsd.org/D38199
2023-01-31 15:02:16 -05:00
Gleb Smirnoff
e68b379244 tcp: embed inpcb into tcpcb
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.

The most import one is the inpcb itself.  With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa.  Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that.  However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort.  The new intotcpcb() macro is ready for such move.

The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.

One interesting side effect is that now TCP data is allocated from
SMR-protected zone.  Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.

Large part of the change was done with sed script:

s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g

Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.

Differential revision:	https://reviews.freebsd.org/D37127
2022-12-07 09:00:48 -08:00
John Baldwin
2ff447ee3b cxgbe: Enable TOE TLS RX when an RX key is provided via setsockopt().
Rather than requiring a socket to be created as a TLS socket from the
get go, switch a TOE socket from "plain" TOE to TLS mode when a
receive key is added to the socket.

The firmware is only able to switch a "plain" TOE connection to TLS
mode if the head of the pending socket data is the start of a TLS
record, so the connection is migrated to TLS mode as a multi-step
process.

When TOE TLS RX is enabled, the associated connection's receive side
is frozen via a flag in the TCB.  The state of the socket buffer is
then examined to determine if the pending data in the socket buffer
ends on a TLS record boundary.  If so, the connection is migrated to
TLS mode and unfrozen.  Otherwise, the connection is unfrozen
temporarily until more data arrives.  Once more data arrives, the
receive queue is frozen again and rechecked.  This continues until the
connection is paused at a record boundary.  Any records received
before TLS mode is enabled are decrypted as software records.

Note that this removes the 'rx_tls_ports' sysctl.  TOE TLS offload for
receive is now enabled automatically on existing TOE connections when
using a KTLS-aware SSL library just as it was previously enabled
automatically for TLS transmit.  This also enables TLS offload for TOE
connections which enable TLS after passing initial data in the clear
(e.g. STARTTLS with SMTP).

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37351
2022-11-15 12:08:51 -08:00
John Baldwin
21186bdb2d cxgbe: Various whitespace fixes.
Mostly trailing whitespace and spaces before tabs.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D37350
2022-11-15 12:03:57 -08:00
John Baldwin
e4bc19b2fa cxgbe tom: Fix jobtotid() compilation.
The previous commit lost an implicit struct socket * cast.  Use an
inline function instead as the macro is already rather long.

Fixes:		e1401f7579 cxgbe: use standard sototcpcb() accessor macro to get socket's tcpcb
Sponsored by:	Chelsio Communications
2022-11-08 11:25:58 -08:00
Gleb Smirnoff
9eb0e8326d tcp: provide macros to access inpcb and socket from a tcpcb
There should be no functional changes with this commit.

Reviewed by:		rscheff
Differential revision:	https://reviews.freebsd.org/D37123
2022-11-08 10:24:40 -08:00
Gleb Smirnoff
b2c558c898 cxgbe: include headers required to include t4_tom.h
Before the change we would get struct tcpcb forward declaration
only with help of pollution via in_pcb.h.
2022-10-19 15:15:53 -07:00
Gleb Smirnoff
e1401f7579 cxgbe: use standard sototcpcb() accessor macro to get socket's tcpcb
Reviewed by:		np
Differential revision:	https://reviews.freebsd.org/D37041
2022-10-19 15:15:32 -07:00
Gleb Smirnoff
53af690381 tcp: remove INP_TIMEWAIT flag
Mechanically cleanup INP_TIMEWAIT from the kernel sources.  After
0d7445193a, this commit shall not cause any functional changes.

Note: this flag was very often checked together with INP_DROPPED.
If we modify in_pcblookup*() not to return INP_DROPPED pcbs, we
will be able to remove most of this checks and turn them to
assertions.  Some of them can be turned into assertions right now,
but that should be carefully done on a case by case basis.

Differential revision:	https://reviews.freebsd.org/D36400
2022-10-06 19:24:37 -07:00
Navdeep Parhar
8d2c13931b cxgbe/tom: Fix assertions in the code that maintains TCB history.
The tids used for TOE connections start from tid_base, not 0.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2022-09-28 20:01:14 -07:00
Gleb Smirnoff
e7d02be19d protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach.  Now this structure is
  only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw.  This was suggested
  in 1996 by wollman@ (see 7b187005d1), and later reiterated
  in 2006 by rwatson@ (see 6fbb9cf860).
o Make struct domain hold a variable sized array of protosw pointers.
  For most protocols these pointers are initialized statically.
  Those domains that may have loadable protocols have spacers. IPv4
  and IPv6 have 8 spacers each (andre@ dff3237ee5).
o For inetsw and inet6sw leave a comment noting that many protosw
  entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D36232
2022-08-17 11:50:32 -07:00
John Baldwin
782db2881b cxgbe TOE TLS: Fix handling of unusual record types.
This doesn't matter for real traffic but fixes failures in the KTLS
unit tests that use unusual record types.

Sponsored by:	Chelsio Communications
2022-08-08 11:21:54 -07:00
John Baldwin
c6b3a3772c cxgbe TOE TLS: Simplify a few routines.
Remove a few more remnants from the old pre-KTLS support and instead
assume that each work request sends a single TLS record.

Sponsored by:	Chelsio Communications
2022-08-08 11:21:54 -07:00
Gleb Smirnoff
b46667c63e sockbuf: merge two versions of sbcreatecontrol() into one
No functional change.
2022-05-17 10:10:42 -07:00
Gleb Smirnoff
4581cffb3d sockets: fix build, convert missed sbreserve_locked() calls
Fixes:	4328318445
2022-05-12 14:29:19 -07:00
Gleb Smirnoff
4328318445 sockets: use socket buffer mutexes in struct socket directly
Since c67f3b8b78 the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it.  In 74a68313b5 macros that access
this mutex directly were added.  Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.

This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.

This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally.  However, it allows operation of a
protocol that doesn't use them.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35152
2022-05-12 13:22:12 -07:00
John Baldwin
b483b6b256 cxgbe tom: Force unsigned modulus for queue indices.
The final transmit and receive queue indices need to be positive
values.  However, since txq_idx and rxq_idx are signed (to permit
using -1 to as a marker for uninitialized values), using %= with
another integer type (vi->nofld[tr]xq) yielded a sign-extended modulus
value.  This resulted in negative queue indices and a buffer underrun
when arc4random() returned a value with the sign bit set.  Use a
temporary unsigned variable to hold the "raw" queue index to force
unsigned modulus.

This worked previously because the modulus was previously applied
directly to the return value of arc4random() which is unsigned before
the result was assigned to txq_idx and rxq_idx.

Discussed with:	np
Fixes:		db28d4a0cd cxgbe/t4_tom: Support for round-robin selection of offload queues.
Sponsored by:	Chelsio Communications
2022-05-05 16:30:14 -07:00
Mateusz Guzik
d37dca9ec9 cxgbe: plug a set-but-not-used var
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2022-04-19 12:45:56 +00:00
Navdeep Parhar
db28d4a0cd cxgbe/t4_tom: Support for round-robin selection of offload queues.
A COP (Connection Offload Policy) rule can now specify that the tx
and/or rx queue for a new tid should be selected in a round-robin
manner. There is no change in default behavior.

Reviewed by:	jhb@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34921
2022-04-14 15:49:58 -07:00
John Baldwin
2beaefe884 cxgbei: Support unmapped I/O requests.
- Add icl_pdu_append_bio and icl_pdu_get_bio methods.

- Add new page pod routines for allocating and writing page pods for
  unmapped bio requests.  Use these new routines for setting up DDP
  for iSCSI tasks with a SCSI I/O CCB which uses CAM_DATA_BIO.

- When ICL_NOCOPY is used to append data from an unmapped I/O request
  to a PDU, construct unmapped mbufs from the relevant pages backing
  the struct bio.  This also requires changes in the t4_push_pdus path
  to support unmapped mbufs.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D34383
2022-03-10 15:50:52 -08:00
John Baldwin
bca6e339ac cxgbe tom: Compile fix for disabled KTR trace.
Sponsored by:	Chelsio Communications
2022-03-08 14:30:51 -08:00
John Baldwin
87b0e7711f cxgbe tom: Use VM_PAGE_TO_PHYS().
Sponsored by:	Chelsio Communications
2022-03-08 14:30:26 -08:00
John Baldwin
44e7472d0e cxgbe tom: Use be64toh instead of htobe64 to convert to host order.
This is a no-op but more accurately conveys intent.

Sponsored by:	Chelsio Communications
2022-03-08 14:30:05 -08:00
John Baldwin
de414339c9 cxgbe tom: Use vm_paddr_t for physical addresses in page pod routines.
Sponsored by:	Chelsio Communications
2022-03-08 14:28:06 -08:00
John Baldwin
2753997438 cxgbe: Move page pods KTR traces under VERBOSE_TRACES. 2022-03-02 15:32:21 -08:00
Mark Johnston
6be8944d96 ktls: Zero out TLS_GET_RECORD control messages
Otherwise we end up copying one uninitialized byte into the socket
buffer.

Reported by:	KMSAN
Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33953
2022-01-20 15:42:46 -05:00
Navdeep Parhar
39d5cbdc1b cxgbe(4): Fix "set but not used [-Wunused-but-set-variable]" warnings.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2022-01-10 12:15:12 -08:00
Gleb Smirnoff
f64dc2ab5b tcp: TCP output method can request tcp_drop
The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection
when they do output on it.  The default stack never does this, thus
existing framework expects tcp_output() always to return locked and
valid tcpcb.

Provide KPI extension to satisfy demands of advanced stacks.  If the
output method returns negative error code, it means that caller must
call tcp_drop().

In tcp_var() provide three inline methods to call tcp_output():
- tcp_output() is a drop-in replacement for the default stack, so that
  default stack can continue using it internally without modifications.
  For advanced stacks it would perform tcp_drop() and unlock and report
  that with negative error code.
- tcp_output_unlock() handles the negative code and always converts
  it to positive and always unlocks.
- tcp_output_nodrop() just calls the method and leaves the responsibility
  to drop on the caller.

Sweep over the advanced stacks and use new KPI instead of using HPTS
delayed drop queue for that.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D33370
2021-12-26 08:48:19 -08:00
Gleb Smirnoff
40fa3e40b5 tcp: mechanically substitute call to tfb_tcp_output to new method.
Made with sed(1) execution:

sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/)

sed:
s/tp->t_fb->tfb_tcp_output\(tp\)/tcp_output(tp)/
s/to tfb_tcp_output\(\)/to tcp_output()/

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D33366
2021-12-26 08:47:59 -08:00
John Baldwin
e3ba94d4f3 Don't require the socket lock for sorele().
Previously, sorele() always required the socket lock and dropped the
lock if the released reference was not the last reference.  Many
callers locked the socket lock just before calling sorele() resulting
in a wasted lock/unlock when not dropping the last reference.

Move the previous implementation of sorele() into a new
sorele_locked() function and use it instead of sorele() for various
places in uipc_socket.c that called sorele() while already holding the
socket lock.

The sorele() macro now uses refcount_release_if_not_last() try to drop
the socket reference without locking the socket.  If that shortcut
fails, it locks the socket and calls sorele_locked().

Reviewed by:	kib, markj
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D32741
2021-11-09 10:50:12 -08:00
John Baldwin
9affbb0f52 cxgbe tom: Enter network epoch in t4_aiotx_task().
While here, don't restore the old vnet until after sorele().

Sponsored by:	Chelsio Communications
2021-09-14 13:46:15 -07:00
John Baldwin
5dbf8c1588 cxgbe tom: Update rcv_nxt for a FIN after handle_ddp_close().
For TCP DDP, handle_ddp_close() needs to see the pre-FIN rcv_nxt to
determine how much data was placed in the local buffer before the FIN
was received.  The changes in d59f1c49e2 broke this by updating
rcv_nxt before calling handle_ddp_close().

Fixes:		d59f1c49e2 cxgbe tom: Permit rcv_nxt mismatches on FIN for iSCSI connections on T6.
Sponsored by:	Chelsio Communications
2021-09-14 13:46:14 -07:00
John Baldwin
1ecbc1d8e9 cxgbe tom: Don't queue AIO requests on listen sockets.
This is similar to the fixes in 141fe2dcee.  One difference is that
TOE sockets do not change states (listen vs non-listen) once created,
so no lock is needed for SOLISTENING().

Sponsored by:	Chelsio Communications
2021-09-14 13:46:14 -07:00
Navdeep Parhar
53c17de2b4 cxgbe/t4_tom: Use stale L2T entry and avoid busy-waiting for resolution.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2021-09-08 20:55:47 -07:00