When using iSCSI PDU offload (cxgbei) on T6 adapters, a burst of
received PDUs can be reported via a single message to the driver.
Previously the driver passed these multi-PDU bursts up to the iSCSI
stack up as a single "large" PDU by rewriting the buffer offset, data
segment length, and DataSN fields in the iSCSI header. The DataSN
field in particular was rewritten so that each of the "large" PDUs
used consecutively increasing values. While this worked, the forged
DataSN values did not match the ExpDataSN value in the subsequent SCSI
Response PDU. The initiator does not currently verify this value, but
the forged DataSN values prevent adding a check.
To avoid this, allow a logical iSCSI PDU (struct icl_pdu) to describe
a burst of PDUs via a new 'ip_additional_pdus' field. Normally this
field is set to zero when 'struct icl_pdu' represents a single PDU.
If logical PDU represents a burst of on-the-wire PDUs, then 'ip_npdus'
contains the count of additional on-the-wire PDUs. The header of this
"large" PDU is still modified, but the DataSN field now contains the
DataSN value of the first on-the-wire PDU in the burst.
Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31577
A socket in the FIN_WAIT_1 state is marked disconnected by
do_close_con_rpl() even though there might still receive data pending.
This is because the socket at that point has set SBS_CANTRCVMORE which
causes the protocol layer to discard any data received before the FIN.
However, icl_cxgbei_conn_close needs to wait until all the data has
been discarded. Replace the wait for SS_ISDISCONNECTED with instead
waiting for final_cpl_received() to be called.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
ISO can be disabled before establishing a connection by setting
dev.tNnex.N.toe.iso to 0.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31223
Create a struct icl_soft_conn which extends struct icl_conn and
move fields only used by icl_soft from struct icl_conn to
struct icl_soft_conn.
Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31414
Recent firmware versions round down the value passed here by the MSS
and subsequently mishandle transmitted PDUs larger than the rounded
down value.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
This ensures the TOE has finished processing any in-flight received
data before returning to the caller. The caller assumes it is safe to
free any open tasks or transfers (and associated buffers) after this
function returns.
Previously, data placed directly via DDP could be written to buffers
after the caller had freed the buffers.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
If a data PDU encounters an error such as a digest error, the firmware
will report that data PDU when completion moderation is active even if
it is not the final data PDU in a burst.
Sponsored by: Chelsio Communications
A non-placed PDU can be delivered by CPL_RX_ISCSI_CMP in the middle of
a burst of placed PDUs (received via DDP) in which case the rcv_nxt
will not match the start of the non-placed PDU.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
If the connection is in the process of disconnecting, ic_socket can be
NULL. For icl_cxgbei_conn_transfer_setup(), lock the connection and
check ic_socket before using it. For icl_cxgbei_conn_task_setup(),
the caller already holds the connection lock, so assert it and bail
early with ECONNRESET if the connection is disconnecting.
Reported by: Jithesh Arakkan @ Chelsio
Fixes: f949967c8e cxgbei: Fix a race between transfer setup and a peer reset.
In 4427ac3675, the TOM driver stopped sending work requests to
program iSCSI page pods directly and instead queued them to be written
asynchronously with iSCSI PDUs. The queue of mbufs to send is
protected by the inp lock. However, the inp cannot be safely obtained
from the toep since a RST from the remote peer might have cleared
toep->inp asynchronously in an ithread. To fix, obtain the inp from
the socket as is already done in icl_cxgbei_conn_pdu_queue_cb() and
fail the new transfer setup with ECONNRESET if the connection has been
reset.
To avoid passing sockets or inps into the page pod routines, pull the
mbufq out of the two relevant page pod routines such that the routines
queue new work request mbufs to a caller-supplied mbufq.
Reported by: Jithesh Arakkan @ Chelsio
Fixes: 4427ac3675
T6 makes several changes relative to T5 for receive of iSCSI PDUs.
First, earlier adapters issue either 2 or 3 messages to the host for
each PDU received: CPL_ISCSI_HDR contains the BHS of the PDU,
CPL_ISCSI_DATA (when DDP is not used for zero-copy receive) contains
the PDU data as buffers on the freelist, and CPL_RX_ISCSI_DDP with
status of the PDU such as result of CRC checks. In T6, a new
CPL_RX_ISCSI_CMP combines CPL_ISCSI_HDR and CPL_RX_ISCSI_DDP. Data
PDUs which are directly placed via DDP only report a single
CPL_RX_ISCSI_CMP message. Data PDUs received on the free lists are
reported as CPL_ISCSI_DATA followed by CPL_RX_ISCSI_CMP. Control PDUs
such as R2T are still reported via CPL_ISCSI_HDR and CPL_RX_ISCSI_DDP.
Supporting this requires changing the CPL_ISCSI_DATA handler to
allocate a PDU structure if it is not preceded by a CPL_ISCSI_HDR as
well as support for the new CPL_RX_ISCSI_CMP.
Second, when using DDP for zero-copy receive, T6 will only issue a
CPL_RX_ISCSI_CMP after a burst of PDUs have been received (indicated
by the F flag in the BHS). In this case, the CPL_RX_ISCSI_CMP can
reflect the completion of multiple PDUs and the BHS and TCP sequence
number included in the message are from the last PDU received in the
burst. Notably, the message does not include any information about
earlier PDUs received as part of the burst. Instead, the driver must
track the amount of data already received for a given transfer and use
this to compute the amount of data received in a burst. In addition,
the iSCSI layer currently has no way to permit receiving a logical PDU
which spans multiple PDUs. Instead, the driver presents each burst as
a single, "large" PDU to the iSCSI target and initiators. This is
done by rewriting the buffer offset and data length fields in the BHS
of the final PDU as well as rewriting the DataSN so that the received
PDUs appear to be in order.
To track all this, cxgbei maintains a hash table of 'cxgbei_cmp'
structures indexed by transfer tags for each offloaded iSCSI
connection. When a SCSI_DATA_IN message is received, the ITT from the
received BHS is used to find the necessary state in the hash table,
whereas SCSI_DATA_OUT replies use the TTT as the key. The structure
tracks the expected starting offset and DataSN of the next burst as
well as the rewritten DataSN value used for the previously received
PDU.
Discussed with: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D30458
This fixes a few bugs in iSCSI backends where the backends were using
the limits they advertised initially during the login phase as the
final values instead of the values negotiated with the other end.
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D30271
These were seemingly copied over from icl_soft.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D30268
The CTL frontend might have provided a buffer that is smaller than the
FirstBurstLength and thus smaller than the amount of unsolicited data
included in the request PDU. Treat these transfers as an empty
transfer.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29940
A single union ctl_io can be reused across multiple transfers (in
particular by the ramdisk backend). On a reuse, the reservation
pointer would retain its value from the previous transfer tripping an
assertion.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29939
- Switch to allocating the cxgbei version of icl_pdu explicitly
as a separate refcounted object allocated via malloc/free
instead of storing it in the bhs mbuf prior to the bhs.
- Support the icl_conn_pdu_queue_cb() method to set a callback
on a PDU to be invoked when the PDU is freed.
- For ICL_NOCOPY buffers, use an external mbuf to manage the
storage for the buffer via m_extaddref(). Each external mbuf
holds a reference on the associated PDU, so the callback is
invoked once all of the external mbufs have been freed.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29910
- Only allocate 16K jumbo mbufs if the region of data to be
appended is sufficiently large, and use a loop.
- Use m_getm2() to allocate a chain for data less than 16K, or
if m_getjcl() fails.
- Use ENOMEM as the return value instead of '1' if the hook fails due
to a memory allocation error.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29909
A CAM target layer I/O CCB can use a S/G list of virtual address ranges
to describe its data buffer. This change adds zero-copy receive support
for such requests.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29908
As a result, CPL_FW4_ACK now returns credits for these work requests.
To support this, page pod work requests are now constructed in special
mbufs similar to "raw" mbufs used for NIC TLS in plain TX queues.
These special mbufs are stored in the ulp_pduq and dispatched in order
with PDU work requests.
Sponsored by: Chelsio Communications
Discussed with: np
Differential Revision: https://reviews.freebsd.org/D29904
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues. Currently it only holds a work queue but will
include additional state in future changes.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29382
There were quite a few places where port_info was being accessed only to
get to the adapter.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25432
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Remove now-redundant items from toepcb and synq_entry and the code to
support them.
Let the driver calculate tx_align, rx_coalesce, and sndbuf by default.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21387
receive sockbuf's high water mark.
Calculate rx credits on the spot instead of tracking sbused/sb_cc and
rx_credits in the toepcb. The previous method worked when the high
water mark changed due to SB_AUTOSIZE but not when it was adjusted
directly (for example, by the soreserve in nfsrvd_addsock).
This fixes a connection hang while running iozone over an NFS mounted
share where nfsd's TCP sockets are being handled by t4_tom.
MFC after: 3 days
Sponsored by: Chelsio Communications
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).
No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
there's no longer any benefit to dropping the pcbinfo lock
and trying to do so just adds an error prone branchfest to
these functions
- Remove cases of same function recursion on the epoch as
recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
thread as the tracker field is now stack or heap allocated
as appropriate.
Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
Requests to modify the state of TLS connections need to be sent on the
same queue as TLS record transmit requests to ensure ordering.
However, in order to use the offload transmit queue in t4_set_tcb_field(),
the function needs to be updated to do proper flow control / credit
management when queueing a request to an offload queue. This required
passing a pointer to the toepcb itself to this function, so while here
remove the 'tid' and 'iqid' parameters and obtain those values from the
toepcb in t4_set_tcb_field() itself.
Submitted by: Harsh Jain @ Chelsio (original version)
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14871
transfers.
The Initiator and Target both perform zero copy receive for transfers
greater than or equal to this threshold.
Sponsored by: Chelsio Communications