opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-04-21 06:07:31 -04:00

Author	SHA1	Message	Date
John Baldwin	bddf73433e	NIC KTLS for Chelsio T6 adapters. This adds support for ifnet (NIC) KTLS using Chelsio T6 adapters. Unlike the TOE-based KTLS in r353328, NIC TLS works with non-TOE connections. NIC KTLS on T6 is not able to use the normal TSO (LSO) path to segment the encrypted TLS frames output by the crypto engine. Instead, the TOE is placed into a special setup to permit "dummy" connections to be associated with regular sockets using KTLS. This permits using the TOE to segment the encrypted TLS records. However, this approach does have some limitations: 1) Regular TOE sockets cannot be used when the TOE is in this special mode. One can use either TOE and TOE-based KTLS or NIC KTLS, but not both at the same time. 2) In NIC KTLS mode, the TOE is only able to accept a per-connection timestamp offset that varies in the upper 4 bits. Put another way, only connections whose timestamp offset has the 28 lower bits cleared can use NIC KTLS and generate correct timestamps. The driver will refuse to enable NIC KTLS on connections with a timestamp offset with any of the lower 28 bits set. To use NIC KTLS, users can either disable TCP timestamps by setting the net.inet.tcp.rfc1323 sysctl to 0, or apply a local patch to the tcp_new_ts_offset() function to clear the lower 28 bits of the generated offset. 3) Because the TCP segmentation relies on fields mirrored in a TCB in the TOE, not all fields in a TCP packet can be sent in the TCP segments generated from a TLS record. Specifically, for packets containing TCP options other than timestamps, the driver will inject an "empty" TCP packet holding the requested options (e.g. a SACK scoreboard) along with the segments from the TLS record. These empty TCP packets are counted by the dev.cc.N.txq.M.kern_tls_options sysctls. Unlike TOE TLS which is able to buffer encrypted TLS records in on-card memory to handle retransmits, NIC KTLS must re-encrypt TLS records for retransmit requests as well as non-retransmit requests that do not include the start of a TLS record but do include the trailer. The T6 NIC KTLS code tries to optimize some of the cases for requests to transmit partial TLS records. In particular it attempts to minimize sending "waste" bytes that have to be given as input to the crypto engine but are not needed on the wire to satisfy mbufs sent from the TCP stack down to the driver. TCP packets for TLS requests are broken down into the following classes (with associated counters): - Mbufs that send an entire TLS record in full do not have any waste bytes (dev.cc.N.txq.M.kern_tls_full). - Mbufs that send a short TLS record that ends before the end of the trailer (dev.cc.N.txq.M.kern_tls_short). For sockets using AES-CBC, the encryption must always start at the beginning, so if the mbuf starts at an offset into the TLS record, the offset bytes will be "waste" bytes. For sockets using AES-GCM, the encryption can start at the 16 byte block before the starting offset capping the waste at 15 bytes. - Mbufs that send a partial TLS record that has a non-zero starting offset but ends at the end of the trailer (dev.cc.N.txq.M.kern_tls_partial). In order to compute the authentication hash stored in the trailer, the entire TLS record must be sent as input to the crypto engine, so the bytes before the offset are always "waste" bytes. In addition, other per-txq sysctls are provided: - dev.cc.N.txq.M.kern_tls_cbc: Count of sockets sent via this txq using AES-CBC. - dev.cc.N.txq.M.kern_tls_gcm: Count of sockets sent via this txq using AES-GCM. - dev.cc.N.txq.M.kern_tls_fin: Count of empty FIN-only packets sent to compensate for the TOE engine not being able to set FIN on the last segment of a TLS record if the TLS record mbuf had FIN set. - dev.cc.N.txq.M.kern_tls_records: Count of TLS records sent via this txq including full, short, and partial records. - dev.cc.N.txq.M.kern_tls_octets: Count of non-waste bytes (TLS header and payload) sent for TLS record requests. - dev.cc.N.txq.M.kern_tls_waste: Count of waste bytes sent for TLS record requests. To enable NIC KTLS with T6, set the following tunables prior to loading the cxgbe(4) driver: hw.cxgbe.config_file=kern_tls hw.cxgbe.kern_tls=1 Reviewed by: np Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21962	2019-11-21 19:30:31 +00:00
John Baldwin	a1b2b6e184	Create a file to hold shared routines for dealing with T6 key contexts. ccr(4) and TLS support in cxgbe(4) construct key contexts used by the crypto engine in the T6. This consolidates some duplicated code for helper functions used to build key contexts. Reviewed by: np MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D22156	2019-11-13 00:53:45 +00:00
John Baldwin	78afed1396	Move CLIP table handling out of TOM and into the base driver. - Store the clip table in 'struct adapter' instead of in the TOM softc. - Init the clip table during attach and teardown during detach. - While here, add a dev.<nexus>.<unit>.misc.clip sysctl to dump the CLIP table. This does mean that we update the clip table even if TOE is not enabled, but non-TOE things need the CLIP table anyway. Reviewed by: np, Krishnamraju Eraparaju @ Chelsio Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D18010	2018-11-29 01:15:53 +00:00
Navdeep Parhar	e7e0844422	cxgbe(4): Replace T4_PKT_TIMESTAMP with something slightly less hackish.	2018-08-18 04:23:51 +00:00
Navdeep Parhar	2dae2a7487	cxgbe(4): Add code to deal with the chip's source MAC table (aka SMT). Submitted by: Krishnamraju Eraparaju @ Chelsio Sponsored by: Chelsio Communications	2018-05-31 21:31:08 +00:00
Navdeep Parhar	eff62dba61	cxgbe(4): Allocate offload Tx queues when a card has resources provisioned for NIC_ETHOFLD and the kernel has option RATELIMIT. It is possible to use the chip's offload queues for normal NIC Tx and not just TOE Tx. The difference is that these queues support out of order processing of work requests and have a per-"flowid" mechanism for tracking credits between the driver and hardware. This allows Tx for any number of flows bound to different rate limits to be submitted to a single Tx queue and the work requests for slow flows won't cause HOL blocking for the rest. Sponsored by: Chelsio Communications	2018-05-17 01:42:18 +00:00
Navdeep Parhar	e1320420d5	cxgbe(4): Move all TCAM filter code into a separate file. Sponsored by: Chelsio Communications	2018-05-01 20:17:22 +00:00
Navdeep Parhar	f856f099cb	cxgbe(4): Initial import of the "collect" component of Chelsio unified debug (cudbg) code, hooked up to the main driver via an ioctl. The ioctl can be used to collect the chip's internal state in a compressed dump file. These dumps can be decoded with the "view" component of cudbg. Obtained from: Chelsio Communications MFC after: 2 months Sponsored by: Chelsio Communications	2017-08-03 14:43:30 +00:00
Navdeep Parhar	2204b42716	cxgbe(4): Support routines for Tx traffic scheduling. - Create a new file, t4_sched.c, and move all of the code related to traffic management from t4_main.c and t4_sge.c to this file. - Track both Channel Rate Limiter (ch_rl) and Class Rate Limiter (cl_rl) parameters in the PF driver. - Initialize all the cl_rl limiters with somewhat arbitrary default rates and provide routines to update them on the fly. - Provide routines to reserve and release traffic classes. MFC after: 1 month Sponsored by: Chelsio Communications	2017-05-02 20:38:10 +00:00
Enji Cooper	193d9e768b	sys/modules: normalize .CURDIR-relative paths to SRCTOP This simplifies make output/logic Tested with: `cd sys/modules; make ALL_MODULES=` on amd64 MFC after: 1 month Sponsored by: Dell EMC Isilon	2017-03-04 10:10:17 +00:00
John Baldwin	f91fca5ba7	Add a driver to create VF devices on Chelsio T4/T5 NICs. Chelsio NICs are a bit unique compared to some other NICs in that they expose different functionality on different physical functions. In particular, PF4 is used to manage the NIC interfaces ('t4nex' and 't5nex'). However, PF4 is not able to create VF devices. Instead, VFs are only supported by physical functions 0 through 3. This commit adds 't4iov' and 't5iov' drivers that attach to PF0-3. One extra wrinkle is that the iov devices cannot enable SR-IOV until the firwmare has been initialized by the main PF4 driver. To handle this case, a new t4_if kobj interface has been added to permit cross-calls between the PF drivers. The PF4 driver notifies sibling drivers when it is fully attached. It also requests sibling drivers to detach before it detaches. Sibling drivers query the PF4 driver during their attach routine to see if it is attached. If not, the sibling drivers defer their attach actions until the PF4 driver informs them it is attached. VF devices are associated with a single port on the NIC. VF devices created from PF0 are associated with the first port on the NIC, VFs from PF1 are associated with the second port, etc. VF devices can only be created from a PF device that has an associated port. Thus, on a 2-port card, VFs are only supported on PF0 and PF1. Reviewed by: np (earlier versions) MFC after: 1 month Sponsored by: Chelsio Communications	2016-07-22 22:46:41 +00:00
John Baldwin	113f2316c6	Add a 'show t4 tcb <nexus> <tid>' command to dump a TCB from DDB. This allows the contents of a TCB to be extracted from a T4/T5 card in DDB after a panic.	2016-04-10 05:06:58 +00:00
Navdeep Parhar	07d684f712	cxgbe(4): support for the kernel RSS option. You need PCBGROUP and RSS in the kernel config to use this. Relnotes: Yes Sponsored by: Chelsio Communications	2015-10-16 01:19:55 +00:00
Gleb Smirnoff	cc4a90c445	Globally enable -fms-extensions when building kernel with gcc, and remove this option from all modules that enable it theirselves. In C mode -fms-extensions option enables anonymous structs and unions, allowing us to use this C11 feature in kernel. Of course, clang supports it without any extra options. Reviewed by: dim	2015-02-17 19:27:14 +00:00
Navdeep Parhar	7cf3e8e4c3	Make sure the compiler flag to get cxgbe(4) to compile with gcc is used only when gcc is being used. This is what r277225 should have been. Suggested by: dim@	2015-01-24 04:41:14 +00:00
Navdeep Parhar	cddd227c5f	Make cxgbe(4) buildable with the gcc in base.	2015-01-16 01:28:28 +00:00
Navdeep Parhar	7951040f8a	cxgbe(4): major tx rework. a) Front load as much work as possible in if_transmit, before any driver lock or software queue has to get involved. b) Replace buf_ring with a brand new mp_ring (multiproducer ring). This is specifically for the tx multiqueue model where one of the if_transmit producer threads becomes the consumer and other producers carry on as usual. mp_ring is implemented as standalone code and it should be possible to use it in any driver with tx multiqueue. It also has: - the ability to enqueue/dequeue multiple items. This might become significant if packet batching is ever implemented. - an abdication mechanism to allow a thread to give up writing tx descriptors and have another if_transmit thread take over. A thread that's writing tx descriptors can end up doing so for an unbounded time period if a) there are other if_transmit threads continuously feeding the sofware queue, and b) the chip keeps up with whatever the thread is throwing at it. - accurate statistics about interesting events even when the stats come at the expense of additional branches/conditional code. The NIC txq lock is uncontested on the fast path at this point. I've left it there for synchronization with the control events (interface up/down, modload/unload). c) Add support for "type 1" coalescing work request in the normal NIC tx path. This work request is optimized for frames with a single item in the DMA gather list. These are very common when forwarding packets. Note that netmap tx in cxgbe already uses these "type 1" work requests. d) Do not request automatic cidx updates every 32 descriptors. Instead, request updates via bits in individual work requests (still every 32 descriptors approximately). Also, request an automatic final update when the queue idles after activity. This means NIC tx reclaim is still performed lazily but it will catch up quickly as soon as the queue idles. This seems to be the best middle ground and I'll probably do something similar for netmap tx as well. e) Implement a faster tx path for WRQs (used by TOE tx and control queues, _not_ by the normal NIC tx). Allow work requests to be written directly to the hardware descriptor ring if room is available. I will convert t4_tom and iw_cxgbe modules to this faster style gradually. MFC after: 2 months	2014-12-31 23:19:16 +00:00
Warner Losh	aeaed50898	Move most of the 15 variations on generating opt_inet.h and opt_inet6.h into kmod.mk by forcing almost everybody to eat the same dogfood. While at it, consolidate the opt_bpf.h and opt_mroute.h targets here too.	2014-08-04 22:37:02 +00:00
Navdeep Parhar	21b787b81a	List one file per line in the Makefiles. This makes it easier to read diffs when a file is added or removed. MFC after: 2 weeks	2014-08-01 01:53:39 +00:00
Navdeep Parhar	88998d7afc	Improve compliance with style.Makefile(5). MFC after: 2 weeks	2014-08-01 01:30:16 +00:00
Navdeep Parhar	298d969c53	cxgbe(4): netmap support for Terminator 5 (T5) based 10G/40G cards. Netmap gets its own hardware-assisted virtual interface and won't take over or disrupt the "normal" interface in any way. You can use both simultaneously. For kernels with DEV_NETMAP, cxgbe(4) carves out an ncxl<N> interface (note the 'n' prefix) in the hardware to accompany each cxl<N> interface. These two ifnet's per port share the same wire but really are separate interfaces in the hardware and software. Each gets its own L2 MAC addresses (unicast and multicast), MTU, checksum caps, etc. You should run netmap on the 'n' interfaces only, that's what they are for. With this, pkt-gen is able to transmit > 45Mpps out of a single 40G port of a T580 card. 2 port tx is at ~56Mpps total (28M + 28M) as of now. Single port receive is at 33Mpps but this is very much a work in progress. I expect it to be closer to 40Mpps once done. In any case the current effort can already saturate multiple 10G ports of a T5 card at the smallest legal packet size. T4 gear is totally untested. trantor:~# ./pkt-gen -i ncxl0 -f tx -D 00:07:43🆎cd:ef 881.952141 main [1621] interface is ncxl0 881.952250 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0 881.952253 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0 881.962540 main [1804] mapped 334980KB at 0x801dff000 Sending on netmap:ncxl0: 4 queues, 1 threads and 1 cpus. 10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> 00:07:43🆎cd:ef) 881.962562 main [1882] Sending 512 packets every 0.000000000 s 881.962563 main [1884] Wait 2 secs for phy reset 884.088516 main [1886] Ready... 884.088535 nm_open [457] overriding ifname ncxl0 ringid 0x0 flags 0x1 884.088607 sender_body [996] start 884.093246 sender_body [1064] drop copy 885.090435 main_thread [1418] 45206353 pps (45289533 pkts in 1001840 usec) 886.091600 main_thread [1418] 45322792 pps (45375593 pkts in 1001165 usec) 887.092435 main_thread [1418] 45313992 pps (45351784 pkts in 1000834 usec) 888.094434 main_thread [1418] 45315765 pps (45406397 pkts in 1002000 usec) 889.095434 main_thread [1418] 45333218 pps (45378551 pkts in 1001000 usec) 890.097434 main_thread [1418] 45315247 pps (45405877 pkts in 1002000 usec) 891.099434 main_thread [1418] 45326515 pps (45417168 pkts in 1002000 usec) 892.101434 main_thread [1418] 45333039 pps (45423705 pkts in 1002000 usec) 893.103434 main_thread [1418] 45324105 pps (45414708 pkts in 1001999 usec) 894.105434 main_thread [1418] 45318042 pps (45408723 pkts in 1002001 usec) 895.106434 main_thread [1418] 45332430 pps (45377762 pkts in 1001000 usec) 896.107434 main_thread [1418] 45338072 pps (45383410 pkts in 1001000 usec) ... Relnotes: Yes Sponsored by: Chelsio Communications.	2014-05-27 18:18:41 +00:00
Warner Losh	c6063d0da8	Use src.opts.mk in preference to bsd.own.mk except where we need stuff from the latter.	2014-05-06 04:22:01 +00:00
Navdeep Parhar	caf20efcde	Add support for packet-sniffing tracers to cxgbe(4). This works with all T4 and T5 based cards and is useful for analyzing TSO, LRO, TOE, and for general purpose monitoring without tapping any cxgbe or cxl ifnet directly. Tracers on the T4/T5 chips provide access to Ethernet frames exactly as they were received from or transmitted on the wire. On transmit, a tracer will capture a frame after TSO segmentation, hw VLAN tag insertion, hw L3 & L4 checksum insertion, etc. It will also capture frames generated by the TCP offload engine (TOE traffic is normally invisible to the kernel). On receive, a tracer will capture a frame before hw VLAN extraction, runt filtering, other badness filtering, before the steering/drop/L2-rewrite filters or the TOE have had a go at it, and of course before sw LRO in the driver. There are 4 tracers on a chip. A tracer can trace only in one direction (tx or rx). For now cxgbetool will set up tracers to capture the first 128B of every transmitted or received frame on a given port. This is a small subset of what the hardware can do. A pseudo ifnet with the same name as the nexus driver (t4nex0 or t5nex0) will be created for tracing. The data delivered to this ifnet is an additional copy made inside the chip. Normal delivery to cxgbe<n> or cxl<n> will be made as usual. /* watch cxl0, which is the first port hanging off t5nex0. / # cxgbetool t5nex0 tracer 0 tx0 (watch what cxl0 is transmitting) # cxgbetool t5nex0 tracer 1 rx0 (watch what cxl0 is receiving) # cxgbetool t5nex0 tracer list # tcpdump -i t5nex0 <== all that cxl0 sees and puts on the wire If you were doing TSO, a tcpdump on cxl0 may have shown you ~64K "frames" with no L3/L4 checksum but this will show you the frames that were actually transmitted. / all done */ # cxgbetool t5nex0 tracer 0 disable # cxgbetool t5nex0 tracer 1 disable # cxgbetool t5nex0 tracer list # ifconfig t5nex0 destroy	2013-07-26 22:04:11 +00:00
Navdeep Parhar	ae11e5c996	Assume INET, INET6, and TCP_OFFLOAD when the driver is built out of tree and KERNBUILDDIR is not set. MFC after: 2 weeks	2012-08-14 23:08:49 +00:00
Navdeep Parhar	a1ea9a8276	cxgbe(4): support for IPv6 TSO and LRO. Submitted by: bz (this is a modified version of that patch)	2012-06-29 19:51:06 +00:00
Ulrich Spörlein	898dec7830	Fix make buildworld -DMODULES_WITH_WORLD Sort opt_ srcs	2011-06-23 20:31:52 +00:00
Navdeep Parhar	4dba21f17e	L2 table code. This is enough to get the T4's switch + L2 rewrite filters working. (All other filters - switch without L2 info rewrite, steer, and drop - were already fully-functional). Some contrived examples of "switch" filters with L2 rewriting: # cxgbetool t4nex0 iport 0 dport 80 action switch vlan +9 eport 3 Intercept all packets received on physical port 0 with TCP port 80 as destination, insert a vlan tag with VID 9, and send them out of port 3. # cxgbetool t4nex0 sip 192.168.1.1/32 ivlan 5 action switch \ vlan =9 smac aa:bb:cc:dd:ee:ff eport 0 Intercept all packets (received on any port) with source IP address 192.168.1.1 and VLAN id 5, rewrite the VLAN id to 9, rewrite source mac to aa:bb:cc:dd:ee:ff, and send it out of port 0. MFC after: 1 week	2011-05-30 21:07:26 +00:00
Navdeep Parhar	489eeba940	T4 packet timestamps. Reference code that shows how to get a packet's timestamp out of cxgbe(4). Disabled by default because we don't have a standard way today to pass this information up the stack. The timestamp is 60 bits wide and each increment represents 1 tick of the T4's core clock. As an example, the timestamp granularity is ~4.4ns for this card: # sysctl dev.t4nex.0.core_clock dev.t4nex.0.core_clock: 228125 MFC after: 1 week	2011-05-05 02:38:08 +00:00
Navdeep Parhar	230abe75dc	Allow multiple modules within sys/modules/cxgbe. The first one is if_cxgbe. MFC after: 3 days	2011-04-01 00:25:32 +00:00

29 commits