Commit graph

131 commits

Author SHA1 Message Date
Navdeep Parhar
2d8910854b cxgbe(4): implement if_get_counter. 2014-09-27 05:50:31 +00:00
Navdeep Parhar
acc45299f5 cxgbe(4): explicitly set various if_hw_tso* values.
MFC after:	3 days
2014-09-26 22:21:02 +00:00
Navdeep Parhar
db25c97a1a Make sure the adapter's management queue and the event queue are
available before any uppper layer driver (TOE, iWARP, or iSCSI)
registers with the base cxgbe(4) driver.

Submitted by:	Hariprasad at chelsio dot com
Reviewed by:	np@
2014-09-26 18:53:00 +00:00
Navdeep Parhar
1dee8327d4 cxgbe(4): Verify that the addresses in if_multiaddrs really are multicast
addresses.  (The chip doesn't really care, it's just that it needs to be
told explicitly if unicast DMACs are checked for "hits" in the hash that
is used after the TCAM entries are all used up).
2014-09-23 22:57:11 +00:00
Navdeep Parhar
8374717dc0 cxgbe(4): add support for the SIOCGI2C ioctl. 2014-09-12 21:56:57 +00:00
Navdeep Parhar
3eb2c201a6 cxgbe(4): knobs to enable/disable PAUSE frame based flow control.
MFC after:	1 week
2014-09-12 05:25:56 +00:00
Navdeep Parhar
bc22dc708f cxgbe(4): Let caller specify whether it's ok to sleep in
t4_sched_config and t4_sched_params.

MFC after:	2 weeks
2014-08-06 19:38:03 +00:00
Navdeep Parhar
46a646940f cxgbe(4): Do not run any sleepable code in the SIOCSIFFLAGS handler when
IFF_PROMISC or IFF_ALLMULTI is being flipped.  bpf(4) holds its global
mutex around ifpromisc in at least the bpf_dtor path.

MFC after:	3 days
2014-08-04 22:32:16 +00:00
Navdeep Parhar
0fe982772d Some hooks in cxgbe(4) for the offloaded iSCSI driver.
(I'm committing this on behalf of my colleagues in the Storage team
at Chelsio).

Submitted by:	Sreenivasa Honnur <shonnur at chelsio dot com>
Sponsored by:	Chelsio Communications.
2014-07-24 18:39:08 +00:00
Navdeep Parhar
82eff304b6 cxgbe(4): Keep track of the clusters that have to be freed by the
custom free routine (rxb_free) in the driver.  Fail MOD_UNLOAD with
EBUSY if any such cluster has been handed up to the kernel but hasn't
been freed yet.  This prevents a panic later when the cluster finally
needs to be freed but rxb_free is gone from the kernel.

MFC after:	1 week
2014-07-23 22:29:22 +00:00
Navdeep Parhar
bae4e5af99 cxgbe(4): Display CF facility correctly in the device log.
MFC after:	3 days
2014-07-15 18:24:41 +00:00
Navdeep Parhar
44eb893659 Allow multi-byte reads in the private CHELSIO_T4_GET_I2C ioctl. The
firmware allows up to 48B to be read this way but the driver limits
itself to 8B at a time to remain compatible with old cxgbetool
binaries.

MFC after:	1 week
2014-07-15 01:03:29 +00:00
Navdeep Parhar
298d969c53 cxgbe(4): netmap support for Terminator 5 (T5) based 10G/40G cards.
Netmap gets its own hardware-assisted virtual interface and won't take
over or disrupt the "normal" interface in any way.  You can use both
simultaneously.

For kernels with DEV_NETMAP, cxgbe(4) carves out an ncxl<N> interface
(note the 'n' prefix) in the hardware to accompany each cxl<N>
interface.  These two ifnet's per port share the same wire but really
are separate interfaces in the hardware and software.  Each gets its own
L2 MAC addresses (unicast and multicast), MTU, checksum caps, etc.  You
should run netmap on the 'n' interfaces only, that's what they are for.

With this, pkt-gen is able to transmit > 45Mpps out of a single 40G port
of a T580 card.  2 port tx is at ~56Mpps total (28M + 28M) as of now.
Single port receive is at 33Mpps but this is very much a work in
progress.  I expect it to be closer to 40Mpps once done.  In any case
the current effort can already saturate multiple 10G ports of a T5 card
at the smallest legal packet size.  T4 gear is totally untested.

trantor:~# ./pkt-gen -i ncxl0 -f tx -D 00:07:43🆎cd:ef
881.952141 main [1621] interface is ncxl0
881.952250 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0
881.952253 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0
881.962540 main [1804] mapped 334980KB at 0x801dff000
Sending on netmap:ncxl0: 4 queues, 1 threads and 1 cpus.
10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> 00:07:43🆎cd:ef)
881.962562 main [1882] Sending 512 packets every  0.000000000 s
881.962563 main [1884] Wait 2 secs for phy reset
884.088516 main [1886] Ready...
884.088535 nm_open [457] overriding ifname ncxl0 ringid 0x0 flags 0x1
884.088607 sender_body [996] start
884.093246 sender_body [1064] drop copy
885.090435 main_thread [1418] 45206353 pps (45289533 pkts in 1001840 usec)
886.091600 main_thread [1418] 45322792 pps (45375593 pkts in 1001165 usec)
887.092435 main_thread [1418] 45313992 pps (45351784 pkts in 1000834 usec)
888.094434 main_thread [1418] 45315765 pps (45406397 pkts in 1002000 usec)
889.095434 main_thread [1418] 45333218 pps (45378551 pkts in 1001000 usec)
890.097434 main_thread [1418] 45315247 pps (45405877 pkts in 1002000 usec)
891.099434 main_thread [1418] 45326515 pps (45417168 pkts in 1002000 usec)
892.101434 main_thread [1418] 45333039 pps (45423705 pkts in 1002000 usec)
893.103434 main_thread [1418] 45324105 pps (45414708 pkts in 1001999 usec)
894.105434 main_thread [1418] 45318042 pps (45408723 pkts in 1002001 usec)
895.106434 main_thread [1418] 45332430 pps (45377762 pkts in 1001000 usec)
896.107434 main_thread [1418] 45338072 pps (45383410 pkts in 1001000 usec)
...

Relnotes:	Yes
Sponsored by:	Chelsio Communications.
2014-05-27 18:18:41 +00:00
Maksim Yevmenkin
080a4b9b1c use correct (integer) type for the temperature sysctl
Reviewed by:	np, scottl
Obtained from:	Netflix
MFC after:	3 days
2014-04-17 19:29:15 +00:00
Navdeep Parhar
8b3f42d52d cxgbe(4): Recognize the "spider" configuration where a T5 card's 40G
QSFP port is presented as 4 distinct 10G SFP+ ports to the driver.

MFC after:	2 weeks
2014-03-21 00:56:56 +00:00
Navdeep Parhar
65bd4d1cb4 cxgbe(4): Use ifi_oqdrops in if_data to count drops in the tx path. 2014-03-20 02:28:05 +00:00
Navdeep Parhar
475992bdfb cxgbe(4): if_iqdrops statistic should include tunnel congestion drops.
MFC after:	1 week
2014-03-20 01:58:04 +00:00
Navdeep Parhar
38035ed6dc cxgbe(4): significant rx rework.
- More flexible cluster size selection, including the ability to fall
  back to a safe cluster size (PAGE_SIZE from zone_jumbop by default) in
  case an allocation of a larger size fails.
- A single get_fl_payload() function that assembles the payload into an
  mbuf chain for any kind of freelist.  This replaces two variants: one
  for freelists with buffer packing enabled and another for those without.
- Buffer packing with any sized cluster.  It was limited to 4K clusters
  only before this change.
- Enable buffer packing for TOE rx queues as well.
- Statistics and tunables to go with all these changes.  The driver's
  man page will be updated separately.

MFC after:	5 weeks
2014-03-18 20:14:13 +00:00
Scott Long
f7a74e061b Add a new sysctl, dev.cxgbe.N.rsrv_noflow, and a companion tunable,
hw.cxgbe.rsrv_noflow.  When set, queue 0 of the port is reserved for
TX packets without a flowid.  The hash value of packets with a flowid
is bumped up by 1.  The intent is to provide a private queue for
link-level packets like LACP that is unlikely to overflow or suffer
deep queue latency.

Reviewed by:	np
Obtained from:	Netflix
MFC after:	3 days
2014-02-06 18:40:38 +00:00
Navdeep Parhar
454813ff9c cxgbe(4): Use the port's tx channel to identify it to t4_clr_port_stats.
MFC after:	3 days
2014-02-06 02:34:29 +00:00
Adrian Chadd
3af0f449ae Add an option to enable or disable the small RX packet copying that
is done to improve performance of small frames.

When doing RX packing, the RX copying isn't necessarily required.

Reviewed by:	np
2014-01-02 23:23:33 +00:00
Navdeep Parhar
93e9cae3fa Read card capabilities after firmware initialization, instead of setting
them up as part of firmware initialization (which the driver gets to do
only if it's the master driver).

Read the range of tids available for the ETHOFLD functionality if it's
enabled.

New is_ftid() and is_etid() functions to test whether a tid falls within
the range of filter tids or ETHOFLD tids respectively.

MFC after:	2 weeks
2013-12-14 03:08:03 +00:00
Adrian Chadd
ac68deae6d Print out the full PCIe link negotiation during dmesg.
I found this useful when checking whether a NIC is in a PCIE 3.0 8x slot
or not.

Reviewed by:	np
Sponsored by:	Netflix, inc.
2013-12-10 00:07:04 +00:00
Navdeep Parhar
d419aaa126 Unstaticize t4_list and t4_uld_list. This works around a clang
annoyance[1] and allows kgdb to find these symbols.

[1] http://lists.freebsd.org/pipermail/freebsd-hackers/2012-November/041166.html

MFC after:	3 days
2013-12-09 23:33:57 +00:00
Navdeep Parhar
273ef9912d cxgbe(4): save a copy of the RSS map for each port for the driver's use. 2013-12-08 17:47:37 +00:00
Navdeep Parhar
05337b80ee cxgbe(4): T4_SET_SCHED_CLASS and T4_SET_SCHED_QUEUE ioctls to program
scheduling classes in the chip and to bind tx queue(s) to a scheduling
class respectively.  These can be used for various kinds of tx traffic
throttling (to force selected tx queues to drain at a fixed Kbps rate,
or a % of the port's total bandwidth, or at a fixed pps rate, etc.).

Obtained from:	Chelsio
2013-12-03 18:34:52 +00:00
Navdeep Parhar
245a0bd40a cxgbe(4): update the internal list of device features.
MFC after:	3 days
2013-11-21 20:07:58 +00:00
Navdeep Parhar
1192eeb8a3 cxgbe(4): Tidy up the display for payload memory statistics (pm_stats).
# sysctl -n dev.t4nex.0.misc.pm_stats
# sysctl -n dev.t5nex.0.misc.pm_stats

MFC after:	1 week
2013-11-07 00:25:49 +00:00
Navdeep Parhar
be2c01211c cxgbe(4): Exclude MPS_RPLC_MAP_CTL (0x11114) from the register dump. Turns
out it's a write-only register with strange side effects on read.

Submitted by:	gnn
MFC after:	3 days
2013-11-04 21:06:21 +00:00
Navdeep Parhar
48d05478bf cxgbe(4): Update T4 and T5 firmwares to 1.9.12.0 2013-10-14 21:25:07 +00:00
Gleb Smirnoff
4cdc1f5421 There are some high performance NICs that count statistics in hardware,
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
Reviewed by:	melifaro, adrian
Approved by:	re (marius)
2013-10-09 19:04:40 +00:00
Navdeep Parhar
480e603c79 Merge r254386 from user/np/cxl_tuning. Add an INET|INET6 check missing
in said revision.

r254386:
Flush inactive LRO entries periodically.
2013-08-29 06:26:22 +00:00
Navdeep Parhar
319a31ea18 Change t4_list_lock and t4_uld_list_lock from mutexes to sx'es.
- tom_uninit had to be reworked not to hold the adapter lock (a mutex)
  around t4_deactivate_uld, which acquires the uld_list_lock.
- the ifc_match for the interface cloner that creates the tracer ifnet
  had to be reworked as the kernel calls ifc_match with the global
  if_cloners_mtx held.
2013-08-28 20:59:22 +00:00
Navdeep Parhar
9800517691 Add hooks in base cxgbe(4) for the iWARP upper-layer driver. Update a
couple of assertions in the TOE driver as well.
2013-08-28 20:45:45 +00:00
Navdeep Parhar
8a59745fca Use correct mailbox and PCIe PF number when querying RDMA parameters. 2013-08-26 19:02:52 +00:00
Navdeep Parhar
2485eeee37 Display P/N information in the description.
Submitted by:	gnn
MFC after:	3 days
2013-08-20 18:22:04 +00:00
Navdeep Parhar
82342de26d Display temperature sensor data. Shows -1 if sensor not
available on the card.

# sysctl dev.t4nex.0.temperature
# sysctl dev.t5nex.0.temperature
2013-08-02 18:05:42 +00:00
Navdeep Parhar
6e22f9f3da Display SGE tunables in the sysctl tree.
dev.t5nex.0.fl_pktshift: payload DMA offset in rx buffer (bytes)
dev.t5nex.0.fl_pad: payload pad boundary (bytes)
dev.t5nex.0.spg_len: status page size (bytes)
dev.t5nex.0.cong_drop: congestion drop setting

Discussed with:	scottl
2013-07-31 05:12:51 +00:00
Navdeep Parhar
2393220538 Display a string instead of a numeric code in the linkdnrc sysctl.
Submitted by:	gnn@
2013-07-27 07:43:43 +00:00
Navdeep Parhar
716c9e1b58 Expand the list of devices claimed by cxgbe(4). 2013-07-27 00:53:07 +00:00
Navdeep Parhar
caf20efcde Add support for packet-sniffing tracers to cxgbe(4). This works with
all T4 and T5 based cards and is useful for analyzing TSO, LRO, TOE, and
for general purpose monitoring without tapping any cxgbe or cxl ifnet
directly.

Tracers on the T4/T5 chips provide access to Ethernet frames exactly as
they were received from or transmitted on the wire.  On transmit, a
tracer will capture a frame after TSO segmentation, hw VLAN tag
insertion, hw L3 & L4 checksum insertion, etc.  It will also capture
frames generated by the TCP offload engine (TOE traffic is normally
invisible to the kernel).  On receive, a tracer will capture a frame
before hw VLAN extraction, runt filtering, other badness filtering,
before the steering/drop/L2-rewrite filters or the TOE have had a go at
it, and of course before sw LRO in the driver.

There are 4 tracers on a chip.  A tracer can trace only in one direction
(tx or rx).  For now cxgbetool will set up tracers to capture the first
128B of every transmitted or received frame on a given port.  This is a
small subset of what the hardware can do.  A pseudo ifnet with the same
name as the nexus driver (t4nex0 or t5nex0) will be created for tracing.
The data delivered to this ifnet is an additional copy made inside the
chip.  Normal delivery to cxgbe<n> or cxl<n> will be made as usual.

/* watch cxl0, which is the first port hanging off t5nex0. */
# cxgbetool t5nex0 tracer 0 tx0  (watch what cxl0 is transmitting)
# cxgbetool t5nex0 tracer 1 rx0  (watch what cxl0 is receiving)
# cxgbetool t5nex0 tracer list
# tcpdump -i t5nex0   <== all that cxl0 sees and puts on the wire

If you were doing TSO, a tcpdump on cxl0 may have shown you ~64K
"frames" with no L3/L4 checksum but this will show you the frames that
were actually transmitted.

/* all done */
# cxgbetool t5nex0 tracer 0 disable
# cxgbetool t5nex0 tracer 1 disable
# cxgbetool t5nex0 tracer list
# ifconfig t5nex0 destroy
2013-07-26 22:04:11 +00:00
Navdeep Parhar
2b66d73259 Attach to the 4x10G T540-CR card. 2013-07-11 19:09:31 +00:00
Navdeep Parhar
3a760ee793 - Show the reason why link is down if this information is available.
- Display the temperature and PHY firmware version of the BT PHY.

MFC after:	1 day
2013-07-05 01:53:51 +00:00
Navdeep Parhar
6eb3180fb2 - Make note of interface MTU change if the rx queues exist, and not just
when the interface is up.
- Add a tunable to control the TOE's rx coalesce feature (enabled by
  default as it always has been).  Consider the interface MTU or the
  coalesce size when deciding which cluster zone to use to fill the
  offload rx queue's free list.  The tunable is:
  dev.{t4nex,t5nex}.<N>.toe.rx_coalesce

MFC after:	1 day
2013-07-04 21:19:01 +00:00
Navdeep Parhar
6300655cc1 On-the-fly changes to the interrupt coalescing timer should apply to the
TOE rx queues too.

MFC after:	1 day
2013-07-04 20:17:39 +00:00
Navdeep Parhar
c337fa30af - Read all TP parameters in one place.
- Read the filter mode, calculate various shifts, and use them
  properly during active open (in select_ntuple).

MFC after:	1 day
2013-07-04 17:55:52 +00:00
Navdeep Parhar
f72b68a1bf - Include the T5 firmware with the driver.
- Update the T4 firmware to the latest.
- Minor reorganization and updates to the version macros, etc.

Obtained from:	Chelsio
MFC after:	1 day
2013-07-03 23:52:15 +00:00
Navdeep Parhar
87c7afeb55 Add a sysctl to get the number of filters available.
sysctl dev.t4nex.<N>.nfilters
sysctl dev.t5nex.<N>.nfilters

MFC after:	3 days
2013-07-01 17:31:04 +00:00
Navdeep Parhar
9942898697 Update T5 register ranges. This is so that regdump skips over registers
with read side-effects.

MFC after:	3 days
2013-06-27 18:59:07 +00:00
Navdeep Parhar
ad13c6af54 cxgbe(4): Never install a firmware if hw.cxgbe.fw_install is 0.
MFC after:	1 week
2013-06-05 20:57:52 +00:00