Commit graph

395 commits

Author SHA1 Message Date
Hans Petter Selasky
cf551f955d Fix race between driver unload and dumping firmware in mlx5core.
Present code uses lock-less accesses to the dump data to prevent top
level ioctls from blocking bottom-level call to dump.  Unfortunately, this
depends on the type stability of the dump data structure, which makes it
non-functional during driver teardown.

Switch to the mutex locking scheme where top levels use the mutex in the
bound regions, while copyouts and drain for completion utilize condvars.
The mutex lifetime is guaranteed to be strictly larger than the time
interval where driver can initiate dump, and most of the control fields
of the old struct mlx5_dump_data are directly embedded into struct
mlx5_core_dev.

Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:08:48 +00:00
Hans Petter Selasky
39c6d43ee5 Ensure the flowtable rules are not freed twice in mlx5en(4).
This can happen when re-loading the driver.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:08:21 +00:00
Hans Petter Selasky
f5233a73d8 Undo previous steps upon returning failure in mlx5en(4).
Else flowtable resources may not be properly freed.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:08:01 +00:00
Hans Petter Selasky
47d93c5c43 Make sure the flow destination structure does not use values off the stack
in mlx5en(4).

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:07:42 +00:00
Hans Petter Selasky
a0a4fd7734 Flush command workqueue when command completion is triggered in mlx5core.
Avoid race for command completion when triggering a command completions event.
Serialize operation by queueing all commands on the same work queue.
This can happen when healthcare triggers.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:07:20 +00:00
Hans Petter Selasky
4f22751048 Make command timeout way shorter in mlx5core.
The command timeout is terribly long, whole two hours. Make it 60s so if
things do go wrong, the user gets feedback in relatively short time, so
they can take corrective actions and/or investigate using tools and such.

Linux commit:
6b6c07bdcdc97ccac2596063bfc32a5faddfe884

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:07:00 +00:00
Hans Petter Selasky
4d0e6d8452 Remove non-functional MLX5E_MAX_RX_SEGS macro in mlx5en(4).
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:06:42 +00:00
Hans Petter Selasky
8b825a1857 Fix for compilation warning in mlx5en(4).
Function 'mlx5e_alloc_rx_wqe' can never be inlined because it uses alloca
(override using the always_inline attribute)

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:06:22 +00:00
Hans Petter Selasky
bd802cea53 Rename functions from mlx5_fwdump to mlx5_ctl in mlx5core.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:05:59 +00:00
Hans Petter Selasky
998c9a2bbc Implement firmware reset from userspace in mlx5tool(8).
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:05:09 +00:00
Hans Petter Selasky
939c79a213 Add Firmware Reset Level, MFRL, register accessors in mlx5core.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:04:40 +00:00
Hans Petter Selasky
c62f4d8de8 Expose per-lane counters before correction mechanism in mlx5en(4).
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:03:29 +00:00
Hans Petter Selasky
5f9484f3f1 Add support for extended PCIe counters in mlx5en(4).
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 11:02:36 +00:00
Hans Petter Selasky
67fd194170 Extend the counters framework in mlx5en(4).
Allow more macro arguments and split the variable type and name into
separate arguments. This allows simple and powerful copy and extraction
of values from IFC based structures into SYSCTLs with the use of a single
macro.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:59:16 +00:00
Hans Petter Selasky
c71a71bafc Update performance counter bits in mlx5core.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:58:41 +00:00
Hans Petter Selasky
adb6fd50c8 Implement reading PCI power status in mlx5core.
Implement a watchdog as part of the healtcare subsystem which
reads the PCI power status during startup and upon the PCI
power status change event and store it into the core device
structure. This value is then exported to user-space via a
read-only SYSCTL. A dmesg print has been added to inform
the admin about the PCI power status.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:58:06 +00:00
Hans Petter Selasky
40218d734d Move workqueue from mlx5en(4) to mlx5core.
This avoids creating more workqueues in mlx5core to do
simple firmware command polling tasks.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:57:37 +00:00
Hans Petter Selasky
ebf7e777cd Always return success for RoCE modify port in mlx5ib.
CM layer calls ib_modify_port() regardless of the link layer.

For the Ethernet ports, qkey violation and Port capabilities
are meaningless. Therefore, always return success for ib_modify_port
calls on the Ethernet ports.

Linux Commit:
ec2558796d25e6024071b6bcb8e11392538d57bf

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:57:16 +00:00
Hans Petter Selasky
86a3977962 Add support for new rates to mlx5ib.
Submitted by:	slavash@
MFC after:      3 days
Sponsored by:   Mellanox Technologies
2019-05-08 10:56:51 +00:00
Hans Petter Selasky
52786315fc Do not add IFM_10G_LR and IFM_40G_ER4 to supported media types by default in
mlx5en(4).

IFM_10G_LR and IFM_40G_ER4 media should be added only if the device
has the needed capability bit set for it.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:55:15 +00:00
Hans Petter Selasky
ac87880ac1 Add support for 200Gb ethernet speeds to mlx5core.
Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:54:54 +00:00
Hans Petter Selasky
6d1dc6524e Remove unused speed enums in mlx5core.
Submitted by:	slavash@
MFC after:      3 days
Sponsored by:   Mellanox Technologies
2019-05-08 10:54:24 +00:00
Hans Petter Selasky
1bda99bff5 Control automatic update of firmware on driver load with a tunable in mlx5core.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:54:05 +00:00
Hans Petter Selasky
945f398487 Correct check for the calibration generation in mlx5en(4).
If generation is cleared due to hardware clock failure, check for it before
the divisor is used.  Actually clear generation when failure occurs.

While there, stop doing the calculations inside the generation loop.  Since
all members of mlx5e_clbr_point are used for calculations, get the
local copy of the structure and use it after generation stabilized.

Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:53:47 +00:00
Hans Petter Selasky
752b8aabfa Let rx_out_of_buffer be a 32-bit counter in mlx5en(4).
This fixes counting issues when the firmware resets the counter during
allocation of the counter set where the counter belongs.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:53:25 +00:00
Hans Petter Selasky
c29b90ba68 Add vnic steering drop statistics in mlx5en(4).
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:53:01 +00:00
Hans Petter Selasky
94175bc92d Use software counters for rx_packets and rx_bytes in mlx5en(4).
The physical- and virtual- port counters might not reflect the amount
of data received after address filtering. Use the software counters
instead for rx_packets and rx_bytes to know exactly how much data
was received.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:52:32 +00:00
Hans Petter Selasky
be2b4a690a Add mlx5_firmware_update() in mlx5core.
Add support for upgrading firmware on mlx5 module load.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:52:11 +00:00
Hans Petter Selasky
03ee9ea9bf Fix for double bus master disable in mlx5core.
mlx5_pci_disable_device is calling pci_disable_device which disables
bus master. No need to explicitly call pci_clear_master.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:51:29 +00:00
Hans Petter Selasky
ea78f07b5e Implement userspace firmware update for ConnectX-4/5/6.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:50:35 +00:00
Hans Petter Selasky
b255ca093a Rename mlx5_fwdump_addr to more neutral mlx5_tool_addr in mlx5core.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:50:08 +00:00
Hans Petter Selasky
20a63539d6 Add mlxfw callbacks in mlx5core.
Add mlx5 implementation for the ones defined by the mlxfw
shared module to be used while flashing the device firmware.

The callbacks do their job through the MCQI, MCC and MCDA registers.

Linux commit:
62bd22cf326dc4ac5be673c11cef4602dc1f5e47

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:49:36 +00:00
Hans Petter Selasky
8d593abae4 Convert remaining module parameters into SYSCTLs in mlx5core.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:44:53 +00:00
Hans Petter Selasky
a5754bd222 Remove redundant line of code in mlx5core.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:44:27 +00:00
Hans Petter Selasky
07e68e62a0 Change implicit and probably erronous EPERM to EIO on command status error
in mlx5core.

Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:44:02 +00:00
Hans Petter Selasky
0b9cdc8e53 Fix netstat counters mapping in mlx5en(4).
The current mapping of driver counters to netstat counters is wrong.
For example, a single jabber packet, will cause the Ierrs counter to
count three times.

The work for mapping the hardware and software counters to their right
place in netstat counters were already done in Linux, take that as is
to the FreeBSD driver.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:42:33 +00:00
Hans Petter Selasky
8b1b42c150 Avoid leaking send queue mbufs during error recovery in mlx5en(4).
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:41:44 +00:00
Hans Petter Selasky
031b1981d2 Add helper functions to set/query MCC/MCDA/MCQI registers in mlx5core.
To be used by the mlx5 callbacks exposed to the mlxfw module.

Linux commit:
d2ad488b0073bd1a2c3f5d2ea50a7eb632103e5d

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:41:21 +00:00
Hans Petter Selasky
9e3c099977 Enhance MCAM reg to allow query on access reg support in mlx5core.
Enhance MCAM to allow the driver to query which access regs are
supported. For now, expose the regs needed for FW flashing.

Linux commit:
0ab87743cc8c5bcd482daf71961ed5fc45349e01

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:41:00 +00:00
Hans Petter Selasky
d5d52dd748 Add MCC (Management Component Control) register definitions in mlx5core.
MCC (Management Component Control) allows to control a firmware
component update.

MCDA (Management Component Data Access) allows to read and write
a firmware component.

MCQI (Management Component Query Information) allows to query
information about firmware components.

Linux commit:
4717628938423fcba0aa8fa889e9fed4eb6a655f

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:40:41 +00:00
Hans Petter Selasky
58e63779f2 Add reading the mcam_reg in mlx5core.
Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:40:13 +00:00
Hans Petter Selasky
5a8145f6f2 Query and cache PCAM, MCAM registers on initialization in mlx5core.
On load_one, we now cache our capabilities registers internally, similar
to QUERY_HCA_CAP. Capabilities can later be queried using macros
introduced in this patch.

Linux commit:
71862561f3a62015a11de16d1c306481e8415c08

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:39:53 +00:00
Hans Petter Selasky
0358212d37 Implement PCAM, MCAM access register commands in mlx5core.
Introduced registers will expose capabilities of new registers and
features related to port/management.
Driver will query MCAM and PCAM in order to avoid failing on old
firmwares with lack of support.

Linux commit:
c835ad64683bd3e2d1b31ed2cb1ff4366932edb1

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:39:25 +00:00
Hans Petter Selasky
ae73b04113 Expose PCAM, MCAM registers infrastructure in mlx5core.
PCAM: Ports capabilities mask register.
MCAM: Management capabilities mask register.

PCAM and MCAM registers will provide information regarding firmware
support for different features, in order to avoid cases where new driver
combined with old firmware results in syndromes (for ex. PCIe counters
before this patchset).

Linux commit:
cfdcbceaeffc669b70d904d80a2df9c86c232566

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:39:01 +00:00
Hans Petter Selasky
fba52edb89 Add sysctl(8) to control fast unload support in mlx5core.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:38:31 +00:00
Hans Petter Selasky
a4130a344c Add Fast teardown support to mlx5core.
Today mlx5 devices support two teardown modes:
1- Regular teardown
2- Force teardown

This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the pages.

Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been
   scattered to memory

Linux commit:
fcd29ad17c6ff885dfae58f557e9323941e63ba2

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:38:06 +00:00
Hans Petter Selasky
2647c1e4c5 Make sure the running variable is properly set for ratelimited SQs in mlx5en(4).
Else the SQs won't be properly released when closing rate-limited connections
leading to wrong state transitions on the SQ.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:37:31 +00:00
Hans Petter Selasky
ba11bcecd8 Implement get and set nic state as global functions in mlx5core.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:37:03 +00:00
Hans Petter Selasky
c2a1e80706 Ticks are integer type in FreeBSD.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:36:32 +00:00
Hans Petter Selasky
a005c157e4 Configure firmware to use RX hash format in mini CQE in mlx5en(4).
When using CQE zipping, one can choose between RX hash and Checksum.
This will indicate the parameter on which a zipping session should be
stopped.

While porting the Linux code, Checksum was chosen. However, the value
of Checksum is not being used anywhere.
For the FreeBSD driver, we prefer to use the RX hash format which will
guarantee the RX hash value for all the mini CQEs.
While at it, make sure to initialize the Checksum value in the
decompressed CQE.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:35:55 +00:00
Hans Petter Selasky
d52ffcb71c Disable CQE zipping by default in mlx5en(4).
After doing performance measurements, it seems like CQE zipping doesn't
have any significant benefit.
Moreover, we know that this feature is disabled by default on other
operating systems (Linux for example).

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:35:35 +00:00
Hans Petter Selasky
f0dcb8dff5 Split mlx5e_update_stats_work() in mlx5en(4).
Split the function into the mlx5e_update_stats_locked() core and make
mlx5e_update_stats_work() call the _locked helper, similar to many other
places in the kernel. This improves the code structure, making the
locking clean.

Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:35:14 +00:00
Hans Petter Selasky
91f13f8368 Implement fast close of RX channel in mlx5en(4).
Instead of waiting for all jobs to be cancelled, simply close the completion
queue to prevent more completion events and let mlx5e_destroy_rq() cleanup
the remaining mbufs.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:34:42 +00:00
Hans Petter Selasky
243853215d Correct number of elements for priority to traffic class mappings in mlx5en(4).
The number of priorities is always 8, while the number of traffic classes
supported can vary. While at it convert the sysctl node into an array.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:34:14 +00:00
Hans Petter Selasky
ffadb62f20 Remove unused module parameter in mlx5ib.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:33:29 +00:00
Hans Petter Selasky
6428c27faf Make sure to error out when arming the CQ fails in mlx4ib and mlx5ib.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:33:09 +00:00
Hans Petter Selasky
069963d772 Destroy port stats debug context in correct order in mlx5en(4).
Destroy children nodes before parent nodes.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:32:22 +00:00
Hans Petter Selasky
c66537d7b2 Fix tx_jumbo_packets counter in mlx5en(4).
Instead of reading Ethernet RFC 2819 pXtoYoctets counters from
hardware which counts RX octets, count tx_stat_pXtoYoctets from
Ethernet extended counters which counts TX octets.

TX jumbo counters should be accumulated only after the PPCNT
counters were fetched from hardware with their latest value.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:32:03 +00:00
Hans Petter Selasky
bcfad02593 Update Ethernet extended counters in mlx5en(4).
Expose all Ethernet extended counters those counters via debug_stats
sysctl:
dev.mce.X.debug_stats

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:31:32 +00:00
Hans Petter Selasky
5169fb81ca Protect from infinite sw-reset loop in mlx5core.
Avoid an infinite software firmware reset loop that may be caused by a
hardware bug by limiting the maximum number of resets.
The counter between resets is reset by request for reset, and not by a
successful reset.
The interval between two resets can be configured via sysctl:
hw.mlx5.sw_reset_timeout
which is global to all mlx5 devices in the system.

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:30:47 +00:00
Hans Petter Selasky
192fc18d49 Disable all MSIX interrupts before shutdown in mlx5.
Make sure the interrupt handlers don't race with the fast unload one
code in the shutdown handler.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:30:18 +00:00
Hans Petter Selasky
a3a31fde6d Import Linux code to implement mlx5_ib_disassociate_ucontext() in mlx5ib.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:29:45 +00:00
Hans Petter Selasky
983026ea83 Add temperature warning event to log in mlx5core.
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.

Linux commit:
1865ea9adbfaf341c5cd5d8f7d384f19948b2fe9

Submitted by:	slavash@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:28:18 +00:00
Hans Petter Selasky
7646dc2347 Correctly define the interface state bits in mlx5en(4).
While at it remove unused interface state bits. This also fixes and issue
during shutdown:

There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.

The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.

Linux commit:
10a8d00707082955b177164d4b4e758ffcbd4017
b3cb5388499c5e219324bfe7da2e46cbad82bfcf

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:27:29 +00:00
Hans Petter Selasky
e5eae1dc7d Enable FPGA and FPGA QP errors for EQ and call the handler in mlx5core.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:26:33 +00:00
Hans Petter Selasky
c322dbafd5 Add MLX5_FPGA_RELOAD IOCTL(2) to mlx5fpga.
Submitted by:	kib@
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:25:14 +00:00
Hans Petter Selasky
423530be04 Add support for Dynamic Interrupt Moderation, DIM, in mlx5en(4).
Add support for DIM based on Linux,
with some minor adaptions specific to FreeBSD.

Linux commit
f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2019-05-08 10:23:33 +00:00
Andrew Gallatin
50575ce11c Track TCP connection's NUMA domain in the inpcb
Drivers can now pass up numa domain information via the
mbuf numa domain field.  This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.

Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.

Reviewed by:	markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20028
2019-04-25 15:37:28 +00:00
Andrew Gallatin
7687707dd4 Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory
This commit adds new if_alloc_domain() and if_alloc_dev() methods to
allocate ifnets.  When called with a domain on a NUMA machine,
ifalloc_domain() will record the NUMA domain in the ifnet, and it will
allocate the ifnet struct from memory which is local to that NUMA
node.  Similarly, if_alloc_dev() is a wrapper for if_alloc_domain
which uses a driver supplied device_t to call ifalloc_domain() with
the appropriate domain.

Note that the new if_numa_domain field fits in an alignment pad in
struct ifnet, and so does not alter the size of the structure.

Reviewed by:	glebius, kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19930
2019-04-22 19:24:21 +00:00
Andrew Gallatin
538ff57b75 mlx5en: Enable new pfil(9) KPI ethernet filtering hooks
This allows efficient filtering at packet ingress on mlx5en.

Note that the packets are filtered (and potentially dropped) *before*
the driver has committed to (re)allocating an mbuf for the
packet. Dropped packets are treated essentially the same as an
error. Nothing is allocated, and the existing buffer is recycled. This
allows us to drop malicious packets at close to line rate with very
little CPU use.

Reviewed by:	hselasky, slavash, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19063
2019-04-15 17:14:50 +00:00
Konstantin Belousov
a748d99a17 Fix build with option RSS, removing unused variables.
Reported by:	np
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-12-06 21:52:40 +00:00
Konstantin Belousov
e206dc6479 Appease gcc build, remove duplicated declaration.
Reported by:	np
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-12-06 19:20:00 +00:00
Sean Bruno
5fc3b4acab Change u32 to uint32_t to allow the native-xtools target to build
libsysdecode.

Submitted by:	kib
2018-12-06 18:59:33 +00:00
Slava Shwartsman
0f3b263d83 mlx4/mlx5: Updated driver version to 3.5.0
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:25:34 +00:00
Slava Shwartsman
cc971b2261 mlx5en: Implement backpressure indication.
The backpressure indication is implemented using an unlimited rate type of
mbuf send tag. When the upper layers typically the socket layer has obtained such
a tag, it can then query the destination driver queue for the current
amount of space available in the send queue.

A single mbuf send tag may be referenced multiple times and a refcount has been added
to the mlx5e_priv structure to track its usage. Because the send tag resides
in the mlx5e_channel structure, there is no need to wait for refcounts to reach
zero until the mlx4en(4) driver is detached. The channels structure is persistant
during the lifetime of the mlx5en(4) driver it belongs to and can so be accessed
without any need of synchronization.

The mlx5e_snd_tag structure was extended to contain a type field, because there are now
two different tag types which end up in the driver which need to be distinguished.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:25:03 +00:00
Slava Shwartsman
71defeda26 mlx5en: Improve configuration of HW LRO.
In order to enable HW LRO, both the "hw_lro" sysctl in the mlx5en(4) config
space must be set, and the ifconfig(8) LRO capability must be set. Any other
settings will disable HW LRO.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:24:33 +00:00
Slava Shwartsman
01f02abfec mlx5en: Count all transmitted and received bytes.
Add counter for all transmitted and received bytes. Currently only all
transmitted and received packets were counted. Fix description of RX LRO
counters while at it.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:24:02 +00:00
Slava Shwartsman
3230c29d72 mlx5en: Statically allocate and free the channel structure(s).
By allocating the worst case size channel structure array
at attach time we can eliminate various NULL checks in the
fast path. And also reduce the chance for use-after-free
issues in the transmit fast path.

This change is also a requirement for implementing
backpressure support.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:23:31 +00:00
Slava Shwartsman
b3cf149325 mlx5en: Fix race in mlx5e_ethtool_debug_stats().
Writing to the debug stats variable must be locked,
else serialization will be lost which might cause
various kernel panics due to creating and destroying
sysctls out of order.

Make sure the sysctl context is initialized after freeing
the sysctl nodes, else they can be freed twice.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:23:01 +00:00
Slava Shwartsman
42390bb884 mlx5en: Add support for IFM_10G_LR and IFM_40G_ER4 media types.
Inspect the ethernet compliance code to figure out actual cable type by reading
the PDDR module info register.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:22:30 +00:00
Slava Shwartsman
f292a94d6b mlx5en: Don't set rate on SQs when the SQ is already stopped.
This can happen when connections are short lived and leads to
a firmware error printout in dmesg, syndrome 0x51cfb0, because
the SQ is in the wrong state.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:21:59 +00:00
Slava Shwartsman
3e581cabf0 mlx5en: Fix for inlining issues in transmit path
1) Don't exceed the drivers own hardcoded TX inline limit.

The blueflame register size can be much greater than the hardcoded limit
for inlining. Make sure we don't exceed the drivers own limit, because this
also means that the maximum number of TX fragments becomes invalid and
then memory size assumptions in the TX path no longer hold up.

2) Make sure the mlx5_query_min_inline() function returns an error code.

3) Header inlining is required when using TSO.

4) Catch failure to compute inline header size for TSO.

5) Add support for UDP when computing inline header size.

6) Fix for inlining issues with regards to DSCP.

Make sure we inline 4 bytes beyond the ethernet and/or
VLAN header to workaround a hardware bug extracting
the DSCP field from the IPv4/v6 header.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:21:28 +00:00
Slava Shwartsman
d51ced5fae mlx5en: Remove the DRBR and associated logic in the transmit path.
The hardware queues are deep enough currently and using the DRBR and associated
callbacks only leads to more task switching in the TX path. The is also a race
setting the queue_state which can lead to hung TX rings.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:20:57 +00:00
Slava Shwartsman
e870c0ab61 mlx5en: Implement support for bandwidth limiting in by ratio, ETS.
Add support for setting the bandwidth limit as a ratio rather than in bits per
second. The ratio must be an integer number between 1 and 100 inclusivly.

Implement the needed firmware commands and SYSCTLs through mlx5en(4).

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:20:26 +00:00
Slava Shwartsman
d82f1c13ad mlx5fpga: Add set and query connect/disconnect FPGA
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:19:55 +00:00
Slava Shwartsman
085b35bb69 mlx5fpga: IOCTL for FPGA temperature measurement
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:19:23 +00:00
Slava Shwartsman
b5f5751275 mlx5fpga: Support MorseQ board
Added and supported new enum "morseQ = 4" for fpga_id field

Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:18:52 +00:00
Slava Shwartsman
d50c55f17d mlx5fpga_tools initial code import.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:17:22 +00:00
Slava Shwartsman
e9dcd83155 mlx5fpga: Initial code import.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 14:11:20 +00:00
Slava Shwartsman
67db687393 mlx5ib: Set default active width and speed when querying port.
Make sure the active width and speed is set in case the
translate_eth_proto_oper() function doesn't recognize the
current port operation mask.

Linux commit:
7672ed33c4c15dbe9d56880683baaba4227cf940

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:49:11 +00:00
Slava Shwartsman
d3300d4aed mlx5ib: Make sure the congestion work timer does not escape the drain procedure.
If the mlx5_ib_read_cong_stats() function was running when mlx5ib was unloaded,
because this function unconditionally restarts the timer, the timer can still
be pending after the delayed work has been cancelled. To fix this simply loop
on the delayed work cancel procedure as long as it returns non-zero.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:48:39 +00:00
Slava Shwartsman
ac2fdeb4e7 mlx5ib: Fix null pointer dereference in mlx5_ib_create_srq
Although "create_srq_user" does overwrite "in.pas" on some paths, it
also contains at least one feasible path which does not overwrite it.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:48:10 +00:00
Slava Shwartsman
c9e9b5c104 mlx5ib: Fix sign extension in mlx5_ib_query_device
"fw_rev_min(dev->mdev)" with type "unsigned short" (16 bits, unsigned) is
promoted in "fw_rev_min(dev->mdev) << 16" to type "int" (32 bits, signed), then
sign-extended to type "unsigned long" (64 bits, unsigned). If
"fw_rev_min(dev->mdev) << 16" is greater than 0x7FFFFFFF, the upper bits of the
result will all be 1.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:47:41 +00:00
Slava Shwartsman
31c3f64819 mlx5: Fix driver version location
Driver description should be set by core and not by the Ethernet driver.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:47:10 +00:00
Slava Shwartsman
721a1a6a69 mlx5: Fixes to allow command polling mode to exist alongside event mode.
A command is either polling or event driven and the mode cannot change
during execution of a command. Make sure the event handler only handle
commands which are not polled. This is done by checking the command mode
in the command handler before completing commands.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:46:39 +00:00
Slava Shwartsman
b084af6cdc mlx5: Fix wrong size allocation for QoS ETC TC register
The driver allocates wrong size (due to wrong struct name) when issuing
a query/set request to NIC's register.

Linux commit:
d14fcb8d877caf1b8d6bd65d444bf62b21f2070c

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:46:09 +00:00
Slava Shwartsman
8718eb63f8 mlx5: Add software tx_jumbo_packets counter
This counter will represent transmitted packets which has more than
1518 octets.
The NIC has multiple hardware counters for counting transmitted
packets larger than 1518 octets. Each counter counts the packets
in specific range.
We accumulate those counters to have a single counter.

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:45:37 +00:00
Slava Shwartsman
feb5f357ea mlx5: Implement support for configuring PCIe packet write ordering via a sysctl.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:45:08 +00:00
Slava Shwartsman
70b417cf90 mlx5: Extend vector argument to u64.
Else the MLX5_TRIGGERED_CMD_COMP flag will be masked away.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:44:38 +00:00
Slava Shwartsman
29e544513e mlx5: Add global control to disable firmware reset, for all mlx5 devices.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:44:08 +00:00
Slava Shwartsman
2119f825d1 mlx5: Fix use-after-free in self-healing flow
When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.

Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:43:37 +00:00
Slava Shwartsman
8f7f07368d mlx5: Move hw.mlx5 node definition to mlx5_core.
Submitted by:   kib@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:43:07 +00:00
Slava Shwartsman
63cc6d1bc2 mlx5: Convert some spaces into tabs and use device_printf() instead of printf().
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:42:36 +00:00
Slava Shwartsman
abb28d287b mlx5: Add SRQ fixes from Linux
Combine multiple fixes from Linux to SRQ.
Linux commits:
c73b791 IB/mlx5: Assign SRQ type earlier
0fd27a8 IB/mlx5: Fix out-of-bound access
c2b37f7 IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
d63c467 RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:42:06 +00:00
Slava Shwartsman
3b21d18587 mlx5: Fix for potential memory leaks.
Make sure allocated data gets freed in error cases.

Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:41:37 +00:00
Slava Shwartsman
07b624ed71 mlx5: Discard unused return values.
Submitted by:   hselasky@
Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:41:06 +00:00
Slava Shwartsman
843a89d37e mlx5: Raise fatal IB event when sys error occurs
All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.

All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.

Linux commit:
aba462134634b502d720e15b23154f21cfa277e5

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:40:36 +00:00
Slava Shwartsman
2bf40c3608 mlx5: Fix integer overflow while resizing CQ
The user can provide very large cqe_size which will cause to integer
overflow.

Linux commit:
28e9091e3119933c38933cb8fc48d5618eb784c8

Approved by:    hselasky (mentor)
MFC after:      1 week
Sponsored by:   Mellanox Technologies
2018-12-05 13:40:05 +00:00
Slava Shwartsman
0c79f82cf0 mlx5: Notify user that the ConnectX-6 shutdown its port due to power limitation
If power exceed the slot limit, or slot limit is unknown the ConnectX-6
firmware will shutdown its port.
Inform the user via debug message.

MFC after:      3 days
Approved by:    hselasky (mentor), kib (mentor)
Sponsored by:   Mellanox Technologies
2018-10-22 10:38:38 +00:00
Hans Petter Selasky
6ed134c41b Make the MSIX module parameter limit per device, in mlx5en(4).
MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 12:41:09 +00:00
Hans Petter Selasky
16ae32f927 Add support for receive side scaling stride, RSSS, in mlx5en(4).
The receive side scaling stride parameter is a value which define the interval
between active receive side queues. The traffic for the inactive queues is
redirected to the nearest active queue by use of modulus. The default value
of this parameter is one, which means all receive side queues are used.

The point of this feature is to redirect more traffic to fewer receive side
queues in order to take more advantage of sorted large receive offload,
sorted LRO. The sorted LRO works better when more packets are accumulated
per service interval.

MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 12:28:06 +00:00
Hans Petter Selasky
0be5034007 Don't stall transmit queue on drops in mlx5en(4).
When a transmitted packet is dropped don't stall the transmit queue.

MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 12:19:36 +00:00
Hans Petter Selasky
2d32b0a304 Maximum number of mbuf frags is off-by-one for worst case scenario in mlx5en(4).
Inspecting the PRM no more than 0x3F data segments, DS, of size 16 bytes is
allowed.

Worst case scenario summary of DS usage:
Header is fixed:	2 DS
Maximum inlining:	98 => (98 - 2) / 16 = 6 DS
Remainder:		0x3F - 2 - 6 = 55 DS (mbuf frags)

Previously a value of 56 DS was used and this would work in the
normal case because not all inline data area was used up.

MFC after:		3 days
Approved by:		re (marius)
Sponsored by:		Mellanox Technologies
2018-09-06 11:06:07 +00:00
Hans Petter Selasky
7b9b93a8dd Update version information for the mlx5 and mlx5en(4) modules.
While at it bump some copyright dates.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-18 10:12:53 +00:00
Hans Petter Selasky
0539900214 Do not inline transmit headers and use HW VLAN tagging if supported by mlx5en(4).
Query the minimal inline mode supported by the card.
When creating a send queue, cache the queried mode and optimize the transmit
if no inlining is required.  In this case, we can avoid touching the headers
cache line and avoid dirtying several more lines by copying headers into
the send WQEs.  Also, if no inline headers are used, hardware assists in
the VLAN tag framing.

Submitted by:		kib@, slavash@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-18 10:03:30 +00:00
Hans Petter Selasky
90c8e44125 Use a mbuf header instead of a mbuf cluster for debugging interrupts in mlx5en(4).
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:53:37 +00:00
Hans Petter Selasky
a6b2d28d05 Add module parameter to limit number of MSIX EQ vectors in mlx5en(4).
For setups having a large amount of PCI devices, it makes sense to limit the
number of MSIX vectors per PCI device, in order to avoid running out of IRQ
vectors.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:47:56 +00:00
Hans Petter Selasky
aa9f073c9b Add missing newline.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:43:43 +00:00
Hans Petter Selasky
2f17f76aa4 Handle jumbo frames without requiring big clusters in mlx5en(4).
The scatter list is formed by the chunks of MCLBYTES each, and larger
than default packets are returned to the stack as the mbuf chain.

Submitted by:		kib@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:42:05 +00:00
Hans Petter Selasky
f8c3349737 Enable both receive and transmit pauseframes by default in mlx5en(4).
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:21:02 +00:00
Hans Petter Selasky
f2b4782c81 Add context numbers for HW elements in mlx5en(4).
To access the data, set sysctl dev.mce.N.conf.debug_stats to 1.
This enables the sysctl node dev.mce.N.hw_ctx_debug.  Its content is
the mapping of each channel' number to used receive queue and associated
completion queue, set of the transmit queues numbers and corresponding
completion queues.

Trimmed example output:
channel 30 rq 188 cq 1085
channel 30 tc 0 sq 187 cq 1084
channel 31 rq 191 cq 1087
channel 31 tc 0 sq 190 cq 1086

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:18:01 +00:00
Hans Petter Selasky
a880c1ff6a Do not hint about 'trust both' mode when the mlx5en(4) hardware does not support it.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:11:30 +00:00
Hans Petter Selasky
f0474ab919 Correctly write atomic variable in mlx5en(4).
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 11:08:40 +00:00
Hans Petter Selasky
52ff436841 Remove redundant call to mlx5_vsc_find_cap() in mlx5core.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 10:27:46 +00:00
Hans Petter Selasky
6d54b22db7 Make sure the state variable is set atomically instead of using a mutex in mlx5core.
Device detach and setting error state may deadlock over the interface mutex
like this:

a) Detach code in mlx5en waits until error state is set while the interface
mutex is locked.
b) The set error handler needs to lock the interface mutex before it can
set the error state.

The solution is to use atomics to set the error state.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 10:20:01 +00:00
Hans Petter Selasky
b575d8c850 Refactor access to CR-space into using VSC APIs in mlx5core.
Remove no longer used files and APIs.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 10:16:32 +00:00
Hans Petter Selasky
9fc929d2e2 Remove redundant newline character in mlx5core.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 10:11:00 +00:00
Hans Petter Selasky
18450a3b10 Update version information for the mlx5ib module.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 10:07:40 +00:00
Hans Petter Selasky
62bfa774ae Don't pass unsupported events to ibcore from mlx5ib.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:59:55 +00:00
Hans Petter Selasky
14a1b9bd3a Use static device naming instead of dynamic one in mlx5ib.
When resetting mlx5core instances it can happen that the order of attach and
detach for mlx5ib instances is changed. Take the unit number for mlx5_%d from
the parent PCI device, similarly to what is done in mlx5en(4), so that there
is a direct relationship between mce<N> and mlx5_<N>.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:58:11 +00:00
Hans Petter Selasky
ed0cee0bf4 Implement support for Differentiated Service Code Point, DSCP, in mlx5en(4).
The DSCP feature is controlled using a set of sysctl(8) fields under
the qos sysctl directory entry for mlx5en(4).

For Routable RoCE QPs, the DSCP should be set in the QP's address path.
The DSCP's value is derived from the traffic class.

Linux commit:
ed88451e1f2d400fd6a743d0a481631cf9f97550

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:56:40 +00:00
Hans Petter Selasky
f4546fa376 Add support for prio-tagged traffic for RDMA in ibcore.
When receiving a PCP change all GID entries are reloaded.
This ensures the relevant GID entries use prio tagging,
by setting VLAN present and VLAN ID to zero.

The priority for prio tagged traffic is set using the regular
rdma_set_service_type() function.

Fake the real network device to have a VLAN ID of zero
when prio tagging is enabled. This is logic is hidden inside
the rdma_vlan_dev_vlan_id() function which must always be used
to retrieve the VLAN ID throughout all of ibcore and the
infiniband network drivers.

The VLAN presence information then propagates through all
of ibcore and so incoming connections will have the VLAN
bit set. The incoming VLAN ID is then checked against the
return value of rdma_vlan_dev_vlan_id().

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-07-17 09:11:53 +00:00
Hans Petter Selasky
38535d6cab Add support for hardware rate limiting to mlx5en(4).
The hardware rate limiting feature is enabled by the RATELIMIT kernel
option. Please refer to ifconfig(8) and the txrtlmt option and the
SO_MAX_PACING_RATE set socket option for more information. This
feature is compatible with hardware transmit send offload, TSO.

A set of sysctl(8) knobs under dev.mce.<N>.rate_limit are provided to
setup the ratelimit table and also to fine tune various rate limit
related parameters.

Sponsored by:	Mellanox Technologies
2018-05-29 14:04:57 +00:00
Matt Macy
4f6c66cc9c UDP: further performance improvements on tx
Cumulative throughput while running 64
  netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15409
2018-05-23 21:02:14 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Konstantin Belousov
952e75c763 mlx5en: Always allow VLAN id 0.
According to the 802.1Q-2014 9.6 VLAN Tag Control Information, VID value 0
means that there is no VLAN tag assigned to the packet, and only PCP and
DEI values from the tag are meaningful.  Current flow table programming
filter out such packets.

When programming VLAN filter for flow table, unconditionally add rule which
accept packets with VLAN id 0.  The packets are already handled correctly
by the network stack.

Reviewed by:	hselasky, slavash
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2018-05-02 20:22:03 +00:00
Hans Petter Selasky
28cfdee769 Bump driver version number in mlx5en(4).
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-04-04 10:45:06 +00:00
Hans Petter Selasky
d77004ab47 Remove unused structure field in mlx5core.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:58:58 +00:00
Hans Petter Selasky
76ee71dcd3 Bump mlx5core driver version.
MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:55:31 +00:00
Hans Petter Selasky
4d5fdbe9b8 Fix for use after free in mlx5core.
Make sure the command completion handler is not called when the device is
in internal error state. This can easily trigger use after free situations.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:50:45 +00:00
Hans Petter Selasky
ca2345a05d Make sure Giant is locked when allocating bus resources in mlx5core.
During health care IRQ resources will be reallocated.
Newbus requires that Giant is locked before accessing
these resources.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2018-03-30 19:49:35 +00:00
Hans Petter Selasky
92d23c82cd Collect firmware dump when mlx5core is in device error state.
Firmware dump collecting should be triggered in case firmware syndrome
with request for reset bit is set.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:48:25 +00:00
Hans Petter Selasky
d28b6b55ba Reorganize health recovery in mlx5core.
- Move the semaphore locking and unlocking to the same function.
- Flags are no longer needed if the reset and crdump will be done in the
  same function.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:45:48 +00:00
Hans Petter Selasky
3c1274bd64 Prepare for FW dump in error state in mlx5core.
- Move firmware dump prep and cleanup to init_one() and remove_one() so that
the init and cleanup will happen only upon driver reload.
- Add some prints to indicate firmware dump.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:43:15 +00:00
Hans Petter Selasky
0a752b05a8 Properly check if crspace is supported in mlx5core.
The old code checked for MLX5_CR_SPACE_DOMAIN which is irrelevant here.
However, if dev->vsec_addr would be 0, an access to wrong offset would
happen.

MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:39:27 +00:00
Hans Petter Selasky
4950c6ec72 Add missing newline character in print in mlx5core.
MFC after:	3 days
Submitted by:	slavash@
Sponsored by:	Mellanox Technologies
2018-03-30 19:35:31 +00:00
Brooks Davis
541d96aaaf Use an accessor function to access ifr_data.
This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size).  This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 week
Relnotes:	yes
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14900
2018-03-30 18:50:13 +00:00
Hans Petter Selasky
9a3d0cf097 Remove redundant prototype to fix compilation with GCC.
Reported by:	jeff@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-25 08:55:53 +00:00
Hans Petter Selasky
c0cea51b46 Don't wait for completions when a mlx5en(4) device is in internal
error state.

If the device is in internal error state the hardware will not
generate completions. Just move on to destroy the resources.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:38:12 +00:00
Hans Petter Selasky
9cd6fc88be Fix incorrect page count when mlx5core is in internal error.
Change page cleanup flow when in internal error to properly decrement
the page counts when reclaiming pages. That prevents timing out
waiting for extra pages that were actually cleaned up previously.

Submitted by:	slavash@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-03-23 18:35:59 +00:00