There is no reason to re-create the command workqueue during healthcare.
This also fixes an issue where a previous work struct may refer to a
destroyed workqueue.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Present code uses lock-less accesses to the dump data to prevent top
level ioctls from blocking bottom-level call to dump. Unfortunately, this
depends on the type stability of the dump data structure, which makes it
non-functional during driver teardown.
Switch to the mutex locking scheme where top levels use the mutex in the
bound regions, while copyouts and drain for completion utilize condvars.
The mutex lifetime is guaranteed to be strictly larger than the time
interval where driver can initiate dump, and most of the control fields
of the old struct mlx5_dump_data are directly embedded into struct
mlx5_core_dev.
Submitted by: kib@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Avoid race for command completion when triggering a command completions event.
Serialize operation by queueing all commands on the same work queue.
This can happen when healthcare triggers.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Implement a watchdog as part of the healtcare subsystem which
reads the PCI power status during startup and upon the PCI
power status change event and store it into the core device
structure. This value is then exported to user-space via a
read-only SYSCTL. A dmesg print has been added to inform
the admin about the PCI power status.
MFC after: 3 days
Sponsored by: Mellanox Technologies
mlx5_pci_disable_device is calling pci_disable_device which disables
bus master. No need to explicitly call pci_clear_master.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Add mlx5 implementation for the ones defined by the mlxfw
shared module to be used while flashing the device firmware.
The callbacks do their job through the MCQI, MCC and MCDA registers.
Linux commit:
62bd22cf326dc4ac5be673c11cef4602dc1f5e47
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
To be used by the mlx5 callbacks exposed to the mlxfw module.
Linux commit:
d2ad488b0073bd1a2c3f5d2ea50a7eb632103e5d
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
On load_one, we now cache our capabilities registers internally, similar
to QUERY_HCA_CAP. Capabilities can later be queried using macros
introduced in this patch.
Linux commit:
71862561f3a62015a11de16d1c306481e8415c08
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Introduced registers will expose capabilities of new registers and
features related to port/management.
Driver will query MCAM and PCAM in order to avoid failing on old
firmwares with lack of support.
Linux commit:
c835ad64683bd3e2d1b31ed2cb1ff4366932edb1
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Today mlx5 devices support two teardown modes:
1- Regular teardown
2- Force teardown
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the pages.
Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been
scattered to memory
Linux commit:
fcd29ad17c6ff885dfae58f557e9323941e63ba2
MFC after: 3 days
Sponsored by: Mellanox Technologies
Avoid an infinite software firmware reset loop that may be caused by a
hardware bug by limiting the maximum number of resets.
The counter between resets is reset by request for reset, and not by a
successful reset.
The interval between two resets can be configured via sysctl:
hw.mlx5.sw_reset_timeout
which is global to all mlx5 devices in the system.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Make sure the interrupt handlers don't race with the fast unload one
code in the shutdown handler.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.
Linux commit:
1865ea9adbfaf341c5cd5d8f7d384f19948b2fe9
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
While at it remove unused interface state bits. This also fixes and issue
during shutdown:
There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.
The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.
Linux commit:
10a8d00707082955b177164d4b4e758ffcbd4017
b3cb5388499c5e219324bfe7da2e46cbad82bfcf
MFC after: 3 days
Sponsored by: Mellanox Technologies
Inspect the ethernet compliance code to figure out actual cable type by reading
the PDDR module info register.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
1) Don't exceed the drivers own hardcoded TX inline limit.
The blueflame register size can be much greater than the hardcoded limit
for inlining. Make sure we don't exceed the drivers own limit, because this
also means that the maximum number of TX fragments becomes invalid and
then memory size assumptions in the TX path no longer hold up.
2) Make sure the mlx5_query_min_inline() function returns an error code.
3) Header inlining is required when using TSO.
4) Catch failure to compute inline header size for TSO.
5) Add support for UDP when computing inline header size.
6) Fix for inlining issues with regards to DSCP.
Make sure we inline 4 bytes beyond the ethernet and/or
VLAN header to workaround a hardware bug extracting
the DSCP field from the IPv4/v6 header.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Add support for setting the bandwidth limit as a ratio rather than in bits per
second. The ratio must be an integer number between 1 and 100 inclusivly.
Implement the needed firmware commands and SYSCTLs through mlx5en(4).
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
Driver description should be set by core and not by the Ethernet driver.
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
A command is either polling or event driven and the mode cannot change
during execution of a command. Make sure the event handler only handle
commands which are not polled. This is done by checking the command mode
in the command handler before completing commands.
Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies
When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.
Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies