Commit graph

37 commits

Author SHA1 Message Date
Wei Hu
64610df593 mana: remove redundant doorbell in mana_poll_rx_cq()
With the last commit to refill the rx mbuf in batch, the doorbell
in mana_poll_rx_cq() becomes redundant. Remove it to save a few
microseconds spent in mmio call.

Reported by:	NetApp
Reviewed by:	Tallamraju, Sai
Tested by:	whu
Fixes:		9b8701b8 ("mana: refill the rx mbuf in batch")
MFC after:	3 days
Sponsored by:	Microsoft

(cherry picked from commit 47f4137e44b8079c7784604d220a298db07a19a1)
2025-03-18 04:53:56 +00:00
Wei Hu
5c97b7c296 mana: refill the rx mbuf in batch
Set the default refill threshod to be one quarter of the rx queue
length. User can change this value with hw.mana.rx_refill_thresh
in loader.conf. It improves the rx completion handling by saving
10% to 15% of overall time with this change.

Tested by:	whu
MFC after:	2 weeks
Sponsored by:	Microsoft

(cherry picked from commit 9b8701b81f14f0fa0787425eb9761b765d5faab0)
2025-03-12 09:09:32 +00:00
Wei Hu
dae67894b4 mana: Increase default tx and rx ring size to 1024
Tcp perfomance tests show high number of retries under heave tx
traffic. The numbers of queue stops and wakeups also increase.
Further analysis suggests the FreeBSD network stack tends to send
TSO packets with multiple sg entries, typically ranging from
10 to 16. On mana, every two sgs takes one unit of tx ring.
Therefore, adding up one unit for the head, it takes 6 to 9 units
of tx ring to send a typical TSO packet.

Current default tx ring size is 256, which can get filled up
quickly under heavy load. When tx ring is full, the send queue
is stopped waiting for the ring space to be freed. This could
cause the network stack to drop packets, and lead to tcp
retransmissions.

Increase the default tx and rx ring size to 1024 units. Also
introduce two tuneables allowing users to request tx and rx ring
size in loader.conf:
        hw.mana.rx_req_size
        hw.mana.tx_req_size
When mana is loading, the driver checks these two values and
round them up to power of 2. If these two are not set or
the request values are out of the allowable range, it sets the
default ring size instead.

Also change the tx and rx single loop completion budget to 8.

Tested by:	whu
MFC after:	2 weeks
Sponsored by:	Microsoft

(cherry picked from commit a18e99945414fb1f9d455b780c6fcf2d09cc68d8)
2025-03-12 08:40:56 +00:00
John Baldwin
1ab14e185a Check for errors when detaching children first, not last
These detach routines in these drivers all ended with 'return
(bus_generic_detach())' meaning that if any child device failed to
detach, the parent driver was left in a mostly destroyed state, but
still marked attached.  Instead, bus drivers should detach child
drivers first and return errors before destroying driver state in the
parent.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D47387

(cherry picked from commit d412c07617eb35435668b024bc2cecda05f57f1f)
2025-02-27 10:17:49 -05:00
Doug Moore
16b7da6f7a dev/mana: replace power2 function
Replace is_power_of_2(length) with power2(length).  When length != 0, as in
this case, they produce the same result.  This will allow an implementation
of is_power_of_two to be dropped.

Reviewed by:	alc, markj
Differential Revision:	https://reviews.freebsd.org/D45536

(cherry picked from commit a94ed493b50752cee09245fc312c63b00331f217)
2025-02-10 04:50:32 -06:00
Doug Moore
6f309b9d56 log2: move log2 functions from linuxkpi to libkern
Linux has a header file that defines an ilog2 function and some simple
functions/macros that use it: roundup_pow_of_two, is_power_of_2,
rounddown_pow_of_two, and order_base_2.  This change moves three of
those simple functions (all but is_power_of_2) from linuxkpi to
libkern.  It also deletes a few implementations of these functions
that have previously been copied into code for various device drivers,
so that they can use the libkern version.  The is_power_of_2 macro was
not moved because powerof2 in param.h provides almost the same service
already (except that they disagree about whether 0 is a power of two).

Since the linux definitions of these functions were copied into
FreeBSD 11 years ago, linux has improved them, and this change
provides those improvements.  In particular, a giant table of log
values for evaluating ilog2 for constant values is no longer
necessary.

Reviewed by:	alc, markj (previous version)
Differential Revision:	https://reviews.freebsd.org/D45536

(cherry picked from commit c8b0c33b03ac072413b27bed2bdae2ae27426f3a)
2025-02-10 04:29:23 -06:00
Doug Moore
4ed1837853 libkern: add ilog2 macro
The kernel source contains several definitions of an ilog2 function;
some are slower than necessary, and one of them is incorrect.
Elimininate them all and define an ilog2 macro in libkern to replace
them, in a way that is fast, correct for all argument types, and, in a
GENERIC kernel, includes a check for an invalid zero parameter.

Folks at Microsoft have verified that having a correct ilog2
definition for their MANA driver doesn't break it.

Reviewed by:	alc, markj, mhorne (older version), jhibbits (older version)
Differential Revision:	https://reviews.freebsd.org/D45170
Differential Revision:	https://reviews.freebsd.org/D45235

(cherry picked from commit b0056b31e90029553894d17c441cbb2c06d31412)
2025-02-10 04:27:12 -06:00
Zhenlei Huang
e96a62ea65 mana: Remove stray semicolons
MFC after:	1 week

(cherry picked from commit 6ccf4f4071c5bf85a9aad593e92d1623e949c039)
2024-10-31 12:40:17 +08:00
Zhenlei Huang
0fa8566510 mana: Stop checking for failures from malloc/mallocarray/buf_ring_alloc(M_WAITOK)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D45852

(cherry picked from commit 1dc7a7b74b5ad37ff7c8dc22f1a710460a5f1dcd)
2024-09-30 12:44:24 +08:00
Zhenlei Huang
6b1f530935 net: Remove unneeded NULL check for the allocated ifnet
Change 4787572d05 made if_alloc_domain() never fail, then also do the
wrappers if_alloc(), if_alloc_dev(), and if_gethandle().

No functional change intended.

Reviewed by:	kp, imp, glebius, stevek
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D45740

(cherry picked from commit aa3860851b9f6a6002d135b1cac7736e0995eedc)
2024-07-12 20:03:37 +08:00
Mark Johnston
16cc589d91 mana: Use device_set_desc()
No functional change intended.

MFC after:	1 week

(cherry picked from commit 89848b3445ad52c304e6f9c5036aa9108bedb4c8)
2024-06-27 11:27:41 -04:00
Wei Hu
dda1c30ffe mana: fix leaking pci resource problem detaching mana deivces
Fixing the error messages when detaching the mana gdma devices
showed in dmesg: "Device leaked memory resources".

Reported by:	NetApp
MFC after:	3 days
Sponsored by:	Microsoft

(cherry picked from commit 47e99e5bc5bcfa621fe6a3e62386f227c47e8cff)
2024-02-29 06:37:23 +00:00
Wei Hu
33cd621105 mana: Fix TX CQE error handling
For an unknown TX CQE error type (probably from a newer hardware),
still free the mbuf, update the queue tail, etc., otherwise the
accounting will be wrong.

Also, TX errors can be triggered by injecting corrupted packets, so
replace the mana_err to mana_dbg logging.

Reported by:	NetApp
MFC after:	1 week
Sponsored by:	Microsoft

(cherry picked from commit 516b5059705b6b8bbba28821dbe05964c128f9a9)
2024-01-24 12:36:13 +00:00
Wei Hu
a72a0af819 mana: add lro and tso stat counters
Add a few stat counters for tso and lro.

MFC after:	3 days
Sponsored by:	Microsoft

(cherry picked from commit b167e449c8db01f082691503fb5c1255ad5750eb)
2023-09-18 10:27:46 +00:00
Wei Hu
4edfbe719b mana: add ioctl to support toggling offloading features
With this support, users can enable or disable offloading features
such as txcsum, rxcsum, tso and software lro through ifconfig.

To enable or disable tx features, do it on mana interface first,
then hn/netvsc to sync it up with mana. For example:

ifconfig mana0 -txcsum
ifconfig hn0 -tscsum

To enable or disable rx features, just applying on mana interface
would be sufficient.

Disabling txcsum imples disabling tso. Enabling tso when txcsum
is disabled will result in an error message in dmesg requesting
to enable txcsum first.

Above applies to ipv6 offloading features as well.

Tested by:	whu
MFC after:	3 days
Sponsored by:	Microsoft

(cherry picked from commit ab7dc1ceb6d36fd804bedb818086ae3ff6692bf7)
2023-09-18 10:26:09 +00:00
Wei Hu
4b22565f85 mana: fix tso parameters and set hwassist bits
The parameters for tso on mana were not set correctly. Also the
hwassist bits were not set. These two cause tso on mana not work.
Fixed the issues and make tso working on mana.

Tested by:	whu
MFC after:	3 days
Sponsored by:	Microsoft

(cherry picked from commit 643fd7b4bc57de87ddfeb75a8f0bdb27dbb8c3ce)
2023-09-09 13:57:50 +00:00
Wei Hu
55b7a8233e mana: batch ringing RX queue doorbell on receiving packets
It's inefficient to ring the doorbell page every time a WQE is posted to
the received queue. Excessive MMIO writes result in CPU spending more
time waiting on LOCK instructions (atomic operations), resulting in
poor scaling performance.

Move the code for ringing doorbell page to where after we have posted all
WQEs to the receive queue in mana_poll_rx_cq().

In addition, use the correct WQE count for ringing RQ doorbell.
The hardware specification specifies that WQE_COUNT should set to 0 for
the Receive Queue. Although currently the hardware doesn't enforce the
check, in the future releases it may check on this value.

Tested by:	whu
MFC after:	1 week
Sponsored by:	Microsoft

(cherry picked from commit e4e11c1d07f5d58ff8cf4e07ac8f61eecbbb5417)
2023-09-09 12:51:57 +00:00
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Wei Hu
7b9bd54ae8 mana: fix a KASSERT panic on recursed lock access in mana_cfg_vport
The panic stack looks like this:
panic: _sx_xlock_hard: recursed on non-recursive sx MANA port lock
@ /usr/src/sys/dev/mana/mana_en.c:1022

KDB: stack backtrace:
vpanic() at vpanic+0x150/frame 0xfffffe011b3c1970
panic() at panic+0x43/frame 0xfffffe011b3c19d0
_sx_xlock_hard() at _sx_xlock_hard+0x82d/frame 0xfffffe011b3c1a70
_sx_xlock() at _sx_xlock+0xb0/frame 0xfffffe011b3c1ab0
mana_cfg_vport() at mana_cfg_vport+0x79/frame 0xfffffe011b3c1b40
mana_alloc_queues() at mana_alloc_queues+0x3b/frame 0xfffffe011b3c1c50
mana_up() at mana_up+0x40/frame 0xfffffe011b3c1c70
mana_ioctl() at mana_ioctl+0x25b/frame 0xfffffe011b3c1cb0
ifhwioctl() at ifhwioctl+0xd11/frame 0xfffffe011b3c1db0
hn_xpnt_vf_init() at hn_xpnt_vf_init+0x15f/frame 0xfffffe011b3c1e10

The lock has already been held in the caller. Remove this
redundant lock attempt.

Reported by:	NetApp
Sponsored by:	Microsoft
2023-08-11 03:30:38 +00:00
Justin Hibbits
37d22ce087 Mechanically convert mana to IfAPI
Reviewed by:	zlei
Sponsored by:	Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D37835
2023-02-07 14:16:18 -05:00
Li-Wen Hsu
d0b5e4a30a
mana(4): Make the code cross-platform
Discussed with:	whu
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36388
2022-11-04 11:45:35 +08:00
John Baldwin
c5eed4146f Fix various places which cast a pointer to a uint64_t or vice versa.
GCC warns about the mismatched sizes on 32-bit platforms.

Reviewed by:	imp, markj
Differential Revision:	https://reviews.freebsd.org/D36752
2022-09-28 13:58:02 -07:00
Wei Hu
9e772f203f mana: Fix a couple i386 build errors
Fix a couple i386 build errors

Fixes:	b685df314f
Sponsored by:	Microsoft
2022-08-29 06:35:02 +00:00
Wei Hu
b685df314f mana: some code refactoring and export apis for future RDMA driver
- Record the physical address for doorbell page region
  For supporting RDMA device with multiple user contexts with their
  individual doorbell pages, record the start address of doorbell page
  region for use by the RDMA driver to allocate user context doorbell IDs.

- Handle vport sharing between devices
  For outgoing packets, the PF requires the VF to configure the vport with
  corresponding protection domain and doorbell ID for the kernel or user
  context. The vport can't be shared between different contexts.

  Implement the logic to exclusively take over the vport by either the
  Ethernet device or RDMA device.

- Add functions for allocating doorbell page from GDMA
  The RDMA device needs to allocate doorbell pages for each user context.
  Implement those functions and expose them for use by the RDMA driver.

- Export Work Queue functions for use by RDMA driver
  RDMA device may need to create Ethernet device queues for use by Queue
  Pair type RAW. This allows a user-mode context accesses Ethernet hardware
  queues. Export the supporting functions for use by the RDMA driver.

- Define max values for SGL entries
  The number of maximum SGl entries should be computed from the maximum
  WQE size for the intended queue type and the corresponding OOB data
  size. This guarantees the hardware queue can successfully queue requests
  up to the queue depth exposed to the upper layer.

- Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES
   When doing memory registration, the PF may respond with
   GDMA_STATUS_MORE_ENTRIES to indicate a follow request is needed. This is
   not an error and should be processed as expected.

- Define data structures for protection domain and memory registration
  The MANA hardware support protection domain and memory registration for use
  in RDMA environment. Add those definitions and expose them for use by the
  RDMA driver.

MFC after:	2 weeks
Sponsored by:	Microsoft
2022-08-29 05:24:21 +00:00
Wei Hu
fa2d4a22fa mana: add rmb load fence to comply with hw spec
To ensure software reads fresh data after observing ownership bits.

Sponsored by:	Microsoft
2022-08-15 07:39:15 +00:00
John Baldwin
825718a331 mana: Remove unused devclass argument to DRIVER_MODULE. 2022-05-09 12:22:02 -07:00
Wei Hu
de64aa32c8 mana: Add handling of CQE_RX_TRUNCATED
The proper way to drop this kind of CQE is advancing rxq tail
without indicating the packet to the upper network layer.

MFC after:	2 weeks
Sponsored by:	Microsoft
2022-02-15 07:27:42 +00:00
Wei Hu
aa108bc7c5 mana: Add RX fencing
RX fencing allows the driver to know that any prior change to the RQs has
finished, e.g. when the RQs are disabled/enabled or the hashkey/indirection
table are changed, RX fencing is required.

Remove the previous 'sleep' workaround and add the real support for
RX fencing as the PF driver supports the MANA_FENCE_RQ request now (any
old PF driver not supporting the request won't be used in production).

MFC after:	2 weeks
Sponsored by:	Microsoft
2022-01-14 07:34:39 +00:00
Wei Hu
027d0c1c04 mana: fix misc minor handlding issues when error happens.
- In mana_create_txq(), if test fails we must free some resources
  as in all the other handling paths of this function.
- In mana_gd_read_cqe(), add warning log in case of CQE read
  overflow, instead of failing silently.
- Fix error handling in mana_create_rxq() when
  cq->gdma_id >= gc->max_num_cqs.
- In mana_init_port(), use the correct port index rather than 0.
- In mana_hwc_create_wq(), If allocating the DMA buffer fails,
  mana_hwc_destroy_wq was called without previously storing the
  pointer to the queue. In order to avoid leaking the pointer to
  the queue, store it as soon as it is allocated.

MFC after:	2 weeks
Sponsored by:	Microsoft
2022-01-13 07:22:21 +00:00
Wei Hu
623918a198 mana: Improve the HWC error handling
Currently when the HWC creation fails, the error handling is flawed,
e.g. if mana_hwc_create_channel() -> mana_hwc_establish_channel() fails,
the resources acquired in mana_hwc_init_queues() is not released.

Enhance mana_hwc_destroy_channel() to do the proper cleanup work and
call it accordingly.

MFC after:	2 weeks
Sponsored by:	Microsoft
2022-01-13 06:08:43 +00:00
Wei Hu
ed65c80a34 Mana: report OS info to PF driver
The PF driver might use the OS info for statistical purposes.

MFC after:	2 weeks
Sponsored by:	Microsoft
2022-01-10 13:32:30 +00:00
Gordon Bergling
a506133ac9 mana: Fix a typo in a source code comment
- s/maxium/maximum/

MFC after:	1 week
2021-11-03 16:20:11 +01:00
Wei Hu
1833cf1373 Mana: move mana polling from EQ to CQ
-Each CQ start task queue to poll when completion happens.
    This means every rx and tx queue has its own cleanup task
    thread to poll the completion.
    - Arm EQ everytime no matter it is mana or hwc. CQ arming
    depends on the budget.
    - Fix a warning in mana_poll_tx_cq() when cqe_read is 0.
    - Move cqe_poll from EQ to CQ struct.
    - Support EQ sharing up to 8 vPorts.
    - Ease linkdown message from mana_info to mana_dbg.

Tested by:	whu
MFC after:	2 weeks
Sponsored by:	Microsoft
2021-10-26 12:25:22 +00:00
John Baldwin
0def501d14 mana: Cast an unused value to void to quiet a warning.
This appeases a -Wunused-value warning from GCC 9.

Reviewed by:	whu
Differential Revision:	https://reviews.freebsd.org/D31948
2021-09-25 11:28:14 -07:00
Wei Hu
f12b1b8b47 Remove unused function mana_reset_counters.
This fixes the build warning caused by this function.
Reported by:	markj
Tested by:	whu
MFC after:	2 weeks
Sponsored by:	Microsoft
2021-08-20 16:05:40 +00:00
Wei Hu
ce110ea12f Microsoft Azure Network Adapter(MANA) VF support
MANA is the new network adapter from Microsoft which will be available
in Azure public cloud. It provides SRIOV NIC as virtual function to
guest OS running on Hyper-V.

The code can be divided into two major parts. Gdma_main.c is the one to
bring up the hardware board and drives all underlying hardware queue
infrastructure. Mana_en.c contains all main ethernet driver code.
It has only tested and supported on amd64 architecture.

PR:		256336
Reviewed by:	decui@microsoft.com
Tested by:	whu
MFC after:	2 week
Relnotes:	yes
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D31150
2021-08-20 10:44:57 +00:00