The mbuf is NULL issue happens when the device sends the driver
a completion with a wrong request id.
Trigger a reset whenever this happens.
Approved by: cperciva (mentor)
Sponsored by: Amazon, Inc.
(cherry picked from commit da73e3a7d0)
This commit adds differentiation for a reset caused by missing tx
completions, by verifying if the driver didn't receive tx
completions caused by missing interrupts.
The cleanup_running field was added to ena_ring because
cleanup_task.ta_pending is zeroed before ena_cleanup() runs.
Also ena_increment_reset_counter() API was added in order to support
only incrementing the reset counter.
Approved by: cperciva (mentor)
Sponsored by: Amazon, Inc.
(cherry picked from commit a33ec635d1)
RX completion descriptors may sometimes contain errors due
to corruption. Upon identifying such a case, the driver will
trigger a reset with an explicit reset reason
ENA_REGS_RESET_RX_DESCRIPTOR_MALFORMED.
Approved by: cperciva (mentor)
Sponsored by: Amazon, Inc.
(cherry picked from commit 4af71159db)
TX completion descriptors may sometimes contain errors due
to corruption. Upon identifying such a case, the driver will
trigger a reset with an explicit reset reason
ENA_REGS_RESET_TX_DESCRIPTOR_MALFORMED.
Approved by: cperciva (mentor)
Sponsored by: Amazon, Inc.
(cherry picked from commit 3872721846)
This commit updates all the license signatures to 2024.
Approved by: cperciva (mentor)
Sponsored by: Amazon, Inc.
(cherry picked from commit 8d6806cd08)
Some of the files are using outdated linceses.
Update the license to be 2023.
Approved by: cperciva (mentor)
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 246aa27324)
This commit removes the usage of this API from the freebsd driver since
the relevant functionality is not supported by the device.
Approved by: cperciva (mentor)
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 2835752e07)
This commit fixes the usage of this function to be compatible with the
new API introduced by ena-com update to v2.7.0
Approved by: cperciva (mentor)
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
(cherry picked from commit 72e34ebdd0)
In most places, the req_id is printed first, and the qid is printed as a
second. To align the driver, one printout was reworked and the print
order of those variables was changed.
Suggested by: rpokala
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
We do not have atomic(9) routines for bools, and it is not guaranteed
that sizeof(bool) is 1.
This fixes the KASAN and KMSAN kernel builds, which fail because the
compiler refuses to silently cast a _Bool * to a uint8_t * when calling
the atomic(9) sanitizer interceptors.
Reviewed by: Dawid Górecki <dgr@semihalf.com>
MFC after: 2 weeks
Fixes: 0ac122c388 ("ena: Use atomic_load/store functions for first_interrupt variable")
Differential Revision: https://reviews.freebsd.org/D35683
Most of the constants in ena.h file were prefixed with ENA_*, while
others did not have this prefix. Align the constants by prefixing the
remaining constants with ENA.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
The ena_qid variable value is never used. It can be safely removed.
That also silences the compilation warning.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Surround cases of possible simultaneous access to the first_interrupt
variable with atomic_load/store functions.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Store timestamp of last cleanup in Tx ring structure. This does not
change anything during normal operation of the driver but could be
useful when the device fails for some reason.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Print information about qid if req_id is invalid. Add information about
qid and req_id if mbuf is invalid.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Since `ena_com_tx_comp_req_id_get` already checks for `req_id` validity,
the logic was exiting early, never giving `validate_tx_req_id` a chance
to trigger device reset.
Rewrite the logic so that device reset is called based on return value
of `ena_com_tx_comp_req_id_get` instead.
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
ena_tx_csum function did not check if IPv6 checksum offload was
requested it only checked checksum offloading flags for IPv4 packets.
Because of that, when encountering CSUM_IP6_* flags, the function simply
returned without actually setting checksum offloading in ena_ctx.
Check CUSM_IP6_* flags to enable IPv6 checksum offload.
Additionally, only IPv4 header was being parsed regardless of EtherType
field, because of that, value of L4 protocol read when actually trying
to send IPv6 packets was wrong. Use ip6_lasthdr function to get length
of all IPv6 headers and payload protocol.
Set the DF flag to 1 in order to allow the device to offload the IPv6
checksum calculation and achieve optimal performance.
Add CSUM6_OFFLOAD and CSUM_OFFLOAD definitions into ena_datapath.h.
Submitted by: Dawid Gorecki <dgr@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Bind RX/TX queues and MSI-X vectors to matching CPUs based on the RSS
bucket entries.
Introduce sysctls for the following RSS functionality:
- rss.indir_table: indirection table mapping
- rss.indir_table_size: indirection table size
- rss.key: RSS hash key (if Toeplitz used)
Said sysctls are only available when compiled without `option RSS`, as
kernel-side RSS support currently doesn't offer RSS reconfiguration.
Migrate the hash algorithm from CRC32 to Toeplitz and change the initial
hash value to 0x0 in order to match the standard Toeplitz implementation.
Provide helpers for hash key inversion required for HW operations.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Provide the following sysctl statistics in order to stay aligned with
the Linux driver:
* rx_ring.csum_good
* tx_ring.unmask_interrupt_num
Also rename the 'bad_csum' statistic name to 'csum_bad' for alignment.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
ENA silently assumed that ena_up, ena_down and ena_start_xmit routines
should be called within locked context. Driver's logic heavily assumes
on concurrent access to those routines, so for safety and better
documentation about this assumption, the locking assertions were added
to the above functions.
The assertion was added only for the main steps (skipping the helper
functions) which can be called from multiple places including the kernel
and the driver itself.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Stay aligned with the Linux driver by adding the following logs:
* inform the user about retrying queue creation
* warn on non-empty ena_tx_buffer.mbuf prior to ena_tx_map_mbuf
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
All ena_com_prepare_tx errors other than ENA_COM_NO_MEM are fatal and
require device reset.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
In case of Low-latency Queue, one small enough descriptor can be pushed
directly to the ENA hw, thus saving one fragment. Check for this
condition before performing collapse.
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Update the driver in order not to break its compilation
and make use of the new ENA logging system
Migrate platform code to the new logging system provided by ena_com
layer.
Make ENA_INFO the new default log level.
Remove all explicit use of `device_printf`, all new logs requiring one
of the log macros to be used.
According to man style(9), only C-style comments should be used.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Refering to guide: https://wiki.freebsd.org/SPDX the SPDX tag should not
replace the standard license text, however it should be added over the
standard license text to make the automation easier.
Because of that, the old license was kept, but the SPDX tag was added
on top of every ENA driver file.
Submited by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27117
For the first descriptor in a chain the data may start at an offset.
It is optional feature of some devices, so the driver must ack that
it supports it.
The data pointer of the mbuf is simply shifted by the given value.
Submitted by: Maciej Bielski <mba@semihalf.com>
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27116
* Use the new API of ena_trace_*
* Fix typo syndrom --> syndrome
* Remove validation of the Rx req ID (already performed in the ena-com)
* Remove usage of deprecated ENA_ASSERT macro
Submitted by: Ido Segev <idose@amazon.com>
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27115
Networking is broken if the driver configures its (virtual) hardware to
use a hash algorithm (or a key) different from the one that the network
stack (software RSS) uses. This can be seen with connections initiated
from the host. The PCB will be placed into the hash table based on the
hash value calculated by the software. The hardware-calculated hash
value in reponse packets will be different, so the PCB won't be found.
Tested with a kernel compiled with 'options RSS' on an instance with ena
driver.
Reviewed by: mw, adrian
MFC after: 2 weeks
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D24733
Driver version upgrade is connected with support for the new device
fetures, like Tx drops reporting or disabling meta caching.
Moreover, the driver configuration from the sysctl was reworked to
provide safer and better flow for configuring:
* number of IO queues (new feature),
* drbr size on Tx,
* Rx queue size.
Moreover, a lot of minor bug fixes and improvements were added.
Copyright date in the license of the modified files in this release was
updated to 2020.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
There is no guarantee from bus_dmamap_load_mbuf_sg() for matching
mbuf chain segments to dma physical segments.
This patch ensure correctly mapping to LLQ header and DMA segments.
Submitted by: Ido Segev <idose@amazon.com>
Obtained from: Amazon, Inc.
Determined by a flag passed from the device. No metadata is set within
ena_tx_csum when caching is disabled.
Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
This patch reworks how the Rx queue size is being reconfigured and how
the information from the device is being processed.
Reconfiguration of the queues and reset of the device in order to make
the changes alive isn't the best approach. It can be done synchronously
and it will let to pass information if the reconfiguration was
successful to the user. It now is done in the ena_update_queue_size()
function.
To avoid reallocation of the ring buffer, statistic counters and the
reinitialization of the mutexes when only new size has to be assigned,
the io queues initialization function has been split into 2 stages:
basic, which is just copying appropriate fields and the advanced, which
allocates and inits more advanced structures for the IO rings.
Moreover, now the max allowed Rx and Tx ring size is being kept
statically in the adapter and the size of the variables holding those
values has been changed to uint32_t everywhere.
Information about IO queues size is now being logged in the up routine
instead of the attach.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
As the reset triggering is no longer a simple macro that was just
setting appropriate flag, the new function for triggering reset was
added. It improves code readability a lot, as we are avoiding additional
indentation.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
* Removed adaptive interrupt moderation (not suported on FreeBSD).
* Use ena_com_free_q_entries instead of ena_com_free_desc.
* Don't use ENA_MEM_FREE outside of the ena_com.
* Don't use barriers before calling doorbells as it's already done in
the HAL.
* Add function that generates random RSS key, common for all driver's
interfaces.
* Change admin stats sysctls to U64.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Driver working in LLQ mode in some cases can send only few last segments
of the mbuf using DMA engine, and the rest of them are sent to the
device using direct PCI transaction. To map the only necessary data, two DMA
maps were used. That solution was very rough and was causing a bug - if
both maps were used (head_map and seg_map), there was a race in between
two flows on two queues and the device was receiving corrupted
data which could be further received on the other host if the Tx cksum
offload was enabled.
As it's ok to map whole mbuf and then send to the device only needed
segments, the design was simplified to use only single DMA map.
The driver version was updated to v2.1.1 as it's important bug fix.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Two new tables are added to ena_tx_buffer structure:
* netmap_map_seg stores DMA mapping structures,
* netmap_buf_idx stores buff indexes taken from the slots.
When Tx resources are being set, the new mapping structures are created
and netmap Tx rings are being reset.
When Tx resources are being released, used netmap bufs are unmapped from
DMA and then mapping structures are destroyed.
When Tx interrupt occurrs, ena_netmap_tx_irq is called.
ena_netmap_txsync callback signalizes that there are new packets which
should be transmitted.
First, it fills ena_netmap_ctx. Then it performs two actions:
* ena_netmap_tx_frames moves packets from netmap ring to NIC,
* ena_netmap_tx_cleanup restores buffers from NIC and gives them back
to the userspace app.
0 is returned in case of Tx error that could be handled by the driver.
ena_netmap_tx_frames checks if there are packets ready for transmission.
Then, for each of them, ena_netmap_tx_frame is called. If error occurs,
transmitting is stopped, but if the error was cause due to HW ring being
full, information about that is not propagated to the userspace app.
When all packets are ready, doorbell is written to NIC and netmap ring
state is updated.
Parsing of one packet is done by the ena_netmap_tx_frame function.
First, it checks if number of slots does not exceed NIC limit. Invalid
packets are being dropped and the error is propagated to the upper
layer. As each netmap buffer has equal size, which is typically greater
then 2KiB, there shouldn't be any packets which contain too many slots.
Then, the ena_com_tx_ctx structure is being filled. As netmap does not
support any hardware offloads, ena_com_tx_meta structure is set to zero.
After that, ena_netmap_map_slots maps all memory slots for DMA.
If the device works in the LLQ mode, the push header is being determined
by checking if the header fits within the first socket.
If so, the portion of data is being copied directly from the slot.
In other case, the data is copied to the intermediate buffer.
First slots are treated the same as as the others, because DMA mapping
has no impact on LLQ mode. Index of each netmap buffer is taken from
slot and stored in netmap_buf_idx array. In case of mapping error,
memory is unmapped and packets are put back to the netmap ring.
ena_netmap_tx_cleanup performs out of order cleanup of sent buffers.
First, req_id is taken and is validated. As validate_tx_req_id from
ena.c is specific to kernels mbuf, another implementation is provided.
Each req_id is cleaned up by ena_netmap_tx_clean_one function. Buffers
are being unmaped from DMA and put back to netmap ring. In the end,
state of netmap and NIC rings are being updated.
Differential Revision: https://reviews.freebsd.org/D21936
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Most of code used for Rx ring initialization could be reused in NETMAP.
Reset of NETMAP ring and new alloc method was added. Driver decides if
use kernels mbufs or NETMAPs slots based on IFCAP_NETMAP flag. It
allows to reuse ena_refill_rx_bufs, which provides proper handling of
Rx out of order completion.
ena_netmap_alloc_rx_slot takes exactly the same arguments as
ena_alloc_rx_mbuf, but instead of allocating one mbuf it takes one slot
from NETMAP ring. Based on queue id proper netmap_ring is found. As
NETMAP provides the "partial opening" feature not all of the rings are
avaiable. Not used points to invalid ring. If there is available slot,
it is taken from the ring. Its buffer is mapped to DMA and its index is
stored in ena_rx_buffer field in ena_rx_buffer structure. Then ena_buf
is filled with addresses and ring state is updated.
Cleanup is handled by ena_netmap_free_rx_slot. It unmaps DMA and returns
buffer to ring. As we could not return more bufs than we have taken and
we should not override occupied slots, buf_index should be 0. It is
being checked by assertion.
ena_netmap_rxsync callback puts received packets back to NETMAP ring and
passes them to user space by updating ring pointers. First it fills
ena_netmap_ctx.
Then it performs two actions:
* ena_netmap_rx_frames moves received frames from NIC to NETMAP ring,
* ena_netmap_rx_cleanup fills NIC ring with slots released by userspace
app.
In case of Rx error that could be handled by NIC driver (for example by
performing reset) rx sync should return 0.
ena_netmap_rx_frames first checks if NETMAP ring is in consistent
state and then in the loop receives new frames. When all available
frames are taken nr_hwtail is updated.
Receiving one frame is handled by ena_netmap_rx_frame. If no error
occurrs, each Descriptor is loaded by ena_netmap_rx_load_desc function.
If packets take more than one segments NS_MOREFRAG flag must be set in
all, but not last slot. In case of wrong req_id packet is removed from
NETMAP ring. If packet is successful received counters are updated.
Refiling of NIC ring is performed by ena_netmap_rx_cleanup function.
It calculates number of available slots and call ena_refill_rx_bufs with
proper number.
Differential Revision: https://reviews.freebsd.org/D21935
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Move Rx/Tx routines to separate file.
Some functions:
* ena_restore_device,
* ena_destroy_device,
* ena_up,
* ena_down,
* ena_refill_rx_bufs
could be reused in upcoming netmap code in the driver. To make it
possible, they were moved to ena.h header.
Differential Revision: https://reviews.freebsd.org/D21933
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.