Before this change the transmit taskqueue would enqueue itself when it
cannot find space on the NIC ring with the hope that eventually space
would be made. This results in the following livelock that only occurs
after passing ~200Gbps of TCP traffic for many hours:
100% CPU
┌───────────┐wait on ┌──────────┐ ┌───────────┐
│user thread│ cpu │gve xmit │wait on │gve cleanup│
│with mbuf ├────────►│taskqueue ├────────►│taskqueue │
│uma lock │ │ │ NIC ring│ │
└───────────┘ └──────────┘ space └─────┬─────┘
▲ │
│ wait on mbuf uma lock │
└───────────────────────────────────────────┘
Further details about the livelock are available on
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560.
After this change, the transmit taskqueue no longer spins till there is
room on the NIC ring. It instead stops itself and lets the
completion-processing taskqueue wake it up.
Since I'm touching the trasnmit taskqueue I've also corrected the name
of a counter and also fixed a bug where EINVAL mbufs were not being
freed and were instead living forever on the bufring.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: markj
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47138
(cherry picked from commit 40097cd67c0d52e2b288e8555b12faf02768d89c)
DQO is the descriptor format for our next generation virtual NIC.
It is necessary to make full use of the hardware bandwidth on many
newer GCP VM shapes.
This patch extends the previously introduced DQO descriptor format
with a "QPL" mode. QPL stands for Queue Page List and refers to
the fact that the hardware cannot access arbitrary regions of the
host memory and instead expects a fixed bounce buffer comprising
of a list of pages.
The QPL aspects are similar to the already existing GQI queue
queue format: in that the mbufs being input in the Rx path have
external storage in the form of vm pages attached to them; and
in the Tx path we always copy the mbuf payload into QPL pages.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: markj
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D46691
(cherry picked from commit 2348ac893d10f06d2d84e1e4bd5ca9f1c5da92d8)
DQO is the descriptor format for our next generation virtual NIC.
It is necessary to make full use of the hardware bandwidth on many
newer GCP VM shapes.
One major change with DQO from its predecessor GQI is that it uses
dual descriptor rings for both TX and RX queues.
The TX path uses a descriptor ring to send descriptors to HW, and
receives packet completion events on a TX completion ring.
The RX path posts buffers to HW using an RX descriptor ring and
receives incoming packets on an RX completion ring.
In GQI-QPL, the hardware could not access arbitrary regions of
guest memory, which is why there was a pre-negotitated bounce buffer
(QPL: Queue Page List). DQO-RDA has no such limitation.
"RDA" is in contrast to QPL and stands for "Raw DMA Addressing" which
just means that HW does not need a fixed bounce buffer and can DMA
arbitrary regions of guest memory.
A subsequent patch will introduce the DQO-QPL datapath that uses the
same descriptor format as in this patch, but will have a fixed
bounce buffer.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: markj
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D46690
(cherry picked from commit d438b4ef0cfc6986b93d0754f49ebf3ead50f269)
Change 4787572d05 made if_alloc_domain() never fail, then also do the
wrappers if_alloc(), if_alloc_dev(), and if_gethandle().
No functional change intended.
Reviewed by: kp, imp, glebius, stevek
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D45740
(cherry picked from commit aa3860851b9f6a6002d135b1cac7736e0995eedc)
This fixes a panic caused by double free.
PR: kern/279410
Differential Revision: https://reviews.freebsd.org/D45489
(cherry picked from commit b81cbb12410b000074483899e61e9e767ba3ec1d)
Each Rx descriptor points to a packet buffer of size 2K, which means
that MTUs greater than 2K see multi-descriptor packets. The TCP-hood of
such packets was being incorrectly determined by looking for a flag on
the last descriptor instead of the first descriptor.
Also fixed and progressed the version number.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41754
(cherry picked from commit 5f62584a9adb7887bae33af617cfa4f43017abf8)
Ringing the doorbell before making the BPF call can result in the
mbuf being freed before the BPF call.
Reviewed-by: markj
MFC-after: 3 days
Differential Revision: https://reviews.freebsd.org/D41189
While there, also make MODULE_PNP_INFO to reflect that the device
description is provided.
Reported-by: jrtc27
Reviewed-by: jrtc27, imp
Differential Revision: https://reviews.freebsd.org/D40430
gVNIC is a virtual network interface designed specifically for
Google Compute Engine (GCE). It is required to support per-VM Tier_1
networking performance, and for using certain VM shapes on GCE.
The NIC supports TSO, Rx and Tx checksum offloads, and RSS.
It does not currently do hardware LRO, and thus the software-LRO
in the host is used instead. It also supports jumbo frames.
For each queue, the driver negotiates a set of pages with the NIC to
serve as a fixed bounce buffer, this precludes the use of iflib.
Reviewed-by: markj
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D39873