According to section 5.1.6.2.1 of version 1.3 of the virtio
specification, the driver MUST NOT set VIRTIO_NET_HDR_F_DATA_VALID in
the flags. So don't do that.
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D53650
(cherry picked from commit 836b3cd9d7910aff5225e9e58189067ca03fae30)
Transmit segment offloading depends on transmit checksum offloading.
Enforce that constraint. This also fixes a bug, since if_hwassist bits
are from the CSUM_ space, not from the IFCAP_ space.
PR: 290773
Reviewed by: Timo Völker
Tested by: lg@efficientip.com
Differential Revision: https://reviews.freebsd.org/D53629
(cherry picked from commit 4c50ac68166caf7e08c5a9984d63fa91490fa50d)
The type of variable promisc and allmulti was changed from int to bool
by commit [1].
[1] 7dce56596f Convert to if_foreach_llmaddr() KPI
MFC after: 3 days
(cherry picked from commit 80dfed11fc1c61ce9168db01dee263447619e859)
Keep the hwassist flags for transmit checksum offload and transmit
segment offload in sync with the enabled capabilities.
Reported by: Timo Völker
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D52765
(cherry picked from commit f2575d56c8c9a8acad4a61a3586546dff4febce1)
Hardware TCP LRO results in problems in settings with IP forwarding
being enabled. In case of nodes without IP forwarding, using
software LRO is also beneficial in general, since it can provide better
information about what was received on the wire.
Therefore, disable hardware TCP LRO by default.
By tuning the loader tunable, this can be changed.
PR: 263229
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D52684
(cherry picked from commit 6e4b811009d63f33c59d51f28fd4a030ca90843e)
Enable the handling of the IFCAP_RXCSUM_IPV6 handling by handling
IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6 as a pair. Also make clear, that
software and hardware LRO require receive checksum offload.
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D52682
(cherry picked from commit eaf619fddcb21859311b895a0836da3171a01531)
When ALTQ is enabled, this driver does "hardware" accounting and soft
accounting at the same time. Prefer the "hardware" one to make the logic
simpler.
Reviewed by: zlei
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D44817
(cherry picked from commit 2a346c8993cbb92a321a7c25bd9ac4dcaae352d1)
While here, advertise the IFCAP_HWSTATS capability to avoid the net
stack from double counting it.
Co-authored-by: zlei
Reviewed by: zlei
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D44816
(cherry picked from commit a14d561e58529c9686a2efc47f4828ad82026e63)
When transmitting a packet over the vtnet interface, map the
csum flags CSUM_DATA_VALID | CSUM_PSEUDO_HDR to the virtio
flag VIRTIO_NET_HDR_F_DATA_VALID.
When receiving a packet over the virtio network channel, translate
the virtio flag VIRTIO_NET_HDR_F_NEEDS_CSUM not to CSUM_DATA_VALID |
CSUM_PSEUDO_HDR, but to CSUM_TCP, CSUM_TCP_IPV6, CSUM_UDP, or
CSUM_UDP_IPV6.
The second change fixes a series of issue related to checksum
offloading for if_vtnet.
While there, improve the stats counters to allow a detailed view
on what is going on in relation to checksum offloading.
PR: 165059
Reviewed by: tuexen, manpages
Differential Revision: https://reviews.freebsd.org/D51686
(cherry picked from commit 3008f30d2c2cabdd7e17f7fb922139da8681ffbd)
Fix the aggregation of the interface level counters
* dev.vtnet.X.tx_task_rescheduled,
* dev.vtnet.X.tx_tso_offloaded,
* dev.vtnet.X.tx_csum_offloaded,
* dev.vtnet.X.rx_task_rescheduled,
* dev.vtnet.X.rx_csum_offloaded, and
* dev.vtnet.X.rx_csum_failed.
Also ensure that dev.vtnet.X.tx_defrag_failed only counts the number
of times m_defrag() fails.
While there, mark sysctl-variables used for exporting statistics as
such (CTLFLAG_STATS).
Reviewed by: Timo Völker
Differential Revision: https://reviews.freebsd.org/D51999
(cherry picked from commit 03da4395158d374b5e38623f6744ce31302b530c)
We can do so trivially, so make these tables read-only. No functional
change intended.
Reviewed by: cem, emaste
MFC after: 2 weeks
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D52003
(cherry picked from commit d5f55356a2fbf8222fb236fe509821e12f1ea456)
Include opt_inet.h and opt_inet6.h early in the files including
virtio_net.h, since they use INET and/or INET6.
While there, remove redundant inclusion of sys/types.h, since it is
included already by sys/param.h.
There was a discussion to include opt_inet.h and opt_inet6.h also
in virtio_net.h. glebius suggested to add a mechanism for files
to check, if required opt_*.h files were included. virtio_net.h
will be the first consumer of this mechanism.
Reviewed by: glebius, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D52046
(cherry picked from commit 3077532b1bb2911d3012ee90bae9d9499c960569)
Remove an always-false check for whether the request has already
completed before sleeping. Even if the request is complete, the
response tag is updated while holding the channel lock, which is also
held here.
No functional change intended.
Sponsored by: Klara, Inc.
(cherry picked from commit 28c9b13b236d25512cfe4e1902411ff421a14b64)
Otherwise we can end up with a lost interrupt, causing lost request
completion wakeups and hangs in the filesystem layer.
Continue processing until we enable interrupts and then observe an empty
queue, like other virtio drivers do.
Sponsored by: Klara, Inc.
Some implementations of the virtio 9p transport are implemented on
virtio_mmio, e.g. the Arm FVP. Use the correct macro so the driver
attaches when this is the case.
Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D49600
If, when submitting a request, the virtqueue is full, we sleep until an
interrupt has fired, then restart the request. However, while sleeping
the channel lock is dropped, and in the meantime another thread may have
reset the per-channel SG list, so upon retrying we'd (re)submit whatever
happened to be left over in the previous request.
Fix the problem by rebuilding the SG list after sleeping.
Sponsored by: Klara, Inc.
- Remove superfluous newlines.
- Use bool literals.
- Replace an unneeded SYSINIT with static initialization.
No functional change intended.
Sponsored by: Klara, Inc.
When the module is loaded on a system running on qemu/kvm the "modern"
virtio infrastructure is used and virtio_read_device_config() will end
up calling vtpci_modern_read_dev_config(). This function cannot read
values of arbitrary sizes and will panic if the p9fs mount tag size is
not supported by it.
Use virtio_read_device_config_array() instead. It was tested on both
bhyve and qemu/kvm.
PR: 280098
Co-authored-by: Mark Peek <mp@FreeBSD.org>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1320
device_attach routines are allowed to sleep, and this routine already
has other M_WAITOK allocations.
Reported by: markj
Reviewed by: markj
Fixes: 1efd69f933b6 ("p9fs: move NULL check immediately after alloc...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45721
This is derived from swills@ fork of the Juniper virtfs with many
changes by me including bug fixes, style improvements, clearer layering
and more consistent logging. The filesystem is renamed to p9fs to better
reflect its function and to prevent possible future confusion with
virtio-fs.
Several updates and fixes from Juniper have been integrated into this
version by Val Packett and these contributions along with the original
Juniper authors are credited below.
To use this with bhyve, add 'virtio_p9fs_load=YES' to loader.conf. The
bhyve virtio-9p device allows access from the guest to files on the host
by mapping a 'sharename' to a host path. It is possible to use p9fs as a
root filesystem by adding this to /boot/loader.conf:
vfs.root.mountfrom="p9fs:sharename"
for non-root filesystems add something like this to /etc/fstab:
sharename /mnt p9fs rw 0 0
In both examples, substitute the share name used on the bhyve command
line.
The 9P filesystem protocol relies on stateful file opens which map
protocol-level FIDs to host file descriptors. The FreeBSD vnode
interface doesn't really support this and we use heuristics to guess the
right FID to use for file operations. This can be confused by privilege
lowering and does not guarantee that the FID created for a given file
open is always used for file operations, even if the calling process is
using the file descriptor from the original open call. Improving this
would involve changes to the vnode interface which is out-of-scope for
this import.
Differential Revision: https://reviews.freebsd.org/D41844
Reviewed by: kib, emaste, dch
MFC after: 3 months
Co-authored-by: Val Packett <val@packett.cool>
Co-authored-by: Ka Ho Ng <kahon@juniper.net>
Co-authored-by: joyu <joyul@juniper.net>
Co-authored-by: Kumara Babu Narayanaswamy <bkumara@juniper.net>
This blit an ARGB image on the dedicated vd.
This also adds vt_fb_bitblt_argb which will works for most of the vt backends
Differential Revision: https://reviews.freebsd.org/D45929
Reviewed by: tsoome
Sponsored by: Beckhoff Automation GmbH & Co. KG
(cherry picked from commit b93028d8cd3aafc883b5f0ecec65a8a2a30af7f3)
Change 4787572d05 made if_alloc_domain() never fail, then also do the
wrappers if_alloc(), if_alloc_dev(), and if_gethandle().
No functional change intended.
Reviewed by: kp, imp, glebius, stevek
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D45740
(cherry picked from commit aa3860851b9f6a6002d135b1cac7736e0995eedc)
Some platforms require an adjustment of the ethernet hearders. Rather
than make this be on __NO_STRICT_ALIGNMENT being defined, define
VTNET_ETHER_ALIGN to be either 0 or ETHER_ALIGN (aka 2). Add a test to
the if statements to only do them when != 0. This eliminates the #ifdef
sprinkled in the code, still communicates the intent and gives the same
compiled results.
Sponsored by: Netflix
Reviewed by: bz, bryanv
Differential Revision: https://reviews.freebsd.org/D43654
(cherry picked from commit 0ea4b4084845bfeedc8c692e4d34252023b78cb3)
While we account for the padding in the length of the mbuf we use, we do
not account for it when we 'guess' the size of the mbuf to allocate
based in the MTU of the device. This leads to a situation where we might
fail if the mtu is close to a bucket size (say 2018) such that the added
padding would push us over the edge for a full-sized packet. mtu of 2018
is super rare (2016 and 2020 would both work), but fix it none-the-less.
It's a shame we can't just set VTNET_RX_HEADER_PAD to 2 in this case. The 4
seems hard-coded somewhere I've not found documented (I think it's in the
protocol given the comments about VIRTIO_F_ANY_LAYOUT).
Sponsored by: Netflix
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D43656
(cherry picked from commit d9e0e42627613b56abf0f8fa1ad601e5690d775c)
If the header that we add to the packet's size is 0 % 4 and we're
strictly aligning, then we need to adjust where we store the header so
the packet that follows will have it's struct ip header properly
aligned. We do this on allocation (and when we check the length of the
mbufs in the lro_nomrg case). We can't just adjust the clustersz in the
softc, because it's also used to allocate the mbufs and it needs to be
the proper size for that. Since we otherwise use the size of the mbuf
(or sometimes the smaller size of the received packet) to compute how
much we can buffer, this ensures no overflows. The 2 byte adjustment
also does not affect how many packets we can receive in the lro_nomrg
case.
PR: 271288
Sponsored by: Netflix
Reviewed by: bryanv
Differential Revision: https://reviews.freebsd.org/D43224
(cherry picked from commit 3be59adbb5a2ae7600d46432d3bc82286e507e95)
xpt_path_string() is now a wrapper around xpt_path_sbuf(). Using it
to than concatenate result to another sbuf makes no sense. Just call
xpt_path_sbuf() directly.
MFC after: 1 month
(cherry picked from commit 8c4ee0b22c98fc1e208dd133f617bd329cd10728)
Queue an initial fetch of data during attach and after every read
rather than synchronously fetching data and polling for completion.
If data has not been returned from an previous fetch during read,
just return EAGAIN rather than blocking.
Co-authored-by: John Baldwin <jhb@FreeBSD.org>
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D41656
(cherry picked from commit f1c5a2e3a625053e2b70d5b1777d849a4d9328f2)
Use the correct endian switching function when switching to a little
endian 64-bit address. Even on a little-endian machine this will
truncate the address to a 32-bit value.
Sponsored by: Arm Ltd
(cherry picked from commit 4386935191c576fa62a52d52734e264fe8329a67)
Add a driver to connect vt to the VirtIO GPU device in 2D mode. This
provides a output on the display when a qemu virtio gpu device is
added, e.g. with -device virtio-gpu-pci.
Tested on qemu using UTM, and a Hetzner arm64 VM instance.
Reviewed by: bryanv (earlier version)
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D40094
If the host doesn't announce VIRTIO_NET_F_CTRL_RX we cannot disable all
multicast traffic. Previously we'd refuse to set the IFF_ALLMULTI flag,
which is the exact opposite of what is actually happening.
This broke things such as igmpproxy.
See also: https://redmine.pfsense.org/issues/14301
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D41356
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6b. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a. However in e87c494015 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.
This change not only reverts e87c494015 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch. A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.
Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH. But the tcp_lro_flush_all() asserts the presence
of network epoch. Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.
Reviewed by: zlei, gallatin, erj
Differential Revision: https://reviews.freebsd.org/D39510
The code paths while dumping do not got through busdma. As such,
safeguard against calling bus_dmamap_sync() with a NULL map. The x86
implementation tolerates this but others do not, resulting in a
NULL dereference panic when dumping to a vtblk device on arm64, riscv,
etc.
Fixes: 782105f7c8 ("vtblk: Use busdma")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D37990
Virtio operates with physical addresses, while busdma is designed to
map these to produce bus addresses. On most supported platforms,
these two are interchangeable; on powerpc platforms, they are not.
When on powerpc, set an IOMMU of NULL, which causes the powerpc busdma
code to bypass the iommu mapping; this leaves us with the physical
buffer addresses which the virtio host expects to see.
Tested by: alfredo
Fixes: 782105f7c8 ("vtblk: Use busdma")
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D37891
Now that vtblk uses busdma, it keeps important information inside its
request structures. The functions used for kernel dumps synthesize
their own request structures rather than using structures initialized
with the necessary bits for busdma.
Add busdma-bypass paths. Since dumping writes contiguous blocks of
physical memory, vtblk doesn't need busdma in that case.
Reported by: glebius
Tested by: glebius
Differential Revision: https://reviews.freebsd.org/D37243
We assume that bus addresses from busdma are the same thing as
"physical" addresses in the Virtio specification; this seems to
be true for the platforms on which Virtio is currently supported.
For block devices which are limited to a single data segment per
I/O, we request PAGE_SIZE alignment from busdma; this is necessary
in order to support unaligned I/Os from userland, which may cross
a boundary between two non-physically-contiguous pages. On devices
which support more data segments per I/O, we retain the existing
behaviour of limiting I/Os to (max data segs - 1) * PAGE_SIZE.
Reviewed by: bryanv
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D36667