Replace is_power_of_2(length) with power2(length). When length != 0, as in
this case, they produce the same result. This will allow an implementation
of is_power_of_two to be dropped.
Reviewed by: alc, markj
Differential Revision: https://reviews.freebsd.org/D45536
(cherry picked from commit a94ed493b50752cee09245fc312c63b00331f217)
Linux has a header file that defines an ilog2 function and some simple
functions/macros that use it: roundup_pow_of_two, is_power_of_2,
rounddown_pow_of_two, and order_base_2. This change moves three of
those simple functions (all but is_power_of_2) from linuxkpi to
libkern. It also deletes a few implementations of these functions
that have previously been copied into code for various device drivers,
so that they can use the libkern version. The is_power_of_2 macro was
not moved because powerof2 in param.h provides almost the same service
already (except that they disagree about whether 0 is a power of two).
Since the linux definitions of these functions were copied into
FreeBSD 11 years ago, linux has improved them, and this change
provides those improvements. In particular, a giant table of log
values for evaluating ilog2 for constant values is no longer
necessary.
Reviewed by: alc, markj (previous version)
Differential Revision: https://reviews.freebsd.org/D45536
(cherry picked from commit c8b0c33b03ac072413b27bed2bdae2ae27426f3a)
The kernel source contains several definitions of an ilog2 function;
some are slower than necessary, and one of them is incorrect.
Elimininate them all and define an ilog2 macro in libkern to replace
them, in a way that is fast, correct for all argument types, and, in a
GENERIC kernel, includes a check for an invalid zero parameter.
Folks at Microsoft have verified that having a correct ilog2
definition for their MANA driver doesn't break it.
Reviewed by: alc, markj, mhorne (older version), jhibbits (older version)
Differential Revision: https://reviews.freebsd.org/D45170
Differential Revision: https://reviews.freebsd.org/D45235
(cherry picked from commit b0056b31e90029553894d17c441cbb2c06d31412)
We don't use legacy receive descriptors and masking out the vlan ID
isn't necessary since the tag is in the standard format, so remove it.
(cherry picked from commit 124b7722aad7d4cf12d96c030659aef78175aa9c)
This got lost many years ago in 8eb6488ebb
It is used by the driver's DBG printfs.
(cherry picked from commit bf6f0db8a762966b08430692c92ae34e667948db)
Remove an always-false check for whether the request has already
completed before sleeping. Even if the request is complete, the
response tag is updated while holding the channel lock, which is also
held here.
No functional change intended.
Sponsored by: Klara, Inc.
Otherwise we can end up with a lost interrupt, causing lost request
completion wakeups and hangs in the filesystem layer.
Continue processing until we enable interrupts and then observe an empty
queue, like other virtio drivers do.
Sponsored by: Klara, Inc.
If, when submitting a request, the virtqueue is full, we sleep until an
interrupt has fired, then restart the request. However, while sleeping
the channel lock is dropped, and in the meantime another thread may have
reset the per-channel SG list, so upon retrying we'd (re)submit whatever
happened to be left over in the previous request.
Fix the problem by rebuilding the SG list after sleeping.
Sponsored by: Klara, Inc.
- Remove superfluous newlines.
- Use bool literals.
- Replace an unneeded SYSINIT with static initialization.
No functional change intended.
Sponsored by: Klara, Inc.
When the module is loaded on a system running on qemu/kvm the "modern"
virtio infrastructure is used and virtio_read_device_config() will end
up calling vtpci_modern_read_dev_config(). This function cannot read
values of arbitrary sizes and will panic if the p9fs mount tag size is
not supported by it.
Use virtio_read_device_config_array() instead. It was tested on both
bhyve and qemu/kvm.
PR: 280098
Co-authored-by: Mark Peek <mp@FreeBSD.org>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1320
device_attach routines are allowed to sleep, and this routine already
has other M_WAITOK allocations.
Reported by: markj
Reviewed by: markj
Fixes: 1efd69f933b6 ("p9fs: move NULL check immediately after alloc...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45721
This is derived from swills@ fork of the Juniper virtfs with many
changes by me including bug fixes, style improvements, clearer layering
and more consistent logging. The filesystem is renamed to p9fs to better
reflect its function and to prevent possible future confusion with
virtio-fs.
Several updates and fixes from Juniper have been integrated into this
version by Val Packett and these contributions along with the original
Juniper authors are credited below.
To use this with bhyve, add 'virtio_p9fs_load=YES' to loader.conf. The
bhyve virtio-9p device allows access from the guest to files on the host
by mapping a 'sharename' to a host path. It is possible to use p9fs as a
root filesystem by adding this to /boot/loader.conf:
vfs.root.mountfrom="p9fs:sharename"
for non-root filesystems add something like this to /etc/fstab:
sharename /mnt p9fs rw 0 0
In both examples, substitute the share name used on the bhyve command
line.
The 9P filesystem protocol relies on stateful file opens which map
protocol-level FIDs to host file descriptors. The FreeBSD vnode
interface doesn't really support this and we use heuristics to guess the
right FID to use for file operations. This can be confused by privilege
lowering and does not guarantee that the FID created for a given file
open is always used for file operations, even if the calling process is
using the file descriptor from the original open call. Improving this
would involve changes to the vnode interface which is out-of-scope for
this import.
Differential Revision: https://reviews.freebsd.org/D41844
Reviewed by: kib, emaste, dch
MFC after: 3 months
Co-authored-by: Val Packett <val@packett.cool>
Co-authored-by: Ka Ho Ng <kahon@juniper.net>
Co-authored-by: joyu <joyul@juniper.net>
Co-authored-by: Kumara Babu Narayanaswamy <bkumara@juniper.net>
The current addition to the interrupt nesting level in
xen_arch_intr_handle_upcall() needs to be compensated in
xen_intr_handle_upcall(), otherwise interrupts dispatched by the upcall handler
end up seeing a td_intr_nesting_level of 2 or more, which makes them assume
there's been an interrupt nesting.
Such extra interrupt nesting count lead to statclock() reporting idle time as
interrupt, as the call from interrupt context will always be seen as a nested
one (td->td_intr_nesting_level >= 2) due to the nesting count increase done by
both xen_arch_intr_handle_upcall() and intr_execute_handlers().
Fix this by adjusting the nested interrupt count before dispatching interrupts
from xen_intr_handle_upcall().
PR: 277231
Reported by: Matthew Grooms <mgrooms@shrew.net>
Fixes: af610cabf1 ('xen/intr: adjust xen_intr_handle_upcall() to match driver filter')
Sponsored by: Cloud Software Group
Reviewed by: Elliott Mitchell <ehem+freebsd@m5p.com>
This implementation had various bugs. bde@ reported that the unit
conversion/scaling is wrong, and it also does not handle 82574L or
igb(4) devices correctly.
With the new AIM code, it is expected most users will not need to
manually tune this.
If you do need static control:
hw.em.enable_aim=0 for all interfaces at boot or dev.em.X.enable_aim=0
for individual interfaces at runtime and they will track the
hw.em.max_interrupt_rate tunable. That codepath has been bugfixed for
all supported chipsets.
You may view the current rate with dev.em.X.queue_rx_0.interrupt_rate
which has been bugfixed for all supported chipsets.
If you need to set different rates per interface for some reason let me
know and I will rethink how to add this back. Otherwise you can leave
AIM on for general purpose interfaces and disable it at runtime on
special purpose low or high latency interfaces that would track
hw.em.max_interrupt_rate if you have a mix of concerns.
PR: 235031
Reported by: Bruce Evans <bde@FreeBSD.org>
Relnotes: yes
Sponsored by: BBOX.io
(cherry picked from commit 4020351325c02cc27aa4992c199ff18a9542a52c)
This is a retread of https://reviews.freebsd.org/D34449 which I think
will fix the issue for the remote side not supporting autoneg. We now
attempt an autoneg, and if that fails fall back to the current code
that forces the link speed/duplex.
The original intent of this patch is to inform the remote switch of
duplex settings when we (the client) are specifying a fixed 10 or 100
speed. Otherwise it may get the duplex setting wrong.
The tricky case is when the remote (switch) side is fixing its
speed AND duplex while disabling autoneg and we (client) need to do
the same, which still seems to be common enough at some ISPs.
Original commit message follows:
Currently if an e1000 interface is set to a fixed media configuration,
for gigabit, it will participate in auto-negotiation as required by
IEEE 802.3-2018 Clause 37. However, if set to fixed media configuration
for 100 or 10, it does NOT participate in auto-negotiation.
By my reading of Clauses 28 and 37, while auto-negotiation is optional
for 100 and 10, it is not prohibited and is, in fact, "highly
recommended".
This patch enables auto-negotiation for fixed 100 and 10 media
configuration, in a similar manner to that already performed for 1000.
I.e., the patch enables advertising of just the manually configured
settings with the goal of allowing the remote end to match the manually
configured settings if it has them available.
To be clear, this patch does NOT allow an em(4) interface that has been
manually configured with specific media settings to respond to
auto-negotiation by then configuring different parameters to those that
were manually configured. The intent of this patch is to fully comply
with the requirements of Clause 37, but for 100 and 10.
The need for this has arisen on an em(4) link where the other end is
under a different administrative control and is set to full
auto-negotiation. Due to the cable length GigE is not working well. It
is desired to set the em(4) end to "media 100baseTX mediatype
full-duplex" which does work when both ends are configured that way.
Currently, because em(4) does not participate in autoneg for this
setting, the remote defaults to half-duplex - i.e., there's a duplex
mismatch and things don't work. With this patch, em(4) would inform the
remote that it has only 100baseTX full, the remote would match that and
it will work.
Tested by: Natalino Picone <natalino.picone@nozominetworks.com>
Tested by: Franco Fichtner <franco@opnsense.org>
Tested by: J.R. Oldroyd <fbsd@opal.com> (previous version)
Sponsored by: Nozomi Networks
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D47336
(cherry picked from commit bceec3d80a3caf9249e24247fb937674bf5b46b5)
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.
This allows to change the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.
Reviewed by: rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D44259
(cherry picked from commit 90853dfac851afa9e8840f5a38383256d75458b6)
* Adds support for SFPs that are not correctly coded as an SFP
transceiver. i.e. Coherent-Finisar FCLF8522P2BTL.
* Configures multi-rate SFPs i.e. Coherent-Finisar FCLF8522P2BTL as
SGMII so they can do 10/100/1000 auto-negotiation.
* Adds support for 100BaseLX SGMII transceivers.
* Some code cleanup and additional debugging.
Reviewed by: emaste, markj, Franco Fichtner <franco@opnsense.org>
Tested by: Natalino Picone <natalino.picone@nozominetworks.com>
Sponsored by: Nozomi Networks
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D47337
(cherry picked from commit 15853a5fc9548d9805a2ef22f24e2eb580198341)
As explained in PR 277038, iflib calls IFDI_DETACH() and then
IFDI_QUEUES_FREE(). With igc, the latter writes to a register after it
has been unmapped.
igc_if_detach() already calls igc_release_hw_control(), and looking at
callers of igc_if_queues_free(), that appears to be sufficient. So,
just remove the igc_release_hw_control() call.
PR: 277038
Reported by: Mike Belanger <mibelanger@qnx.com>
Reviewed by: kbowling
Tested by: kbowling
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D47293
(cherry picked from commit 35d05a14ed7e9935be1ed0fe965b91aaaa4c92ef)
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.
This allows to change the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.
Sponsored by: Netflix
(cherry picked from commit ab540d44ba3201ff8313b90ba0096004603b2e34)
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.
This allows to fix the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.
Michael notes emperically 82599 has an unexpected middle mask:
Chip First Middle Last
82599 0xFF6 0xFF6 0xF7F
which should be fixed up to 0xF76 (RFC 3168) in a future commit.
Reviewed by: rrs, rscheff
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D44258
(cherry picked from commit eea2e089f8dadf850a30ed837edd7a386427a9ed)
This is a relatively well known trick for the X520 (82599), can be
useful for testing and lab settings. It's not an official standard or
particularly common but ubiquitous Broadcom switch ASICs deal with it.
We'll call it 1000Base-KX because it's SerDes on the passive cable and
I don't think it's worth adding another media type for this.
Reviewed by: emaste
Sponsored by: BBOX.io
Differential Revision: https://reviews.freebsd.org/D47352
(cherry picked from commit 48ddd1b9f88753c6875566fbb67bc622453e4993)
Otherwise the ifmedia subsystem may call unguarded NULL function
pointers. Same issue that was fixed for cxgb(4) in f2daf8995.
Also see: https://github.com/opnsense/src/issues/228
MFC after: 1 week
Header ice_rss.h uses the kernel RSS interface if option RSS is defined.
However when ice_rss.h is included by ice_lib.h there is no prior
inclusion of ice_opts.h to set RSS causing ifdef RSS to always fail. Add
ice_opts.h to the top of ice_lib.h (like ice_iflib.h) so RSS can be
defined when ice_rss.h is parsed.
With that in place, compilation fails due to a missing defintion of
ICE_DEFAULT_RSS_HASH_CONFIG. It is defined in ice_rss.h only when RSS is
not defined. Since this define is not part of the kernel RSS interface
but ice-specific, it should always be defined. Move its definition
outside of ifdef RSS.
PR: 255309
Reviewed by: mhorne, erj (earlier version)
MFC after: 3 days
Pull Request: https://github.com/freebsd/freebsd-src/pull/1460
(cherry picked from commit 6e5650896fe47398e49e3d81af60cc60dbb09e6e)
In (unknown) situations it seems the i2c bus can have trouble,
while nothing about the current link state has changed, the driver
would react by going into a link down state, and start busylooping
on up to 4 cores. Even if there was a valid link, such spinning
on a cpu by a kernel thread would wreak havoc to existing and
new connections.
This patch does the following:
1. If such a bus failure occurs, we keep the last known link state.
2. Prevent busy looping by implementing the lockmgr() facility to
be able to sleep while the i2c code waits on the i2c ISR. We cap
this with a timeout.
3. Pin the admin queues to the last CPU in the system, to prevent
other scenarios where busy looping might occur from landing on CPU
0, which especially seems to cause a lot of issues.
Given the design constraints both in hardware and in software,
the lockmgr() seems to be the only viable option, even though
FreeBSD explicitly forbids sleeping in callout context, but
fails to explain why this is or offer alternatives.
axgbe: revert allocating admin queues to last CPU
The issue was resolved in 52454a1e5b.
Scheduled threads such as CARP are now no longer pinned to CPU 0, making sure
they always get their time slice even if CPUs are blocked.
Since the I/O expander chip does not do a reset when soft power
cycling, the driver will first turn off all LEDs when initializing,
although no specific routine seems to be called when powering down.
This means that the LEDs will stay on until the driver has booted up,
after which the driver will be in a consistent state.
Initially, RSF (Receive Queue Store and Forward) was disabled for
unknown reasons, but the cut-through mode that's enabled as a result
seems to send 0 length packets up to the DMA when the RX queue is
full.
Since the iflib interface needs axgbe_pci_init() and its phy starting capabilities, no data was passed in its absence.
With the NULL check of the axgbe_miibus we also resort back to an MDIO read as a module might be capable of both
clause 22 and clause 45 methods of communication.
with the move of phy_stop() to if_detach() in d50d4e8cd4, it's better to prevent reconfiguring the phy should the pci_init() callout trigger more than once.
Within the code path of autonegotiation for gigabit SFP modules was a bug, causing
a report of LINK_ERR for cases where an external SFP PHY was present. Fixing this issue
did not resolve to a link however, as it turned out that while autonegotiation interrupts
were happening, it's resulting status cannot be correctly determined in all cases. In these
specific cases we have no other option than to assume a module has negotiated to 1Gbit/s.
PHY-specific configuration has been delegated to the miibus driver, if an external PHY is present.
It's possible that the i2c bus does not recognize a PHY on the first pass, so in all cases we
retry up to a maximum of 5 times during each link poll pass to ensure we didn't miss the presence
of an external PHY.
This commit also addresses link issues on both 100 mbit and 1Gb fiber modules. Not all of these modules
have the correct data set according to SFF-8472, as such we first check for gigabit compliance and
the associated baudrate, otherwise we resort back to determining what type of fiber module is plugged
in by checking the baudrate, cable length and wavelength and setting the MAC speed accordingly.
It is possible for a machine to boot into a state in which the configuration register,
responsible for controlling wether an I/O signal is considered an input or output,
contains randomized values. It was assumed this was programmed by the BIOS.
If I/O is reversed, it's possible for the driver to think an SFPP module has been inserted
when there is none, leading to unrecoverable I2C errors.
The configuration register should contain a state which is determined and provided by the BIOS,
hence no hard-coded values are programmed here.