In (unknown) situations it seems the i2c bus can have trouble,
while nothing about the current link state has changed, the driver
would react by going into a link down state, and start busylooping
on up to 4 cores. Even if there was a valid link, such spinning
on a cpu by a kernel thread would wreak havoc to existing and
new connections.
This patch does the following:
1. If such a bus failure occurs, we keep the last known link state.
2. Prevent busy looping by implementing the lockmgr() facility to
be able to sleep while the i2c code waits on the i2c ISR. We cap
this with a timeout.
3. Pin the admin queues to the last CPU in the system, to prevent
other scenarios where busy looping might occur from landing on CPU
0, which especially seems to cause a lot of issues.
Given the design constraints both in hardware and in software,
the lockmgr() seems to be the only viable option, even though
FreeBSD explicitly forbids sleeping in callout context, but
fails to explain why this is or offer alternatives.
axgbe: revert allocating admin queues to last CPU
The issue was resolved in 52454a1e5b.
Scheduled threads such as CARP are now no longer pinned to CPU 0, making sure
they always get their time slice even if CPUs are blocked.
Since the I/O expander chip does not do a reset when soft power
cycling, the driver will first turn off all LEDs when initializing,
although no specific routine seems to be called when powering down.
This means that the LEDs will stay on until the driver has booted up,
after which the driver will be in a consistent state.
Initially, RSF (Receive Queue Store and Forward) was disabled for
unknown reasons, but the cut-through mode that's enabled as a result
seems to send 0 length packets up to the DMA when the RX queue is
full.
Since the iflib interface needs axgbe_pci_init() and its phy starting capabilities, no data was passed in its absence.
With the NULL check of the axgbe_miibus we also resort back to an MDIO read as a module might be capable of both
clause 22 and clause 45 methods of communication.
with the move of phy_stop() to if_detach() in d50d4e8cd4, it's better to prevent reconfiguring the phy should the pci_init() callout trigger more than once.
Within the code path of autonegotiation for gigabit SFP modules was a bug, causing
a report of LINK_ERR for cases where an external SFP PHY was present. Fixing this issue
did not resolve to a link however, as it turned out that while autonegotiation interrupts
were happening, it's resulting status cannot be correctly determined in all cases. In these
specific cases we have no other option than to assume a module has negotiated to 1Gbit/s.
PHY-specific configuration has been delegated to the miibus driver, if an external PHY is present.
It's possible that the i2c bus does not recognize a PHY on the first pass, so in all cases we
retry up to a maximum of 5 times during each link poll pass to ensure we didn't miss the presence
of an external PHY.
This commit also addresses link issues on both 100 mbit and 1Gb fiber modules. Not all of these modules
have the correct data set according to SFF-8472, as such we first check for gigabit compliance and
the associated baudrate, otherwise we resort back to determining what type of fiber module is plugged
in by checking the baudrate, cable length and wavelength and setting the MAC speed accordingly.
It is possible for a machine to boot into a state in which the configuration register,
responsible for controlling wether an I/O signal is considered an input or output,
contains randomized values. It was assumed this was programmed by the BIOS.
If I/O is reversed, it's possible for the driver to think an SFPP module has been inserted
when there is none, leading to unrecoverable I2C errors.
The configuration register should contain a state which is determined and provided by the BIOS,
hence no hard-coded values are programmed here.
Change 4787572d05 made if_alloc_domain() never fail, then also do the
wrappers if_alloc(), if_alloc_dev(), and if_gethandle().
No functional change intended.
Reviewed by: kp, imp, glebius, stevek
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D45740
(cherry picked from commit aa3860851b9f6a6002d135b1cac7736e0995eedc)
Since d49e83eac3, iflib(9) is ready
for this change.
While at it, make isc_driver_version strings (static) const where
not apparently un-const on purpose, too.
This reduces the size of the amd64 GENERIC by about 10 KiB.
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
Keep the register read just in case though it seems possible it is not
needed as the function later clears specific interrupts via a write to
the same register.
if (sb == NULL) { ... sb->s_error } is going to be a bad time. Return
ENOMEM when we cannot allocate an sbuf for the sysctl rather than
dereferencing the NULL pointer just returned.
Reviewed by: manu@, allanjude@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30373
By default, axgbe driver does a receiver reset after predefined number
of retries for the link to come up. However, this receiver reset
doesn't always suffice, due to an hardware issue.
In that case, as a workaround, a complete phy reset is necessary.
This patch introduces a sysctl that can be set to 1 to let the driver
reset the phy completely, rather than just doing receiver reset.
The workaround will be removed once the issue is fixed by means
of firmware update.
This patch also fixes the handling of the direct attach cables
properly.
Submitted by: rajesh1.kumar_amd.com
Differential Revision: https://reviews.freebsd.org/D28266
AMD 10GbE hardware is designed to have two buffers per receive descriptor to
support split header feature. For this purpose, the driver was designed to use
2 iflib freelists per receive queue. So, that buffers from 2 freelists are used
to refill an entry in the receive descriptor. The current design holds good
with regular data traffic.
But, when netmap comes into play, the current design will not fit in. The
current netmap interfaces and netmap implementation in iflib doesn't seem
to accomodate the design of 2 freelists per receive queue. So, exercising
Netmap capability with inbuilt tools like bridge, pkt-gen doesn't work with
the 2 freelists driver design.
So, the driver design is changed to accomodate the current netmap interfaces
and netmap implementation in iflib by using single freelist per receive queue
approach when Netmap capability is exercised without disturbing the current
2 freelists approach.
The dev.ax.sph_enable tunable can be set to 0 to configure the single
free list mode.
Thanks to Stephan Dewt for his Initial set of code changes for the stated
problem.
Submitted by: rajesh1.kumar_amd.com
Approved by: vmaffione
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27797
Doing a 'dd' over iscsi will reliably cause stalls. Tx
cleaning _should_ reliably happen as data is sent.
However, currently if the transmit queue fills it will
wait until the iflib timer (hz/2) runs.
This change causes the the tx taskq thread to be run
if there are completed descriptors.
While here:
- make timer interrupt delay a sysctl
- simplify txd_db_check handling
- comment on INTR types
Background on the change:
Initially doorbell updates were minimized by only writing to the register
on every fourth packet. If txq_drain would return without writing to the
doorbell it scheduled a callout on the next tick to do the doorbell write
to ensure that the write otherwise happened "soon". At that time a sysctl
was added for users to avoid the potential added latency by simply writing
to the doorbell register on every packet. This worked perfectly well for
e1000 and ixgbe ... and appeared to work well on ixl. However, as it
turned out there was a race to this approach that would lockup the ixl MAC.
It was possible for a lower producer index to be written after a higher one.
On e1000 and ixgbe this was harmless - on ixl it was fatal. My initial
response was to add a lock around doorbell writes - fixing the problem but
adding an unacceptable amount of lock contention.
The next iteration was to use transmit interrupts to drive delayed doorbell
writes. If there were no packets in the queue all doorbell writes would be
immediate as the queue started to fill up we could delay doorbell writes
further and further. At the start of drain if we've cleaned any packets we
know we've moved the state machine along and we write the doorbell (an
obvious missing optimization was to skip that doorbell write if db_pending
is zero). This change required that tx interrupts be scheduled periodically
as opposed to just when the hardware txq was full. However, that just leads
to our next problem.
Initially dedicated msix vectors were used for both tx and rx. However, it
was often possible to use up all available vectors before we set up all the
queues we wanted. By having rx and tx share a vector for a given queue we
could halve the number of vectors used by a given configuration. The problem
here is that with this change only e1000 passed the necessary value to have
the fast interrupt drive tx when appropriate.
Reported by: mav@
Tested by: mav@
Reviewed by: gallatin@
MFC after: 1 month
Sponsored by: iXsystems
Differential Revision: https://reviews.freebsd.org/D27683
* uninitialised variable use
* Using AXGBE_SET_ADV() where it was intended; using AXGBE_ADV()
seems wrong and also causes a compiler warning.
Reviewed by: rpokala
Differential Revision: https://reviews.freebsd.org/D26839
This patch has the driver for 10Gigabit Ethernet controller in AMD
SoC. This driver is written compatible to the Iflib framework. The
existing driver is for the old version of hardware. The submitted
driver here is for the recent versions of the hardware where the Ethernet
controller is PCI-E based.
Submitted by: Rajesh Kumar <rajesh1.kumar@amd.com>
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D25793
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.
Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
we will import a newer version of the Linux code so the linuxkpi was not
used.
This is still missing 10G support, and multicast has not been tested.
Reviewed by: gnn
Obtained from: ABT Systems Ltd
Sponsored by: SoftIron Inc
Differential Revision: https://reviews.freebsd.org/D8549