The mvneta device requires MVNETA_TX_CMD_L4_CHECKSUM_NONE bit to be set in the tx descriptor is checksum not required. However, mvneta_tx_set_csumflag() is not setting this flag currently, causing the hardware to randomly corrupt IP header during transmission.
This affects injected IPv4 packets that skips kernel IP stack processing (e.g. DHCP), as well as all IPv6 packets, since the driver currently does not offload csum for IPv6.
The fix is to remove all the early return paths from mvneta_tx_set_csumflag() which do not set the MVNETA_TX_CMD_L4_CHECKSUM_NONE flag.
PR: 248306
Submitted by: Mike Cui <cuicui@gmail.com>
Reported by: Mike Cui <cuicui@gmail.com>
so x86 can support Intel DMAR and AMD IOMMU simultaneously.
Reviewed by: kib
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D25894
Re-implement clocks for these SoC by using now standard extres/clk framework.
This is necessary for future expansion of these. The new implementation
is (due to the size of the patch) only the initial (minimum) version.
It will be updated/expanded with a subsequent set of particular patches.
This patch is also not tested on OMAP4 based boards (BeagleBone),
so all possible issues should be (and will be) fixed by ASAP once
identified.
Submited by: Oskar Holmlund (oskar.holmlund@ohdata.se)
Differential Revision: https://reviews.freebsd.org/D25118
On Gen2 VMs, Hyper-V provides mmio space for framebuffer.
This mmio address range is not useable for other PCI devices.
Currently only efifb driver is using this range without reserving
it from system.
Therefore, vmbus driver reserves it before any other PCI device
drivers start to request mmio addresses.
PR: 222996
Submitted by: weh@microsoft.com
Reported by: dmitry_kuleshov@ukr.net
Reviewed by: decui@microsoft.com
Sponsored by: Microsoft
PowerPC support was fixed in r357596 by changing PCI bustag to BE as
part of the solution, but this caused regression on mips. This change
implements byte swapping of virtio PCI config area in the driver,
leaving lower layer untouched.
Submittnd by: Fernando Valle <fernando.valle@eldorado.org.br>
Reported by: arichardson
Reviewed by: alfredo, arichardson
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25416
Use the existing PMC_CPUID_LEN to size pmc_cpuid in the kernel and various
buffers for reading it in libpmc. This avoids some extra syscalls and
malloc/frees.
While in here, use strlcpy to copy a user-provided cpuid string instead of
memcpy, to make sure we terminate the buffer.
Reviewed by: mav
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25679
APEI allows platform to report different kinds of errors to OS in several
ways. We've found that Supermicro X10/X11 motherboards report PCIe errors
appearing on hot-unplug via this interface using NMI. Without respective
driver it ended up in kernel panic without any additional information.
This driver introduces support for the APEI Generic Hardware Error Source
reporting via NMI, SCI or polling. It decodes the reported errors and
either pass them to pci(4) for processing or just logs otherwise. Errors
marked as fatal still end up in kernel panic, but some more informative.
When somebody get to native PCIe AER support implementation both of the
reporting mechanisms should get common error recovery code. Since in our
case errors happen when the device is already gone, there is nothing to
recover, so the code just clears the error statuses, practically ignoring
the otherwise destructive NMIs in nicer way.
MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.
This device was originally used as part of the goldfish virtual hardware
platform used for emulating Android on QEMU, but is now also used as the
RTC for the RISC-V virt machine in QEMU. It provides a simple 64-bit
nanosecond timer exposed via a pair of memory-mapped 32-bit registers,
although only with 1s granularity.
Reviewed by: brooks (mentor), jhb (mentor), kp
Approved by: brooks (mentor), jhb (mentor), kp
Obtained from: CheriBSD
Differential Revision: https://reviews.freebsd.org/D25717
from Intel DMAR support, so it can be used on other IOMMU systems.
Reviewed by: kib
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D25743
As Emanuel poited me the Linux processes these clock assignments in forward
order, not in reversed. I misread the original code.
Tha problem with wrong order for assigned clocks found in tegra (and some imx)
DT should be reanalyzed and solved by different way.
MFC with: r363123
Reported by; manu
This is a generic function start a scan request for the given
cam_sim.
Other driver can now just use this function to request a new rescan.
Submitted by: kibab
Handle the fact that parts of usb(4) can be compiled into the boot
loader, where M_WAITOK does not guarantee a successful allocation.
PR: 240545
Submitted by: Andrew Reiter <arr@watson.org> (original version)
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25706
I2C communication is done by a combination of driving a line low or
letting it float, so that it is either pulled up or driven low by
another party.
r355276 besides the stated goal of the change -- using the new GPIO API
-- also changed the logic, so that active state is signaled by actively
driving a line.
That worked with iicbb prior to r362042, but stopped working after that
commit on at least some hardware. My guess that the breakage was
related to getting an ACK bit. A device expected to be able to drive
SDA actively low, but controller was actively driving it high for some
time.
Anyway, this change seems to fix the problem.
Tested using gpioiic on Orange Pi PC Plus with HTU21 sensor.
Reported by: Nick Kostirya <nikolay.kostirya@i11.co>
Reviewed by: manu
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25684
This new function allows us to find the SMMU instance assigned
for a particular PCI RID.
Reviewed by: andrew
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D25687
If the hypervisor advertises support for the DISCARD command then the
guest can perform TRIM commands, freeing space on the backing store.
If VIRTIO_BLK_F_DISCARD is enabled, advertise DISKFLAG_CANDELETE
Tested with FreeBSD guests on bhyve and KVM
Reviewed by: jhb
Tested by: freqlabs
MFC after: 1 month
Relnotes: yes
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D21708
pmc_cpuid was uninitialized for most AMD processor families. We can still
populate this string for unimplemented families.
Also added a CPUID_TO_STEPPING macro and converted existing code to use it.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25673
It follows the equivalent Linux change to be able to differentiate
skylakex and cascadelakex, sharing the same model but not stepping.
This fixes skylakex handling broken by r363144.
MFC after: 6 days
The EIP-97 is a packet processing module found on the ESPRESSObin. This
commit adds a crypto(9) driver for the crypto and hash engine in this
device. An initial skeleton driver that could attach and submit
requests was written by loos and others at Netgate, and the driver was
finished by me.
Support for separate AAD and output buffers will be added in a separate
commit, to simplify merging to stable/12 (where those features don't
exist).
Reviewed by: gnn, jhb
Feedback from: andrew, cem, manu
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D25417
to coalesce tx work requests.
Note that Coverity will still treat this as an out-of-bounds access. We
do want to compare 16B starting from ethmacdst but cmp_l2hdr was was
going beyond that by 2B.
cmp_l2hdr was introduced in r362905.
Reported by: Coverity (CID 1430284)
Sponsored by: Chelsio Communications
Linux processes these clocks in reverse order and some DT relies
on this fact. For example, the frequency setting for a given PLL
is the last in the list, preceded by the frequency setting of its
following divider or so...
MFC after: 1 week
Every revision of twsi after the A20 have a bug where we need to
write again the control register after each interrupts. We also need
to add some delay before writing to this register, a simple read of the
same register does the job so do that.
Also fix the case when we have finish sending all the bytes, it only worked
for 1 byte transfer (the same kind that we do for talking to the PMIC on A20
boards).
While here add more debug messages and rework some of them.
This was tested by talking to a AT23C32 eeprom and a DS3231 RTC from an
H3 and A20 board.
PR: 247576
Reported by: Manuel Stühn (freebsd@justmail.de)
MFC after: 1 week
On some boards there is a lot of of syscon node that are unused as
more specific drivers is probed before, no need to flood the console
for the mostly-unused generic ones.
MFC after: 1 week
This adds support for the Broadcom bcm2711 PCI express controller, found
on the Raspberry Pi 4 (aka the bcm2838 SoC). The driver has only been
developed against the soldered-on VIA XHCI controller and not tested
with other end points.
Submitted by: Robert Crowston <crowston_protonmail.com>
Differential Revision: https://reviews.freebsd.org/D25068
Currently the linking order of the infiniband, IB, modules decide in which
order the clients are attached and detached. For example one IB client may
use resources from another IB client. This can lead to a potential deadlock
at shutdown. For example if the ipoib is unregistered after the ib_multicast
client is detached, then if ipoib is using multicast addresses a deadlock may
happen, because ib_multicast will wait for all its resources to be freed before
returning from the remove method.
Fix this by using module_xxx_order() instead of module_xxx().
Differential Revision: https://reviews.freebsd.org/D23973
MFC after: 1 week
Sponsored by: Mellanox Technologies
- Ask the firmware for the number of frames that can be stuffed in one
work request.
- Modify mp_ring to increase the likelihood of tx coalescing when there
are just one or two threads that are doing most of the tx. Add teeth
to the abdication mechanism by pushing the consumer lock into mp_ring.
This reduces the likelihood that a consumer will get stuck with all
the work even though it is above its budget.
- Add support for coalesced tx WR to the VF driver. This, with the
changes above, results in a 7x improvement in the tx pps of the VF
driver for some common cases. The firmware vets the L2 headers
submitted by the VF driver and it's a big win if the checks are
performed for a batch of packets and not each one individually.
Reviewed by: jhb@
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25454
Use fence instead of barrier, which is optimized to take advantage of
the x86 TSO memory model.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies
MFC after: 1 week
- Add CCM driver and clocks implementations for i.MX 8M
- Add GPC driver for iMX8
- Add clock tree for i.MX 8M Quad
- Add clocks support and new compat strings (where required) for existing i.MX 6 UART, I2C, and GPIO drivers
- Enable aarch64-compatible drivers form i.MX 6 in arm64 GENERIC kernel config
- Add dtb/imx8 kernel module with DTBs for Nitrogen8M and iMX8MQ EVK
With this patch both Nitrogen8M and iMX8MQ EVK boot with NFS root up to multiuser login prompt
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D25274
Create an acpi attachment for the DWC USB OTG device. This is present in
the Raspberry Pi 4 in the USB-C port normally used to power the board. Some
firmware presents the kernel with ACPI tables rather than FDT so we need
an ACPI attachment.
Submitted by: Greg V <greg_unrelenting.technology>
Approved by: hselasky (removal of All rights reserved)
Differential Revision: https://reviews.freebsd.org/D25203
This is a flag from the MAC that says the received packet didn't match
a keycache slot. This isn't technically a problem as WEP keys don't
match keycache slots (they're "global" keys), but it could be useful
for tracking down CCMP decryption failures.
Right now it's a no-op - it mirrors what the AR9300 HAL does and it
just increments a counter. But, hey, maybe one day I'll use it for
diagnosing keycache/CCMP decrypt issues.
The only thing this tunable enables now is reporting to ACPI _OSC that
Active State Power Management and Clock Power Management Capability are
"supported" by the OS.
I've found that at least some Supermicro server boards do not allow OS
to support native PCIe hot-plug unless it reports those capabilities.
After spending significant time in PCIe specs I have found very little
motivation for that, and none of it applies to those motherboards, not
enabling ASPM themselves. So unless OS explicitly wants to save power,
I see nothing for it to do there actually.
I guess it may get sense to support ASPM when we get Thunderbolt support.
Otherwise I have no system with PCIe hot-plug where power saving matters.
It would be nice to enable this by default, but I worry that it affect
power saving of some laptops, even though I haven't noticed that myself.
Not all interrupt sources that affect CIS bit were acknowledged.
Specifically, bits in STATESTS (aka WAKESTS) were left set.
The fix is to disable WAKEEN and clear STATESTS bits before the HDA
interrupt is enabled. This way we should never get any STATESTS bits.
I also added placeholders for all event bits that we currently do not
enable, do not handle and do not clear. This might get useful when / if
we enable any of them.
Reported by: kib (Apollo Lake hardware)
Tested by: kib (earlier, different change)
MFC after: 2 weeks
X-MFC with: r362294
- Move temporary sglists into the session structure and protect them
with a per-session lock instead of a per-adapter lock.
- Retire an unused session field, and move a debugging field under
INVARIANTS to avoid using the session lock for completion handling
when INVARIANTS isn't enabled.
- Use counter_u64 for per-adapter statistics.
Note that this helps for cases where multiple sessions are used
(e.g. multiple IPsec SAs or multiple KTLS connections). It does not
help for workloads that use a single session (e.g. a single GELI
volume).
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25457
These bzero's should have been explicit_bzero's.
Reviewed by: cem, delphij
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25437
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().
Suggested by: cem
Reviewed by: cem, delphij
Approved by: csprng (cem)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25435
There were quite a few places where port_info was being accessed only to
get to the adapter.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25432
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object. The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25329
This is the key on the right side of the function keys, with the
"hamburger menu" icon on it.
Submitted by: GregV <greg@unrelenting.technology>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25390
This mode was added in r362496. Rename it to make the meaning more
clear.
PR: 247306
Suggested by: rpokala
Submitted by: Ali Abdallah <ali.abdallah@suse.com>
MFC with: r362496
The ACPI Specification defines a Generic Address Structure (GAS),
which is used to describe UART controller register layout in the
SPCR table. The driver responsible for parsing it (uart_cpu_acpi)
wrongly associates the Access Size field to the uart_bas's regshft
and the register BitWidth to the regiowidth - according to
the definitions it should be opposite.
This problem remained hidden most likely because the majority of platforms
use 32-bit registers (BitWidth) which are accessed with the according
size (Dword). However on Marvell Armada 8k / Cn913x platforms,
the 32-bit registers should be accessed with Byte granulity, which
unveiled the issue.
This patch fixes above by proper values assignment and slightly improved
parsing.
Note that handling of the AccessWidth set to EFI_ACPI_6_0_UNDEFINED is
needed to work around a buggy SPCR table on EC2 x86 "bare metal" instances.
Reviewed by: manu, imp, cperciva, greg_unrelenting.technology
Obtained from: Semihalf
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25373
Networking is broken if the driver configures its (virtual) hardware to
use a hash algorithm (or a key) different from the one that the network
stack (software RSS) uses. This can be seen with connections initiated
from the host. The PCB will be placed into the hash table based on the
hash value calculated by the software. The hardware-calculated hash
value in reponse packets will be different, so the PCB won't be found.
Tested with a kernel compiled with 'options RSS' on an instance with ena
driver.
Reviewed by: mw, adrian
MFC after: 2 weeks
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D24733
When the PCI address != physical address we need to translate from the
former to the latter before passing to the parent to map into the kernels
virtual address space.
Sponsored by: Innovate UK
It's interesting that similar messages from gpiobus_acquire_pin never
had any prefix while gpiobus_release_pin messages were prefixed with
"gpiobus_acquire_pin".
Anyway, the prefix is not that useful and can be deduced from context.
MFC after: 2 weeks
fibX_lookup_nh_ext().
fibX_lookup_nh_ represents pre-epoch generation of fib kpi,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D24975
It turns out relocating the symbol table itself can cause issues, like fbt
crashing because it applies the offsets to the kernel twice.
This had been previously brought up in rS333447 when the stoffs hack was
added, but I had been unaware of this and reimplemented symtab relocation.
Instead of relocating the symbol table, keep track of the relocation base
in ddb, so the ddb symbols behave like the kernel linker-provided symbols.
This is intended to be NFC on platforms other than PowerPC, which do not
use fully relocatable kernels. (The relbase will always be 0)
* Remove the rest of the stoffs hack.
* Remove my half-baked displace_symbol_table() function.
* Extend ddb initialization to cope with having a relocation offset on the
kernel symbol table.
* Fix my kernel-as-initrd hack to work with booke64 by using a temporary
mapping to access the data.
* Fix another instance of __powerpc__ that is actually RELOCATABLE_KERNEL.
* Change the behavior or X_db_symbol_values to apply the relocation base
when updating valp, to match link_elf_symbol_values() behavior.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25223
Once tx mbufs have been handed to hardware, nothing serializes the tx
path against completion and potential use-after-free of the outbound
mbuf. Perform accounting and BPF tap before queueing to hardware to
avoid this race.
Submitted by: Steve Wirtz <steve_wirtz AT dell.com>
Reviewed by: markj, rstone
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D25364
For the sake of the record, this is the last use of the words master and slave
in the FreeBSD's USB stack, drivers and subsystems.
MFC after: 1 week
Sponsored by: Mellanox Technologies
- refactorize packet receive path. Make sure that we don't leak mbufs
and/or that we don't create holes in RX descriptor ring
- slightly simplify handling with TX descriptors
MFC after: 4 weeks
By using DWC TRM terminology, normal descriptor format should be named
extended and alternate descriptor format should be named normal.
Should not been functional change.
MFC after: 4 weeks
Use naming nomenclature used in DesignWare TRM.
This driver was written by using Altera (now Intel) documentation for Arria
FPGA manual. Unfortunately this manual used very different (and in some cases
opposite naming) for registers and descriptor fields. Unfortunately,
this makes future expansion extremely hard.
Should not been functional change.
MFC after: 4 weeks
When the PCI and CPU physical addresses are identical it doesn't matter
which is used to create the resources, however on some systems, e.g.
qemu armv7 virt, they are different. This leads to a panic as we try to
map the wrong physical address into the kernel address space.
Reported by: Jenkins via trasz
Sponsored by: Innovate UK
- temporarily disable handling with phy, we don't have driver for it yet
- always clear cause for administartive interrupt.
While I'm in, fix style(9) (mainly whitespace).
MFC after: 4 weeks
- only normal memory window is mandatory, prefetchable memory and
I/O windows should be optional
- full PCIe configuration space is supported
- remove duplicated check from function for accessing configuration space.
It is already contained in pci_dw_check_dev()
MFC after: 2 weeks
This chip is used in the Rasperry Pi 4, and is supported by the if_genet
driver. Currently we use the ukphy mii driver, this patch switches over
to the brgphy mii driver instead. To support the rgmii-rxid phy mode,
which is now the default in the Linux dtb, we add support for clock
skewing.
These changes are taken from OpenBSD and NetBSD, except for the bailout
in brgphy_bcm54xx_clock_delay() in rgmii mode, which was found necessary
after testing.
Submitted by: Robert Crowston, crowston at protomail.com
Differential Revision: https://reviews.freebsd.org/D25251
Instead of panic after one second of polling, make the normal timeout
handler to activate, reset the controller and abort the outstanding
requests. If all of it won't happen within 10 seconds then something
in the driver is likely stuck bad and panic is the only way out.
In particular this fixed device hot unplug during execution of those
polled commands, allowing clean device detach instead of panic.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
It is plausible that the hardware interrupts a host only when GIS goes
from zero to one. GIS is formed by OR-ing multiple hardware statuses,
so it's possible that a previously cleared status gets set again while
another status has not been cleared yet. Thus, there will be no new
interrupt as GIS always stayed set. If we don't re-examine GIS then we
can leave it set and never get another interrupt again.
Without this change I frequently saw a problem where snd_hda would stop
working. Setting dev.hdac.1.polling=1 would bring it back to life and
afterwards I could set polling back to zero. Sometimes the problem
started right after a boot, sometimes it happened after resuming from
S3, frequently it would occur when sound output and input are active
concurrently (such as during conferencing). I looked at HDAC_INTSTS
while the sound was not working and I saw that both HDAC_INTSTS_GIS and
HDAC_INTSTS_CIS were set, but there were no interrupts.
I have collected some statistics over a period of several days about how
many loops (calls to hdac_one_intr) the new code did for a single
interrupt:
+--------+--------------+
|Loops |Times Happened|
+--------+--------------+
|0 |301 |
|1 |12857746 |
|2 |280 |
|3 |2 |
|4+ |0 |
+--------+--------------+
I believe that previously the sound would get stuck each time we had to loop
more than once.
The tested hardware is:
hdac1: <AMD (0x15e3) HDA Controller> mem 0xfe680000-0xfe687fff at device 0.6 on pci4
hdacc1: <Realtek ALC269 HDA CODEC> at cad 0 on hdac1
No objections: mav
MFC after: 5 weeks
Differential Revision: https://reviews.freebsd.org/D25128
- Support Prefetchable Memory.
- Use the correct rman when allocating memory and ioports.
- Translate PCI addresses in bus_alloc_resource to allow physical
addresses that are different than pci addresses.
Reviewed by: Robert Crowston <crowston_protonmail.com>
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D25121
Changes in the mbuf layout regarding HW TLS, resulted in wrong detection
of starting mbuf. Use a boolean variable to handle this and pass m_adj()
the top mbuf, so that the packet header is adjusted correctly.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Make sure we disable the multicast filter in promiscious mode aswell as when
the all multicast flag is set.
MFC after: 1 week
Found by: Tycho Nightingale <tychon@freebsd.org>
Sponsored by: Mellanox Technologies
Since the two functions are similar, introduce a common function
(vtnet_rx_vq_process()) to share common code.
This also improves locking, by ensuring vrxs_rescheduled is accessed
under the RXQ lock, and taskqueue_enqueue() is not called under the
lock (therefore avoiding a spurious duplicate lock warning).
Reported by: jrtc27
MFC after: 2 weeks
Remove TSO from the toggle mask when automatically disabled by TXCKSUM* in
various NIC drivers.
Reviewed by: hselasky, np, gallatin, jpaetzel
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25120
For legacy devices that don't support MrgRxBuf (such as bhyve pre-r358180),
r361944 failed to update the receive handler to account for the additional
padding introduced by the unused num_buffers field that is now always present
in struct vtnet_rx_header. Thus, calculate the padding dynamically based on
vtnet_hdr_size.
PR: 247242
Reported by: thj
Tested by: thj
The nm_register callback needs to call nm_set_native_flags()
or nm_clear_native_flags() once the device has been stopped.
However, in the current implementation this is not true,
as the device is stopped by vtnet_init_locked(). This causes
race conditions where the driver crashes as soon as it
dequeues netmap buffers assuming they are mbufs (or the other
way around).
To fix the issue, we extend vtnet_init_locked() with a second
argument that, if not zero, will set/clear the netmap flags.
This results in a huge simplification of the nm_register
callback itself.
Also, use netmap_reset() to check if a ring is going to be
re-initialized in netmap mode.
MFC after: 1 week
Parts of the z8530 driver were still using the SUN channel spacing.
This was invalid on PowerMac and QEMU, where the attachment was to escc,
not escc-legacy. This means the driver has apparently NEVER worked properly
on Macintosh hardware.
Add documentation for the channel spacing details, and change to using
driver-specific initialization instead of hardcoded spacing so either
spacing can be used.
Fixes boot hang in QEMU when using the serial console, and fixes use on
Xserve serial (and presumably PowerMacs that have a Stealth Serial port
or similar)
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D24661
Prepare support to be able to handle font data in loader, consolidate
data structures to sys/font.h and update vtfontcvt.
vtfontcvt update is about to output set of glyphs in form of C source,
the implementation does allow to output compressed or uncompressed font
bitmaps.
Reviewed by: bcr
Differential Revision: https://reviews.freebsd.org/D24189
This function returns NULL if the ring identified by
queue id and direction is in netmap mode. Otherwise
return the corresponding kring.
Use this function to replace vtnet_netmap_queue_on().
MFC after: 1 week
This partially reverts r361053 since there have been reports
by users that this breaks some functionality for em(4)
devices; it seems at first glance that some sort of interface
restart is required for those cards.
This isn't a proper fix; this unbreaks those users until a proper
fix is found for their issues.
PR: 240818
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after: 3 days
Allow the TCP header to reside in the mbuf following the IP header.
Else such packets will get dropped.
Backtrace:
mlx5e_sq_xmit()
mlx5e_xmit()
ether_output_frame()
ether_output()
ip_output_send()
ip_output()
rip_output()
sosend_generic()
sosend()
kern_sendit()
sendit()
sys_sendto()
amd64_syscall()
fast_syscall_common()
MFC after: 1 week
Sponsored by: Mellanox Technologies
Typically the TCP/IP headers fit within the first mbuf and should not
trigger any of the error cases. Use unlikely() for these cases.
No functional change.
MFC after: 1 week
Sponsored by: Mellanox Technologies
When parsing the TCP/IP header in the fast path, make it clear by using
the const keyword, no fields are to be modified inside the transmitted
packet.
No functional change.
MFC after: 1 week
Sponsored by: Mellanox Technologies
I2C_SET was quite inflexible, it used too long delays as well as some
unnecessary delays. The new building blocks are iicbb_clockin and
iicbb_clockout. The former sets SDA and starts the high period of SCL,
the latter executes the low period of SCL. What happens during the high
phase depends on the operation. For writes we just hold both lines, for
reads we poll SDA. S, Sr and P change SDA in the middle of the high
period.
Also, the calculation of udelay has been updated, so that the resulting
period more closely corresponds the requested bus frequency. There is a
new knob, io_delay, that allows to further adjust udelay based on the
estimated latency of pin toggling operations.
Finally, I slightly changed debug tracing and added error indicators to
it. The debug prints are compiled in but disabled by default. This can
be of use if there is any fallout from this change.
Some ideas for further improvements:
- add a function for sub-microsecond delays (e.g., in units of 1/10th of
a microsecond) and use it for more precise timing of short delays;
- account for the actual time spent in the pin I/O.
Some sample debug output with the new code follows.
Reading temperature and humidity from HTU21 in the bus hold mode:
<<w80+ we3+ <w81+ .....r6d+ rac+ r94- >>
<<w80+ we5+ <w81+ .............r47+ re2+ r84- >>
where '<<' is S, '<' is Sr, '>>' is P, '.' is one millisecond of clock
stretching by the slave.
Reading temperature and humidity in the no-hold mode:
<<w80+ wf3+ >>
<<w81- >>
<<w81+ r6d+ r54+ raf- >>
<<w80+ wf5+ >>
<<w81- >>
<<w81+ r48+ r4e+ r9c- >>
where '+' is Ack and '-' is NoAck.
We see that first read attempts are not acknowledged.
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D22206
It seems that second call does not add any useful state change for all
implemented timecounters.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Writing segment id to I2C device 0x30 only required if the segment is
non-zero. On the devices without E-DCC support writing to that address
fails and whole transaction then fails too. To avoid this do
not attempt write to the segment selection device unless required.
MFC after: 2 weeks
- crypto_apply() is only used for reading a buffer to compute a
digest, so change the data pointer to a const pointer.
- To better match m_apply(), change the data pointer type to void *
and the length from uint16_t to u_int. The length field in
particular matters as none of the apply logic was splitting requests
larger than UINT16_MAX.
- Adjust the auth_xform Update callback to match the function
prototype passed to crypto_apply() and crypto_apply_buf(). This
removes the needs for casts when using the Update callback.
- Change the Reinit and Setkey callbacks to also use a u_int length
instead of uint16_t.
- Update auth transforms for the changes. While here, use C99
initializers for auth_hash structures and avoid casts on callbacks.
Reviewed by: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25171
The original PCIe hot-plug code required a couple of things which cause
PCI probing errors on the QEMU Q35 system and possibly physical systems
(Dell R6515).
Allocate the hot-plug interrupt as shared to support INTx interrupts.
The hot-plug interrupt mechanism should normally be MSI as PCIe mandates
MSI support, but QEMU's Q35 bridge only provides INTx interrupts.
Second, the code required the Electromechanical Interlock (Slot Status
EIS) to be engaged if present (Slot Capability EIP). Some platforms
including QEMU Q35 set EIP but not EIS. Fix by deleting the check.
Reviewed by: imp, mav, jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24877
Update the iflib version of ixl driver based on the OOT version ixl-1.11.29.
Major changes:
- Extract iflib specific functions from ixl_pf_main.c to ixl_pf_iflib.c
to simplify code sharing between legacy and iflib version of driver
- Add support for most recent FW API version (1.10), which extends FW
LLDP Agent control by user to X722 devices
- Improve handling of device global reset
- Add support for the FW recovery mode
- Use virtchnl function to validate virtual channel messages instead of
using separate checks
- Fix MAC/VLAN filters accounting
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: erj@
Tested by: Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after: 1 week
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D24564
DEBUG is a kernel configuration flag and if used cpufreq_dt.c will fail the
build of kernel.
PR: 246867
Submitted by: Oskar Holmund (oskar.holmlund@ohdata.se)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25080
The non-legacy interface always defines num_buffers in the header,
regardless of whether VIRTIO_NET_F_MRG_RXBUF, just leaving it unused. We
also need to ensure our virtqueue doesn't filter out VIRTIO_F_VERSION_1
during negotiation, as it supports non-legacy transports just fine. This
fixes network packet transmission on TinyEMU.
Reviewed by: br, brooks (mentor), jhb (mentor)
Approved by: br, brooks (mentor), jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D25132
The feature bits are exposed as a 32-bit register with 2 banks, so we
should negotiate both halves. Notably, VIRTIO_F_VERSION_1 is in the
upper half, and will be used in an upcoming commit.
The PCI bus driver also has this bug, but the legacy BAR layout did not
include selector registers and is rather different from the modern
layout, so it remains solely as legacy.
Reviewed by: br, brooks (mentor), jhb (mentor)
Approved by: br, brooks (mentor), jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D25131
Add ICL_NOCOPY flag to icl_pdu_append_data(), specifying that the method
can just reference the data buffer instead of immediately copying it.
Extend the offload KPI with optional PDU queue method, allowing to specify
completion callback, called when all the data referenced by above has been
transferred and won't be accessed any more (the buffers can be freed).
Implement the above functionality in software iSCSI driver using mbufs
with external storage and reference counter. Note that some NICs (ixl(4))
may keep the mbuf in TX queue for a long time, so CTL has to be ready.
Add optional method to struct ctl_scsiio for buffer reference counting.
Implement it for CTL block backend, allowing to delay free of the struct
ctl_be_block_io and memory it references as needed. In first reincarnation
of the patch I tried to delay whole I/O as it is done for FibreChannel,
that was cleaner, but due to the above callback delays I had to rewrite
it this way to not leave LUN referenced potentially for hours or more.
All together on sequential read from ZFS ARC this saves about 30% of CPU
time and memory bandwidth by avoiding one of 3 memory copies (the other
two are from ZFS ARC to DMU cache and then from DMU cache to CTL buffers).
On tests with 2x Xeon Silver 4114 this allows to reach full line rate of
100GigE NIC. Tests with Gold CPUs and two 100GigE NICs are stil TBD,
but expectations to saturate them are pretty high. ;)
Discussed with: Chelsio
Sponsored by: iXsystems, Inc.
This logic is running the beacon receive bits in STA+AP mode on both the
STA and AP side. The STA side sees its beacons from the BSS fine; the
AP side is seeing other beacons on the same channel but with the BSS
node for some odd reason. (I think it's a valid reason, but I currently
forget what that valid reason is.)
So, just to be cleaner about things, don't run the nexttbtt/etc bits
at all if we're in hostap mode. If I ever get mesh working then maybe
I'll make sure it works right on mesh+ap and mesh+sta modes.
Whilst here, log the VAP i'm being called on to make it clearer what
is going on. I may end up adding a VAP dprintf version of this at
some point.
Tested:
* AR9380, STA (DWDS client) + hostap on the same NIC
description of items residing in a so-called union. FreeBSD currently
only supports 4 such pop levels.
If the push level is not restored within the processing of the same
HID item, an invalid memory location may be used for subsequent HID
item processing.
Verify that the push level is always valid when processing HID items.
Reported by: Andy Nguyen (Google)
MFC after: 3 days
Sponsored by: Mellanox Technologies
These are from the linux iwlwifi driver ;the default use smaller
maximum AMPDUs (8k) and a much smaller density (none.) The latter
could cause stability issues.
Tested:
* Tested on Intel 6300, STA mode.
Differential Revision: https://reviews.freebsd.org/D25113
My AMD Ryzen system has 4 AHCI controllers, each supporting 16 MSI vectors.
Since two of the controllers have only one SATA port, limit to single MSI
saves system 30 interrupt vectors for free.
It may be possible to also limit number of MSI vectors to 4 and 8 for the
other two controllers, but according to the AHCI specification after that
controllers may revert to only one vector, that would be a bigger loss to
risk.
MFC after: 2 weeks
This change introduces Comet Lake Mobile Platform support in the e1000
driver along with shared code patches described below.
- Cast return value of e1000_ltr2ns() to higher type to avoid overflow
- Remove useless statement of assigning act_offset
- Add initialization of identification LED
- Fix flow control setup after connected standby:
After connected standby the driver blocks resets during
"AdapterStart" and skips flow control setup. This change adds
condition in e1000_setup_link_ich8lan() to always setup flow control
and to setup physical interface only when there is no need to block
resets.
Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Submitted by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed by: erj@
Tested by: Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after: 1 week
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D25035
That assumption should be true when superio(4) uses the hardware
exlusively. But it turns out to not hold on some real systems.
So, err on the side of correctness rather than performance.
Clear current_ldn in sio_conf_exit.
Reported by: bz
Tested by: bz
MFC after: 1 week
This specifically fixes that TX frames are large enough now to hold a 3900 odd
byte AMSDU (the little ones); me flipping it on earlier messed up transmit!
Tested:
* if_run, STA mode, TX/RX TCP/UDP iperf. TCP is now back to normal and
correctly does ~ 3200 byte AMSDU/fast frames (2x1600ish byte MSDUs).
This flips on basic 11n for 2GHz/5GHz station operation.
* It flips on HT20 and MCS rates;
* It enables A-MPDU decap - the payload format is a bit different;
* It does do some basic checks for HT40 but I haven't yet flipped on
HT40 support;
* It enables software A-MSDU transmit; I honestly don't want to make
A-MPDU TX work and there are apparently issues with QoS and A-MPDU TX.
So I totally am ignoring A-MPDU TX;
* MCS rate transmit is fine.
I haven't:
* A-MPDU TX, as I said above;
* made radiotap work fully;
* HT40;
* short-GI support;
* lots of other stuff that honestly no-one is likely to use.
But! Hey, this is another ye olde 11n USB NIC that now works pretty OK
in 11n rates. A-MPDU receive seems fine enough given it's a draft-n
device from before 2010.
Tested:
* Ye olde UB82 Test NIC (AR9170 + AR9104) - 2GHz/5GHz
This change prevents a race that happens when rxsync dequeues
N-1 rx packets (with N being the size of the netmap rx ring).
In this situation, the loop exits without re-enabling the
rx interrupts, thus causing the VQ to stall.
MFC after: 1 week
The new index tracks the next netmap slot that is going
to be enqueued into the virtqueue. The index is necessary
to prevent the receive VQ and the netmap rx ring from going
out of sync, considering that we never enqueue N slots, but
at most N-1. This change fixes a bug that causes the VQ
and the netmap ring to go out of sync after N-1 packets
have been received.
MFC after: 1 week
The netmap_rx_irq() function normally wakes up user-space threads
waiting for more packets. In this case, it is not necessary to
call it under the driver queue lock. However, if the interface is
attached to a VALE switch, netmap_rx_irq() ends up calling rxsync
on the interface (see netmap_bwrap_intr_notify()). Although
concurrent rxsyncs are serialized through the kring lock
(see nm_kr_tryget()), the lock acquire operation is not blocking.
As a result, it may happen that netmap_rx_irq() is called on
an RX ring while another instance is running, causing the
second call to fail, and received packets stall in the receive VQ.
We fix this issue by calling netmap_irx_irq() under the VQ lock.
MFC after: 1 week
The netmap_rx_irq() function may return NM_IRQ_RESCHED to inform the
driver that more work is pending, and that netmap expects netmap_rx_irq()
to be called again as soon as possible.
This change implements this behaviour in the vtnet driver.
MFC after: 1 week
* Set the tx/rx chains based on the existing MIMO eeprom reads
* Add 3-chain rates
Tested:
* MAC/BBP RT5390 (rev 0x0502), RF RT5370 (MIMO 1T1R), 2g/5g STA
* MAC/BBP RT3593 (rev 0x0402), RF RT3053 (MIMO 3T3R), 2g/5g STA
Now that I'm a proud owner of an ASUS USB-N66, I can test 2G/5G and
3-stream configurations.
For now, just flip on 5G HT rates. I've tested this in both
5G HT20 and 5G 11a modes. It's still one stream for now until
we verify that the number of streams reported (ie the MIMO below)
is actually the number of 11n streams, NOT the number of antennas.
(They don't have to match! You can have more antennas than MIMO
streams!)
Tested:
* run0: MAC/BBP RT3593 (rev 0x0402), RF RT3053 (MIMO 3T3R)
r361601 implemented basic support for cleaing the console history buffer.
But after clearing the history buffer, it's not especially useful to be
able to scroll back through that buffer, or for the cursor position to
remain at (very likely) the bottom of the screen.
PR: 224436
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D25079
which happens on some laptops after returning to legacy multiplexing mode
at initialization stage.
PR: 242542
Reported by: Felix Palmen <felix@palmen-it.de>
MFC after: 1 week
netmap assumes the one "slot" is left unused to distinguish
the empty ring and full ring conditions. This assumption was
violated by vtnet_netmap_rxq_populate().
MFC after: 1 week
The functionality contained in this function is duplicated,
as it is already available in vtnet_txq_free_mbufs()
and vtnet_rxq_free_mbufs().
MFC after: 1 week
The vtnet_netmap_rxq_populate() function erroneously assumed
that kring->nr_hwcur = 0, i.e. the kring was in the initial
state. However, this is not always the case: for example,
when a vtnet reinit is triggered by some changes in the
interface flags or capenable.
This patch changes the behaviour of vtnet_netmap_kring_refill()
so that it always starts publishing the netmap buffers starting
from the current value of kring->nr_hwcur.
MFC after: 1 week
This is something I added a few years ago to handle resyncing the beacon if
we miss a beacon or need to sync after association/reassociation/powersave.
However, if we're doing STA+AP mode (eg DWDS) then we don't want
to reprogram the beacons here; this may upset normal AP operation.
I missed checking for the sc->sc_swbmiss flag so I was reinitialising
the beacon timers after every beacon miss / TSFOOR option, and
that isn't likely good.
This plus ensuring that STA's are created with "-beacon" to disable
BMISS/TSFOOR processing will hopefully quieten some of the issues
I've seen with missed beacons / TSFOOR (out of range) interrupts
coming in when operating in STA mode.
Tested:
* AR9380/AR9580, STA+AP modes
* Enable self-generated 11n frames
* add MCS rates for 1-stream and 2-stream rates; will do 3-stream
once the rest of this tests out OK with other people.
* Hard-code 1 stream for now
* Add A-MPDU RX mbuf tagging
* RTS/CTS if doing RTSCTS in HT protmode as well as legacy; they're
separate configuration flags
* Update the amrr rate index stuff - walk the rates array like others
to find the right one - this now works for MCS and CCK/OFDM rates
* Add support for atheros fast frames/AMSDU support as we can generate
those in net80211.
TODO:
* HT40 isn't enabled yet
* No A-MPDU support just yet; that requires some more firmware research
and maybe porting some ath(4) A-MPDU support/tracking into net80211
* Short preamble flags aren't set yet for MCS; need to check the linux
driver and see what's going on there
* Add 3x3 rates and set tx/rx stream configuration appropriately
* More 5GHz testing; I have a 3x3 dual band USB NIC coming soon that'll
let me test this.
* Figure out why the RX path isn't performing as fast as it could -
there's only a single buffer loaded at a time for the receive path
in the USB bulk handler and this may not be super useful.
Tested:
* RT5390 usb, 1x1, RF5370 (2GHz radio), STA mode - A-MSDU TX, A-MPDU RX
Submitted by: Ashish Gupta <ashishgu@andrew.cmu.edu>
Differential Revision: https://reviews.freebsd.org/D22840
PCI bus driver restores most but not all of a child PCI-PCI bridge
configuration. The bridge's I/O windows are restored by pcib driver and
that happens later in time. This can be problematic because the Command
register is restored before the windows are restored. If the firmware
programs the windows incorrectly or even does not program them at all,
then the bridge can start claiming I/O cycles that are not intended for
it. This will continue until the correct windows are restored.
I have observed this problem with a buggy BIOS where after resuming from
S3 an I/O port window of a PCI-PCI bridge was configured with zero base
and limit causing the bridge to claim 0x0 - 0xFFF port range. That
interfered with ACPI port access including ACPI PM Timer at port 0x808,
thus wreaking havoc in the time keeping.
The solution is to restore the Command register of PCI-PCI bridges after
the windows are restored in pcib driver. While here, I decided that for
other PCI device types (normal and cardbus) it's better to restore the
Command register after their BARs are restored.
To do: per jhb's suggestion, move the window handling to pci driver.
Reviewed by: imp, jhb, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25028
Extract scrollback buffer initialization into a common routine, used both
during vt(4) init and in handling the CONS_CLRHIST ioctl.
PR: 224436
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D24815
If there's no data to read from xenstore short-circuit
xctrl_on_watch_event to return early, there's no reason to continue
since the lack of data would prevent matching against any known event
type.
Sponsored by: Citrix Systems R&D
MFC with: r352925
MFC after: 1 week
The correct type to use to represent disk sectors is blkif_sector_t
(which is an uint64_t underneath). This avoid truncation of the disk
size calculation when resizing on i386, as otherwise the calculation
of d_mediasize in xbd_connect is truncated to the size of unsigned
long, which is 32bits on i386.
Note this issue didn't affect amd64, because the size of unsigned long
is 64bits there.
Sponsored by: Citrix Systems R&D
MFC after: 1 week
Until net80211 grows a specific ticks type that matches the system,
manually use the same type as the kernel/net80211 'ticks' type
(signed int.)
Tested:
* AR9380, STA mode
The ice(4) driver is the driver for the Intel E8xx series Ethernet
controllers; currently with codenames Columbiaville and
Columbia Park.
These new controllers support 100G speeds, as well as introducing
more queues, better virtualization support, and more offload
capabilities. Future work will enable virtual functions (like
in ixl(4)) and the other functionality outlined above.
For full functionality, the kernel should be compiled with
"device ice_ddp" like in the amd64 NOTES file, and/or
ice_ddp_load="YES" should be added to /boot/loader.conf so that
the DDP package file included in this commit can be downloaded
to the adapter. Otherwise, the adapter will fall back to a single
queue mode with limited functionality.
A man page for this driver will be forthcoming.
MFC after: 1 month
Relnotes: yes
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21959
Driver version upgrade is connected with support for the new device
fetures, like Tx drops reporting or disabling meta caching.
Moreover, the driver configuration from the sysctl was reworked to
provide safer and better flow for configuring:
* number of IO queues (new feature),
* drbr size on Tx,
* Rx queue size.
Moreover, a lot of minor bug fixes and improvements were added.
Copyright date in the license of the modified files in this release was
updated to 2020.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
There is no guarantee from bus_dmamap_load_mbuf_sg() for matching
mbuf chain segments to dma physical segments.
This patch ensure correctly mapping to LLQ header and DMA segments.
Submitted by: Ido Segev <idose@amazon.com>
Obtained from: Amazon, Inc.
There is ena_free_all_io_rings_resources() called twice on device
detach:
ena_detach():
ena_destroy_device():
/* First call */
ena_free_all_io_rings_resources()
/* Second call */
ena_free_all_io_rings_resources()
The double-free causes panic() on kldunload, for example.
As the ena_destroy_device() is also called by ena_reset_task() it is
better to stay unchanged. Thus, remove the "Second call" of the function.
Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Determined by a flag passed from the device. No metadata is set within
ena_tx_csum when caching is disabled.
Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
If requested size of IO queues is not supported try to decrease it until
finding the highest value that can be satisfied.
Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
By default, in ena_attach() the driver attempts to acquire
ena_adapter::max_num_io_queues MSI-X vectors for the purpose of IO
queues, however this is not guaranteed. The number of vectors acquired
depends also on system resources availability.
Regardless of that, enable the number of effectively used IO queues to
be further limited through the sysctl node.
Example: Assumming that there are 8 IO queues configured by default, the
command
$ sysctl dev.ena.0.io_queues_nb=4
will reduce the number of available IO queues to 4. Similarly, the value
can be also increased up to maximum supported value. A value higher than
maximum supported number of IO queues is ignored. Zero is ignored too.
Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Make the ena_adapter::num_io_queues a number of effectively used IO
queues. While the ena_adapter::max_num_io_queues is an upper-bound
specified by the HW, the ena_adapter::num_io_queues may be lower than
that, depending on runtime system resources availability.
On reset, there are called ena_destroy_device() and then
ena_restore_device(). The latter calls, in turn, ena_enable_msix(),
which will attempt to re-acquire ena_adapter::max_num_io_queues of
MSIX vectors again.
Thus, the value of ena_adapter::num_io_queues may be different before
and after reset. For this reason, free the IO rings structures (drbr,
counters) in ena_destroy_device() and allocate again in
ena_restore_device().
Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
This method has been aligned with the way how the Rx queue size is being
updated - so it's now done synchronously instead of resetting the
device.
Moreover, the input parameter is now being validated if it's a power of
2. Without this, it can cause kernel panic.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
This patch reworks how the Rx queue size is being reconfigured and how
the information from the device is being processed.
Reconfiguration of the queues and reset of the device in order to make
the changes alive isn't the best approach. It can be done synchronously
and it will let to pass information if the reconfiguration was
successful to the user. It now is done in the ena_update_queue_size()
function.
To avoid reallocation of the ring buffer, statistic counters and the
reinitialization of the mutexes when only new size has to be assigned,
the io queues initialization function has been split into 2 stages:
basic, which is just copying appropriate fields and the advanced, which
allocates and inits more advanced structures for the IO rings.
Moreover, now the max allowed Rx and Tx ring size is being kept
statically in the adapter and the size of the variables holding those
values has been changed to uint32_t everywhere.
Information about IO queues size is now being logged in the up routine
instead of the attach.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Recent changes to the epoch requires driver to notify that they knows
epoch in order to prevent input packet function to enter epoch each
time the packet is received.
ENA is using NET_TASK for handling Rx, so it's entering epoch
automatically whenever this task is being executed.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
If the conditional check for ENA_FLAG_DEV_UP is negated, the body of the
function can have smaller indentation and it makes the code cleaner.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
As functions which are declared in the header files are intended to be
the interface and are going to be used by other files, it's better to
include argument names in the definition, so the caller won't have to
check the .c file in order to check their meaning and order.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Currently, the driver had 2 global locks - one was sx lock used for
up/down synchronization and the second one was mutex, which was used
for link configuration and timer service callout.
It is better to have single lock for that. We cannot use mutex, as it
can sleep and cause witness errors in up/down configuration, so sx lock
seems to be the only choice.
Callout cannot use sx lock, but the timer service is MP safe, so we just
need to avoid race between ena_down() and ena_detach(). It can be
avoided by acquiring sx lock.
Simple macros were added that are encapsulating implementation of the
lock and makes the code cleaner.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
As the reset triggering is no longer a simple macro that was just
setting appropriate flag, the new function for triggering reset was
added. It improves code readability a lot, as we are avoiding additional
indentation.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
The function ena_enable_msix_and_set_admin_interrupts takes two
arguments while the second is not used and so can be spared. This is a
static function, only ena.c is affected.
Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Tx drops statistics are fetched from HW every ena_keepalive_wd() call
and are observable using one of the commands:
* sysctl dev.ena.0.hw_stats.tx_drops
* netstat -I ena0 -d
Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
* Removed adaptive interrupt moderation (not suported on FreeBSD).
* Use ena_com_free_q_entries instead of ena_com_free_desc.
* Don't use ENA_MEM_FREE outside of the ena_com.
* Don't use barriers before calling doorbells as it's already done in
the HAL.
* Add function that generates random RSS key, common for all driver's
interfaces.
* Change admin stats sysctls to U64.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Although I added the reset type field to ath_hal_reset() years ago,
I never finished adding it both throughout the HALs and in if_ath.c.
This will eventually deprecate the ath_hal force_full_reset option
because it can be requested at the driver layer.
So:
* Teach ar5416ChipReset() and ar9300_chip_reset() about the HAL type
* Use it in ar5416Reset() and ar9300_reset() when doing a full chip reset
* Extend ath_reset() to include the HAL_RESET_TYPE parameter added in the above functions
* Use HAL_RESET_NORMAL in most calls to ath_reset()
* .. but use HAL_RESET_BBPANIC for the BB panics, and HAL_RESET_FORCE_COLD during fatal, beacon miss and other hardware related hangs.
This should be a glorified no-op outside of actual hardware issues.
I've tested things with ath_hal force_full_reset set to 1 for years now,
so I know that feature and a full reset works (albeit much slower than
a warm reset!) and it does unwedge hardware.
The eventual aim is to use this for all the places where the driver
detects a potential hang as well as if long calibration - ie, noise floor
calibration - fails to complete. That's one of the big hardware related
things that causes station mode operation to hang without easy recovery.
Differential Revision: https://reviews.freebsd.org/D24981
Some crypto consumers such as GELI and KTLS for file-backed sendfile
need to store their output in a separate buffer from the input.
Currently these consumers copy the contents of the input buffer into
the output buffer and queue an in-place crypto operation on the output
buffer. Using a separate output buffer avoids this copy.
- Create a new 'struct crypto_buffer' describing a crypto buffer
containing a type and type-specific fields. crp_ilen is gone,
instead buffers that use a flat kernel buffer have a cb_buf_len
field for their length. The length of other buffer types is
inferred from the backing store (e.g. uio_resid for a uio).
Requests now have two such structures: crp_buf for the input buffer,
and crp_obuf for the output buffer.
- Consumers now use helper functions (crypto_use_*,
e.g. crypto_use_mbuf()) to configure the input buffer. If an output
buffer is not configured, the request still modifies the input
buffer in-place. A consumer uses a second set of helper functions
(crypto_use_output_*) to configure an output buffer.
- Consumers must request support for separate output buffers when
creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are
only permitted to queue a request with a separate output buffer on
sessions with this flag set. Existing drivers already reject
sessions with unknown flags, so this permits drivers to be modified
to support this extension without requiring all drivers to change.
- Several data-related functions now have matching versions that
operate on an explicit buffer (e.g. crypto_apply_buf,
crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf).
- Most of the existing data-related functions operate on the input
buffer. However crypto_copyback always writes to the output buffer
if a request uses a separate output buffer.
- For the regions in input/output buffers, the following conventions
are followed:
- AAD and IV are always present in input only and their
fields are offsets into the input buffer.
- payload is always present in both buffers. If a request uses a
separate output buffer, it must set a new crp_payload_start_output
field to the offset of the payload in the output buffer.
- digest is in the input buffer for verify operations, and in the
output buffer for compute operations. crp_digest_start is relative
to the appropriate buffer.
- Add a crypto buffer cursor abstraction. This is a more general form
of some bits in the cryptosoft driver that tried to always use uio's.
However, compared to the original code, this avoids rewalking the uio
iovec array for requests with multiple vectors. It also avoids
allocate an iovec array for mbufs and populating it by instead walking
the mbuf chain directly.
- Update the cryptosoft(4) driver to support separate output buffers
making use of the cursor abstraction.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24545
Implement support for AHCI controller found in
NXP QorIQ Layerscape SoCs.
Submitted by: Artur Rojek <ar@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D24466
This patch introduces support for Epson RX-8803 RTC controller accessible
over I2C bus. It has a resolution of 1 sec.
Support for interrupt based alarm was not implemented.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D24364
Add basic TCA6416 GPIO expander support over I2C bus. The driver handles
enabling and disabling pins, setting pin mode to IN and OUT and
toggling the pins. External interrupts are not supported.
Submitted by: Dawid Gorecki <dgr@semihalf.com>
Reviewed by: manu, mmel
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D24363
For TLS v1.3 the 12 bytes of the initial vector, IV, should just be copied
as-is from the kernel to the gcm_iv field, which hold the first 4 bytes,
and the remaining 8 bytes go to the subsequent implicit_iv field.
There is no need to consider the byte order on the 12 bytes of IV like
initially done.
Sponsored by: Mellanox Technologies
Setting so_snd.sb_lowat to at least 1/8 of the socket buffer size allows
send thread more actively use PDUs coalescing, that dramatically reduces
TCP lock congestion and number of context switches, when the socket is
full and PDUs are small.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
This is all very long-standing bug stuff that is touchy and still poorly
documented. Ok, here goes.
The basic bug:
* deleting a VAP causes the RX path (and TX path too) to be restarted
without a full chip reset, which causes RX hangs on the AR9380 and later.
(ie, the ones with the newer DMA engine.)
The basic fix:
* do an RX flush when stopping RX in ath_vap_delete() to match what happens
when RX is stopped elsewhere. This ensures any pending frames are completed
and we restart at the right spot; it also ensures we don't push new RX buffers
into the hardware if we're stopping receive.
The other issues I found:
* Don't bother checking the RX packet ring in the deferred read taskqueue;
that's specifically supposed to be for completing frames rather than
just yanking them off the receive ring.
* Cancel/drain any pending deferred read taskqueue. This isn't done inside
any locks so we should be super careful here. This stops the hardware
being reprogrammed at the same time in another thread/CPU whilst we're
stopping RX.
* .. (yes, this should be better serialised, but that's for another day. maybe.)
* Add more debugging to trace what's going on here.
And the fun bit:
* Reinitialise the RX FIFO ONLY if we've been reset or stopped, rather than just
reset. I noticed that after all the above was done I was STILL seeing RXEOL.
RXEOL isn't enabled on the AR9380 so I'd only see it if I was sending TX frames
(ie a ping where it'd be transmitted but never received) so I was not being
spammed by RXEOL. So, as long as stuff is stopped, restart it.
This seems to be doing the right thing in both AP and STA modes.
What I should do next, if I ever get time:
* as I said above, serialise the receive stop/start to include taskqueues
* monitor RXEOL on the AR9380 and I keep seeing it spammed / lockups, just
go do a full chip reset to get things back on track. It sucks, but it
is better than nothing.
Tested:
* AR9380 AP/STA mode, adding/deleting a hostap VAP to trigger the TX/RX
queue stop/start; whilst also running an iperf through it. Lots of times.
Lots. Of.. Times.
I have to dig into why I'm seeing it on chips as late as the AR9380 era
stuff (as it's marked as an AR5416 bug, but who knows!) but i'm seeing
aggregate TX frames complete with no blockack bit set. So, everything
should be treated as a failure and do a hardware reset for good measure.
Tested:
* AR9380, STA mode
* AR9580 (5GHz), AP mode
I wasn't enforcing the maximum packet length when using static rates
so although the driver was enforcing it itself OK, the statistics were
sometimes going into the wrong bin.
Tested:
* AR9380, STA mode
- Consistently use 'void *' for key schedules / key contexts instead
of a mix of 'caddr_t', 'uint8_t *', and 'void *'.
- Add a ctxsize member to enc_xform similar to what auth transforms use
and require callers to malloc/zfree the context. The setkey callback
now supplies the caller-allocated context pointer and the zerokey
callback is removed. Callers now always use zfree() to ensure
key contexts are zeroed.
- Consistently use C99 initializers for all statically-initialized
instances of 'struct enc_xform'.
- Change the encrypt and decrypt functions to accept separate in and
out buffer pointers. Almost all of the backend crypto functions
already supported separate input and output buffers and this makes
it simpler to support separate buffers in OCF.
- Remove xform_userland.h shim to permit transforms to be compiled in
userland. Transforms no longer call malloc/free directly.
Reviewed by: cem (earlier version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24855
This change adds Hyper-V socket feature in FreeBSD. New socket address
family AF_HYPERV and its kernel support are added.
Submitted by: Wei Hu <weh@microsoft.com>
Reviewed by: Dexuan Cui <decui@microsoft.com>
Relnotes: yes
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D24061
Fix returning from xenstore device with locks held, which triggers the
following panic:
# cat /dev/xen/xenstore
^C
userret: returning with the following locks held:
exclusive sx evtchn_ringc_sx (evtchn_ringc_sx) r = 0 (0xfffff8000650be40) locked @ /usr/src/sys/dev/xen/evtchn/evtchn_dev.c:262
Note this is not a security issue since access to the device is
limited to root by default.
Sponsored by: Citrix Systems R&D
MFC after: 1 week
Previously the driver handled the bit within itself, but did not expose
the state change to net80211 and interface layers.
This change uses net80211 KPI for rfkill signaling.
The code is modeled after similar code in iwn and wpi.
Reviewed by: adrian
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24923
My preivous logic was a bit wrong. This caused transmissions that failed due
to a mix of short and long retries to count intermediate rates as OK if the
LONG retry count indicated some retries had made it to this intermediate rate,
but the SHORT retry count was the one that caused the whole transmit to fail.
Now status is passed in again - and this is the status for the whole transmission -
and then update_stats() does some quick math to see if the current transmission
series hit its long retry count or not before updating things as a success
or failure.
into account and remove the requirement that the MCS rate is "higher" if we're
considering a new rate.
Ok, another fun one.
* In order for reliable non-software retried higher MCS rates, the TX schedules
(inconsistently!) use hard-coded lower rates at the end of the schedule.
Now, hard-coded is a problem because (a) it means that aggregate formation
is limited by the SLOWEST rate, so I never formed large AMDU frames for
3 stream rates, and (b) if the AP disables lower rates as base rates, it
complains about "unknown rix" every frame you transmit at that rate.
So, for now just disable the third and fourth schedule entry for AMPDUs.
Now I'm forming 32k and 64k aggregates for the higher density MCS rates
much more reliably.
It would be much nicer if the rate schedule stuff wasn't fixed but instead
I'd just populate ath_rc_series[] when I fetch the rates. This is all a
holdover of ye olde pre-11n stuff and I really just need to nuke it.
But for now, ye hack.
* The check for "is this MCS rate better" based on MCS itself is just garbage.
It meant things like going MCS0->7 would be fine, and say 0->8->16 is fine,
(as they're equivalent encoding but 1,2,3 spatial streams), BUT it meant
going something like MCS7->11 would fail even though it's likely that
MCS11 would just be better, both for EWMA/BER and throughput.
So for now just use the average tx time. The "right" way for this comparison
would be to compare PHY bitrates rather than MCS / rate indexes, but I'm not
yet there. The bit rates ARE available in the PHY index, but honestly
I have a lot of other cleaning up to here before I think about that.
* Don't include the RTS/CTS retry count (and thus time) into the average tx time
caluation. It just makes temporarily failures make the rate look bad by
QUITE A LOT, as RTS/CTS exchanges are (a) long, and (b) mostly irrelevant
to the actual rate being tried. If we keep hitting RTS/CTS failures then
there's something ELSE wrong on the channel, not our selected rate.
* Fix formatting, cause reasons;
* Put back the "and the chosen rate is within 90% of the current rate" logic;
* Ensure the best rate and the current rate aren't the same; this ...
* ... fixes the packets_since_switch[] tracking to actually conut how many
frames since the rate switched, so now I know how stable stuff is; and
* Ensure that MCS can go up to a higher MCS at this or any other spatial stream.
My previous quick hack attempt was doing > rather than >= so you had to go
to both a higher root MCS rate (0..7) and spatial stream. Eg, you couldn't
go from MCS0 (1ss) to MCS8 (2ss) this way.
The best rate and switching rate logic still have a bunch more work to do
because they're still quite touchy when it comes to average tx time but at least
now it's choosing higher rates correctly when it wants to try a higher rate.
Tested:
* AR9380, STA mode
Some laptops don't send ACPI "lid status changed" notifications upon
opening the lid if the system was currently suspended. In r358219
this was partially fixed, updating the "lid_status" variable upon
resume even if there is no "status changed" notification from ACPI.
Unfortunately the fix in r358219 did not include notifying userland
via devd; this causes problems on systems using upowerd (e.g. KDE),
since upowerd remembers the most recent devd notification about the
lid status rather than querying the sysctl to get the current status.
This showed up as two symptoms when KDE's "When laptop lid closed: Sleep"
option is set:
1. 50% of the time, closing the lid would not trigger S3 sleep.
2. 50% of the time, plugging/unplugging AC power would trigger S3 sleep.
PR: 246477
MFC after: 3 days
My initial rate control code was .. suboptimal. I wanted to at least get MCS
rates sent, but it didn't do anywhere near enough to handle low signal level links
or remotely keep accurate statistics.
So, 8 years later, here's what I should've done back then.
* Firstly, I wasn't at all tracking packet sizes other than the two buckets
(250 and 1600 bytes.) So, extend it to include 4096, 8192, 16384, 32768 and
65536. I may go add 2048 at some point if I find it's useful.
This is important for a few reasons. First, when forming A-MPDU or AMSDU
aggregates the frame sizes are larger, and thus the TX time calculation
is woefully, increasingly wrong. Secondly, the behaviour of 802.11 channels
isn't some fixed thing, both due to channel conditions and radios themselves.
Notably, there was some observations done a few years ago on 11n chipsets
which noticed longer aggregates showed an increase in failed A-MPDU sub-frame
reception as you got further along in the transmit time. It could be due to
a variety of things - transmitter linearity, channel conditions changing,
frequency/phase drift, etc - but the observation was to potentially form
shorter aggregates to improve BER.
* .. and then modify the ath TX path to report the length of the aggregate sent,
so as the statistics kept would line up with the correct bucket.
* Then on the rate control look-up side - i was also only using the first frame
length for an A-MPDU rate control lookup which isn't good enough here.
So, add a new method that walks the TID software queue for that node to
find out what the likely length of data available is. It isn't ALL of the
data in the queue because we'll only ever send enough data to fit inside the
block-ack window, so limit how many bytes we return to roughly what ath_tx_form_aggr()
would do.
* .. and cache that in the first ath_buf in the aggregate so it and the eventual
AMPDU length can be returned to the rate control code.
* THEN, modify the rate control code to look at them both when deciding which bucket
to attribute the sent frame on. I'm erring on the side of caution and using the
size bucket that the lookup is based on.
Ok, so now the rate lookups and statistics are "more correct". However, MCS rates
are not the same as 11abg rates in that they're not a monotonically incrementing
set of faster rates and you can't assume that just because a given MCS rate fails,
the next higher one wouldn't work better or be a lower average tx time.
So, I had to do a bunch of surgery to the best rate and sample rate math.
This is the bit that's a WIP.
* First, simplify the statistics updates (update_stats()) to do a single pass on
all rates.
* Next, make sure that each rate average tx time is updated based on /its/ failure/success.
Eg if you sent a frame with { MCS15, MCS12, MCS8 } and MCS8 succeeded, MCS15 and MCS
12 would have their average tx time updated for /their/ part of the transmission,
not the whole transmission.
* Next, EWMA wasn't being fully calculated based on the /failures/ in each of the
rate attempts. So, if MCS15, MCS12 failed above but MCS8 didn't, then ensure
that the statistics noted that /all/ subframes failed at those rates, rather than
the eventual set of transmitted/sent frames. This ensures the EWMA /and/ average
TX time are updated correctly.
* When picking a sample rate and initial rate, probe rates aroud the current MCS
but limit it to MCS0..7 /for all spatial streams/, rather than doing crazy things
like hitting MCS7 and then probing MCS8 - MCS8 is basically MCS0 but two spatial
streams. It's a /lot/ slower than MCS7. Also, the reverse is true - if we're at
MCS8 then don't probe MCS7 as part of it, it's not likely to succeed.
* Fix bugs in pick_best_rate() where I was /immediately/ choosing the highest MCS
rate if there weren't any frames yet transmitted. I was defaulting to 25% EWMA and
.. then each comparison would accept the higher rate. Just skip those; sampling
will fill in the details.
So, this seems to work a lot better. It's not perfect; I'm still seeing a lot of
instability around higher MCS rates because there are bursts of loss/retransmissions
that aren't /too/ bad. But i'll keep iterating over this and tidying up my hacks.
Ok, so why this still something I'm poking at? rather than porting minstrel_ht?
ath_rate_sample tries to minimise airtime, not maximise throughput. I have
extended it with an EWMA based on sub-frame success/failures - high MCS rates
that have partially successful receptions still show super short average frame
times, but a /lot/ of retransmits have to happen for that to work.
So for MCS rates I also track this EWMA and ensure that the rates I'm choosing
don't have super crappy packet failures. I don't mind not getting lower
peak throughput versus minstrel_ht; instead I want to see if I can make "minimise
airtime" work well.
Tested:
* AR9380, STA mode
* AR9344, STA mode
* AR9580, STA/AP mode
This function is responsible for setting pc_domain in each pcpu
structure. Call it from the main function that starts APs, rather than
a separate SYSINIT. This makes it easier to close the window where
UMA's per-CPU slab allocator may be called while pc_domain is
uninitialized. In particular, the allocator uses pc_domain to allocate
domain-local pages, so allocations before this point end up using domain
0 for everything.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24757
__builtin_unreachable doesn't raise any compile-time warnings/errors on its
own, so problems with its usage can't be easily detected. While it would be
nice for this situation to change and compilers to at least add a warning
for trivial cases where local state means the instruction can't be reached,
this isn't the case at the moment and likely will not happen.
This commit adds an __assert_unreachable, whose intent is incredibly clear:
it asserts that this instruction is unreachable. On INVARIANTS builds, it's
a panic(), and on non-INVARIANTS it expands to __unreachable().
Existing users of __unreachable() are converted to __assert_unreachable,
to improve debuggability if this assumption is violated.
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D23793
So, replicate the ATI vendor snoop configuration for the AMD vendor.
I think that this should fix a number of cases where users currently
have to resort to polling or disabling MSI.
MFC after: 1 week
Right now (well, since I did this in 2011/2012) the rate control code
makes some super bad choices for 11n aggregates/rates, and it tracks
statistics even more questionably.
It's been long enough and I'm now trying to use it again daily, so let's
start by:
* telling the rate control code if it's an aggregate or not;
* being clearer about the TID - yes it can be extracted from the
ath_buf but this way it can be overridden by the caller without
changing the TID itself.
(This is for doing experiments with voice/video QoS at some point..)
* Return an optional field to limit how long the aggregate is in
microseconds. Right now the rate control code supplies a rate table
and the ath aggr form code will look at the rate table and limit
the aggregate size to 4ms at the slowest rate. Yeah, this is pretty
terrible.
* Add some more TODO comments around handling txpower, rate and
handling filtered frames status so if I continue to have spoons for
this I can go poke at it.
Yes, people shouldn't use bitfields in C for structure parsing.
If someone ever wants a cleanup task then it'd be great to remove them
from this vendor code and other places in the ar9285/ar9287 HALs.
Alas, here we are.
AH_BYTE_ORDER wasn't defined and neither were the two values it could be.
So when compiling ath_ee_print_9300 it'd default to the big endian struct
layout and get a WHOLE lot of stuff wrong.
So:
* move AH_BYTE_ORDER into ath_hal/ah.h where it can be used by everyone.
* ensure that AH_BYTE_ORDER is actually defined before using it!
This should work on both big and little endian platforms.
There are no in-kernel consumers.
Reviewed by: cem
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24775
It no longer has any in-kernel consumers via OCF. smbfs still uses
single DES directly, so sys/crypto/des remains for that use case.
Reviewed by: cem
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24773
There are no longer any in-kernel consumers. The software
implementation was also a non-functional stub.
Reviewed by: cem
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24771
Although a few drivers supported this algorithm, there were never any
in-kernel consumers. cryptosoft and cryptodev never supported it,
and there was not a software xform auth_hash for it.
Reviewed by: cem
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24767
Pursuant to r360398, implement driver-specific versions of the
ifdi_needs_restart iflib device method.
Some (if not most?) Intel network cards don't need reinitializing when a
VLAN is added or removed from the device hardware, so these implement
ifdi_needs_restart in a way that tell iflib not to bring the interface
up or down when a VLAN is added or removed, regardless of whether the
VLAN_HWFILTER interface capability flag is set or not.
This could potentially solve several PRs relating to link flaps that
occur when VLANs are added/removed to devices.
Signed-off-by: Eric Joyner <erj@freebsd.org>
PR: 240818, 241785
Reviewed by: gallatin@, olivier@
MFC after: 3 days
MFC with: r360398
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D24659
r360870 added linux/slab.h into liunx/bitmap.h and this include linux/types.h
The qlnx driver is redefining some of those types so remove them and add an
explicit linux/types.h include.
Pointy hat: manu
Reported by: Austin Shafer <ashafer@badland.io>
Some ethernet switches have very large register windows; for example
the AR8316 switch MIB starts at 0x20000.
Submitted by: Mori Hiroki <yamori813@yahoo.co.jp>
The attach method uses GPIO_GET_BUS() to get a "newbus" device
that provides a pin. But on hints-based systems a GPIO controller
driver might not be fully initialized yet and it does not know gpiobus
hanging off it. Thus, GPIO_GET_BUS() cannot be called yet.
The reason is that controller drivers typically create a child gpiobus
using gpiobus_attach_bus() and that leads to the following call chain:
gpiobus_attach_bus() -> gpiobus_attach() ->
bus_generic_attach(gpiobus) -> gpioiic_attach().
So, gpioiic_attach() is called before gpiobus_attach_bus() returns.
I observed this bug with nctgpio driver on amd64.
I think that the problem was introduced in r355276.
The fix is to avoid calling GPIO_GET_BUS() from the attach method.
Instead, we know that on hints-based systems only the parent gpiobus can
provide the pins.
Nothing is changed for FDT-based systems.
MFC after: 1 week
Sometimes, especially when there is not much memory in the system left,
allocating mbuf jumbo clusters (like 9KB or 16KB) can take a lot of time
and it is not guaranteed that it'll succeed. In that situation, the
fallback will work, but if the refill needs to take a place for a lot of
descriptors at once, the time spent in m_getjcl looking for memory can
cause system unresponsiveness due to high priority of the Rx task. This
can also lead to driver reset, because Tx cleanup routine is being
blocked and timer service could detect that Tx packets aren't cleaned
up. The reset routine can further create another unresponsiveness - Rx
rings are being refilled there, so m_getjcl will again burn the CPU.
This was causing NVMe driver timeouts and resets, because network driver
is having higher priority.
Instead of 16KB jumbo clusters for the Rx buffers, 9KB clusters are
enough - ENA MTU is being set to 9K anyway, so it's very unlikely that
more space than 9KB will be needed.
However, 9KB jumbo clusters can still cause issues, so by default the
page size mbuf cluster will be used for the Rx descriptors. This can have a
small (~2%) impact on the throughput of the device, so to restore
original behavior, one must change sysctl "hw.ena.enable_9k_mbufs" to
"1" in "/boot/loader.conf" file.
As a part of this patch (important fix), the version of the driver
was updated to v2.1.2.
Submitted by: cperciva
Reviewed by: Michal Krawczyk <mk@semihalf.com>
Reviewed by: Ido Segev <idose@amazon.com>
Reviewed by: Guy Tzalik <gtzalik@amazon.com>
MFC after: 3 days
PR: 225791, 234838, 235856, 236989, 243531
Differential Revision: https://reviews.freebsd.org/D24546
The bus is independent of the device, so all devices can be attached to
either a PCI bus or an MMIO bus. For example, QEMU's virtio-rng-device
gives the MMIO variant of virtio-rng-pci, and is now detected.
Reviewed by: andrew, br, brooks (mentor)
Approved by: andrew, br, brooks (mentor)
Differential Revision: https://reviews.freebsd.org/D24730
The non-legacy virtio MMIO specification drops the use of PFNs and
replaces them with physical addresses. Whilst many implementations are
so-called transitional devices, also implementing the legacy
specification, TinyEMU[1] does not. Device-specific configuration
registers have also changed to being little-endian, and must be accessed
using a single aligned access for registers up to 32 bits, and two
32-bit aligned accesses for 64-bit registers.
[1] https://bellard.org/tinyemu/
Reviewed by: br, brooks (mentor)
Approved by: br, brooks (mentor)
Differential Revision: https://reviews.freebsd.org/D24681
With the removal of in-tree consumers of DES, Triple DES, and
MD5-HMAC, the only algorithm this driver still supports is SHA1-HMAC.
This is not very useful as a standalone algorithm (IPsec AH-only with
SHA1 would be the only user).
This driver has also not been kept up to date with the original driver
in OpenBSD which supports a few more cards and AES-CBC on newer cards.
The newest card currently supported by this driver was released in
2005.
Reviewed by: cem
MFC after: 1 week
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24691
Only _BCL and _BCM methods seem to be essential to the driver's
operation. If _BQC is missing then we can assume that the current
brightness is whatever we set by the last _BCM invocation. If _DCS or
_DGS is missing the we can make assumptions as well.
The change is based on a patch suggested by Anthony Jenkins
<Scoobi_doo@yahoo.com> in PR 207086.
PR: 207086
Submitted by: Anthony Jenkins <Scoobi_doo@yahoo.com (earlier version)
Reviewed by: manu
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D24653
"F lock" is a switch between two sets of scancodes for function keys F1-F12
found on some Logitech and Microsoft PS/2 keyboards [1]. When "F lock" is
pressed, then F1-F12 act as function keys and produce usual keyscans for
these keys. When "F lock" is depressed, F1-F12 produced the same keyscans
but prefixed with E0.
Some laptops use [2] E0-prefixed F1-F12 scancodes for non-standard keys.
[1] https://www.win.tue.nl/~aeb/linux/kbd/scancodes-6.html
[2] https://reviews.freebsd.org/D21565
MFC after: 2 weeks
o Shrink sglist(9) functions to work with multipage mbufs down from
four functions to two.
o Don't use 'struct mbuf_ext_pgs *' as argument, use struct mbuf.
o Rename to something matching _epg.
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598
The following series of patches addresses three things:
Now that array of pages is embedded into mbuf, we no longer need
separate structure to pass around, so struct mbuf_ext_pgs is an
artifact of the first implementation. And struct mbuf_ext_pgs_data
is a crutch to accomodate the main idea r359919 with minimal churn.
Also, M_EXT of type EXT_PGS are just a synonym of M_NOMAP.
The namespace for the newfeature is somewhat inconsistent and
sometimes has a lengthy prefixes. In these patches we will
gradually bring the namespace to "m_epg" prefix for all mbuf
fields and most functions.
Step 1 of 4:
o Anonymize mbuf_ext_pgs_data, embed in m_ext
o Embed mbuf_ext_pgs
o Start documenting all this entanglement
Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598
All callers are currently filtering bad nsid to this function,
however, we'll have undefined behavior if that's not true. Add the
KASSERT to prevent that.
Add the nvmeX device to the XPT_PATH_INQ nvme specific
information. while one could figure this out by looking up the
domain🚌slot:function, it's a lot easier to have the SIM set it
directly since the sim knows this.
This largely reuses the TLS TOE support added in r330884. However,
this uses the KTLS framework in upstream OpenSSL rather than requiring
Chelsio-specific patches to OpenSSL. As with the existing TLS TOE
support, use of RX offload requires setting the tls_rx_ports sysctl.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24453
- Add a new TCP_RXTLS_ENABLE socket option to set the encryption and
authentication algorithms and keys as well as the initial sequence
number.
- When reading from a socket using KTLS receive, applications must use
recvmsg(). Each successful call to recvmsg() will return a single
TLS record. A new TCP control message, TLS_GET_RECORD, will contain
the TLS record header of the decrypted record. The regular message
buffer passed to recvmsg() will receive the decrypted payload. This
is similar to the interface used by Linux's KTLS RX except that
Linux does not return the full TLS header in the control message.
- Add plumbing to the TOE KTLS interface to request either transmit
or receive KTLS sessions.
- When a socket is using receive KTLS, redirect reads from
soreceive_stream() into soreceive_generic().
- Note that this interface is currently only defined for TLS 1.1 and
1.2, though I believe we will be able to reuse the same interface
and structures for 1.3.
Instead, copy the strings into a temporary buffer on the stack and
run strcmp on the copies.
Reviewed by: brooks, kib
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D24567
Some models of laptops e.g. "X1 Carbon 3rd Gen Thinkpad" have LRM buttons
wired as so called "Synaptic touchpads extended buttons" rather thah real
trackpoint buttons. Handle this case with merging of events from both
sources.
PR: 245877
Reported by: Raichoo <raichoo@googlemail.com>
MFC after: 1 week
Kernel prints the device announcement before the attach method is
called, so if the correct description is not set by the probe method,
then the announcement would have an incorrect one.
MFC after: 1 week
Use DRIVER_MODULE_ORDERED(SI_ORDER_ANY) so that ig4's ACPI attachment
happens after iicbus and acpi_iicbus drivers are registered.
I have seen a problem where iicbus attached under ig4 instead of
acpi_iicbus when ig4.ko was loaded with kldload. I believe that that
happened because ig4 driver was a first driver to register, it attached
and created an iicbus child. Then iicbus driver was registered and,
since it was the only driver that could attach to the iicbus child
device, it did exactly that. After that acpi_iicbus driver was
registered. It would be able to attach to the iicbus device, but it was
already attached, so nothing happened.
MFC after: 2 weeks
In r360131, acpi_ec probe was changed to not clobber an error status prior to
several error cases that did not explicitly set the error variable before
goto'ing the exit path. However, I did not notice that the error variable was
not set to success in the success path. That caused all successful probes to
fail, which is obviously undesirable.
PR: 245778
Reported by: Neel Chauhan <neel AT neelc.org>, Evilham <contact AT evilham.com>
Tested by: Evilham
X-MFC-With: r360131
as the dma_device during RDMA registration.
cxgbe's struct device cannot be used as-is because it's a native FreeBSD
driver and ibcore is LinuxKPI based.
MFC after: 1 week
MFC after: r360196
The sole in-tree user of this flag has been retired, so remove this
complexity from all drivers. While here, add a helper routine drivers
can use to read the current request's IV into a local buffer. Use
this routine to replace duplicated code in nearly all drivers.
Reviewed by: cem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24450
In r360126, I meant to have a different mask only on powerpc, not powerpc64.
Update the check to check that we're not compiling for powerpc64.
Reported by: jhibbits
Approved by: wulf (implicit)
MFC after: 2 weeks
X-MFC-Note: 12 only
X-MFC-With: r360126
Differential Revision: D24370 (followup)
All of the 'goto out;' cases in this probe routine without explicit
initialization of 'ret' indicate error cases and were clearly intended
to use the initial definition of 'ret' with ENXIO. However, 'ret' was
accidentally squashed by reuse for a subroutine call near the beginning
of probe.
Use a different variable for the subroutine status to preserve ENXIO ret
for the 'goto out's as a minimal solution to the panic reported at attach
for now.
PR: 245757
Change kern.evdev.rcpt_mask from 3 to 12 by default. This makes us much
more evdev-friendly, and will prevent everyone using xorg and wayland with
evdev devices (the default) from needing to change this locally.
powerpc32 still uses the old value for the keyboard part, becaues the adb
keyboard driver used there is not evdev compatible.
Reviewed by: wulf
Approved by: wulf
MFC after: 2 weeks
X-MFC-Note: 12 only
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D24370
elements in the middle.
This fixes a panic when detaching USB mouse.
PR: 245732
Reviewed by: wulf
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D24500
don't implement link power management, LPM.
This fixes error code XHCI_TRB_ERROR_BANDWIDTH for isochronous USB 3.0
transactions.
Submitted by: Horse Ma <Shichun.Ma@dell.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
snd_hda was rewritten in r230130; one function retained a comment
referencing the previous name.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
A later change, currently being iterated on in D24459, will in-fact change
the lock type to an sx so that TTY drivers can sleep on it if they need to.
Committing this ahead of time to make the review in question a little more
palatable.
tty_lock_assert() is unfortunately still needed for now in two places to
make sure that the tty lock has not been recursed upon, for those scenarios
where it's supplied by the TTY driver and possibly a mutex that is allowed
to recurse.
Suggested by: markj
On my Dell Latitude 7390 laptop, the brightness hotkeys
(Fn+<up/down arrow>) send ACPI notifications which acpi_video
handles by adjusting its brightness setting; but ACPI does not
actually do anything with the backlight.
Announcing brightness changes via devd makes it possible to close
the loop by triggering the intel_backlight utility to perform the
required backlight adjustment.
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D24424
Name each ehci driver uniquely.
This remove the warning printed at each arm boot :
module_register: cannot register simplebus/ehci from kernel; already loaded from kernel
A similar fix was done in r333074 but imx_ehci wasn't renamed and generic_ehci wasn't
present at that time.