There are two possible ways how crypto callback are called: directly from
caller and deffered from crypto thread.
For outbound packets the direct call chain is the following:
IPSEC_OUTPUT() method -> ipsec[46]_common_output() ->
-> ipsec[46]_perform_request() -> xform_output() ->
-> crypto_dispatch() -> crypto_invoke() -> crypto_done() ->
-> xform_output_cb() -> ipsec_process_done() -> ip[6]_output().
The SA and SP references are held while crypto processing is not finished.
The error handling code wrongly expected that crypto callback always called
from the crypto thread context, and it did references releasing in
xform_output_cb(). But when the crypto callback called directly, in case of
error the error handling code in ipsec[46]_perform_request() also did
references releasing.
To fix this, remove error handling from ipsec[46]_perform_request() and do it
in xform_output() before crypto_dispatch().
MFC after: 10 days
Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.
ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.
Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.
Struct xvnode changed layout, no compat shims are provided.
For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.
Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.
Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).
Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439
There are two possible ways how crypto callback are called: directly from
caller and deffered from crypto thread.
For inbound packets the direct call chain is the following:
IPSEC_INPUT() method -> ipsec_common_input() -> xform_input() ->
-> crypto_dispatch() -> crypto_invoke() -> crypto_done() ->
-> xform_input_cb() -> ipsec[46]_common_input_cb() -> netisr_queue().
The SA reference is held while crypto processing is not finished.
The error handling code wrongly expected that crypto callback always called
from the crypto thread context, and it did SA reference releasing in
xform_input_cb(). But when the crypto callback called directly, in case of
error (e.g. data authentification failed) the error handling in
ipsec_common_input() also did SA reference releasing.
To fix this, remove error handling from ipsec_common_input() and do it
in xform_input() before crypto_dispatch().
PR: 219356
MFC after: 10 days
The (eventually) upcoming ath(4) changes will include being able to load
ath(4) devices on the AHB bus (ie the on-die wifi part of the SoC)
as modules.
In order for this to happen, a copy of the calibration data needs to be
copied away before the SPI driver runs or the memory map access hack
won't work.
Now, ideally (!) there'd be some driver that can come up after the MTD
pieces (eg, SPI, NAND, etc) and load into a firmware chunk the calibration
data.
(Or, really really nicely, would be an actual async firmware API that
would lead itself to having a driver schedule a file read - or a raw device
read - to get to the calibration data.)
Now, until all of the above is done - I'm going to perpetuate the layer
breaking atrocity here by simply doing the PCI bus fixup EEPROM/calibration
data hack here. This will work for any AR71xx (and later on, AR231x/AR531x)
device, as well as the handful of QCA MIPS + QCA9880v2 802.11ac boards with
NOR flash.
To use, this goes into the kernel config:
# Enable EEPROM hacks
options AR71XX_ATH_EEPROM
device ar71xx_caldata
device firmware
# This enables the ath_ahb driver (when I commit the change!) to
# pull data out of the firmware hack.
options ATH_EEPROM_FIRMWARE
In the hints file:
# ART calibration data mapping device
hint.ar71xx_caldata.0.at="nexus0"
hint.ar71xx_caldata.0.order=0
# Where the ART is - last 64k in the first 8MB of flash
hint.ar71xx_caldata.0.map.0.ath_fixup_addr=0x1fff0000
hint.ar71xx_caldata.0.map.0.ath_fixup_size=16384
# And now tell the ath(4) driver where to look!
hint.ath.0.eeprom_firmware="ar71xx_caldata.0.map.0.eeprom_firmware"
Tested:
* carambola2, AR933x SoC, using a set of ath and ath_hal modules to load
TODO:
* unify this bit of firmware loading code, as I will definitely need
to include both the PCI bus firmware version (for PCI ID fixups too!)
as well as AHB/on-chip calibration data.
* Commit the ath_ahb bus code
* Convert .. everything over. That'll take the majority of the time.
The textproc/glimpse port expired over 3 years ago because there weren't any
more publicly available distfiles, and because it lacked a maintainer. Remove
the target as it's no longer executable on FreeBSD.
Differential Revision: D10764
MFC after: 1 month
Reviewed by: imp
Sponsored by: Dell EMC Isilon
if it is called on a TCP socket
* with an IPv6 address and the socket is bound to an
IPv4-mapped IPv6 address.
* with an IPv4-mapped IPv6 address and the socket is bound to an
IPv6 address.
Thanks to Jonathan T. Leighton for reporting this issue.
Reviewed by: bz gnn
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D9163
ENA is a networking interface designed to make good use of modern CPU
features and system architectures.
The ENA device exposes a lightweight management interface with a
minimal set of memory mapped registers and extendable command set
through an Admin Queue.
The driver supports a range of ENA devices, is link-speed independent
(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has
a negotiated and extendable feature set.
Some ENA devices support SR-IOV. This driver is used for both the
SR-IOV Physical Function (PF) and Virtual Function (VF) devices.
ENA devices enable high speed and low overhead network traffic
processing by providing multiple Tx/Rx queue pairs (the maximum number
is advertised by the device via the Admin Queue), a dedicated MSI-X
interrupt vector per Tx/Rx queue pair, and CPU cacheline optimized
data placement.
The ENA driver supports industry standard TCP/IP offload features such
as checksum offload and TCP transmit segmentation offload (TSO).
Receive-side scaling (RSS) is supported for multi-core scaling.
The ENA driver and its corresponding devices implement health
monitoring mechanisms such as watchdog, enabling the device and driver
to recover in a manner transparent to the application, as well as
debug logs.
Some of the ENA devices support a working mode called Low-latency
Queue (LLQ), which saves several more microseconds. This feature will
be implemented for driver in future releases.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Jakub Palider <jpa@semihalf.com>
Jan Medala <jan@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10427
Previously open(2) was allowed in capability mode, with a comment that
suggested this was likely the case to facilitate debugging. The system
call would still fail later on, but it's better to disallow the syscall
altogether.
We now have the kern.trap_enotcap sysctl or PROC_TRAPCAP_CTL proccontrol
to aid in debugging.
In any case libc has translated open() to the openat syscall since
r277032.
Reviewed by: kib, rwatson
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10850
Since netfront uses different locks for the RX and TX paths there's no need to
drop the RX lock before calling if_input.
Suggested by: jhb
Tested by: cperciva
Sponsored by: Citrix Systems R&D
MFC with: r318523
When doing AMSDU offload, the driver (for now!) presents 802.11 frames with
the same sequence number and crypto sequence number / IV values up to the stack.
But, this will trip afoul over the sequence number detection.
So drivers now have a way to signify that a frame is part of an offloaded
AMSDU group, so we can just ensure that we pass those frames up to the
stack.
The logic will be a bit messy - the TL;DR will be that if it's part of
the previously seen sequence number then it belongs in the same burst.
But if we get a repeat of the same sequence number (eg we sent an ACK
but the receiver didn't hear it) then we shouldn't be passing those frames
up. So, we can't just say "all subframes go up", we need to track
whether we've seen the end of a burst of frames for the given sequence
number or not, so we know whether to actually pass them up or not.
The first part of doing all of this is to ensure the ieee80211_rx_stats
struct is available in the RX sequence number check path and the
RX ampdu reorder path. So, start by passing the pointer into these
functions to avoid doing another lookup.
The actual support will come in a subsequent commit once I know the
functionality actually works!
General cleanup, for diff reduction with NetBSD and future use by FAT
support in makefs.
Submitted by: Siva Mahadevan <smahadevan@freebsdfoundation.org>
Obtained from: NetBSD
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10821
When creating EQs to handle CQ completion events for the PF or for
VFs, we create enough EQE entries to handle completions for the max
number of CQs that can use that EQ.
When SRIOV is activated, the max number of CQs a VF (or the PF) can
obtain is its CQ quota (determined by the Hypervisor resource
tracker). Therefore, when creating an EQ, the number of EQE entries
that the VF should request for that EQ is the CQ quota value (and not
the total number of CQs available in the firmware).
Under SRIOV, the PF, also must use its CQ quota, because the resource
tracker also controls how many CQs the PF can obtain.
Using the firmware total CQs instead of the CQ quota when creating EQs
resulted wasting MTT entries, due to allocating more EQEs than were
needed.
MFC after: 3 days
Sponsored by: Mellanox Technologies
(TS) method is used. When packet timestamp is used, the "current_qdelay"
keeps storing the last queue delay value calculated in the dequeue
function. Therefore, when a burst of packets arrives followed by
a pause, the "current_qdelay" will store a high value caused by the
burst and stick to that value during the pause because the queue
delay measurement is done inside the dequeue function. This causes
the drop probability calculation function to calculate high drop
probability value instead of zero and prevents the burst allowance
mechanism from working properly. Fix this problem by resetting
"current_qdelay" inside the drop probability calculation function
when the queue length is zero and TS option is used.
Submitted by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 1 week
Marvell Armada 380 is a uni-processor variant of the 38x SoC
family. A function platform_mp_setmaxid() was setting a hardcoded
value, which caused boot fail on A380. Fix this by relying on
the CPU count obtained from device tree nodes.
Submitted by: Marcin Wojtas <mw@semihalf.com
Obtained from: Semihalf
Sponsored by: Netgate
Reviewed by: loos
Differential revision: https://reviews.freebsd.org/D10783
Before the fix for single interrupt, both percpu and non-percpu routes
were enabled/disable at the same time.
Submitted by: Marcin Wojtas <mw@semihalf.com
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Reviewed by: loos
Differential revision: https://reviews.freebsd.org/D10716
e6000sw family automatically reflects PHY status in each port's registers.
Therefore it is not necessary to do a full PHY polling squence, which
results in much quicker operation and much less significant usage of
the SMI bus.
Care must be taken that the resulting ifmedia_active is identical to
what the PHY will compute, or gratuitous link status changes will
occur whenever the PHYs update function is called.
This patch implements above improvement. On the occasion set a pointer to
the proc structure to be part of software context instead of being
a global variable.
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: loos
Differential revision: https://reviews.freebsd.org/D10714
Make sure the RX ring lock is only released when the state of the ring is
consistent, or else concurrent calls to xn_rxeof might get an inconsistent ring
state and thus some packets might be processed twice.
Note that this is not very common, and could only happen when an interrupt is
delivered while in xn_ifinit.
Reported by: cperciva
Tested by: cperciva
MFC after: 1 week
Sponsored by: Citrix Systems R&D
For all Marvell devices, MBUS windows configuration is done
in a common place. Only CESA was an exception, so move its
related code from driver to mv_common.c. This way it uses
same proper DRAM information, same as all other interfaces
instead of parsing DT /memory node directly.
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: loos
Differential revision: https://reviews.freebsd.org/D10723
Hitherto implementation of PHY polling resulted in a risk of an
endless loop and very high occupation of the SMI bus. Improve the
operation by limiting the polling tries and adding sleepable
pause.
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: loos
Differential revision: https://reviews.freebsd.org/D10713
Call disk_gone when the backend switches to the "Closing" state and blkfront
still has pending users. This allows the disk to be detached, and will call
into xbd_closing by itself when the geom layout cleanup has finished.
Reported by: bapt
Tested by: manu
Reviewed by: bapt
Sponsored by: Citrix Systems R&D
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D10772
defined. On machines without arithmetic shift instructions, zero bits
may be shifted in from the left, giving a large positive result instead
of the desired divide-by power-of-2. Fix this by operating on the
absolute value and compensating for the possible negation later.
Reverse the order of the underflow/overflow tests and the exponential
decay calculation to avoid the possibility of an erroneous overflow
detection if p is a sufficiently small non-negative value. Also
check for negative values of prob before doing the exponential decay
to avoid another instance of of right shifting a negative value.
Tested by: Rasool Al-Saadi <ralsaadi@swin.edu.au>
MFC after: 1 week
kern_yield(0) effectively causes the calling thread to be rescheduled
immediately since it resets the thread's priority to the highest possible
value. This can cause livelocks when the pattern
"while (!trylock()) kern_yield(0);" is used since the thread holding the
lock may linger on the runqueue for the CPU on which the looping thread is
running.
MFC after: 1 week
This will help Jenkins dedupe 9 warnings between the static build and
the module build of ipsec(4).
Missed in SRCTOP conversion in r314651.
MFC with: r314651
Sponsored by: Dell EMC Isilon