Part of the nvme recovery process for errors is to reset the
card. Sometimes, this results in failing the entire controller. When nda
is in use, we free the sim, which will sleep until all the I/O has
completed. However, with only one thread, the request fail task never
runs once the reset thread sleeps here. Create two threads to allow I/O
to fail until it's all processed and the reset task can proceed.
This is a temporary kludge until I can work out questions that arose
during the review, not least is what was the race that queueing to a
failure task solved. The original commit is vague and other error paths
in the same context do a direct failure. I'll investigate that more
completely before committing changing that to a direct failure. mav@
raised this issue during the review, but didn't otherwise object.
Multiple threads, though, solve the problem in the mean time until other
such means can be perfected.
Reviewed by: jhb@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30366
If an iSCSI connection is shutdown abruptly (e.g. by a RST from the
peer), pending iSCSI PDUs and page pod work requests can be in the
ulp_pduq when the final CPL is received indicating the death of the
connection.
Reported by: Jithesh Arakkan @ Chelsio
In 4427ac3675, the TOM driver stopped sending work requests to
program iSCSI page pods directly and instead queued them to be written
asynchronously with iSCSI PDUs. The queue of mbufs to send is
protected by the inp lock. However, the inp cannot be safely obtained
from the toep since a RST from the remote peer might have cleared
toep->inp asynchronously in an ithread. To fix, obtain the inp from
the socket as is already done in icl_cxgbei_conn_pdu_queue_cb() and
fail the new transfer setup with ECONNRESET if the connection has been
reset.
To avoid passing sockets or inps into the page pod routines, pull the
mbufq out of the two relevant page pod routines such that the routines
queue new work request mbufs to a caller-supplied mbufq.
Reported by: Jithesh Arakkan @ Chelsio
Fixes: 4427ac3675
T6 makes several changes relative to T5 for receive of iSCSI PDUs.
First, earlier adapters issue either 2 or 3 messages to the host for
each PDU received: CPL_ISCSI_HDR contains the BHS of the PDU,
CPL_ISCSI_DATA (when DDP is not used for zero-copy receive) contains
the PDU data as buffers on the freelist, and CPL_RX_ISCSI_DDP with
status of the PDU such as result of CRC checks. In T6, a new
CPL_RX_ISCSI_CMP combines CPL_ISCSI_HDR and CPL_RX_ISCSI_DDP. Data
PDUs which are directly placed via DDP only report a single
CPL_RX_ISCSI_CMP message. Data PDUs received on the free lists are
reported as CPL_ISCSI_DATA followed by CPL_RX_ISCSI_CMP. Control PDUs
such as R2T are still reported via CPL_ISCSI_HDR and CPL_RX_ISCSI_DDP.
Supporting this requires changing the CPL_ISCSI_DATA handler to
allocate a PDU structure if it is not preceded by a CPL_ISCSI_HDR as
well as support for the new CPL_RX_ISCSI_CMP.
Second, when using DDP for zero-copy receive, T6 will only issue a
CPL_RX_ISCSI_CMP after a burst of PDUs have been received (indicated
by the F flag in the BHS). In this case, the CPL_RX_ISCSI_CMP can
reflect the completion of multiple PDUs and the BHS and TCP sequence
number included in the message are from the last PDU received in the
burst. Notably, the message does not include any information about
earlier PDUs received as part of the burst. Instead, the driver must
track the amount of data already received for a given transfer and use
this to compute the amount of data received in a burst. In addition,
the iSCSI layer currently has no way to permit receiving a logical PDU
which spans multiple PDUs. Instead, the driver presents each burst as
a single, "large" PDU to the iSCSI target and initiators. This is
done by rewriting the buffer offset and data length fields in the BHS
of the final PDU as well as rewriting the DataSN so that the received
PDUs appear to be in order.
To track all this, cxgbei maintains a hash table of 'cxgbei_cmp'
structures indexed by transfer tags for each offloaded iSCSI
connection. When a SCSI_DATA_IN message is received, the ITT from the
received BHS is used to find the necessary state in the hash table,
whereas SCSI_DATA_OUT replies use the TTT as the key. The structure
tracks the expected starting offset and DataSN of the next burst as
well as the rewritten DataSN value used for the previously received
PDU.
Discussed with: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D30458
It includes:
1)Newly added TMF feature.
2)Added newly Huawei & Inspur PCI ID's
3)Fixed smartpqi driver hangs in Z-Pool while running on FreeBSD12.1
4)Fixed flooding dmesg in kernel while the controller is offline during in ioctls.
5)Avoided unnecessary host memory allocation for rcb sg buffers.
6)Fixed race conditions while accessing internal rcb structure.
7)Fixed where Logical volumes exposing two different names to the OS it's due to the system memory is overwritten with DMA stale data.
8)Fixed dynamically unloading a smartpqi driver.
9)Added device_shutdown callback instead of deprecated shutdown_final kernel event in smartpqi driver.
10)Fixed where Os is crashed during physical drive hot removal during heavy IO.
11)Fixed OS crash during controller lockup/offline during heavy IO.
12)Fixed coverity issues in smartpqi driver
13)Fixed system crash while creating and deleting logical volume in a continuous loop.
14)Fixed where the volume size is not exposing to OS when it expands.
15)Added HC3 pci id's.
Reviewed by: Scott Benesh (microsemi), Murthy Bhat (microsemi), imp
Differential Revision: https://reviews.freebsd.org/D30182
Sponsored by: Netflix
The second set of USB transfer is requested by hkbd(4) and
should improve HID keyboard handling in kdb and panic contexts.
MFC after: 1 week
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D30486
Which happens when USB transfer setup is failed.
MFC after: 1 week
PR: 254974
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D30485
The generated C output for aicasm_scan.l defines yylineno already, so
references to it from other files should use an extern declaration.
The STAILQ_HEAD use in aicasm_symbol.h also provided an identifier,
causing it to both define the struct type and define a variable of that
struct type, causing any C file including the header to define the same
variable. This variable is not used (and confusingly clashes with a
field name just below) and was likely caused by confusion when switching
between defining fields using similar type macros and defining the type
itself.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30525
Freed ctx is used in the later callee ocs_hw_command(),
which is a use after free bug.
Return error if sli_cmd_common_nop() failed.
PR: 255865
Reported by: lylgood@foxmail.com
Approved by:: markj
The LinuxKPI net_device actually is an ifnet; in order to further
clean that up so we can extend "net_device" replace the few macros
inline in mlx4.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30476
CTRL and OFLD tx queues do not have automatic tx credit flush enabled so
it is okay for the cidx not to be the same as the pidx when the queue is
destroyed.
Reported by: Jithesh Arakkan @ Chelsio
MFC after: 1 week
Sponsored by: Chelsio Communications
Otherwise the resouce buffer may have been freed when
AcpiSetCurrentResources() is called, leading to a use-after-free.
PR: 255862
Submitted by: Lv Yunlong <lylgood@foxmail.com> (original version)
MFC after: 1 week
m_pullup() frees the input mbuf chain upon a failure. Set *mpp to NULL
in this case to ensure that the caller does not free the chain again.
PR: 224928
Submitted by: Lv Yunlong <lylgood@foxmail.com> (original version)
MFC after: 1 week
This removes all unused bits from linux/netdevice.h and migrates two
inline functions into the mlx4 and ofed code respectively.
This gets the mlx4/ofed (struct ifnet) specific bits down to 7 lines
in netdevice.h.
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
Reviewed by: hselasky, kib
Differential Revision: https://reviews.freebsd.org/D30461
This is intended for use in KTLS transmit where each TLS record is
described by a single mbuf that is itself queued in the socket buffer.
Using the existing CRYPTO_BUF_MBUF would result in
bus_dmamap_load_crp() walking additional mbufs in the socket buffer
that are not relevant, but generating a S/G list that potentially
exceeds the limit of the tag (while also wasting CPU cycles).
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30136
- Process the list of local IPs once instead of once per adapter. Add
addresses from all VNETs to the driver's list but leave hardware
updates for later when the global VNET/IFADDR list locks have been
released.
- Add address to the hardware table synchronously when a CLIP entry is
requested for an address that's not already in there.
- Provide ioctls that allow userspace tools to manage addresses in the
CLIP table.
- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
automatically added to the CLIP table or not.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
This fixes inability to start USB xfers in a case when FIFO has been
already open()-ed but no read() or poll() calls has been issued yet.
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30343
This fixes lose of evdev events after moused has been killed.
While here use bitwise operations for UMS_EVDEV_OPENED flag.
Reviewed by: hselasky
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30342
With this patch:
% dmesg | grep -i uart
uart2: <Intel Gemini Lake SIO/LPSS UART 0> mem 0xa1426000-0xa1426fff,0xa1425000-0xa1425fff irq 4 at device 24.0 on pci0
uart3: <Intel Gemini Lake SIO/LPSS UART 1> mem 0xa1424000-0xa1424fff,0xa1423000-0xa1423fff irq 5 at device 24.1 on pci0
uart4: <Intel Gemini Lake SIO/LPSS UART 2> mem 0xfea10000-0xfea10fff irq 6 at device 24.2 on pci0
uart5: <Intel Gemini Lake SIO/LPSS UART 3> mem 0xa1422000-0xa1422fff,0xa1421000-0xa1421fff irq 7 at device 24.3 on pci0
PR: 256101
Submitted by: Daniel Ponte <amigan@gmail.com>
MFC after: 1 week
It turns out that, silly adrian, setting it to 64 means only two
AMPDU frames of 32 subframes each. Thus, whilst those are in-flight,
any subsequent queues frames to that node get dropped.
This ends up being pretty no bueno for performance if any receive
is also going on at that point.
Instead, set it to 128 for the time being to ensure that SOME
frames get queued in the meantime. This results in some frames
being immediately available in the software queue for transmit
when the two existing A-MPDU frames have been completely sent,
rather than the queue remaining empty until at least one is sent.
It's not the best solution - I still think I'm scheduling receive
far more often than giving time to schedule transmit work -
but at least now I'm not starving the transmit side.
Before this, a bidirectional iperf would show receive at ~ 150mbit/sec.
but the transmit side at like 10kbit/sec. With it set to 128 it's
now 150mbit/sec receive, and ~ 10mbit receive. It's better than 10kbit/sec,
but still not as far as I'd like it to be.
Tested:
* AR9380/QCA934x (TL-WDR4300 AP), Macbook pro test STA + AR9380 test STA
I've been using STA+AP modes at home for a couple years now
and I've been finding and fixing a lot of weird corner cases.
This is the eventual patchset I've landed on.
* Don't force beacon resync in STA mode if we're using sw beacon tracking.
This stops a variety of stomping issues when the STA VAP is reconfigured;
the AP hardware beacons were being stomped on!
* Use the first AP VAP to configure beacons on, rather than the first VAP.
This prevents weird behaviour in ath_beacon_config() when the hardware
is being reconfigured and the STA VAP was the first one created.
* Ensure the beacon interval / timing programming is within the AR9300
HAL bounds by masking off any flags that may have been there before
shifting the value up to 1/8 TUs rather than the 1 TU resolution the
previous chips used.
Now I don't get weird beacon reprogramming during startup, STA state
changes and hardware recovery which showed up as HI-LARIOUS beacon
configurations and STAs that would just disconnect from the AP very
frequently.
Tested:
* AR9344/AR9380, STA and AP and STA+AP modes
If a regulator hasn't been enable by a driver but is enabled in hardware
(most likely enabled by U-Boot), regulator_status will returns that it
is enabled and so any call to regulator_disable will panic as it wasn't
enabled by one of our drivers.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30293
This allow us to powerup/down the card and enabling/disabling the
regulators if any.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30292
This helper can be used to enable/disable the regulator and starting
the power sequence of sd/sdio/eMMC cards.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30291
If a sd/emmc node have a pwrseq property parse it and get the corresponding
driver.
This can later be used to powerup/powerdown the SDIO card or eMMC.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30289
This driver is used to power up sdio card or eMMC.
It handle the reset-gpio, clocks and needed delays for powerup/powerdown.
Sponsored by: Diablotin Systems
Differential Revision: https://reviews.freebsd.org/D30288
For the discovery phase of SD/eMMC we need to do some transaction in a async
way.
The classic CAM XPT_{GET,SET}_TRAN_SETTING cannot be used in a async way.
This also allow us to split the discovery phase into a more complete state
machine and we don't mtx_sleep with a random number to wait for completion
of the tasks.
For mmc_sim we now do the SET_TRAN_SETTING in a taskqueue so we can call
the needed function for regulators/clocks without the cam lock(s). This part is
still needed to be done for sdhci.
We also now save the host OCR in the discovery phase as it wasn't done before and
only worked because the same ccb was reused.
Reviewed by: imp, kibab, bz
Differential Revision: https://reviews.freebsd.org/D30038
This fixes a few bugs in iSCSI backends where the backends were using
the limits they advertised initially during the login phase as the
final values instead of the values negotiated with the other end.
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D30271
It's a class0 driver that implements some pcib methods and creates
a pci bus as its children.
The "ofw_pci" name will be used by a new driver that will be a subclass
of the pci bus.
No functional changes intended.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30226
There is no need to preform any voltage reconfiguration
in case the vccq regulator is not physically attached to the
slot.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30355
These were seemingly copied over from icl_soft.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D30268
I botched a few of the changes when rebasing the changes in
4b6ed0758d across the changes in
43bbae1948.
- Move the counter allocations into alloc_ofld_rxq().
- Free the counters freeing an ofld rxq.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D30267
The new Mikrotik 10/25G NIC is mostly compatible with AR8151 hardware,
with few exceptions:
* card supports only 32bit DMA operations
* card does not support write-one-to-clear semantics for interrupt status
register
* MDIO operations can take longer to complete
This patch adds support for Mikrotik 10/25G NIC to the alc driver
while maintaining support for all earlier HW.
The patch was tested with FreeBSD main branch as of commit
f4b38c360e
This was tested on Intel i7-4790K system with Mikrotik 10/25G NIC.
This was tested on Intel i7-4790K system with RB44Ge (AR8151 based 4-port NIC)
to verify backwards compatibility.
PR: 256000
Submitted by: Gatis Peisenieks <gatis@mikrotik.com>
MFC after: 1 week
behind USB HUBs are detected and the USB reset counter logic will kick in
preventing enumeration of continuously failing ports.
Submitted by: phk@
Tested by: bz@
PR: 237666
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
says it should be max 10 milliseconds.
This may fix some USB enumeration issues:
> usbd_req_re_enumerate: addr=3, set address failed! (USB_ERR_IOERROR, ignored)
> usbd_setup_device_desc: getting device descriptor at addr 3 failed,
Found by: Zhichao1.Li@dell.com
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
0 milliseconds and 2 seconds inclusivly. Some style fixes while at it.
The USB specification has minimum values and maximum values,
and not only minimum values.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
qoriq_gpio_pin_setflags() locks the device mutex, as does
qoriq_gpio_map_gpios(), causing a recursion on non-recursive lock. This
was missed during testing for 16e549ebe.
PVHv1 was officially removed from Xen in 4.9, so just axe the related
code from FreeBSD.
Note FreeBSD supports PVHv2, which is the replacement for PVHv1.
Sponsored by: Citrix Systems R&D
Reviewed by: kib, Elliott Mitchell
Differential Revision: https://reviews.freebsd.org/D30228
The CTL frontend might have provided a buffer that is smaller than the
FirstBurstLength and thus smaller than the amount of unsolicited data
included in the request PDU. Treat these transfers as an empty
transfer.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29940
A single union ctl_io can be reused across multiple transfers (in
particular by the ramdisk backend). On a reuse, the reservation
pointer would retain its value from the previous transfer tripping an
assertion.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29939
- Switch to allocating the cxgbei version of icl_pdu explicitly
as a separate refcounted object allocated via malloc/free
instead of storing it in the bhs mbuf prior to the bhs.
- Support the icl_conn_pdu_queue_cb() method to set a callback
on a PDU to be invoked when the PDU is freed.
- For ICL_NOCOPY buffers, use an external mbuf to manage the
storage for the buffer via m_extaddref(). Each external mbuf
holds a reference on the associated PDU, so the callback is
invoked once all of the external mbufs have been freed.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29910
- Only allocate 16K jumbo mbufs if the region of data to be
appended is sufficiently large, and use a loop.
- Use m_getm2() to allocate a chain for data less than 16K, or
if m_getjcl() fails.
- Use ENOMEM as the return value instead of '1' if the hook fails due
to a memory allocation error.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29909
A CAM target layer I/O CCB can use a S/G list of virtual address ranges
to describe its data buffer. This change adds zero-copy receive support
for such requests.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29908
As a result, CPL_FW4_ACK now returns credits for these work requests.
To support this, page pod work requests are now constructed in special
mbufs similar to "raw" mbufs used for NIC TLS in plain TX queues.
These special mbufs are stored in the ulp_pduq and dispatched in order
with PDU work requests.
Sponsored by: Chelsio Communications
Discussed with: np
Differential Revision: https://reviews.freebsd.org/D29904
We shouldn't overwrite capability register. Instead, voltages supported
by the controller have to be read from dts, as the hardware doesn't
report correct values.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30123
Add a missing call to mmc_fdt_parse, without it some dts properties
are not parsed.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30122
Add data specific for SoC, including all necessary quirks.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30121
The hotplug script will be executed only once for each backend,
regardless of the frontend triggering reconnections. Fix blkback to
deal with the hotplug script being executed only once, so that
reconnections don't stall waiting for a hotplug script execution
that will never happen.
As a result of the fix move the initialization of dev_mode, dev_type
and dev_name to the watch callback, as they should be set only once
the first time the backend connects.
This fix is specially relevant for guests wanting to use UEFI OVMF
firmware, because OVMF will use Xen PV block devices and disconnect
afterwards, thus allowing them to be used by the guest OS. Without
this change the guest OS will stall waiting for the block backed to
attach.
Fixes: de0bad0001 ('blkback: add support for hotplug scripts')
MFC after: 1 week
Sponsored by: Citrix Systems R&D
On rk3399 the VOP-little node has a single 'port' property (not a
collection of 'ports' or indexed ports).
Reviewed by: manu
Sponsored by: UKRI
Differential Revision: https://reviews.freebsd.org/D30165
I saw a situation where the driver set CAM_AUTOSNS_VALID on a failed ccb
even though SRB_STATUS_AUTOSENSE_VALID was not set in the status.
The actual sense data remained all zeros.
The problem seems to be that create_storvsc_request() always sets
hv_storvsc_request::sense_info_len, so checking for sense_info_len != 0
is not enough to determine if any auto-sense data is actually available.
Reviewed by: whu, imp
MFC after: 2 weeks
Sponsored by: CyberSecure
Differential Revision: https://reviews.freebsd.org/D30124
Virtio modern has the common data organized in little endian, but
on powerpc64 BE it was reading and writing in the wrong endian.
Submitted by: Leonardo Bianconi <leonardo.bianconi@eldorado.org.br>
Reviewed by: bryanv, alfredo
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28947
Only LS1046A and LS1028A require the base clk to be divided by 2.
Implement that by moving the divider to a SoC specific data.
This commit fixes base clk setup for the entire SoC family,
including the already suported LS2160A.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30120
Attaching and detaching devices can be heavy-weight and detaching can
sleep waiting for events. For that reason using the system-wide
single-threaded taskqueue_thread is not really appropriate.
There is even a possibility for a deadlock if taskqueue_thread is used
for detaching.
In fact, there is an easy to reproduce deadlock involving nvme, pass
and a sudden removal of an NVMe device.
A pass peripheral would not release a reference on an nvme sim until
pass_shutdown_kqueue() is executed via taskqueue_thread. But the
taskqueue's thread is blocked in nvme_detach() -> ... -> cam_sim_free()
because of the outstanding reference.
MFC after: 10 days
Sponsored by: CyberSecure
Reviewed by: mav, imp
Differential Revision: https://reviews.freebsd.org/D30144
DRIVER_OK status is set after device_attach() succeeds. For now postpone
disk_create to attach_completed() method.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Approved by: lwhsu (mentor)
Differential Revision: https://reviews.freebsd.org/D30049
The _ext event notification includes the address being added/removed and
that gives the driver an easy way to ignore non-IPv6 addresses. Remove
'tom' from the handler's name while here, it was moved out of t4_tom a
long time ago.
MFC after: 1 week
Sponsored by: Chelsio Communications
AIM (adaptive interrupt moderation) was part of BSD11 driver. Upon IFLIB
migration, AIM feature got lost. Re-introducing AIM back into IFLIB
based IXGBE driver.
One caveat is that in BSD11 driver, a queue comprises both Rx and Tx
ring. Starting from BSD12, Rx and Tx have their own queues and rings.
Also, IRQ is now only configured for Rx side. So, when AIM is
re-enabled, we should now consider only Rx stats for configuring EITR
register in contrast to BSD11 where Rx and Tx stats were considered to
manipulate EITR register.
Reviewed by: gallatin, markj
Sponsored by: NetApp, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D27344
Several protocol methods take a sockaddr as input. In some cases the
sockaddr lengths were not being validated, or were validated after some
out-of-bounds accesses could occur. Add requisite checking to various
protocol entry points, and convert some existing checks to assertions
where appropriate.
Reported by: syzkaller+KASAN
Reviewed by: tuexen, melifaro
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29519
Add PCI IDs for Intel Apollo Lake Series HSUARTs:
# pciconf -ll
drv selector class rev hdr vendor device subven subdev
uart0@pci0:0:24:0: 118000 0b 00 8086 5abc 8086 7270
uart1@pci0:0:24:1: 118000 0b 00 8086 5abe 8086 7270
uart2@pci0:0:24:2: 118000 0b 00 8086 5ac0 8086 7270
uart3@pci0:0:24:3: 118000 0b 00 8086 5aee 8086 7270
NB (Intel Document Number 336256-004US):
1. The E3900 and A3900 Series Processors support four LPSS_UART ports,
while the N- and J- Series Processors support only LPSS_UART [2:1]
ports.
2. The LPSS_UART1 port is dedicated for discrete Global Navigation
Satellite System (GNSS). This port can be used for generic UART
functionality if GNSS is not used.
3. The LPSS_UART2 port is dedicated for host OS debug.
4. The LPSS_UART0 and LPSS_UART3 ports are for generic UART functionality.
5. Only UART [1:0] ports support DMA.
PR: 255556
Submitted by: Jose Luis Duran <jlduran@gmail.com>
MFC after: 1 week
It is defined as a uint64_t in the UEFI spec. As it's not used as a
pointer by the kernel follow this and define it as the same in the
kernel.
Reviewed by: kib, manu, imp
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D29759
The original implementation only supports getting the address from legacy
BIOS (by searching for the SMBIOS_SIG pattern in a fixed address space).
Try to get the SMBIOS table from EFI through efirt (EFI Runtime Services)
firstly. Continue to search in the legacy BIOS if a NULL address is
returned from EFI.
By this way the ipmi function supports both legacy BIOS and UEFI systems.
Reviewed by: dab, vangyzen
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D30007
There is no need to panic in if_transmit if the checksums requested are
inconsistent with the frame being transmitted. This typically indicates
that the kernel and driver were built with different INET/INET6 options,
or there is some other kernel bug. The driver should just throw away
the requests that it doesn't understand and move on.
MFC after: 1 week
Sponsored by: Chelsio Communications
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts.
Approved by: erj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29876
Add suspend/resume callbacks to the driver and a live reset built around
them. This commit covers the basic NIC and future commits will expand
this functionality to other stateful parts of the chip. Suspend and
resume operate on the chip (the t?nex nexus device) and affect all its
ports. It is not possible to suspend/resume or reset individual ports.
All these operations can be performed on a running NIC. A reset will
look like a link bounce to the networking stack.
Here are some ways to exercise this functionality:
/* Manual suspend and resume. */
# devctl suspend t6nex0
# devctl resume t6nex0
/* Manual reset. */
# devctl reset t6nex0
/* Manual reset with driver sysctl. */
# sysctl dev.t6nex.0.reset=1
/* Automatic adapter reset on any fatal error. */
# hw.cxgbe.reset_on_fatal_err=1
Suspend disables the adapter (DMA, interrupts, and the port PHYs) and
marks the hardware as unavailable to the driver. All ifnets associated
with the adapter are still visible to the kernel but operations that
require hardware interaction will fail with ENXIO. All ifnets report
link-down while the adapter is suspended.
Resume will reattach to the card, reconfigure it as before, and recreate
the queues servicing the existing ifnets. The ifnets are able to send
and receive traffic as soon as the link comes back up.
Reset is roughly the same as a suspend and a resume with at least one of
these events in between: D0->D3Hot->D0, FLR, PCIe link retrain.
MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications
* Fix 82574 Link Status Changes, carrying the OTHER mask bit around as
needed.
* Move igb-class LSC re-arming out of FAST back into the handler.
* Clarify spurious/other interrupt re-arms in FAST.
In MSI-X mode, 82574 and igb-class devices use an interrupt filter to
handle Link Status Changes. We want to do LSC re-arms in the handler
to take advantage of autoclear (EIAC) single shot behavior.
82574 uses 'Other' in ICR and IMS for LSC interrupt types when in MSI-X
mode, so we need to set and re-arm the 'Other' bit during attach and
after ICR reads in the FAST handler if not an LSC or after handling on
LSC due to autoclearing.
This work was primarily done to address the referenced PR, but inspired
some clarification and improvement for igb-class devices once the
intentions of previous bug fix attempts became clearer.
PR: 211219
Reported by: Alexey <aserp3@gmail.com>
Tested by: kbowling (I210 lagg), markj (I210)
Approved by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29943
A lot more generic cam related things are done in mmc_sim so this simplify
the driver a lot.
Differential Revision: https://reviews.freebsd.org/D27487
Reviewed by: kibab
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC. This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.
This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):
/* Idempotent */
alloc_foo
{
if (!SW_ALLOCATED)
init_iq/init_eq/init_fl no-fail sw init
alloc_iq_fl/alloc_eq/alloc_wrq may-fail sw alloc
add_foo_sysctls, etc. no-fail post-alloc items
if (!HW_ALLOCATED)
alloc_iq_fl_hwq/alloc_eq_hwq hw resource allocation
}
/* Idempotent */
free_foo
{
if (!HW_ALLOCATED)
free_iq_fl_hwq/free_eq_hwq release hw resources
if (!SW_ALLOCATED)
free_iq_fl/free_eq/free_wrq release sw resources
}
The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent. The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.
MFC after: 1 month
Sponsored by: Chelsio Communications
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts (this driver covers over 20 years
of silicon)
Reviewed by: erj
Approved by: markj
Sponsored by: Pink Floyd - Any Colour You Like (in kind)
Differential Revision: https://reviews.freebsd.org/D29872
Allow new enclosure to replace previously existing one if there is
no completely unused table entry, same as it is done for devices.
If we can not process DPM due to corruption -- wipe it and restart
from scratch. Otherwise I don't see a way to recover persistence if
something go wrong and there is no BIOS to recover it for us.
Together this solves a problem that appeared when 9300-8i firmware
update to 16.00.10.00 somehow switched its mapping mode from Device
Persistence to Enclosure/Slot without wiping the DPM table. It made
HBA completely unusable, since overflowed and conflicting mapping
table was unable to map any of enclosures and so devices.
Also while there make some enclosure mapping errors more informative.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
3e7bae0821 turns the BUS_READ_IVAR() failure from a warning into a
KASSERT. For certain PCI audio devices such like snd_csa(4) and
snd_emu10kx(4), the ac97_create() keeps the device handler generated
by device_add_child(pci_dev, "pcm"), which is not really a PCI device
handler. This in turn causes the subsequent pci_get_subdevice()
inside ac97_initmixer() triggering a panic.
This patch tries to put a bandaid for the aforementioned pcm device
children such that they can use the correct PCI handler(from parent)
to avoid a KASSERT panic in the INVARIANTS kernel.
Tested with: snd_csa(4), snd_ich(4), snd_emu10kx(4)
Reviewed by: imp
MFC after: 1 month
There are two kinds of routines in the driver that read statistics from
the hardware: the cxgbe_* variants read the per-port MPS/MAC registers
and the vi_* variants read the per-VI registers. They can be called
from the 1Hz callout or if_get_counter. All stats collection now takes
place under the callout lock and there is a new flag to indicate that
these routines should not access any hardware register.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
While the PV SCSI SG list can handle 512k of SG entries, it can only do
so for I/O that's aligned to 4k or better. newfs_msdos does unaligned
I/O, so triggers too long for host errors in cam when a 512k I/O is
attempted. Prefer power of 2 256k to the absolute maximum 508k, though
that can be revisited should the latter show to give significant
performance improvement.
MFC After: 3 days
Tested by: darius on discord (508k version of patch)
Sponsored by: Netflix
Since then, the FreeBSD USB stack has got proper USB RNDIS support.
PR: 254345
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
- SNDSTAT_LABEL_* are renamed to SNDST_DSPS_*, and SNDSTAT_LABEL_DSPS
becomes SNDST_DSPS.
- Centralize channel number/rate/formats into a single nvlist
The above nvlist is named "info_play" and "info_rec"
- Expose only encoding format in pfmts/rfmts. Userland has no direct
access to AFMT_ENCODING/CHANNEL/EXTCHANNEL macros, thus it serves no
meaning to expose too much information through this pair of labels.
However pminrate/rminrate, pmaxrate/rmaxrate, pfmts/rfmts are
deprecated and will be removed in future.
This commit keeps ioctls ABI compatibility with __FreeBSD_version
1400006 for now. In future the compat ABI with 1400006 will be removed
once audio/virtual_oss is rebuilt.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29770
iwnstats was not compiling because of some issues raised by the clang
compiler due to -Werror. As a tool it is not connected to world build.
Add missing field "barker_mrc" initialization in struct
iwn_sensitivity_limits for -Wmissing-field-initializers, remove unused
pointer *is on iwn_stats_*_print functions and unused variables for
-Wunused-parameter and -Wunused-variable.
The value for field "barker_mrc" of struct iwn2030_sensitivity_limits
was obtained from linux 3.2 wireless/iwlwifi driver code (iwl-2000.c:115
.barker_corr_th_min_mrc = 390).
Also set BINDIR in Makefile to make it possible to install under
/usr/local/sbin/iwnstats as it require super user.
Reviewed by: adrian
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29800
Add support for current and future client platform PCI IDs. These are
all I219 variants and have no known driver changes versus previous
generation client platform I219 variants.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29801
There are a number of issues in the e1000 multicast filter handling
that have been present for a long time. Take the updated approach from
ixgbe(4) which does not have the issues.
The issues are outlined in the PR, in particular this solves crossing
over and under the hardware's filter limit, not programming the
hardware filter when we are above its limit, disabling SBP (show bad
packets) when the tunable is enabled and exiting promiscuous mode, and
an off-by-one error in the em_copy_maddr function.
PR: 140647
Reported by: jtl
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29789
We don't need to set the bits here since the if/else if/else statements
fully cover setting these bit pairs.
Reported by: markj
Reviewed by: markj, erj
Approved by: #intel_networking
MFC aftter: 1 week
Differential Revision: https://reviews.freebsd.org/D29827
The NAV (network allocation vector) register reflects the current MAC
tracking of NAV - when it will stay quiet before transmitting.
Other devices transmit their frame durations in their 802.11 PHY headers
and all devices that hear a frame - even if it's one in an encoding
they don't understand - will understand the low bitrate PHY header that
includes the frame duration. So, they'll set NAV to this value so
they'll stay quiet until the transmit completes.
Anyway, sometimes the PHY NAV header is garbled and sometimes, notably
older broadcom devices, will fake a long NAV so they can get "cleaner" air
for local calibration. When this happens, the hardware will stay quiet
for quite some time and this can lead to missed/stuck beacons, or
(for Very Large Values) a MAC hang.
This code just adds the ability to get/set the NAV; the driver will
need to take care of using it during transmit hangs and beacon misses
to see if it's due to a trash looking NAV.
We must make sure that incoming packets will never overflow the netmap
buffers, even when the user is using the offset feature. In the typical
scenario, the netmap buffer is 2KiB and, with an MTU of 1500, there are
~500 bytes available for user offsets.
Unfortunately, some NICs accept incoming packets even when they are
larger then the MTU. This means that the only way to stop DMA from
overflowing the netmap buffers, when offsets are allowed, is to choose
a hardware buffer length which is smaller than the netmap buffer
length. For most NICs and for 2KiB netmap buffers, this means 1024
bytes, which is unconveniently small.
The current code will select the small hardware buf size even when
offsets are not in use. The main purpose of this change is to
fix this bug by returning to the normal behavior for the no-offsets
case.
At the same time, the patch pushes the handling of the offset case
to the lower level driver code, so that it can be made NIC-specific
(in future patches).
First, two of those four checks are unreachable.
Second, I don't believe there should be ">=" instead of ">".
Third, bus_dma(9) already returns the same EFBIG if ">".
This fixes false I/O errors in worst S/G cases with maxphys >= 2MB.
MFC after: 1 week
This is the April update to vendor/wpa committed upstream
2021/04/07.
This is MFV efec822389.
Suggested by: philip
Reviewed by: philip
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D29744
Explicitly disable ring synchronization before calling
callbacks that may result in a hardware reset.
Before this patch we relied on capturing the down/up events which,
however, may not be issued by all drivers.
"It looks like it would be less confusing to rename 'count' to
something like 'idx', since that's what it's used for in this
function."
Reviewed by: erj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29798
There is a weird limit of AGTIAPI_MAX_DMA_SEGS (128) S/G segments per
I/O since the initial driver import. I don't know why it was added,
can only guess some hardware limitation, but in worst case it means
maximum I/O size of 508KB. Respect it to be safe, rounding to 256KB.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
It is a direct request for data corruptions, one report of which we
have received. I am very surprised that only one.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Restore 525e07418c after the iflib conversion of igb(4). This
reenables random MAC address generation when attaching to a VF with a
zeroed MAC.
PR: 253535
Reported by: Balaev PA <mail@void.so>
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29785
The boundary differentiating "lem" vs "em" class devices was wrong
after the iflib conversion of lem(4).
The Packet Buffer size for 82547 class chips was not set correctly
after the iflib conversion of lem(4).
These changes restore functionality on an 82547 for the submitter.
PR: 236119
Reported by: Jeff Gibbons <jgibbons@protogate.com>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29766
This is a debugging tunable that shouldn't have retained this setting
after the initial iflib conversion of the driver
PR: 248934
Reported by: Franco Fichtner <franco@opnsense.org>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29768
A doomed VI does not have a valid ifnet.
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: np
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29662
There haven't been any non-obscure drivers that supported this
functionality and it has been impossible to test to ensure that it
still works. The only known consumer of this interface was the engine
in OpenSSL < 1.1. Modern OpenSSL versions do not include support for
this interface as it was not well-documented.
Reviewed by: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29736
iscsid can be sleeping in iscsi_ioctl() causing the destroy_dev() to
sleep forever if iscsi.ko is unloaded while iscsid is running.
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: mav
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29688
Use .o files directly. Replace the .o.uu files that we uudecode with .o files.
Adjust the kernel and module build to cope.
Suggestions by: markj@, emaste@
Sposnored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D29636
uudecode the .o.uu files and commit directly to the tree. Adjust the build
infrastructure to cope with the new location, both for the kernel and modules.
Sposnored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D29635
Store the .o files directly in the tree. We no longer need to play uuencode
games like we did in the CVS days. Adjust the build infrastructure to match.
Reviewed by: markj@
Sposnored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D29634
We no longer need to use uuencode to uuencode files in our tree. Store the .o
file directly instead. Adjust the build to cope with the new arrangement.
Suggestions by: emaste, bz, donner
Reviewed by: markm
Sposnored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D29632
The driver needs to provide a LinuxKPI device structure to register
itself with the IB subsystem. It was erroneously using a copy of its
FreeBSD device structure for this purpose.
Use linux_pci_attach_device() instead, following the example of the
Chelsio iwarp driver. Also ensure that we don't leak the faked device
during detach.
Reviewed by: hselasky
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29595
The mbuf allocated could be a chain and must be freed with m_freem.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29579
That fixes disabled keyboard input after Xorg server has been stopped.
Reviewed by: whu
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D28171
Such type cannot be used in code that is in common between
FreeBSD and Linux. Use the FreeBSD type instead.
MFC after: 3 days
Reported by: markj
Differential Revision: https://reviews.freebsd.org/D29677
similarly to the Linux driver, by a tunable read only sysctl.
Submitted by: Oleg Sidorkin <osidorkin@gmail.com>
PR: 254884
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
The size of the ATU MEM/IO windows is implicitly casted to uint32_t.
Because of that some window sizes were silently demoted to 0 and ignored.
Check the size if its too large, trim it to 4GB and print a warning message.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: mw
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29625
The lower pmc event bits were masked off to find the PMC event ID.
The doesn't work when there are more events. Switch it to use the
offser relative to the first event while also checking the ID is
in the expected range.
Reviewed by: gnn, ray
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D29600
This function will be used for exposing DMI info as sysctls in the
smbios module (in an upcoming review).
While here, add __packed to the structs.
Reviewed by: dab
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29270
On some systems (e.g. Lenovo ThinkPad X240, Apple MacBookPro12,1)
the SMBIOS entry point is not found in the <0xFFFFF space.
Follow the SMBIOS spec and use the EFI Configuration Table for
locating the entry point on EFI systems.
Reviewed by: rpokala, dab
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29276
Commit: f2f1ab39c0 ("pci_user: call bus_translate_resource before BAR mmap")
broke build for 32-bit platforms due to rman_res_t and vm_paddr_t
incompatible types. Fix that.
On some armv8 machines it is possible that the mapping between CPU
and PCI bus BAR base addresses is not 1:1. In case a BAR is allocated
in kernel using bus_alloc_resource_any this translation is handled in
ofw_pci_activate_resource.
Do the same in pci_user.c by calling bus_translate_resource devmethod.
This fixes mmaping BARs to userspace on Marvell SoCs (Armada 7k8k/CN913x)
and possibly many other platforms.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: kib
Obtained from: Semihalf
Sponsored by: Marvell
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29604
Use viewport "2" instead of "0" and change window type from MEM to IO.
Without these changes the MEM ATU window can be overwritten with the IO one.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29516
Add flow_control to hw.ixl tunables tree to let override
initial flow control configuration for all interfaces.
Keep using configuration set by NVM by default.
Reviewed by: erj@, gallatin@
Tested by: gowtham.kumar.ks_intel.com
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D29338
Summary:
They're nearly identical, so don't use two copies. Merge the newer
driver into the older one, and move it to a common location.
Add the Semihalf and associated copyrights in addition to mine, since
it's a non-trivial amount of code merged.
Reviewed By: mw
Differential Revision: https://reviews.freebsd.org/D29520
Usually on boot the MPS is already configured by BIOS. But we've
found that on hot-plug it is not true at least for our Supermicro
X11 boards. As result, mismatch between root's configuration of
256 bytes and device's default of 128 bytes cause problems for some
devices, while others seem to work fine.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Rename the last remaining bits depending on ifnet from linux/netdevice.h
instead of using the compat macros. This helps clearing up
struct netdevice being struct ifnet from linux/netdevice.h.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky, kib
X-D-R: D29366
Differential Revision: https://reviews.freebsd.org/D29497
Under geom(4) nvme_ns_bio_process() is on the path where sleep
is prohibited as g_io_shedule_down() calls THREAD_NO_SLEEPNG()
before geom->start().
Reviewed By: imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29539
There is no change in the source of the stats (t4_get_port_stats or
t4_get_vi_stats) but the per-port callout is gone.
Sponsored by: Chelsio Communications
Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D29527
This fixes a panic due to stale so->so_proto if t4_tom is unloaded and
one or more connections that were previously offloaded are still around
in TIME_WAIT state.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29503
LLVM12 complains if you change the symbol binding:
error: mfs_root_end changed binding to STB_WEAK [-Werror,-Winline-asm]
error: mfs_root changed binding to STB_WEAK [-Werror,-Winline-asm]
The netmap monitor intercepts any TX/RX packets on the monitored
port. However, before this change there was no way to tell
whether an intercepted packet was being transmitted or received
on the monitored port.
A TXMON flag in the netmap slot has been added for this purpose.
This feature enables applications to ask netmap to transmit or
receive packets starting at a user-specified offset from the
beginning of the netmap buffer. This is meant to ease those
packet manipulation operations such as pushing or popping packet
headers, that may be useful to implement software switches,
routers and other packet processors.
To use the feature, drivers (e.g., iflib, vtnet, etc.) must have
explicit support. This change does not add support for any driver,
but introduces the necessary kernel changes. However, offsets support
is already included for VALE ports and pipes.
Use strncmp() instead of bcmp(), so that we don't have to find the
minimum of the string lengths before comparing.
Reviewed by: kib
Reported by: KASAN
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29463
This avoids some atomics by using counter_u64 for TX and relying on
existing single-threading (single ithread per rxq) for RX.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29383
This type mirrors struct sge_ofld_rxq and holds state for TCP offload
transmit queues. Currently it only holds a work queue but will
include additional state in future changes.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29382
Remove unused #includes of LinuxKPI headers noticed while trying to
solve LinuxKPI struct net_device and related functions.
Neither netdevice.h nor inetdevice.h nor notifier.h seem to be needed.
This takes cxgbe(4) out of the picture of D29366.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: np
X-D-R: D29366 (extracted as further cleanup)
Differential Revision: https://reviews.freebsd.org/D29432
Remove unused #includes of a LinuxKPI header noticed while trying to
solve LinuxKPI struct net_device and related functions.
This takes qlnxr out of the picture of D29366.
Sponsored-by: The FreeBSD Foundation
MFC-after: 2 weeks
X-D-R: D29366 (extracted as further cleanup)
Remove linux/inetdevice.h as neither of the two inline functions there
are used here.
Sposored-by: The FreeBSD Foundation
MFC-after: 2 weeks
Reviewed-by: hselasky
X-D-R: D29366 (extracted as further cleanup)
Differential Revision: https://reviews.freebsd.org/D29428
When debugging leaked MSI/MSI-X vectors through LinuxKPI I found
the informational printf unhelpful. Rather than just stating we
leaked also tell how many MSI or MSI-X vectors we leak.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: jhb
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29394
net80211 changed a while back to support per-VAP config for things
rather than it being global. This is to support firmware NICs that
support per-VAP flags and configuration where the firmware will figure
out how to combine them.
However, it introduced a fun timing issue - those changes used to happen
to the shared ic state before newstate() was called, but now they're
also tasks and they can happen /after/.
This isn't a problem for ath(4), but it exposed some interesting
timing and config bugs here. Notably, I saw short slot NOT being
configured in 5GHz mode during some associations, so 5GHz stuff
would hang or behave poorly. Other times the follow-up auth has
the right config, so it didn't hang.
So for now, just flip this over to using the per-VAP flags which
are correct when newstate() is called. net80211 should also have
those flags synch'ed to the global ic state before newstate() runs
and that can come in a subsequent commit.
Whilst here also fix plcp to be consistently logged as a hex value.
Tested:
* iwn(4) Intel 6205, STA mode, both 2GHz and 5GHz
Differential Revision: https://reviews.freebsd.org/D29379
Reviewed by: bz
The hw.cxgbe.kern_tls tunable was used for this in the past and if it
was set then all T6 adapters would be configured for NIC TLS operation
and could not be reconfigured for TOE without a reload. With this
change ifconfig can be used to manipulate toe and txtls caps like any
other caps. hw.cxgbe.kern_tls continues to work as usual but its
effects are not permanent any more.
* Enable nic_ktls_ofld in the default configuration file and use the
firmware instead of direct register manipulation to apply/rollback
NIC TLS configuration. This allows the driver to switch the hardware
between TOE and NIC TLS mode in a safe manner. Note that the
configuration is adapter-wide and not per-port.
* Remove the kern_tls config file as it works with 100G T6 cards only
and leads to firmware crashes with 25G cards. The configurations
included with the driver (with the exception of the FPGA configs) are
supposed to work with all adapters.
Reported by: Veeresh U.K. at Chelsio
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D29291
upper_32_bits() and lower_32_bits() are defined twice in this file.
With the extra conditinal removed on LinuxKPI in 3b1ecc9fa1
they are also included from there already. Use the LinuxKPI version
and remove the two local ones.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: hselasky
MFC-after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29392
We are not aware of any out-of-tree consumers anymore
which would need KPI support for before Linux version 5.
Update the two in-tree consumers to use the new KPI.
This allows us to remove the extra version check and
will also give access to {lower,upper}_32_bits() unconditionally.
Sponsored-by: The FreeBSD Foundation
Reviewed-by: hselasky, rlibby, rstone
MFC-after: 2 weeks
X-MFC: to 13 only
Differential Revision: https://reviews.freebsd.org/D29391
Only attempt to fetch the configuration data and connect the shared
ring once the frontend has switched to the 'Connected' state. This
seems to be inline with what Linux netback does, and is required to
make newer versions of NetBSD netfront work, since NetBSD only
publishes the required configuration before switching to the Connected
state.
MFC after: 1 week
Sponsored by: Citrix Systems R&D
This avoids mixing the use of two different enums which modern C
compilers warn about.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29301
For guests running under some kind of VMMs, configuration structure is
available in memory space but not I/O space.
Reported by: Yuan Rui <number201724@me.com>
MFC after: 2 weeks
Reviewed by: rpokala, bryanv, jhb
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D28818
The MSI-X resource shouldn't be assumed to be always on BAR1.
The Virtio v1.1 Spec did not specify that MSI-X table and PBA BAR has to
be BAR1 either.
Reported by: Yuan Rui <number201724@me.com>
MFC after: 2 weeks
Reviewed by: bryanv, jhb
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D28817
While here, make sure only the PF driver attempts to program the global
RSS key (with options RSS). The VF driver doesn't have access to those
device registers.
MFC after: 1 week
Sponsored by: Chelsio Communications
A repeat call will recreate the memory windows in the hardware and move
them to their last-known positions without repeating any of the software
initialization.
MFC after: 1 week
Sponsored by: Chelsio Communications
The decision whether a TCP packet is sent over IPv4 or IPv6 was
based on ethertype, which works correctly. In D27926 the criteria
was changed to checking if the CSUM_IP_TSO flag is set in the
csum-flags and then considering it to be TCP/IPv4.
However, the TCP stack sets the flag to CSUM_TSO for IPv4 and IPv6,
where CSUM_TSO is defined as CSUM_IP_TSO|CSUM_IP6_TSO.
Therefore TCP/IPv6 packets gets mis-classified as TCP/IPv4,
which breaks TSO for TCP/IPv6.
This patch bases the check again on the ethertype.
This fix will be MFC instantly as discussed with re(gjb).
MFC after: instantly
PR: 254366
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D29331
In some cases like broken hardware nvme(4) may wait minutes for
controller response before timeout. Doing so in a tight spin loop
made whole system unresponsive.
Reviewed by: imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29309
Sponsored by: iXsystems, Inc.
After length decisions, we've decided that the if_wg(4) driver and
related work is not yet ready to live in the tree. This driver has
larger security implications than many, and thus will be held to
more scrutiny than other drivers.
Please also see the related message sent to the freebsd-hackers@
and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on
2021/03/16, with the subject line "Removing WireGuard Support From Base"
for additional context.
These ioctl commands aim to provide easier ways for user space
applications to enumerate existing audio devices and the node they can
potentially use.
The exchange of device lists between user space and kernel is done on
nv(9). Some ioctl commands are added to /dev/sndstat node:
- SNDSTAT_REFRESH_DEVS
- SNDSTAT_GET_DEVS
- SNDSTAT_ADD_USER_DEVS
- SNDSTAT_FLUSH_USER_DEVS
Bump __FreeBSD_version to reflect the addition of the ioctls.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D26884
Definitions inside usr.sbin/bhyve/virtio.h are thrown away.
Definitions in sys/dev/virtio are used instead.
This reduces code duplication.
Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29084
The netmap_ioctl() function has a reference counting bug in case of
NETMAP_REQ_PORT_INFO_GET command. When `hdr->nr_name[0] == '\0'`,
the function does not decrease the refcount of "nmd", which is
increased by netmap_mem_find(), causing a refcount leak.
Reported by: Xiyu Yang <sherllyyang00@gmail.com>
Submitted by: Carl Smith <carl.smith@alliedtelesis.co.nz>
MFC after: 3 days
PR: 254311
This file got resynced with OpenBSD to pick up fixes that had taken
place after the version initially ported to FreeBSD. KASSERT there is
more like MPASS here.
Reported by: David Wolfskill <david@catwhisker.org>
The RSC support feature introduced a bit field "rm_internal" in
struct rndis_pktinfo with total size unchanged.
The guest does not use this field in the tx path. However we need to
initialize it to zero in case older hosts which are not aware of this
field.
Fixes: a491581f ("Hyper-V: hn: Enable vSwitch RSC support")
MFC after: 2 weeks
Sponsored by: Microsoft
This is the culmination of about a week of work from three developers to
fix a number of functional and security issues. This patch consists of
work done by the following folks:
- Jason A. Donenfeld <Jason@zx2c4.com>
- Matt Dunwoodie <ncon@noconroy.net>
- Kyle Evans <kevans@FreeBSD.org>
Notable changes include:
- Packets are now correctly staged for processing once the handshake has
completed, resulting in less packet loss in the interim.
- Various race conditions have been resolved, particularly w.r.t. socket
and packet lifetime (panics)
- Various tests have been added to assure correct functionality and
tooling conformance
- Many security issues have been addressed
- if_wg now maintains jail-friendly semantics: sockets are created in
the interface's home vnet so that it can act as the sole network
connection for a jail
- if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0
- if_wg now exports via ioctl a format that is future proof and
complete. It is additionally supported by the upstream
wireguard-tools (which we plan to merge in to base soon)
- if_wg now conforms to the WireGuard protocol and is more closely
aligned with security auditing guidelines
Note that the driver has been rebased away from using iflib. iflib
poses a number of challenges for a cloned device trying to operate in a
vnet that are non-trivial to solve and adds complexity to the
implementation for little gain.
The crypto implementation that was previously added to the tree was a
super complex integration of what previously appeared in an old out of
tree Linux module, which has been reduced to crypto.c containing simple
boring reference implementations. This is part of a near-to-mid term
goal to work with FreeBSD kernel crypto folks and take advantage of or
improve accelerated crypto already offered elsewhere.
There's additional test suite effort underway out-of-tree taking
advantage of the aforementioned jail-friendly semantics to test a number
of real-world topologies, based on netns.sh.
Also note that this is still a work in progress; work going further will
be much smaller in nature.
MFC after: 1 month (maybe)
TSFOOR happens if a beacon with a given TSF isn't received within the
programmed/expected TSF value, plus/minus a fudge range. (OOR == out of range.)
If this happens then it could be because the baseband/mac is stuck, or
the baseband is deaf. So, do a cold reset and resync the beacon to
try and unstick the hardware.
It also happens when a bad AP decides to err, slew its TSF because they
themselves are resetting and they don't preserve the TSF "well."
This has fixed a bunch of weird corner cases on my 2GHz AP radio upstairs
here where it occasionally goes deaf due to how much 2GHz noise is up
here (and ANI gets a little sideways) and this unsticks the station
VAP.
For AP modes a hung baseband/mac usually ends up as a stuck beacon
and those have been addressed for a long time by just resetting the
hardware. But similar hangs in station mode didn't have a similar
recovery mechanism.
Tested:
* AR9380, STA mode, 2GHz/5GHz
* AR9580, STA mode, 5GHz
* QCA9344 SoC w/ on-board wifi (TL-WDR4300/3600 devices); 2GHz
STA mode
Right now ts_antenna is either 0 or 1 in each supported HAL so
this is purely a sanity check.
Later on if I ever get magical free time I may add some extensions
for the NICs that can have slightly more complicated antenna switches
for transmit and I'd like this to not bust memory.
Completions for crypto requests on port 1 can sometimes return a stale
cookie value due to a firmware bug. Disable requests on port 1 by
default on affected firmware.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26581
These fixes are only relevant for requests on the second port. In
some cases, the crypto completion data, completion message, and
receive descriptor could be written in the wrong order.
- Add a separate rx_channel_id that is a copy of the port's rx_c_chan
and use it when an RX channel ID is required in crypto requests
instead of using the tx_channel_id.
- Set the correct rx_channel_id in the CPL_RX_PHYS_ADDR used to write
the crypto result.
- Set the FID to the first rx queue ID on the adapter rather than the
queue ID of the first rx queue for the port.
- While here, use tx_chan to set the tx_channel_id though this is
identical to the previous value.
Reviewed by: np
Reported by: Chelsio QA
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29175
Receive Segment Coalescing (RSC) in the vSwitch is a feature available in
Windows Server 2019 hosts and later. It reduces the per packet processing
overhead by coalescing multiple TCP segments when possible. This happens
mostly when TCP traffics are among different guests on same host.
This patch adds netvsc driver support for this feature.
The patch also updates NVS version to 6.1 as needed for RSC
enablement.
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D29075
nvme drives are configured early in boot. However, a number of the configuration
steps takes which take a while, so we defer those to a config intrhook that runs
before the root filesystem is mounted. At the same time, the PCI hot plug wakes
up and tests the status of the card. It may decide that the card has gone away
and deletes the child. As part of that process nvme_detach is called. If this
call happens after the config_intrhook starts to run, but before it is finished,
there's a race where we can tear down the device's soft state while the
config_intrhook is still using it. Use the new config_intrhook_drain to
disestablish the hook. Either it will be removed w/o running, or the routine
will wait for it to finish. This closes the race and allows safe hotplug at any
time, even very early in boot.
Sponsored by: Netflix, Inc
Reviewed by: jhb, mav
Differential Revision: https://reviews.freebsd.org/D29006
The pwm utility cant set the only flag defined (PWM_POLARITY_INVERTED) so this
patch add the option -I (capital letter i) to send it to the drivers.
None of existing PWM driver have implemented support for flags.
But soon:ish I will put up an review of a pwm driver using TI OMAP DMTimer.
Differential Revision: https://reviews.freebsd.org/D29137
MFC after: 2 weeks
Fix the types of period and duty in share/man/man9/pwmbus.9 to match the one in sys/dev/pmw/pwmbus.c.
Reviewed By: rpokala
Differential Revision: https://reviews.freebsd.org/D29139
MFC after: 3 days
We don't need the result before next sleep time, so no reason to
additionally increase interrupt latency.
While there, remove extra PM ticks to microseconds conversion, making
C2/C3 sleep times look 4 times smaller than really. The conversion
is already done by AcpiGetTimerDuration(). Now I see reported sleep
times up to 0.5s, just as expected for planned 2 wakeups per second.
MFC after: 1 month
It has been observed that some systems are often unable to resume from
ddb after entering with debug.kdb.enter=1. Checking the status further
shows the terminal is blocked waiting in tty_drain(), but it never makes
progress in clearing the output queue, because sc->sc_txbusy is high.
I noticed that when entering polling mode for the debugger, IER_TXRDY is
set in the failure case. Since this bit is never tracked by the softc,
it will not be restored by ns8250_bus_ungrab(). This creates a race in
which a TX interrupt can be lost, creating the hang described above.
Ensuring that this bit is restored is enough to prevent this, and resume
from ddb as expected.
The solution is to track this bit in the sc->ier field, for the same
lifetime that TX interrupts are enabled.
PR: 223917, 240122
Reviewed by: imp, manu
Tested by: bz
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29130
The names are self-explanatory; these are currently only used by the
wg(8) tool, but they are handy data points to have.
Reviewed by: grehan
MFC after: 3 days
Discussed with: decke
Differential Revision: https://reviews.freebsd.org/D29143
We have no use for the udphdr or this hlen local, just spell out the
addition inline.
MFC after: 3 days
Reviewed by: grehan, markj
Differential Revision: https://reviews.freebsd.org/D29142
Some framebuffer properties obtained from the device tree were not being
properly converted to host endian.
Replace OF_getprop calls by OF_getencprop where needed to fix this.
This fixes boot on PowerPC64 LE, when using ofwfb as the system console.
Reviewed by: bdragon
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27475
The kernel-side already accepted a persistent-keepalive-interval, so
just add a verb to ifconfig(8) for it and start exporting it so that
ifconfig(8) can view it.
PR: 253790
MFC after: 3 days
Discussed with: decke
No sleeping allowed here, so avoid it. Collect the subset of data we
want inside of the epoch, as we'll need extra allocations when we add
items to the nvlist.
Reviewed by: grehan (earlier version), markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29124
This partially reverts df55485085 but still fixes the leak. It was
overlooked (sigh) that some packets will exceed MHLEN and cannot be
physically contiguous without clustering, but we don't actually need
it to be. m_defrag() should pull up enough for any of the headers that
we do need to be accessible.
Fixes: df55485085
Pointy hat; kevans
When we enter C2+ state via memory read, it may take chipset some
time to stop CPU. Extra register read covers that time. But MWAIT
makes CPU stop immediately, so we don't need to waste time after
wakeup with interrupts still disabled, increasing latency.
On my system it reduces ping localhost latency, waking up all CPUs
once a second, from 277us to 242us.
MFC after: 1 month
Even though the information is very limited, it seems the intent of
this flag is to control ACPI_BITREG_BUS_MASTER_STATUS use for C3,
not force ACPI_BITREG_ARB_DISABLE manipulations for C2, where it was
never needed, and which register not really doing anything for years.
It wasted lots of CPU time on congested global ACPI hardware lock
when many CPU cores were trying to enter/exit deep C-states same time.
On idle 80-core system it pushed ping localhost latency up to 20ms,
since badport_bandlim() via counter_ratecheck() wakes up all CPUs
same time once a second just to synchronously reset the counters.
Now enabling C-states increases the latency from 0.1 to just 0.25ms.
Discussed with: kib
MFC after: 1 month
This structure is shared among multiple instances of a driver, so we
should ensure that it doesn't somehow get treated as if there's a
separate instance per interface. This is especially important for
software-only drivers like wg.
DEVICE_REGISTER() still returns a void * and so the per-driver sctx
structures are not yet defined with the const qualifier.
Reviewed by: gallatin, erj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29102
ath_hal is compiled into the kernel by default and so always prints a
message to dmesg even when the system has no ath hardware.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
This updates the driver to align with the version included in
the "Intel Ethernet Adapter Complete Driver Pack", version 25.6.
There are no major functional changes; this mostly contains
bug fixes and changes to prepare for new features. This version
of the driver uses the previously committed ice_ddp package
1.3.19.0.
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Tested by: jeffrey.e.pieper@intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D28640
netmap is compiled into the kernel by default so initialization was
always reported, and netmap uses a formatting convention not used in the
rest of the kernel.
Reviewed by: vmaffione
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29099
Firmware access from t4_attach takes place without any synchronization.
The driver should not panic (debug kernels) if something goes wrong in
early communication with the firmware. It should still load so that
it's possible to poke around with cxgbetool.
MFC after: 1 week
Sponsored by: Chelsio Communications
When rx packet contains hash value sent from host, store it in
the mbuf's flowid field so when the same mbuf is on the tx path,
the hash value can be used by the host to determine the outgoing
network queue.
MFC after: 2 weeks
Sponsored by: Microsoft
It closes tiny race when the flag could be set between being cleared
and the space is checked, that would create us some more work. The
flag setting is protected by both locks, so we can clear it in either
place, but in between both locks are dropped.
MFC after: 1 week
I think it allowed to avoid some TX thread wakeups while the socket
buffer is full. But add there another options if ic_check_send_space
is set, which means socket just reported that new space appeared, so
it may have sense to pull more data from ic_to_send for better TX
coalescing.
MFC after: 1 week
Add sysctl link_active_on_if_down, which allows user to control
if interface is kept in active state when it is brought
down with ifconfig. Set it to enabled by default to preserve
backwards compatibility.
Reviewed by: erj
Tested by: gowtham.kumar.ks@intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D28028
HW keeps track of RX errors using several counters, each for
specific type of errors. Report RX errors to OS as sum
of all those counters: CRC errors, illegal bytes, checksum,
length, undersize, fragment, oversize and jabber errors.
There is no HW counter for frames with invalid L3/L4 checksums
so add a SW one.
Also add a "rx_errors" sysctl with a copy of netstat IERRORS
counter value to make it easier accessible from scripts.
Reviewed By: erj
Tested By: gowtham.kumar.ks@intel.com
Sponsored By: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D27639
HW keeps track of RX errors using several counters, each for
specific type of errors. Report RX errors to OS as sum
of all those counters: CRC errors, illegal bytes, checksum,
length, undersize, fragment, oversize and jabber errors.
Also, add new "rx_errs" sysctl in the dev.ix.N.mac_stats tree. This is
to provide an another way to display the sum of RX errors.
Signed-off-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com>
Reviewed By: erj
Tested By: gowtham.kumar.ks@intel.com
Sponsored By: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D27191
This fixes mpr driver on big-endian devices.
Tested on powerpc64 and powerpc64le targets using a SAS9300-8i card
(LSISAS3008 pci vendor=0x1000 device=0x0097)
Submitted by: Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by: luporl, alfredo, Sreekanth Reddy <sreekanth.reddy@broadcom.com> (by email)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25785
Merge the following fixes from https://github.com/pfsense/FreeBSD-src
1940e7d3 Save address of ingress packets to allow wg to work on HA
8f5531f1 Fix connection to IPv6 endpoint
825ed9ee Fix tcpdump for wg IPv6 rx tunnel traffic
2ec232d3 Fix issue with replying to INITIATION messages in server mode
ec77593a Return immediately in wg_init if in DETACH'd state
0f0dde6f Remove unnecessary wg debug printf on transmit
2766dc94 Detect and fix case in wg_init() where sockets weren't cleaned up
b62cc7ac Close the UDP tunnel sockets when the interface has been stopped
Reviewed by: kevans
Obtained from: pfSense 2.5
MFC after: 3 days
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D28962
Mainly link errors interrupts should only be activated on fully linked port,
otherwise noise on lanes can cause livelock. But we don't have error
counters yet, so leave these interrupts disabled.
PAT_WRITE_BACK is x86-only, whereas sys/dev/xen could be shared
between multiple architectures.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D28831
In the Neoverse N1 SDP PCIe driver we need to map a page shared
between the firmware and the kernel. Previously we would use
pmap_kenter for this, however as this is not standardised between
architectures switch to the common pmap_qenter.
While here fix the error handling code to clean up on failure.
Reviewed by: br
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D28890
Make the pwm_backlight module depend on backlight, so it
has access to the backlight interface symbols. Otherwise you'll
get an error like:
link_elf: symbol backlight_get_info_desc undefined
Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com>
MFC after: 3 days
PR: 253765
Add it to the x86 GENERIC and MINIMAL kernels
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
Reviewed by: rpokala
Differential Revision: https://reviews.freebsd.org/D28738
ipmi_ssif will `smbus_request_bus()` to do multiple smbus requests
(which requests the iicbus), and then here in `bread()` we also need to
request the bus because `bread()` takes multiple transactions.
This causes deadlock as it's waiting for the bus it already has without
`IIC_RECURSIVE`.
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D28742
The previous implementation was reported to try to coalesce packets
in situations when it should not, that resulted in assertion later.
This implementation better checks the first packet of the chain for
the coallescing elligibility.
MFC after: 3 days
Instead of 2-4 socket reads per PDU this can do as low as one read
per megabyte, dramatically reducing TCP overhead and lock contention.
With this on iSCSI target I can write more than 4GB/s through a
single connection.
MFC after: 1 month
Multi page rings are mapped using a single hypercall that gets passed
an array of grants to map. One of the grants in the array failing to
map would lead to the failure of the whole ring setup operation, but
there was no cleanup of the rest of the grant maps in the array that
could have likely been created as a result of the hypercall.
Add proper cleanup on the failure path during ring setup to unmap any
grants that could have been created.
This is part of XSA-361.
Sponsored by: Citrix Systems R&D
- Make frontends call unified CTL core method ctl_datamove_done()
to report move completion. It allows to reduce code duplication
in differerent backends by accounting DMA time in common code.
- Add to ctl_datamove_done() and be_move_done() callback samethr
argument, reporting whether the callback is called in the same
context as ctl_datamove(). It allows for some cases like iSCSI
write with immediate data or camsim frontend write save one context
switch, since we know that the context is sleepable.
- Remove data_move_done() methods from struct ctl_backend_driver,
unused since forever.
MFC after: 1 month
T5 and above have extra bits for the optional filter fields. This is a
correctness issue and not just a waste because a filter mode valid on a
T4 (36b) may not be valid on a T5+ (40b).
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Allow the filter mask (aka the hashfilter mode when hashfilters are
in use) to be set any time it is safe to do so. The requested mask
must be a subset of the filter mode already. The driver will not change
the mode or ingress config just to support a new mask.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
1. Query the firmware for filter mode, mask, and related ingress config
instead of trying to figure them out from hardware registers. Read
configuration from the registers only when the firmware does not
support this query.
2. Use the firmware to set the filter mode. This is the correct way to
do it and is more flexible as well. The filter mode (and associated
ingress config) can now be changed any time it is safe to do so.
The user can specify a subset of a valid mode and the driver will
enable enough bits to make sure that the mode is maxed out -- that
is, it is not possible to set another bit without exceeding the
total width for optional filter fields. This is a hardware
requirement that was not enforced by the driver previously.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
- Implements little-endian support (powerpc64le)
- Adds 'hw.ofwfb.physaddr' kernel parameter so user can manually
provide correct address if it's not detected correctly
- Adds 'hw.ofwfb.argb32_pixel' so user can set it manually if
colors are inverted due to incorrect pixel format (default = 1)
- Automatically selects RGBA32 pixel format if NVidia graphic adapter
is detected (sets hw.ofwfb.argb32_pixel=0)
Machines equipped with NVidia graphic adapters tend to use RGBA32
pixel format. By default ARGB32 pixel format is used, proved to work
on machines equipped with ATI graphic adapter and the onboard adapter
used on Talos II and Blackbird machines from Raptor Computing Systems.
Original patch developed by bdragon
Reviewed by: bdragon, luporl
MFC after: 3 days
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D28604
The ACPI NFIT specification defines a set of "NVDIMM State Flags". These
flags are already reported by `acpidump -t', but this change makes them
available on a per-device basis, in a format that is more easily parsed.
To simplify this, introduce acpi_nfit_get_memory_maps_by_dimm(), which
locates the (ACPI_NFIT_MEMORY_MAP)s associated with a given
(nfit_handle_t).
Reviewed by: mav, cem
Tested by: mav, rpokala (version for stable/12)
MFC after: 3 days
Sponsored by: Panasas
BUS_DMA_NOCACHE should only be used when one needs to guarantee the
created mapping has uncached memory attributes, usually as a result
of buggy hardware. Normal use cases should pass BUS_DMA_COHERENT, to
create an appropriate mapping based on the flags passed to
bus_dma_tag_create().
This should have no functional change, since the DMA tags in this driver
are created without the BUS_DMA_COHERENT flag.
Reported by: mmel
Reviewed by: mmel, Thomas Skibo <thomas-bsd@skibo.net>
MFC after: 3 days
Per the i2c spec, a slave device can stretch SCL idefinitely, so 25ms is
a bit arbitrary in general. smbus does specify an optional timeout
recovery mechanism to be done at about 25~35ms, but the IPMI SSIF spec
says that BMCs don't have any obligation to implement that.
The BMC on Altra seems to mostly respond within 25ms, but occasionally
will stretch SCL for ~300 msec.
Also, the count_us mechanism seems to actually timeout around 25%
earlier than it would claim (timeout really happening around 19ms
instead of 25ms).
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
Reviewed by: manu, imp
Differential Revision: https://reviews.freebsd.org/D28747
In the new ENA-based instances like c6gn, the vector table moved to a
new PCIe bar - BAR1. Previously it was always located on the BAR0, so
the resources were already allocated together with the registers.
As the FreeBSD isn't doing any resource allocation behind the scenes,
the driver is responsible to allocate them explicitly, before other
parts of the OS (like the PCI code allocating MSIx) will be able to
access them.
To determine dynamically BAR on which the MSIx vector table is present
the pci_msix_table_bar() is being used and the new BAR is allocated if
needed.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 3 days
Read the PF-only hardware settings directly in get_params__post_init.
Split the rest into two routines used by both the PF and VF drivers: one
that reads the SGE rx buffer configuration and another that verifies
miscellaneous hardware configuration.
MFC after: 1 week
Sponsored by: Chelsio Communications
This updates r311987/fb1d9b7f4113d which allowed any number of vectors to be
used. Since we're just attaching one instance, the meaning of more than one
vector is not clear and seems to cause problems. Fall back to old methods for
these cards.
PR: 235016
Submitted by: David Cross
These errors do not clear so to NULL, so the existing check was
treating these failures as success. The rest of do_pass_establish()
then tried to use the listen socket as if it was a connection socket
newly created by syncache_expand().
In addition, for negative return values, do not send a RST to the
peer.
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D28243
The fallback for __align_up() used by roundup2() uses __typeof__()
which doesn't work for bitfields. This fixes the build on GCC which
uses the fallback.
Reviewed by: arichardson, markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D28599
When refill_fl() fails to allocate large (9/16KB) mbuf cluster, it
falls back to safe (4KB) ones. But it still saved into sd->zidx
the original fl->zidx instead of fl->safe_zidx. It caused problems
with the later use of that cluster, including memory and/or data
corruption.
While there, make refill_fl() to use the safe zone for all following
clusters for the call, since it is unlikely that large succeed.
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Reviewed by: np, jhb
Differential Revision: https://reviews.freebsd.org/D28716
FreeBSD when running as a dom0 under Xen is not supposed to access the
run time services directly, and instead should proxy the calls through
Xen using an hypercall interface that exposes access to selected run
time services.
Implement the efirt interface on top of the Xen provided hypercalls.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D28621
Introduce a set of hooks for MI EFI public functions, so that a new
implementation can be done. This will be used to implement the Xen PV
EFI interface that's used when running FreeBSD as a Xen dom0 from UEFI
firmware. Also make the efi_status_to_errno non-static since it will
be used to evaluate status return values from the PV interface.
No functional change indented.
Sponsored by: Citrix Systems R&D
Reviewed by: kib, imp
Differential revision: https://reviews.freebsd.org/D28620
Locking the second lock which causes the LOR, can be skipped because
the code updating the shared variables is always executing from the
same USB thread.
lock order reversal:
1st 0xfffff80005cc3840 pcm7:play:dsp7.p0 (pcm play channel, sleep mutex)
@ usb_transfer.c:2342
2nd 0xfffff80005cc3860 pcm7:record:dsp7.r0 (pcm record channel, sleep mutex)
@ uaudio.c:2317
lock order pcm record channel -> pcm play channel established at:
witness_checkorder+0x461
__mtx_lock_flags+0x98
dsp_mmap_single+0x151
vm_mmap_cdev+0x65
devfs_mmap_f+0x143
kern_mmap_req+0x594
sys_mmap+0x46
amd64_syscall+0x12e
fast_syscall_common+0xf8
lock order pcm play channel -> pcm record channel attempted at:
witness_checkorder+0xd82
__mtx_lock_flags+0x98
uaudio_chan_play_callback+0xeb
usbd_callback_wrapper+0x7ec
usb_command_wrapper+0x7e
usb_callback_proc+0x8e
usb_process+0xf3
fork_exit+0x80
fork_trampoline+0xe
Found by: Stefan Ehmann <shoesoft@gmx.net>
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
defined by hardware rather than cached one to match HIDIOCGRDESC ioctl.
This fixes errors reported by hid-tools being run against /dev/hidraw#
device node belonging to driver which overloads report descriptor.
MFC after: 1 week
Ignore fantom keyboard state reports entirelly rather than ignore
RollOver states for each key separatelly. Latter results in spurious
release/push pairs of events on each fantom keyboard state report.
Reported by: Jan Martin Mikkelsen <janm_AT_transactionware_DOT_com>
Submitted by: Jan Martin Mikkelsen (initial version)
PR: 253249
MFC after: 1 week
Ignore fantom keyboard state reports entirelly rather than ignore
RollOver states for each key separatelly. Latter results in spurious
release/push pairs of events on each fantom keyboard state report.
Reported by: Jan Martin Mikkelsen <janm_AT_transactionware_DOT_com>
Submitted by: Jan Martin Mikkelsen (initial version)
PR: 253249
MFC after: 1 week
Previously, iscsi_poll() just panicked. This meant if you got a panic
on a box when using the iSCSI initiator, the attempt to shutdown would
trigger a nested panic and never write out a core. Now, CCB's sent to
iSCSI devices (such as the sychronize-cache request in dashutdown())
just fail with a timeout during a panic shutdown.
Reviewed by: scottl, mav
MFC after: 2 weeks
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D28455
Since commit 8fa6abb6f4 ("Expose clang's alignment builtins and use
them for roundup2/rounddown2"), clang emits warnings for several
alignment operations in these drivers because the operation is a no-op.
The compiler is arguably being too strict here, but in the meantime
let's silence the warnings by conditionally compiling the alignment
operations.
Reviewed by: arichardson, hselasky
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28576
My YOGA requires a minimum of 7 to parse w/o an error. Since the memory savings
are trivial and the yoga a popular system, bump the default up to 8. There's no
API/ABI issues in doing this. This hid_item struct isn't exported to userland
and the one libusbhid has is different and only shares a name...
MFC After: 3 days
Reviewed by: wulf@
Differential Revision: https://reviews.freebsd.org/D28543
It appears that production versions of EPYC firmware get the _STA method right
for these nodes. In fact, this workaround breaks on production hardware by
including too many uart nodes. This work around was for pre-release hardware
that wound up not having a large deployment. Move this work around to a kernel
option since the machines that needed it have been powered off and are difficult
to resurrect. Should there be a more significant deployment than is understood,
we can restrict it based on smbios strings.
Discussed with: mmacy@, seanc@, jhb@
MFC After: 3 days
This adds a new sysctl to Wellspring Touchpad driver for controlling
Z-Axis (2-finger vertical scroll) direction "hw.usb.wsp.z_invert".
Submitted by: James Wright <james.wright_AT_digital-chaos_DOT_com>
Reviewed by: wulf
PR: 253321
Differential revision: https://reviews.freebsd.org/D28521
vt is using static buffers for on screen data, the buffer size is
calculated based on maximum supported screen size and 8x16 font.
When using hi-res graphics and very smaller than 8x16 font, we
need to be careful not to overflow static buffers in vt.
Testing: I did test by building smaller buffers than vt currently is using,
royger was testing on actual 4k capable hardware.
MFC after: 1 week
Tested by: royger
X700 family of controllers has limited number of available VLAN
HW filters. Driver did not handle properly a case when user
assigned more VLANs to the interface which had all filters
already in use. Fix that by disabling HW filtering when
it is impossible to create filters for all requested VLANs.
Keep track of registered VLANs using bitstring to be able
to re-enable HW filtering when number of requested VLANs
drops below the limit.
Also switch all allocations to use M_IXL malloc type
to ease detecting memory leaks in the driver.
Reviewed by: erj
Tested by: gowtham.kumar.ks@intel.com
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28137
Add endiannes conversions in order to support big-endian platforms
Submitted by: Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by: luporl, alfredo, kadesai (on email)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26531
- limit maximum segment size to 2048 bytes. Although dwmmc supports a buffer
fragment with a maximum length of 4095 bytes, use the nearest lower power
of two as the maximum fragment size. Otherwise, busdma create excessive
buffer fragments.
- fix off by one error in computation of the maximum data transfer length.
- in addition, reserve two DMA descriptors that can be used by busdma
bouncing. The beginning or end of the buffer can be misaligned.
- Don’t ignore errors passed to bus_dmamap_load() callback function.
- In theory, a DMA engine may be running at time when next dma descriptor is
constructed. Create a full DMA descriptor before OWN bit is set.
MFC after: 2 weeks
This fixes an issue where a private key contained bits that should
have been cleared by the clamping process, but were passed through
to the scalar multiplication routine and resulted in an invalid
public key.
Issue diagnosed (and an initial fix proposed) by shamaz.mazum in
PR 252894.
This fix suggested by Jason Donenfeld.
PR: 252894
Reported by: shamaz.mazum
Reviewed by: dch
MFC after: 3 days
DataSN for solicited Data-Out is per-R2T. Since we handle whole R2T
in one go, we don't need to store it anywhere, especially in global
per-command structure. This may allow us to handle multiple R2T per
command at once, if we decide, or may be relax locking.
Rename the second use of that field to io_referenced_task_tag.
MFC after: 1 month
As we get started with no memory allocator, we set up static font data
for font passed by loader (if there is any). At this time, we also must
set refcount 1, and refcount will get incremented in cnprobe() callback.
At some point the memory allocator will be available, and we will set up
properly allocated font data, but we should not disturb the refcount.
PR: 253147
Memory and PCI resources are freed with no particular order. This could
cause use-after-frees when detaching following a failed attach. For
instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but
iflib_tqg_detach() attempts to access this array. Similarly, adapter
queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to
access adapter queues to free PCI resources.
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D27634
- The behavior implemented in r362905 resulted in delayed transmission
of packets in some cases, causing performance issues. Use a different
heuristic to predict tx requests.
- Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing
entirely. It can be changed at any time. There is no change in
default behavior.
Investigation of iSCSI target data corruption reports brought me to
discovery that cxgb(4) expects mbufs to be physically contiguous, that
is not true after I've started using m_extaddref() in software iSCSI
for large zero-copy transmissions. In case of fragmented memory the
driver transmitted garbage from pages following the first one due to
simple use of pmap_kextract() for the first pointer instead of proper
bus_dmamap_load_mbuf_sg(). Seems like it was done as some optimization
many years ago, and at very least it is wrong in a world of IOMMUs.
This patch just removes that optimization, plus limits packet coalescing
for mbufs crossing page boundary, also depending on assumption of one
segment per packet.
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Reviewed by: mmacy, np
Differential revision: https://reviews.freebsd.org/D28428
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer
that points to unmapped address. Then, this was extended to array of
such pointers. Then, such mbufs were augmented with header/trailer.
Basically, extended mbufs are extended, and set of features is subject
to change. The new name should be generic enough to avoid further
renaming.
newer controller have a sparce bus space that can be figured
out by probing the HW. This gives the starting bus number.
When reading the PCI config. space behind the VMD controller,
the offset of the starting bus needs to be subtracted from
the bus being read.
Fixed a bug in which in which not all of the devices
directly attached to the VMD controller would be probed.
On my initial test HW, a switch was found at bus 0, slot 0
and function 0. All of the NVME drives were behind that
switch. Now scan for all slots and functions attached to
bus 0. If a something was found then run attach after the
scan. On detach also go through all slots and functions
on bus 0.
Tested with device ID's: 0x201d & 0x9a0b
Tested by: nc@
MFC after: 7 days
PR: 252253
Move software iSCSI tunables/sysctls into kern.icl.soft subtree.
Replace several hardcoded length constants there with variables.
While there, stretch the limits to better match Linux' open-iscsi
and our own initiator with new MAXPHYS of 1MB. Our CTL target is
also optimized for up to 1MB I/Os, so there is also a match now.
For Windows 10 and VMware 6.7 initiators at default settings it
should make no change, since previous limits were sufficient there.
Tests of QD1 1MB writes from FreeBSD over 10GigE link show throughput
increase by 29% on idle connection and 132% with concurrent QD8 reads.
MFC after: 3 days
Sponsored by: iXsystems, Inc.
This is a regression issue after the new UAR API was introduced
by f8f5b459d2 .
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
It's rather confusing when adapter->hw and hw are mixed and matched
within a particular function.
Some of this was missed in cd1cf2fc1d
and r353778 respectively.
nids(4) was a clever idea in the early 2000's when the market was
flooded with 10/100 NICs with Windows-only drivers, but that hasn't been
the case for ages and the driver has had no meaningful maintenance in
ages. It only supports Windows-XP era drivers.
Also remove:
- ndis support from wpa_supplicant
- ndiscvt(8)
Reviewed By: emaste, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D27609
RFC 7143 (11.2.1.8):
An ITT value of 0xffffffff is reserved and MUST NOT be assigned for a
task by the initiator. The only instance in which it may be seen on
the wire is in a target-initiated NOP-In PDU (Section 11.19) and in
the initiator response to that PDU, if necessary.
MFC after: 1 month