Commit graph

136012 commits

Author SHA1 Message Date
Sai Rajesh Tallamraju
38bfc6dee3 iflib: Free resources in a consistent order during detach
Memory and PCI resources are freed with no particular order.  This could
cause use-after-frees when detaching following a failed attach.  For
instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but
iflib_tqg_detach() attempts to access this array. Similarly, adapter
queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to
access adapter queues to free PCI resources.

MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D27634
2021-02-01 11:15:54 -05:00
Jonah Caplan
88be0e1120 bridge: fix STP roles and protos strings
Add the missing commas that got lost in e5539fb618.

PR:		252532
Reviewd by:	kp@, donner@, freqlabs@
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28425
2021-02-01 15:27:06 +01:00
Jessica Clarke
05985a7f80 arm64: Improve DDB backtrace support
The existing implementation relies on each trap handler saving a normal
stack frame record, which is a waste of time and space when we're
already saving a trapframe to the stack. It's also wrong as it currently
saves LR not ELR.

Instead of patching it up, rewrite it based on the RISC-V implementation
with inspiration from the amd64 implementation for how to handle
vectored traps to provide an improved implementation. This includes
compressing the information down to one line like other architectures
rather than the highly-verbose old form that repeats itself by printing
LR and FP in one frame only to print them as PC and SP in the next. It
also includes printing out actually useful information about the traps
that occurred, though FAR is not saved in the trapframe so we cannot
print it (in general it can be clobbered between when the trap happened
and now), only ESR.

The AAPCS also allows the stack frame record to be located anywhere in
the frame, not just the top, so the caller's SP is not at a fixed offset
from the callee's FP like on almost all other architectures in
existence. This means there is no way to derive the caller's SP in the
unwinder, and so we have to drop that bit of (unused) state everywhere.

Reviewed by:	jhb, markj
Differential Revision:	https://reviews.freebsd.org/D28026
2021-02-01 14:15:57 +00:00
Hans Petter Selasky
db46c0d0cb Fix LINT kernel builds after 1a714ff204 .
MFC after:	1 week
Discussed with:	rrs@
Differential Revision:  https://reviews.freebsd.org/D28357
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2021-02-01 14:24:15 +01:00
Michael Tuexen
bdd4630c9a sctp: small cleanup, no functional change intended.
MFC after:	3 days
2021-02-01 14:04:57 +01:00
Mateusz Guzik
69c5fa5cd1 zfs: remove incomplete ifdefs for lockless symlink support
This wil be handled differently upstream and merged later.
2021-02-01 13:18:27 +01:00
Navdeep Parhar
3447df8bc5 cxgbe(4): Fixes to tx coalescing.
- The behavior implemented in r362905 resulted in delayed transmission
  of packets in some cases, causing performance issues.  Use a different
  heuristic to predict tx requests.

- Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing
  entirely.  It can be changed at any time.  There is no change in
  default behavior.
2021-02-01 03:00:09 -08:00
Oleksandr Tymoshenko
d6f9c5a6d2 mips: fix NLM platforms breakage caused by e0a0a3ef
NetLogic platforms have their own implementation of cpu_init_interrupts.
Apply the same logic to it as to intr_machdep.c.

PR:	253051
2021-01-31 23:56:22 -08:00
Mateusz Guzik
e6ff6154d2 x86: use compiler intrinsics for bswap* 2021-02-01 04:53:23 +00:00
Mateusz Guzik
aae89f6f09 amd64: use compiler intrinsics for bsf* and bsr* 2021-02-01 04:53:23 +00:00
Mateusz Guzik
6f19dc2124 cache: add delayed degenerate path handling 2021-02-01 04:53:23 +00:00
Mateusz Guzik
bbfb1edd70 cache: move hash computation into the parsing loop 2021-02-01 04:36:45 +00:00
Michael Tuexen
af885c57d6 sctp: improve input validation
Improve the handling of INIT chunks in specific szenarios and
report and appropriate error cause.
Thanks to Anatoly Korniltsev for reporting the issue for the
userland stack.

MFC after:	3 days
2021-01-31 23:46:53 +01:00
Oleksandr Tymoshenko
e0a0a3efcb mips: fix early kernel panic when setting up interrupt counters
Commit 248f0ca converted intrcnt and intrnames from u_long[]
and char[] to u_long* and char* respectively, but for non-INTRNG mips
these symbols were defined in .S file as a pre-allocated static arrays,
so the problem wasn't cought at compile time. Conversion from an array
to a pointer requires pointer initialization and it wasn't done
for MIPS, so whatever happenned to be in the begginning of intcnt[]
array was used as a pointer value.

Move intrcnt/intrnames to C code and allocate them dynamically
although with a fixed size at the moment.

Reviewed by:	emaste
PR:		253051
Differential Revision:	https://reviews.freebsd.org/D28424
MFC after:	1 day
2021-01-31 13:44:45 -08:00
Edward Tomasz Napierala
b8073b3c74 msdosfs: fix vnode leak with msdosfs_rename()
This could happen when failing due to disappearing source file.

Reviewed By:	kib
Tested by:	pho
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27338
2021-01-31 21:37:44 +00:00
Edward Tomasz Napierala
cb69621249 msdosfs: fix double unlock if the source file disappears
We would unlock fvp here, only to unlock it again below,
just before "bad".

Reviewed By:	kib
Tested by:	pho
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27339
2021-01-31 21:35:34 +00:00
Alexander Motin
9dc7c250b8 cxgb(4): Remove assumption of physically contiguous mbufs.
Investigation of iSCSI target data corruption reports brought me to
discovery that cxgb(4) expects mbufs to be physically contiguous, that
is not true after I've started using m_extaddref() in software iSCSI
for large zero-copy transmissions.  In case of fragmented memory the
driver transmitted garbage from pages following the first one due to
simple use of pmap_kextract() for the first pointer instead of proper
bus_dmamap_load_mbuf_sg().  Seems like it was done as some optimization
many years ago, and at very least it is wrong in a world of IOMMUs.

This patch just removes that optimization, plus limits packet coalescing
for mbufs crossing page boundary, also depending on assumption of one
segment per packet.

MFC after:	3 days
Sponsored by:	iXsystems, Inc.
Reviewed by:	mmacy, np
Differential revision:	https://reviews.freebsd.org/D28428
2021-01-31 12:55:06 -05:00
Mateusz Guzik
f1be262ec1 amd64: move memcmp checks upfront
This is a tradeoff which saves jumps for smaller sizes while making
the 8-16 range slower (roughly in line with the other cases).

Tested with glibc test suite.

For example size 3 (most common with vfs namecache) (ops/s):
before:	407086026
after:	461391995

The regressed range of 8-16 (with 8 as example):
before:	540850489
after:	461671032
2021-01-31 16:07:20 +00:00
Mateusz Guzik
e027e24bfa cache: add trailing slash support
Tested by:	pho
2021-01-31 12:02:46 +00:00
Mateusz Guzik
8cbd164a17 cache: handle NOFOLLOW requests for symlinks
Tested by:	pho
2021-01-31 12:02:46 +00:00
Alexander V. Chernikov
78c93a1721 Use process fib for inet/inet6 fib_algo sysctls.
This allows to set/query fib algo for non-default fibs.

MFC after:	3 days
2021-01-31 10:50:08 +00:00
Michael Tuexen
8dc6a1edca sctp: fix a locking issue for old unordered data
Thanks to Anatoly Korniltsev for reporting the issue for the
userland stack.

MFC after:	3 days
2021-01-31 10:46:23 +01:00
Alexander V. Chernikov
151ec796a2 Fix the design problem with delayed algorithm sync.
Currently, if the immutable algorithm like bsearch or radix_lockless
 receives rtable update notification, it schedules algorithm rebuild.
This rebuild is executed by the callout after ~50 milliseconds.

It is possible that a script adding an interface address and than route
 with the gateway bound to that address will fail. It can happen due
 to the fact that fib is not updated by the time the route addition
 request arrives.

Fix this by allowing synchronous algorithm rebuilds based on certain
 conditions. By default, these conditions assume:
1) less than net.route.algo.fib_sync_limit=100 routes
2) routes without gateway.

* Move algo instance build entirely under rib WLOCK.
 Rib lock is only used for control plane (except radix algo, but there
  are no rebuilds).
* Add rib_walk_ext_locked() function to allow RIB iteration with
 rib lock already held.
* Fix rare potential callout use-after-free for fds by binding fd
 callout to the relevant rib rmlock. In that case, callout_stop()
 under rib WLOCK guarantees no callout will be executed afterwards.

MFC after:	3 days
2021-01-30 23:25:57 +00:00
Alexander V. Chernikov
dd9163003c Add rib_subscribe_locked() and rib_unsubsribe_locked() to support
subscriptions during RIB modifications.
Add new subscriptions to the beginning of the lists instead of
 the end. This fixes the situation when new subscription is created
 int the callback for the existing subscription, leading to the
 subscription notification handler pick it.

MFC after: 3 days
2021-01-30 23:25:57 +00:00
Alexander V. Chernikov
ab6d9aaed7 Move business logic from rebuild_fd_callout() into rebuild_fd().
This simplifies code a bit and allows for future non-callout
 callers to request rebuild.

MFC after:	3 days
2021-01-30 23:25:57 +00:00
Alexander V. Chernikov
f8b7ebea49 Improve fib_algo debug messages.
* Move per-prefix debug lines under LOG_DEBUG2
* Create fib instance counter to distingush log messages between
 instances
* Add more messages on rebuild reason.

MFC after:	3 days
2021-01-30 23:25:56 +00:00
Alexander V. Chernikov
91f2c69ec2 Fix unused-function waring when compiling with FIB_ALGO.
MFC after:	3 days
2021-01-30 23:25:56 +00:00
Bjoern A. Zeeb
4a26380ba6 LinuxKPI: add module dependency on firmware(9)
In a6c2507d1b support for LinuxKPI
firmware loading was added.  Record the dependency on firmware(9)
as otherwise (if built as module) linuxkpi will no longer load.

Reported-by:	tijl
MFC after:	1 day
X-MFC-with:	a6c2507d1b
Sponsored-by:	The FreeBSD Foundation
2021-01-30 17:50:26 +00:00
Kirk McKusick
a63eae65ff Revert 2d4422e799, Eliminate lock order reversal in UFS ffs_unmount().
After discussion with Chuck Silvers (chs@) we have decided that
there is a better way to resolve this lock order reversal which
will be committed separately.

Sponsored by: Netflix
2021-01-30 00:03:37 -08:00
Jung-uk Kim
29f37e9bcc acpica: Import ACPICA 20210105.
(cherry picked from commit a61ec1492c58c40bd0d968794c380668c157e2ef)
2021-01-29 20:53:07 -05:00
Jung-uk Kim
385fb5d933 acpica: Import ACPICA 20201217.
(cherry picked from commit a4634ed7779f0905e3bfeb781e58d40a5bdf9bb7)
2021-01-29 19:48:02 -05:00
Mateusz Guzik
d1de5698df amd64: retire sse2_pagezero
All page zeroing is using temporal stores with rep movs*, the routine is
unused for several years.

Should a need arise for zeroing using non-temporal stores, a more
optimized variant can be implemented with a more descriptive name.
2021-01-30 00:17:15 +00:00
Mateusz Guzik
164c3b8184 amd64: add missing ALIGN_TEXT to loops in memset and memmove 2021-01-30 00:01:44 +00:00
Mateusz Guzik
710e45c4b8 Reimplement strlen
The previous code neglected to use primitives which can find the end
of the string without having to branch on every character.

While here augment the somewhat misleading commentary -- strlen as
implemented here leaves performance on the table, especially so for
userspace. Every arch should get a dedicated variant instead.

In the meantime this commit lessens the problem.

Tested with glibc test suite.

Naive test just calling strlen in a loop on Haswell (ops/s):

$(perl -e "print 'A' x 3"):
before:	211198039
after:	338626619

$(perl -e "print 'A' x 100"):
before:	83151997
after:	98285919
2021-01-29 23:48:26 +00:00
Alexander V. Chernikov
cb984c62d7 Fix multipath support for rib_lookup_info().
The initial plan was to remove rib_lookup_info() before
 FreeBSD 13. As several customers are still remaining,
 fix rib_lookup_info() for the multipath use case.
2021-01-29 23:14:24 +00:00
Alexander V. Chernikov
53729367d3 Fix subinterface vlan creation.
D26436 introduced support for stacked vlans that changed the way vlans
 are configured.  In particular, this change broke setups that have
 same-number vlans as subinterfaces.

Vlan support was initially created assuming "vlanX" semantics. In this paradigm,
 automatic number assignment supported by cloning (ifconfig vlan create) was a
 natural fit.
When "ifaceX.Y" support was added, allowing to have the same vlan number on
 multiple devices, cloning code became more complex, as the is no
unified "vlan" namespace anymore. Such interfaces got the first spare
index from "vlan" cloner. This, in turn, led to the following problem:
 ifconfig ix0.333 create -> index 1
 ifconfig ix0.444 create -> index 2
 ifconfig vlan2 create -> allocation failure

This change fixes such allocations by using cloning indexes only for
 "vlanX" interfaces.

Reviewed by:            hselasky
MFC after:		3 days
Differential Revision:  https://reviews.freebsd.org/D27505
2021-01-29 21:43:20 +00:00
Gleb Smirnoff
3f43ada98c Catch up with 6edfd179c8: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG.
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer
that points to unmapped address.  Then, this was extended to array of
such pointers.  Then, such mbufs were augmented with header/trailer.
Basically, extended mbufs are extended, and set of features is subject
to change.  The new name should be generic enough to avoid further
renaming.
2021-01-29 11:46:24 -08:00
Roger Pau Monné
b6d85a5f51 stand/multiboot: adjust the protocol between loader and kernel
There's a currently ad-hoc protocol to hand off the FreeBSD kernel
payload between the loader and the kernel itself when Xen is in the
middle of the picture. Such protocol wasn't very resilient to changes
to the loader itself, because it relied on moving metadata around to
package it using a certain layout. This has proven to be fragile, so
replace it with a more robust version.

The new protocol requires using a xen_header structure that will be
used to pass data between the FreeBSD loader and the FreeBSD kernel
when booting in dom0 mode. At the moment the only data conveyed is the
offset of the start of the module metadata relative to the start of the
module itself.

This is a slightly disruptive change since it also requires a change
to the kernel which is contained in this patch. In order to update
with this change the kernel must be updated before updating the
loader, as described in the handbook. Note this is only required when
booting a FreeBSD/Xen dom0. This change doesn't affect the normal
FreeBSD boot protocol.

This fixes booting FreeBSD/Xen in dom0 mode after
3630506b9d.

Sponsored by:		Citrix Systems R&D
MFC after:		3 days
Reviewed by:		tsoome
Differential Revision:	https://reviews.freebsd.org/D28411
2021-01-29 15:23:26 +01:00
Lutz Donnerhacke
ebc61c86b5 netgraph/ng_source: Switch queuing framework
Change the queuing framework from ifq to mbufq.

Reported by:	glebius
Reviewed by:	glebius, kp
Approved by:	kp (mentor)
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D28407
2021-01-29 12:34:53 +01:00
Mateusz Guzik
45e1f85414 poll: use fget_unlocked or fget_only_user when feasible
This follows select by eleminating the use of filedesc lock.
This is a win for single-threaded processes and a mixed bag for others
as at small concurrency it is faster to take the lock instead of
refing/unrefing each file descriptor.

Nonetheless, removal of shared lock usage is a step towards a
mtx-protected fd table.
2021-01-29 11:23:44 +00:00
Mateusz Guzik
6affe1b712 select: employ fget_only_user
Since most select users are single-threaded this avoid a lot of work
in the common case.

For example select of 16 fds (ops/s):
before:	2114536
after:	2991010
2021-01-29 11:23:44 +00:00
Mateusz Guzik
eaad8d1303 fd: add fget_only_user
This can be used by single-threaded processes which don't share a file
descriptor table to access their file objects without having to
reference them.

For example select consumers tend to match the requirement and have
several file descriptors to inspect.
2021-01-29 11:23:43 +00:00
Bjoern A. Zeeb
abd619045a __FreeBSD_version: update the references to the doc tree
Update the reference of which file to update in the doc tree when
bumping __FreeBSD_version.
2021-01-29 11:00:17 +00:00
Alex Richardson
1d15bceae6 tmpfs: implement pathconf(_PC_SYMLINK_MAX)
This fixes one of the sys/audit tests when running them on tmpfs.

Reviewed By:	delphij, kib
Differential Revision: https://reviews.freebsd.org/D28387
2021-01-29 09:30:25 +00:00
Jamie Gritton
c050ea803e jail: Handle a parent jail when a child is added to it
It's possible when adding a jail that its dying parent comes back to
life.  Only allow that to happen when JAIL_DYING is specified.  And if
it does happen, call PR_METHOD_CREATE on it.
2021-01-28 21:51:09 -08:00
Kyle Evans
0f919ed4ae tmpfs: push VEXEC check into tmpfs_lookup()
vfs_cache_lookup() has already done the appropriate VEXEC check, therefore
we must not re-check in VOP_CACHEDLOOKUP.

This fixes O_SEARCH semantics on tmpfs and removes a redundant descent into
VOP_ACCESS() in the common case.

Reported-by:	arichardson (via CheriBSD Jenkins CI)
Reviewed-by:	kib
MFC-after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28401
2021-01-28 19:25:11 -06:00
Warner Losh
8a51f14a78 newvers: tweak uname to be more useful
The current uname is branch-cXXXX-gHASH

Three changes to make uname more useful.
1. Move from using git rev-list --count to git rev-lis --count --first-parent
   since that gives a better, incrementing number.
2. Report this count as 'nXXXXX' rather than 'cXXXXX' because c is part of
   a hash and we've changed the sematnics of XXXXX
3. Remove g to make HASH cut and pastable.

Durting review, #1 & #3 had the largest consensus. There was a diversity of
opinion on #2, but on the whole it was positive so I'll acknowledge the dissent,
but move forward with something seems to have support since the dissent was all
about what letter to use where I chose 'n'.

MFC After: 3 days
Reviewed by: rgrimes, emaste (earlier version)
Differential Revision: https://reviews.freebsd.org/D28338
2021-01-28 17:44:56 -07:00
Alexander Motin
8fee65d0c7 Add missing newlines.
MFC after:	3 days
2021-01-28 18:20:00 -05:00
Doug Ambrisko
0c852bb9b9 Add support for some more Intel VMD controllers. Some of the
newer controller have a sparce bus space that can be figured
out by probing the HW.  This gives the starting bus number.
When reading the PCI config. space behind the VMD controller,
the offset of the starting bus needs to be subtracted from
the bus being read.

Fixed a bug in which in which not all of the devices
directly attached to the VMD controller would be probed.
On my initial test HW, a switch was found at bus 0, slot 0
and function 0.  All of the NVME drives were behind that
switch.  Now scan for all slots and functions attached to
bus 0.  If a something was found then run attach after the
scan.  On detach also go through all slots and functions
on bus 0.

Tested with device ID's: 0x201d & 0x9a0b

Tested by:	nc@
MFC after:	7 days
PR:		252253
2021-01-28 15:12:14 -08:00
Alexander Motin
b75168ed24 Make software iSCSI more configurable.
Move software iSCSI tunables/sysctls into kern.icl.soft subtree.
Replace several hardcoded length constants there with variables.

While there, stretch the limits to better match Linux' open-iscsi
and our own initiator with new MAXPHYS of 1MB.  Our CTL target is
also optimized for up to 1MB I/Os, so there is also a match now.
For Windows 10 and VMware 6.7 initiators at default settings it
should make no change, since previous limits were sufficient there.

Tests of QD1 1MB writes from FreeBSD over 10GigE link show throughput
increase by 29% on idle connection and 132% with concurrent QD8 reads.

MFC after:	3 days
Sponsored by:	iXsystems, Inc.
2021-01-28 16:22:45 -05:00