The routing stack control depends on quite a tree of functions to
determine the proper attributes of a route such as a source address (ifa)
or transmit ifp of a route.
When actually inserting a route, the stack needs to ensure that ifa and ifp
points to the entities that are still valid.
Validity means slightly more than just pointer validity - stack need guarantee
that the provided objects are not scheduled for deletion.
Currently, callers either ignore it (most ifp parts, historically) or try to
use refcounting (ifa parts). Even in case of ifa refcounting it's not always
implemented in fully-safe manner. For example, some codepaths inside
rt_getifa_fib() are referencing ifa while not holding any locks, resulting in
possibility of referencing scheduled-for-deletion ifa.
Instead of trying to fix all of the callers by enforcing proper refcounting,
switch to a different model.
As the rib_action() already requires epoch, do not require any stability guarantees
other than the epoch-provided one.
Use newly-added conditional versions of the refcounting functions
(ifa_try_ref(), if_try_ref()) and fail if any of these fails.
Reviewed by: donner
Differential Revision: https://reviews.freebsd.org/D28837
(cherry picked from commit 5964172837)
When we have an ifp pointer and the code is running inside epoch,
epoch guarantees the pointer will not be freed.
However, the following case can still happen:
* in thread 1 we drop to refcount=0 for ifp and schedule its deletion.
* in thread 2 we use this ifp and reference it
* destroy callout kicks in
* unhappy user reports a bug
This can happen with the current implementation of ifnet_byindex_ref(),
as we're not holding any locks preventing ifnet deletion by a parallel thread.
To address it, add if_try_ref(), allowing to return failure when
referencing ifp with refcount=0.
Additionally, enforce existing if_ref() is with KASSERT to provide a
cleaner error in such scenarios.
Finally, fix ifnet_byindex_ref() by using if_try_ref() and returning NULL
if the latter fails.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28836
(cherry picked from commit 7563019bc6)
SCTP needs to avoid binding a given socket twice. The check used to
avoid this is racy since neither the inpcb lock nor the global info lock
is held. Fix it by synchronizing using the global info lock. In
particular, sctp_inpcb_bind() may drop the inpcb lock in some cases, but
the info lock is sufficient to prevent double insertion into PCB hash
tables.
Reported by: syzbot+548a8560d959669d0e12@syzkaller.appspotmail.com
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 4a36122b1d)
Fix the following race between itimer_proc_continue() and process exit.
itimer_proc_continue() may be called via realitexpire(), the real
interval timer. Note that exit1() drains this timer _after_ draining
and freeing itimers. Moreover, itimers_exit() is called without the
process lock held; it only acquires the proc lock when deleting
individual itimers, so once they are drained we free p->p_itimers
without any synchronization. Thus, itimer_proc_continue() may load a
non-NULL p->p_itimers array and iterate over it after it has been freed.
Fix the problem by using the process lock when clearing p->p_itimers, to
synchronize with itimer_proc_continue(). Formally, accesses to this
field should be protected by the process lock anyway, and since the
array is allocated lazily this will not incur any overhead in the common
case.
Reported by: syzbot+c40aa8bf54fe333fc50b@syzkaller.appspotmail.com
Reported by: syzbot+929be2f32503bbc3844f@syzkaller.appspotmail.com
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 3138392a46)
We do this when creating md(4) devices, in kern_mdattach_locked(), but
not when resizing the provider. Apply the same policy when resizing, as
many GEOM classes do not expect to deal with providers for which
pp->mediasize % pp->sectorsize != 0.
Reported by: syzkaller
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 47619b6044)
Eliminate a flag variable and reduce indentation. No functional change
intended.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 2496d812a9)
We only drop the inp lock when binding to a specific port. So, only
acquire an extra reference when required. This simplifies error
handling a bit.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 93908fce72)
This is used for the on-board flash on the HiFive Unmatched board.
Reviewed by: #riscv, jrtc27
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31562
(cherry picked from commit 416ac155bb)
Otherwise sckmode is left uninitialised, not zero. This mode is used for
the on-board flash on the HiFive Unmatched board. Whilst here, catch
unknown modes and return an error rather than silently continuing.
Reviewed by: #riscv, jrtc27
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31562
(cherry picked from commit f5d78bea1f)
The fixes for this have now been committed so we can re-enable these.
This reverts commit d9f25575a2.
MFC after: 1 week
(cherry picked from commit 83ec48b792)
Somewhat ironically, there are strict aliasing violations in Clang,
which can result in the following assertion failure:
Assertion `*(NamedDecl **)&Data == ND && "PointerUnion mangles the NamedDecl pointer!"' failed.
Upstream's clang/CMakeLists.txt specifically (not LLVM as a whole)
passes -fno-strict-aliasing if the compiler is not Clang, and this fixes
the above issue.
This was seen when cross-building from Linux using a bootstrap
compiler, but likely also affects worlds built with a new enough
external GCC toolchain.
MFC after: 1 week
Reviewed by: dim
Differential Revision: https://reviews.freebsd.org/D31533
(cherry picked from commit c1f7d8dd23)
If MK_DEBUG_FILES=no then the Clang link rule has clang as .TARGET,
rather than clang.full, causing the implicit ${CFLAGS.${.TARGET:T}} to
be CFLAGS.clang, and thus pull in flags intended for when your compiler
is Clang, not when linking Clang itself. This doesn't matter if your
compiler is in fact Clang, but it breaks using GCC as, for example,
bsd.sys.mk adds -Qunused-arguments to CFLAGS.clang. This is seen when
trying to build a bootstrap toolchain on Linux where GCC is the system
compiler.
Thus, introduce a new internal NO_TARGET_FLAGS variable that is set by
Clang to disable the addition of these implicit flags. This is a bigger
hammer than necessary, as flags for .o files would be safe, but that is
not needed for Clang.
Note that the same problem does not arise for LDFLAGS when building LLD
with BFD, since our build produces a program called ld.lld, not plain
lld (unlike upstream, where ld.lld is a symlink to lld so they can
support multiple different flavours in one binary).
Suggested by: sjg
Fixes: 31ba4ce889 ("Allow bootstrapping llvm-tblgen on macOS and Linux")
MFC after: 1 week
Reviewed by: dim, imp, emaste
Differential Revision: https://reviews.freebsd.org/D31532
(cherry picked from commit c8edd05426)
Because MK_LLDB=no is in BSARGS, the bootstrap-tools recursive make does
not add lldb-tblgen to _clang_tblgen, causing it to not be built. This
means that the build currently always uses the host's lldb-tblgen
(which, whilst currently it appears to work, could in future break if
TableGen backends are added or altered) and, if it doesn't exist (either
because the current FreeBSD system was built with it disabled, or you're
building on macOS/Linux), fails. Linux and macOS cross-builds used to
work simply because LLDB was previously in BROKEN_OPTIONS when building
on non-FreeBSD.
Instead, move MK_LLDB=no from BSARGS to XMAKE. This ensures that the
lib/clang build in cross-tools continues to not build LLDB parts for the
bootstrap toolchain (both to save time/space on FreeBSD, and because our
vendored LLDB does not include the macOS and Linux host files so those
would fail to build).
The DIRDEPS target is updated to move MK_LLDB=no from the BSARGS block
that mirrors Makefile.inc1 to the line that disables additional
toolchain components. The DIRDEPS build likely suffers from the same
issue currently, but having never used it and not being familiar with
how it works I am leaving that as-is. If it does suffer from the same
issue it should be easily reproducible by renaming /usr/bin/lldb-tblgen
or moving it to a directory not in PATH.
Fixes: 31ba4ce889 ("Allow bootstrapping llvm-tblgen on macOS and Linux")
MFC after: 1 week
Reviewed by: dim, emaste, imp
Differential Revision: https://reviews.freebsd.org/D31531
(cherry picked from commit 1e4c802913)
Currently we override MK_CLANG_BOOTSTRAP to no so we don't build a
bootstrap compiler, but subdirectories don't see that and so the hack in
bsd.sys.mk to prefer our includes over Clang's resource dir for external
toolchains is not enabled unless you use -DWITHOUT_CLANG_BOOTSTRAP
explicitly on top of XCC (which tools/build/make.py does not do),
causing duplicate definition errors when building rtld-elf due to the
use of -ffreestanding (Clang's stdint.h will use the system one when
hosted, but its own when freestanding, and only has glibc's preprocessor
guards, not FreeBSD's).
This broke when dropping CLANG_BOOTSTRAP from BROKEN_OPTIONS.
Fixes: 31ba4ce889 ("Allow bootstrapping llvm-tblgen on macOS and Linux")
MFC after: 1 week
Reviewed by: imp, arichardson
Differential Revision: https://reviews.freebsd.org/D31529
(cherry picked from commit ab3a18095f)
There is a __used member in glibc's posix_spawn_file_actions_t in
spawn.h, so we must temporarily undefine __used when including it,
otherwise Support/Unix/Program.inc fails to build. This is based on
similar handling for __unused in other headers.
Fixes: 31ba4ce889 ("Allow bootstrapping llvm-tblgen on macOS and Linux")
MFC after: 1 week
(cherry picked from commit 8a1895a3fa)
The current code checks the RWX bits are 0 but does not check the V bit
is non-zero, meaning not-yet-allocated L1 entries that are still zero
are regarded as being allocated. This is likely due to copying the arm64
code that checks ATTR_DESC_MASK is L1_TABLE, which emcompasses both the
type and the validity in a single field, and erroneously translating
that to a check of just PTE_RWX being 0 to indicate non-leaf, forgetting
about the V bit. This then results in the following panic:
panic: Fatal page fault at 0xffffffc0005cf292: 0x00000000000050
cpuid = 1
time = 1628379581
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
kdb_backtrace() at kdb_backtrace+0x2c
vpanic() at vpanic+0x148
panic() at panic+0x2a
page_fault_handler() at page_fault_handler+0x1ba
do_trap_supervisor() at do_trap_supervisor+0x7a
cpu_exception_handler_supervisor() at
cpu_exception_handler_supervisor+0x70
--- exception 13, tval = 0x50
pmap_enter_l2() at pmap_enter_l2+0xb2
pmap_enter_object() at pmap_enter_object+0x15e
vm_map_pmap_enter() at vm_map_pmap_enter+0x228
vm_map_insert() at vm_map_insert+0x4ec
vm_map_find() at vm_map_find+0x474
vm_map_find_min() at vm_map_find_min+0x52
vm_mmap_object() at vm_mmap_object+0x1ba
vn_mmap() at vn_mmap+0xf8
kern_mmap() at kern_mmap+0x4c4
sys_mmap() at sys_mmap+0x38
do_trap_user() at do_trap_user+0x208
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 8, tval = 0x1dd
Instead, we should just check the V bit, as on amd64, and assert that
any valid L1 entries are not leaves, since an L1 leaf would render the
entire range allocated and thus we should not have attempted to map that
VA in the first place.
Reported by: David Gilbert <dgilbert@daveg.ca>
MFC after: 1 week
Reviewed by: markj, mhorne
Differential Revision: https://reviews.freebsd.org/D31460
(cherry picked from commit 98138bbde0)
USB is already in sys/conf/NOTES, but NVMe is not, nor of course are the
new SiFive device drivers.
MFC after: 1 week
(cherry picked from commit c5e5202a3d)
This has been present since the first revision of the file. The debugf
macros have always been unused so it doesn't actually do anything
useful, and besides, debugging should not be unconditionally turned on
for a production driver. Moreover, this breaks the riscv LINT kernel
build as sys/conf/NOTES includes options DEBUG, resulting in a macro
redefinition error. This does not show up in the arm64 LINT kernel build
since that has an explicit nooptions DEBUG, which is dubious and should
be revisited. Rather than copy such a hack to riscv's NOTES, fix this
specific instance of DEBUG breaking.
Fixes: 896e217a0e ("fu740_pci_dw: Add SiFive FU740 PCIe controller driver")
MFC after: 1 week
(cherry picked from commit 22997b7550)
The SiFive FU740 has both NVMe and USB so we need both to ensure we can
mount root, and HID is a dependency of USB.
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31036
(cherry picked from commit 6e162bd2f2)
This is present on both the FU540 and FU740, but only needed for the
FU740 in order to assert reset and power enable signals for its PCIe
controller.
Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31031
(cherry picked from commit b47e5c5dbe)
The FU740 also uses the same SPI controller.
Reviewed by: kp, philip
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31026
(cherry picked from commit 90a089cf2a)
This is needed for FU740 PCIe support. Whilst we don't need the FU540's
resets they are also defined for completeness.
Reviewed by: manu
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31024
(cherry picked from commit 8e7e0690ec)
This avoids noisy output from early attempts to attach before clk_fixed
has attached to the parent clocks.
Reviewed by: kp, mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31023
(cherry picked from commit dcbea9a2f4)
The FU740 has a very similar controller and will reuse most of the
driver. This also drops the dependency on the device-tree include for
the binding indices; the header doesn't namespace its contents (and nor
does the FU740 one) so using both would require seperate translation
units which would be unnecessarily complicated just to avoid defining
local copies of the small number of constants.
Whilst here, add the missing l to gemgxlclk's name and drop the prci_
prefix from tlclk's name as we don't prefix any of the others and it's
entirely unnecessary.
Reviewed by: kp, mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31021
(cherry picked from commit 12b115ec57)
This repeats amd64's cfcbf8c6fd (r180498) and i386's cf3508519c
(r202894) but for riscv; pmap_kextract must be lock-free and so it can
race with superpage promotion and demotion, thus the L2 entry must only
be loaded once to avoid using inconsistent state.
PR: 250866
Reviewed by: markj, mhorne
Tested by: David Gilbert <dgilbert@daveg.ca>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31253
(cherry picked from commit 4a23504908)
We already attempt to enable the SiFive SPI controller, but since spibus
isn't enabled it isn't actually built.
Reviewed by: kp, philip
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31027
(cherry picked from commit 8c439847f0)
Currently we use the num-viewports property to decide how many outbound
regions there are we can use, defaulting to 2. However, Linux has
stopped using that and so it no longer appears in new device trees, such
as for the SiFive FU740. Instead, it's possible to just probe the
hardware directly.
Reviewed by: mmel
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31030
(cherry picked from commit 4707bb0430)
This supersedes the old legacy mode where a viewport register was used
to mux multiple regions behind a single set of registers, and is used on
the SiFive FU740.
Reviewed by: mmel
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31029
(cherry picked from commit f240dfff22)
Currently we assume there is only one memory and one prefetch memory
window, and ignore the latter. However, the SiFive FU740 has two normal
memory windows.
As part of this, the viewports are rearranged. Previously the viewports
were memory, config then optionally I/O. Both to simplify the config
index calculation and to ensure it can always be mapped even if we have
too many memory windows for the number of viewports, config is moved to
being the first viewport.
This generalisation now also naturally supports mapping prefetch memory
windows.
Reviewed by: mmel
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31028
(cherry picked from commit f8c1701f23)
The size of the ATU MEM/IO windows is implicitly casted to uint32_t.
Because of that some window sizes were silently demoted to 0 and ignored.
Check the size if its too large, trim it to 4GB and print a warning message.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: mw
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29625
(cherry picked from commit 243000b19f)
Use viewport "2" instead of "0" and change window type from MEM to IO.
Without these changes the MEM ATU window can be overwritten with the IO one.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Marvell
Differential revision: https://reviews.freebsd.org/D29516
(cherry picked from commit 57dbb3c259)
Unlike the old fmake, running make FOO=bar when using bmake doesn't put
FOO=bar in .MAKEFLAGS at the top level, it instead just puts FOO in
.MAKEOVERRIDES and the full MAKEFLAGS will be formed for sub-makes.
Moreover, this only applies for sub-makes in rules, so this doesn't
apply to those in shell assignments. This means that the current check
does not catch make MAKEOBJDIRPREFIX=..., only those defined in config
files. Thus we must also check .MAKEOVERRIDES explicitly.
Reviewed by: sjg
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31015
(cherry picked from commit d0c737e184)
The pindex values are assigned from the L3 leaves upwards, meaning there
are NUL2E L3 tables and then NUL1E L2 tables (with a futher NUL0E L1
tables in future when we implement Sv48 support). Therefore anything
below NUL2E is an L3 table's page and anything above or equal to NUL2E
is an L2 table's page (with the threshold of NUL2E + NUL1E marking the
start of the L1 tables' pages in Sv48). Thus all the comparisons and
arithmetic operations must use NUL2E to handle the L3/L2 allocation (and
thus L2/L1 entry) transition point, not NUL1E as all but pmap_alloc_l2
were doing.
To make matters confusing, the NUL1E and NUL2E definitions in the RISC-V
pmap are based on a 4-level page hierarchy but we currently use the
3-level Sv39 format (as that's the only required one, and hardware
support for the 4-level Sv48 is not widespread). This means that, in
effect, the above bug cancels out with the bloated NULxE definitions
such that things "work" (but are still technically wrong, and thus would
break when adding Sv48 support), with one exception. pmap_enter_l2 is
currently the only function to use the correct constant, but since
_pmap_alloc_l3 uses the incorrect constant, it will do complete nonsense
when it needs to allocate a new L2 table (which is rather rare). In this
instance, _pmap_alloc_l3, whilst it would correctly determine the pindex
was for an L2 table, would only subtract NUL1E when computing l1index
and thus go way out of bounds (by 511*512*512 bytes, or 127.75 GiB) of
its own L1 table and, thanks to pmap_distribute_l1, of every other
pmap's L1 table in the whole system. This has likely never been hit as
it would presumably instantly fault and panic.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31087
(cherry picked from commit ade2ea3c45)
These use the raw console interface and poll. Unfortunately, the SiFive
UART puts the FIFO empty bit inside the FIFO data register, which means
that the act of checking whether a character is available also dequeues
any character from the FIFO, requiring the user to press each key twice.
However, since we configure the watermark to be 0 and, when the UART has
been grabbed for the console, we have interrupts off, we can abuse the
interrupt pending register to act as a substitute for the FIFO empty
bit.
This perhaps suggests that the console interface should move from having
rxready and getc to having getc_nonblock and getc (or make getc take a
bool), as all the places that call rxready do so to avoid blocking on
getc when there is no character available.
Reviewed by: kp, philip
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31025
(cherry picked from commit a1f9cdb1ab)
Note that currently Linux's device tree uses the FU540's compatible
string, as does upstream U-Boot, but the U-Boot shipped with the board
based on an older patch series has the correct FU740 name. Thankfully
they are the same, at least as far as software is concerned.
Whilst here, fix a style(9) nit.
Reviewed by: philip, kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31034
(cherry picked from commit 4c4a6884ad)
This is required for the SiFive FU740's PCIe controller. Copied from
arm64 with the only difference being changing pmap_mapdev_attr to
pmap_mapdev as riscv only has the latter.
Reviewed by: mhorne
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31032
(cherry picked from commit d9e85f2c6f)
Apparently some large-file systems out there, such as my powerpc64le
Linux box, define daddr_t as a 32-bit type, which is sad and stymies
cross-building disk images. Cast daddr_t to off_t before doing
arithmetic that overflows.
Reviewed by: arichardson, jrtc27, imp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27458
(cherry picked from commit 7ef082733b)