Keep the definition around since it's used by userspace.
Reviewed by: alc, imp, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35791
It's unused now. Keep the OBJ_DEFAULT identifier, but make it an alias
of OBJT_SWAP for the benefit of out-of-tree code.
Reviewed by: alc, imp, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35790
With the removal of OBJT_DEFAULT, OBJ_ANON implies OBJ_SWAP.
Note, this means that vm_object_split() is more expensive than it used
to be, as it holds busy locks until the end of the range is reached,
even if the object has no swap blocks allocated.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35789
Now that OBJT_DEFAULT objects can't be instantiated, we can simplify
checks of the form object->type == OBJT_DEFAULT || (object->flags &
OBJ_SWAP) != 0. No functional change intended.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35788
With the removal of OBJT_DEFAULT, we can simply handle this in
swap_pager_dealloc(). No functional change intended.
Suggested by: alc
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35787
With the removal of OBJT_DEFAULT, we can assume that pager operations
provide an object with OBJ_SWAP set. Also, we do not need to convert
objects from type OBJT_DEFAULT. Thus, remove checks for OBJ_SWAP and
remove code which modifies the object type. In some places, replace the
check for OBJ_SWAP with a check for whether any swap blocks are
assigned.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35786
With this change, OBJT_DEFAULT objects are no longer allocated.
Instead, anonymous objects are always of type OBJT_SWAP and always have
OBJ_SWAP set.
Modify the page fault handler to check the swap block radix tree in
places where it checked for objects of type OBJT_DEFAULT. In
particular, there's no need to invoke getpages for an OBJT_SWAP object
with no swap blocks assigned.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35785
Clang 15 warns:
sys/dev/cxgb/cxgb_sge.c:1290:21: error: variable 'txsd' set but not used [-Werror,-Wunused-but-set-variable]
struct tx_sw_desc *txsd = &txq->sdesc[txqs->pidx];
^
It appears 'txsd' is a leftover from a previous refactoring (see
3f345a5d09), but is no longer used for anything, and can be removed
without any functional change.
MFC after: 3 days
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D35833
Historically, GEOM utilities (gpart(8), gstripe(8), gmirror(8),
etc) used the gctl_error() routine to report errors. If they called
gctl_error() they would exit with EXIT_FAILURE, otherwise they would
return with EXIT_SUCCESS. If they used gctl_error() to output an
informational message, for example when run with the -v (verbose)
option, they would mistakenly exit with EXIT_FAILURE. A further
limitation of the gctl_error() function was that it could only be
called once. Messages from any additional calls to gctl_error()
would be silently discarded.
To resolve these problems a new function, gctl_msg() has been added.
It can be called multiple times to output multiple messages. It
also has an additional errno argument which should be zero if it is
an informational message or an errno value (EINVAL, EBUSY, etc) if
it is an error. When done the gctl_post_messages() function should
be called to indicate that all messages have been posted. If any
of the messages had a non-zero errno, the utility will EXIT_FAILURE.
If only informational messages (with zero errno) were posted, the
utility will EXIT_SUCCESS.
Tested by: Peter Holm
PR: 265184
MFC after: 1 week
With clang 15, the following -Werror warning is produced:
sys/dev/agp/agp.c:910:16: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
agp_find_device()
^
void
This is because agp_find_device() is declared with a (void) argument
list, and defined with an empty argument list. Make the definition match
the declaration.
MFC after: 3 days
Replace struct timeval in header with struct timespec.
To differentiate header formats, add a new KTR_VERSIONED flag
set in the header type field similar to the existing KTRDROP flag.
To make it easier to extend ktrace headers in the future,
extend the existing header with a version field (version 0 is
reserved for older records without KTR_VERSIONED) as well as
new fields holding the thread ID and CPU ID.
Reviewed by: jhb, pauamma
Differential Revision: https://reviews.freebsd.org/D35774
MFC after: 2 weeks
Eliminate the unroll_entry field from struct iommu_map_entry, shrinking
the struct by 16 bytes on 64-bit architectures.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35769
If an error occurs while processing a TCP segment with some data and the FIN
flag, the back out of the sequence number advance does not take into account the
increase by 1 due to the FIN flag.
Reviewed By: jch, gnn, #transport, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D2970
Before this patch CAM periph drivers called both disk_alloc() and
disk_create() same time on periph creation. But then prevented disks
from opening until the periph probe completion with cam_periph_hold().
As result, especially if disk misbehaves during the probe, GEOM event
thread, triggered to taste the disk, got blocked on open attempt,
potentially for a long time, unable to process other events.
This patch moves disk_create() call from periph creation to the end of
the probe. To allow disk_create() calls from non-sleepable CAM contexts
some of its duties requiring memory allocations are moved either back
to disk_alloc() or forward to g_disk_create(), so now disk_alloc() and
disk_add_alias() are the only disk methods that require sleeping. If
disk fails during the probe disk_create() may just be skipped, going
directly to disk_destroy(). Other method calls during that time are
just ignored. Since GEOM may now see the disks after CAM bus scan is
already completed, introduce per-periph boot hold functions. Enclosure
driver already had such mechanism, so just generalize it.
Reviewed by: imp
MFC after: 1 month
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D35784
Leave one band of ithread priorities available below PI_SOFT for
demoted ithreads but reclaim additional ithread priorities for use by
user time-sharing threads. This is an ABI change in that PZERO moves
up so old ps and top binaries will not format priorities correctly on
newer kernels, but that is a cosmetic rather than functional change.
Reviewed by: kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35647
Allow high priority hardware interrupts to run at PI_REALTIME via
INTR_TYPE_CLK, but collapse all other hardware interrupt threads to
the next priority level (PI_INTR). Collapse all SWI priorities to
the same priority level (PI_SOFT) just below PI_INTR.
Reviewed by: kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35646
If an interrupt thread runs for a full quantum without yielding the
CPU, demote its priority and schedule a preemption to give other
ithreads a turn.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35645
If an interrupt thread runs for a full quantum without yielding the
CPU, demote its priority and schedule a preemption to give other
ithreads a turn.
Reviewed by: kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35644
Use sched_wakeup instead of sched_add when marking an ithread
runnable. This allows schedulers to reset their internal time slice
tracking state and restore the base ithread priority when an ithread
resumes from idle.
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35643
Use it instead of sched_prio when setting the priority of an interrupt
thread.
Reviewed by: kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35642
Different fields in the tdq have different synchronization protocols.
Some are constant, some are accessed only while holding the tdq lock,
some are modified with the lock held but accessed without the lock, some
are accessed only on the tdq's CPU, and some are not synchronized by the
lock at all.
Convert ULE to stop using volatile and instead use atomic_load_* and
atomic_store_* to provide the desired semantics for lockless accesses.
This makes the intent of the code more explicit, gives more freedom to
the compiler when accesses do not need to be qualified, and lets KCSAN
intercept unlocked accesses.
Thus:
- Introduce macros to provide unlocked accessors for certain fields.
- Use atomic_load/store for all accesses of tdq_cpu_idle, which is not
synchronized by the mutex.
- Use atomic_load/store for accesses of the switch count, which is
updated by sched_clock().
- Add some comments to fields of struct tdq describing how accesses are
synchronized.
No functional change intended.
Reviewed by: mav, kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35737
ULE's tdq_notify() tries to avoid delivering IPIs to the idle thread.
In particular, it tries to detect whether the idle thread is running.
There are two mechanisms for this:
- tdq_cpu_idle, an MI flag which is set prior to calling cpu_idle(). If
tdq_cpu_idle == 0, then no IPI is needed;
- idle_state, an x86-specific state flag which is updated after
cpu_idleclock() is called.
The implementation of the second mechanism is racy; the race can cause a
CPU to go to sleep with pending work. Specifically, cpu_idle_*() set
idle_state = STATE_SLEEPING, then check for pending work by loading the
tdq_load field of the CPU's runqueue. These operations can be reordered
so that the idle thread observes tdq_load == 0, and tdq_notify()
observes idle_state == STATE_RUNNING.
Some counters indicate that the idle_state check in tdq_notify()
frequently elides an IPI. So, fix the problem by inserting a fence
after the store to idle_state, immediately before idling the CPU.
PR: 264867
Reviewed by: mav, kib, jhb
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35777
The load balancer executes from statclock and periodically tries to move
threads among CPUs in order to balance load. It may move a thread to
the current CPU (the loader balancer always runs on CPU 0). When it
does so, it may need to schedule preemption of the interrupted thread.
Use sched_setpreempt() to do so, same as sched_add().
PR: 264867
Reviewed by: mav, kib, jhb
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35744
Thread switching used to be atomic with respect to the current CPU's
tdq lock. Since commit 686bcb5c14 that is no longer the case. Now
sched_switch() does this:
1. lock tdq (might already be locked)
2. maybe put the current thread in the tdq, choose a new thread to run
2a. update tdq_lowpri
3. unlock tdq
4. switch CPU context, update curthread
Some code paths in ULE will load pc_curthread from a remote CPU with
that CPU's tdq lock held, usually to inspect its priority. But, as of
the aforementioned commit this is racy.
The problem I noticed is in tdq_notify(), which optionally sends an IPI
to a remote CPU when a new thread is added to its runqueue. If the new
thread's priority is higher (lower) than the currently running thread's
priority, then we deliver an IPI. But inspecting
pc_curthread->td_priority doesn't work, since pc_curthread might be
between steps 3 and 4 above. If pc_curthread's priority is higher than
that of the newly added thread, but pc_curthread is switching to a
lower-priority thread, then tdq_notify() might fail to deliever an IPI,
leaving a high priority thread stuck on the runqueue for longer than it
should. This can cause multi-millisecond stalls in
interactive/ithread/realtime threads.
Fix this problem by modifying tdq_add() and tdq_move() to return the
value of tdq_lowpri before the addition of the new thread. This ensures
that tdq_notify() has the correct priority value to compare against.
The other two uses of pc_curthread are susceptible to the same race. To
fix the one in sched_rem()->tdq_setlowpri() we need to have an exact
value for curthread. Thus, introduce a new tdq_curthread field to the
tdq which gets updated any time a new thread is selected to run on the
CPU. Because this field is synchronized by the thread lock, its
priority reflects the correct lowpri value for the tdq.
PR: 264867
Fixes: 686bcb5c14 ("schedlock 4/4")
Reviewed by: mav, kib, jhb
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35736
On x86 systems, the debug.late_console tunable makes it possible to set
up the console before we call pmap_bootstrap. (The tunable is turned
on by default; setting late_console=0 results in consoles being probed
early.)
Unfortunately this is not compatible with using the ACPI SPCR table to
find the console, since consulting ACPI tables requires mapping memory
addresses. As such, we skip the call to uart_cpu_acpi_spcr from
uart_cpu_x86 in the !late_console case.
Reviewed by: imp
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D35794
I mis-read the RFC w.r.t. handling of the sequenceid
when a CreateSession is done after the initial one
that confirms the ClientID. Fortunately this does
not affect most extant NFSv4.1/4.2 clients, since
they only acquire a single session for TCP for a
ClientID (Solaris might be an exception?).
This patch fixes the server to handle this case,
where the RFC requires the sequenceid be incremented
for each CreateSession and is required to reply to
a retried CreateSession with a cached reply.
It adds a field to nfsclient called lc_prevsess,
which caches the sessionid, which is the only field
in a CreateSession reply that will change for a
retry, to implement this reply cache.
The recent commits up to d4a11b3e3b that mark
session slots bad when "intr" and/or "soft" mounts
are used by the client needs this server patch.
Without this patch, the client will do a full
recovery, including a new ClientID, losing all
byte range locks. However, prior to the recent
client commits, the client would hang when all
session slots were bad, so even without this
patch it is not a regression.
PR: 260011
MFC after: 2 weeks
The K&R style in UFS and other places in the tree's days are numbered
as this syntax is removed in C2x proposal N2432:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2432.pdf
Though running to nearly 6000 lines of diffs this update should cause
no functional change to the code.
Requested by: Warner Losh
MFC after: 2 weeks
Combined changes to allow experimentation with net 0/8 (network 0),
240/4 (Experimental/"Class E"), and part of the loopback net 127/8
(all but 127.0/16). All changes are disabled by default, and can be
enabled by the following sysctls:
net.inet.ip.allow_net0=1
net.inet.ip.allow_net240=1
net.inet.ip.loopback_prefixlen=16
When enabled, the corresponding addresses can be used as normal
unicast IP addresses, both as endpoints and when forwarding.
Add descriptions of the new sysctls to inet.4.
Add <machine/param.h> to vnet.h, as CACHE_LINE_SIZE is undefined in
various C files when in.h includes vnet.h.
The proposals motivating this experimentation can be found in
https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127
Reviewed by: rgrimes, pauamma_gundo.com; previous versions melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D35741
- Remove a flag variable.
- Convert a runtime check of the passed cpuid to a KASSERT.
- Remove the cc_inited flag. An attempt to schedule a callout before
SI_SUB_CPU will crash anyway since the per-CPU mutexes won't have been
initialized, and that flag was only checked in the case where a cpuid
was explicitly specified by the caller.
No functional change intended.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
In preparation for removing OBJT_DEFAULT, eliminate some stale/unhelpful
comments from vm_mmap(), and remove an unused case. In particular, the
remaining callers of vm_mmap() in the tree do not specify OBJT_DEFAULT.
It's much more common to use vm_map_find() to map an object into user
memory, so rather than adjusting vm_mmap() to handle OBJT_SWAP objects,
let's further discourage its use and simply remove OBJT_DEFAULT
handling.
Reviewed by: dougm, alc, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35778
When ice is built as a module, it can't be loaded due to unresolved
symbol. Ensuring in Makefile that the ice_rdma.c is built fixes the
problem.
Signed-off-by: Bartosz Sobczak <bartosz.sobczak@intel.com>
Signed-off-by: Eric Joyner <erj@freebsd.org>
Reviewed by: erj@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D35687
The hw and ifp of a vsi will be NULL if such ixl_vsi is allocated
for a VF. When ixl_reconfigure_filters called, it is trying to
access vsi->ifp and hw->mac.addr and therefore is casuing panic.
This commit add checks to determine if vsi is a VF by checking
if vsi->hw is NULL, before adding IXL_VLAN_ANY filter (which
is already in a VF vsi's filter list) and VLAN filters.
(erj's Note: The SR-IOV flows need revisiting; this will help
prevent panics for now)
Reviewed by: krzysztof.galazka@intel.com, erj@
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35649
vm_object_allocate_anon() automatically sets "charge" to 0 if no cred
reference is provided, so the caller doesn't need any conditional logic.
No functional change intended.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35781
This is in preparation for removal of OBJT_DEFAULT. In particular, it
is now cheap to check whether an OBJT_SWAP object has any swap blocks
allocated, so the benefit of having a separate OBJT_DEFAULT type is
quite marginal, and the OBJT_DEFAULT->SWAP transition is a source of
bugs.
Reviewed by: alc, hselasky, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35779