Use newly-create llentry_request_feedback(),
llentry_mark_used() and llentry_get_hittime() to
request datapatch usage check and fetch the results
in the same fashion both in IPv4 and IPv6.
While here, simplify llentry_provide_feedback() wrapper
by eliminating 1 condition check.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31390
With current generation clang/llvm it can pass all of our tests in
libc/ssp.
While here, remove the extra MACHINE_CPUARCH check for mips. SSP is
included in BROKEN_OPTIONS for this architecture in src.opts.mk, which
is enough to ensure normal builds won't set SSP_CFLAGS.
Reviewed by: kevans, imp, emaste
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31400
This variable is set based on the exact CPU model detected. If this
value is set too small, it could lead to a NULL-dereference from an
improperly initialized pmc_rowindex_to_classdep array.
Though it has been fixed, this was previously the case for Broadwell.
Add two asserts to catch this in DEBUG kernels, as it represents a
configuration error that may be hard to uncover otherwise.
PR: 253687
Reported by: Zhenlei Huang <zlei.huang@gmail.com>
Sponsored by: The FreeBSD Foundation
It was written for Nehalem and Westmere, with minor but incomplete
updates for Sandy Bridge in 78d763a29b. The uncore architecture
changed significantly with this generation, bringing new layouts and
locations for some MSRs.
Misprogramming these MSRs in ucp_start_pmc() may panic the system, and
this is trivially reproducible via pmcstat(8) on at least Broadwell and
Haswell. Disable the class on these CPUs until it can be updated more
completely and leave a TODO comment detailing some of the work required.
Note that the nclasses value for Broadwell was already incorrect and
doesn't need changing.
The result is that any uncore events listed by pmcstat -L will no longer
be allocatable, but this is already the case for newer generations of
Intel CPUs.
PR: 253687
Reported by: Zhenlei Huang <zlei.huang@gmail.com>
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31389
and remove repetetive code that calculates vnode locking type for write.
Reviewed by: khng, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31405
The firmware was already in the tree when I did this commit, and I
missed the message. The bug was obsolete.
This reverts commit 9e3761d126.
PR: 237466
Sponsored by: Netflix
There is no need to store a pointer to the CPU implementer and part
strings. Switch to load them directly into the sbuf used to print them
on boot.
While here print the machine ID register when we fail to determine the
implementer or part we are booting on.
Reviewed by: markj, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31346
The HWCAPS values are based on the ID registers. Move setting these
to the existing ID register parsing code.
Previously we would need to handle all possible ID field values where
a HWCAP is set, however as most ID fields follow a scheme where when
the field increments it will only add new features meaning we only
need to check if the field is greater than when the HWCAP feature
was added.
While here stop setting HWCAP value that need kernel support, but this
support is missing.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31201
After recent arm64 GENERIC config cleanup the ENETC MDIO
in NXP LS1028A SoC should support being loaded as a module.
Obtained from: Semihalf
Sponsored by: Alstom Group
Function level reset has to be done in attach in order to put the
hardware in a known state before configuring it.
The order of DRIVER_MODULEs was changed to ensure that the miibus driver
is loaded when mii_attach is called.
Obtained from: Semihalf
Sponsored by: Alstom Group
It is found on boards equipped with LS1028A SoC.
802.1q VLAN grouping is supported.
An external MDIO device is used for communicating with PHYs.
The driver is built as a module by default, it is not included
in GENERIC kernel config.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30923
Felix switch found in LS1028A supports stripping VLAN tag on
ingress, instead of egress. The striptag flag excepts the latter
behaviour.
Add a new flag to support the feature.
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30922
Peter Wemm added the first CLOCK_* symbols in 0f5ed9f420 in 1997
after obtaining them from NetBSD. In NetBSD, jtc@netbsd.org committed
them in sys/sys/time.h rev 1.19 dated 1996/11/15, along with all the
system calls associated with 1003.1b. FreeBSD's values are, however,
different than NetBSD's today. The USL/UCB lawsuit was settled in 1994,
so these couldn't have been derived from material provided to University
of California covered in that settlement. This file does not need the
settlement disclaimer.
Furthermore, I rewrote most of the code (except the symbols and their
values) when merging it from time.h and sys/time.h. Most of the creative
content of the file is new, so update copyright to reflect that.
Reviewed by: kaktus
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31369
Comments on a pending kvmclock driver suggested adding a
malloc_aligned() to complement malloc_domainset_aligned(); add it now,
and document both.
Reviewed by: imp, kib, allanjude (manpages)
Differential Revision: https://reviews.freebsd.org/D31004
The spinning start time is missing from the calculation due to a
misplaced #endif. Return the #endif where it's supposed to be.
Submitted by: Alexander Alexeev <aalexeev@isilon.com>
Reviewed by: bdrewery, mjg
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D31384
The remote peer might send a FIN in the middle of a burst of data
PDUs. In the case of T6 with data PDU completion moderation, the
driver would not have seen these PDUs since the final PDU in the burst
was never received resulting in a stale rcv_nxt when the FIN is
received.
While here, invert the logic in the condition to be more readable and
always set tp->rcv_nxt from the sequence number in the CPL. This sets
the proper value of rcv_nxt for FINs on connections with data received
but not reported via a CPL (e.g. a partial iSCSI PDU burst interrupted
by a FIN).
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D30871
Similar to what we did earlier for DIOCGETSTATESV2 we only allocate
enough memory for a handful of states and copy those out, bit by bit,
rather than allocating memory for all states in one go.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Only perform this expensive operation when the unit number is a
potential candidate (i.e. not already in use), thereby reducing device
scan time on systems with many devices, unit numbers, and drivers.
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
X-NetApp-PR: #61
Differential Revision: https://reviews.freebsd.org/D31381
I don't think it changes anything, but why not.
While there, make cpu_search_highest() use all 8 lower load bits for
noise, since it does not use cs_prefer and the code is not shared
with cpu_search_lowest() any more.
MFC after: 1 month
o Arm CoreLink TM CMN-600 Coherent Mesh Network controller,
o Arm CoreLink DMC-620 Dynamic Memory Controller.
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
In hw.vmm.create sysctl handler the maximum length of vm name is
VM_MAX_NAMELEN. However in vm_create() the maximum length allowed is
only VM_MAX_NAMELEN - 1 chars. Bump the length of the internal buffer to
allow the length of VM_MAX_NAMELEN for vm name.
MFC after: 3 days
Reviewed by: grehan
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31372
On amd64 XENHVM depends on the xentimer device for PVH early startup,
so both should be added or removed together (like the current
dependency with xenpci). Fix this by adding xentimer to NOTES and
updating the comments on the config files. Note that on i386 there's
no such dependency between xentimer and XENHVM, since there's no PVH
support.
While there also fix the MINIMAL i386 build to include the xentimer,
so it keeps the same functionality as before xentimer was split from
XENHVM.
Reported by: lwhsu
PR: 257549
Fixes: ae59812748 ('xen/timer: make xen timer optional')
On some load patterns it is possible for several CPUs to try steal
thread from the same CPU despite randomization introduced. It may
cause significant lock contention when holding one queue lock idle
thread tries to acquire another one. Use of trylock on the remote
queue allows both reduce the contention and handle lock ordering
easier. If we can't get lock inside tdq_trysteal() we just return,
allowing tdq_idled() handle it. If it happens in tdq_idled(), then
we repeat search for load skipping this CPU.
On 2-socket 80-thread Xeon system I am observing dramatic reduction
of the lock spinning time when doing random uncached 4KB reads from
12 ZVOLs, while IOPS increase from 327K to 403K.
MFC after: 1 month
When sched_highest() called for some CPU group returns nothing, idle
thread calls it for the parent CPU group. But the parent CPU group
also includes the CPU group we've just searched, and unless there is
a race going on, it is unlikely we find anything new this time.
Avoid the double search in case of parent group having only two sub-
groups (the most prominent case). Instead of escalating to the parent
group run the next search over the sibling subgroup and escalate two
levels up after if that fail too. In case of more than two siblings
the difference is less significant, while searching the parent group
can result in better decision if we find several candidate CPUs.
On 2-socket 40-core Xeon system I am measuring ~25% reduction of CPU
time spent inside cpu_search_highest() in both SMT (2x20x2) and non-
SMT (2x20) cases.
MFC after: 1 month
Current expression checks that vm_page_alloc(9) never returns a page
belonging to the preload area. This is not true if something was freed
from there, for instance a preloaded module was unloaded, or ucode update
freed.
Only check that we never allow to allocate a page belonging to the kernel
proper, check against _end.
Reported and tested by: dhw
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
We hold the SOCKBUF_LOCK so use soroverflow_locked here.
This bug may manifest as a non-killable process stuck in [*so_rcv].
Approved by: scottl
Reviewed by: Roy Marples <roy@marples.name>
Fixes: 7045b1603b
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D31374
When certain multipath route begins flapping really fast, it may
result in creating multiple identical nexthop groups. The code
responsible for unlinking unused nexthop groups had an implicit
assumption that there could be only one nexthop group for the
same combination of nexthops with weights. This assumption resulted
in always unlinking the first "identical" group, instead of the
desired one. Such action, in turn, produced a used-but-unlinked
nhg along with freed-and-linked nhg, ending up in random crashes.
Similarly, it is possible that multiple identical nexthops gets
created in the case of high route churn, resulting in the same
problem when deleting one of such nexthops.
Fix by matching the nexthop/nexhop group pointer when deleting the item.
Reported by: avg
MFC after: 1 week
This makes the da(4) driver use UMA for its CCBs by default,
like ada(4) already does. Please let me know via email
if you notice any suspicious kernel messages,
Reviewed By: imp
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31257
Fix a bug that slipped in in 90707c4e44
using the correct field in le32p_replace_bits().
MFC after: 3 days
Reviewed by: hselasky
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31352
which is the place to put MD asserts about allocated pages.
On amd64, verify that allocated page does not belong to the kernel
(text, data) or early allocated pages.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
Allow any 2M aligned contiguous location below 4G for the staging
area location. It should still be mapped by loader at KERNBASE.
The assumption kernel makes about loader->kernel handoff with regard to
the MMU programming are explicitly listed at the beginning of hammer_time(),
where kernphys is calculated. Now kernphys is the variable instead of
symbol designating the physical address.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
Linux standardized what we call CLOCK_{REALTIME,MONOTONIC}_FAST as
CLOCK_{REALTIME,MONOTONIC}_COARSE. In addition, Linux spells
CLOCK_UPTIME as CLOCK_BOOTTIME.
Add aliases to time.h and document these new aliases in
clock_gettime(2).
Reviewed by: vangyzen, kib (prior), dchagin (prior)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30988
Attempt to comply with the strict namespace pollution requirements of
_POSIX_C_SOURCE. Add guards to limit visitbility of CLOCK_ and TIMER_
defines as appropriate. Only define the CLOCK_ variables relevant to the
specific standards. Move all the sharing to sys/_clock_id.h and make
time.h and sys/time.h both include that rather than copy due to the
now large number of clocks and compat defines.
Please note: The old time.h previously used these newer dates:
CLOCK_REALTIME 199506
CLOCK_MONOTONIC 200112
CLOCK_THREAD_CPUTIME_ID 200112
CLOCK_PROCESS_CPUTIME_ID 200112
but glibc defines all of these for 199309. glibc uses this date for all
these values, however, only CLOCK_REALTIME was in IEEE 1003.1b. Add a
comment about this to document it. A large number of programs and
libraries assume that these will be defined for _POSIX_C_SOURCE =
199309.
In addition, leak CLOCK_UPTIME_FAST for the pocl package until it can be
updated to use a simple CLOCK_MONOTONIC.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31056
Recent firmware versions round down the value passed here by the MSS
and subsequently mishandle transmitted PDUs larger than the rounded
down value.
Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications
While nvlists are very useful in maximising flexibility for future
extensions their performance is simply unacceptably bad for the
getstates feature, where we can easily want to export a million states
or more.
The DIOCGETSTATESNV call has been MFCd, but has not hit a release on any
branch, so we can still remove it everywhere.
Reviewed by: mjg
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31099
Add request submission status checks before checking req->ir_compcode,
otherwise it may be zero just because of initialization.
Add checks for req->ir_compcode errors in ipmi_reset_watchdog() and
ipmi_set_watchdog(). In first case explicitly check for 0x80, which
means timer was not previously set, that I found happening after BMC
cold reset. This change makes watchdog timer to recover instead of
permanently ignoring reset errors after BMC reset or upgraded.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.