Commit graph

1083 commits

Author SHA1 Message Date
Konstantin Belousov
ab23c2784b Improve TSC calibration logic.
Stop attempting to use FADT legacy hardware flag, it is almost always
a lie.

Instead, unless user explicitly disabled the calibration, calibrate
against 8254 ISA clock.  Then, obtain the rough value of the expected
TSC frequency from CPUID leafs 0x15/0x16 or even from the CPU
marketing name string.  If calibration results look unbelievably bogus
comparing with CPUID leafs report, use CPUID one.

Intel does not recommend to use CPUID leaf 0x16 for the value of the
system time frequency, indeed the error there might be up to 1% which
e.g. makes ntpd give up.  If ISA clock is present, we win, if not, we
get some frequency that allows the machine to boot without enormous
delay.

Next improvement would be to use HPET for re-calibration if we decided
that ISA clock gives bogus results, after HPETs are enumerated. This
is a much bigger change since we probably would need to re-evaluate
some constants depending on TSC frequency.

Reviewed by:	emaste, jhb, scottl
Tested by:	scottl
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D24426
2020-04-15 22:28:51 +00:00
Konstantin Belousov
bd8a359f81 x86 tsc: fall back to CPUID if calibration results looks unbelievable.
Teested and reviewed by:	scottl
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D24247
2020-04-01 16:21:11 +00:00
Konstantin Belousov
46ed9da5fc Do not spuriously re-enable disabled io_apic pin on EOI for some configurations.
If EOI suppression is supported but reported ioapic version is so old
that it does not has EOI register (weird virtualization setup), fix
Intel trick of eoi-ing by flipping pin type (edge/level) to account
for the disabled pin.

Reported by:	Juniper
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D23965
2020-03-18 21:34:52 +00:00
Konstantin Belousov
1314c4920f Stop (trying to) renumber io apics.
It does not serve any purpose now, the io apic id is not seen by
software, and some Intel documents claim that the register is
implemented for FUD reasons.  More, renumbering seems to not work on
new Intel machines which actually have mismatched MADT and hw IDs.

On older machines where separate APIC bus existed, unique numbering of
all APICs was required for bus arbitration to work, but it is no
longer true (that machines were SMP from pre-Pentium IV era).

When matching PCIe IOAPIC device against MADT-enumerated IOAPICs,
compare io_apic_id from BAR against io_apic_id read from the
MADT-pointed register page.

Reviewed by:	jhb
Tested by:	flo (previous version), pho
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D23965
2020-03-18 21:31:35 +00:00
Konstantin Belousov
c105f5c1fa Widen the stored io_apic_id to 8 bits.
It seems that the newer Intel chipset did that, and Linux reads 8
bits. The only detail is that all seen datasheets, even under NDA,
claim that io apic id is 4 bits.

Submitted by:	jeff
Reviewed by:	jhb
Tested by:	flo, pho
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D23965
2020-03-18 21:24:34 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Konstantin Belousov
a324b7f71d Fix IBRS for machines with IBRS_ALL capability.
When turning IBRS mitigation using sysctl, as opposed to loader tunable,
send IPI to tweak MSR on all cores.  Right now code only performed MSR write
onr the CPU where sysctl was run.

Properly report hw.ibrs_active for IBRS_ALL.  Split hw_ibrs_ibpb_active out
from ibrs_active, to keep the current semantic of guiding kernel entry and
exit handlers.

Reported and tested by:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2020-02-25 17:26:10 +00:00
Konstantin Belousov
43cd55db57 x86/identcpu.c whitespace cleanup.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-02-21 16:55:28 +00:00
Konstantin Belousov
3f490f6e57 print_svm_info: decode a CPUID 0x8000000a.edx bit 20.
This is SVM features word, the bit is defined in "PPR for AMD Family
17h Model 31h B0", document 55803 Rev 0.54.

N.B. GuesSpecCtl (no 't') is the spelling from the document.

Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	3 days
2020-02-21 16:41:17 +00:00
Mateusz Guzik
082a6b2a92 Make atomic_load_ptr type-aware
Returned value has type based on the argument, meaning consumers no longer
have to cast in the commmon case.

This commit keeps the kernel compilable without patching the rest.
2020-02-14 23:15:41 +00:00
Doug Moore
5a9447a324 In dmar_gas_lowermatch, skip searching a subtree if all its addresses are greater than lowaddr.
In dmar_gas_uppermatch, skip searching a subtree if all its gaps-between-alloctions are too small.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23391
2020-02-01 21:47:34 +00:00
Conrad Meyer
e656fa70dc hwpstate_intel(4): Save admin-set EPP/EPB and restore after suspend 2020-02-01 20:12:02 +00:00
Conrad Meyer
c93eb470b4 hwpstate_intel(4): Print failure message only on failure
X-MFC-With: r357379
2020-02-01 20:11:25 +00:00
Conrad Meyer
cd4e43b27d hwpstate_intel(4): Detect and support PKG variant
If package-level control is present, we default to using it.  Per-core
software control may be enabled by setting the machdep.hwpstate_pkg_ctrl
tunable to "0" in loader.conf(5).
2020-02-01 19:50:10 +00:00
Conrad Meyer
556a1a0bc6 hwpstate_intel(4): Add fallback EPP using PERF_BIAS MSR
Per Intel SDM (Vol 3b Part 2), if HWP indicates EPP (energy-performance
preference) is not supported, the hardware instead uses the ENERGY_PERF_BIAS
MSR.  In the epp sysctl handler, fall back to that MSR if HWP does not
support EPP and CPUID indicates the ENERGY_PERF_BIAS MSR is supported.
2020-02-01 19:49:13 +00:00
Conrad Meyer
5e3574c8cd x86: Add/amend some power-management comments/macros
No functional change.
2020-02-01 19:46:02 +00:00
Conrad Meyer
b80d476c3c hwpstate_intel(4): Error check epp sysctl & bail if HW does not support feature 2020-02-01 19:45:27 +00:00
Conrad Meyer
f591c3c847 intel_hwpstate(4): Use identcpu-cached cpuid 6 leaf
No functional change.
2020-02-01 17:54:46 +00:00
Conrad Meyer
351896d372 intel_hwpstate(4): Don't leak bound thread in error conditions
I don't know why a Skylake CPU with the HWP feature bit present would trap
on MSR reads of the HWP registers, but if this occurs, do not leave the
attach thread bound.  This could conceivably cause reported hangs, although
I have no evidence that this is the cause.

Reported by:	ae@, Andreas Nilsson <andrnils AT gmail.com>
X-MFC-With:	r357002
2020-02-01 17:30:45 +00:00
Conrad Meyer
43524989c5 hwpstate(4): Ignore CurPstateLimit by default
Add a sysctl knob to allow users to re-enable it, and document the knob and
default in cpufreq.4.  (While here, add a few unrelated updates to
cpufreq.4.)

It seems that the register value in some hardware simply reflects the
configured P-state.  This results in an inadvertent and unintended outcome
where the P-state can only walk down, and then the driver becomes "stuck" in
the slowest possible P-state.

The Linux driver never consults this register, so that's some evidence that
ignoring the contents are relatively harmless.

PR:		234733
Reported by:	sigsys AT gmail.com, Erich Dollanksy <freebsd.ed.lists AT
		sumeritec.com>
2020-01-31 17:40:41 +00:00
Mark Johnston
1c29da0279 Reimplement stack capture of running threads on i386 and amd64.
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.

Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.

Simplify the KPIs.  Remove stack_save_td_running() and add a return
value to stack_save_td().  On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP.  If the
target thread is running in user mode, stack_save_td() returns EBUSY.

Reviewed by:	kib
Reported by:	mjg, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23355
2020-01-31 15:43:33 +00:00
Conrad Meyer
07a65f9d38 hwpstate_intel(4): Silence/fix Coverity reports
These were all introduced in the initial import of hwpstate_intel(4).

Reported by:	Coverity
CIDs:		1413161, 1413164, 1413165, 1413167
X-MFC-With:	r357002
2020-01-29 03:15:34 +00:00
Conrad Meyer
d9591f0c2a x86: identcpu: Decode new Intel Structured Extended feature bits 2020-01-28 01:37:20 +00:00
Conrad Meyer
4799e1997a x86: identcpu: Decode new Zen2 AMD Feature2 bit 2020-01-28 01:36:45 +00:00
Doug Moore
f886c4ba71 Correct the use of RB_AUGMENT in the RB_TREE macros so that is invoked
at the root of every subtree that changes in an insert or delete, and
only once, and ordered from the bottom of the tree to the top.  For
intel_gas.c, the only user of RB_AUGMENT I can find, change the
augmenting routine so that it does not climb from entry to tree root
on every call, and remove a 'tree correcting' function that can be
supplanted by proper tree augmentation.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23189
2020-01-27 15:09:13 +00:00
Conrad Meyer
9ea85092d9 hwpstate(4): Log a debug line when throttled
If we're going to throttle user requested P-states, we should at least produce
a debug log line indicating the surprising behavior.

PR:		inspired by 234733
2020-01-27 06:04:32 +00:00
Conrad Meyer
08a220dd79 cpufreq(4): Fix missing MODULE_DEPEND on hwpstate_intel
DRIVER_MODULE does not actually define a MODULE_VERSION, which is required
to satisfy a MODULE_DEPENDency.  Declare one explicitly in
hwpstate_intel(4).

Reported by:	flo
X-MFC-With:	r357002
2020-01-23 23:52:57 +00:00
Konstantin Belousov
b94c55a9cb Fix r356919.
Instead of waiting for pc_curthread which is overwritten by
init_secondary_tail(), wait for non-NULL pc_curpcb, to be set by the
first context switch.
Assert that pc_curpcb is not set too early.

Reported and tested by:	rlibby
Reviewed by:	markj, rlibby
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23330
2020-01-23 17:08:33 +00:00
Cy Schubert
ca9fb12a0b Fix 32-bit build post r357002. 2020-01-23 03:38:41 +00:00
Conrad Meyer
4577cf3744 cpufreq(4): Add support for Intel Speed Shift
Intel Speed Shift is Intel's technology to control frequency in hardware,
with hints from software.

Let's get a working version of this in the tree and we can refine it from
here.

Submitted by:	bwidawsk, scottph
Reviewed by:	bcr (manpages), myself
Discussed with:	jhb, kib (earlier versions)
With feedback from:	Greg V, gallatin, freebsdnewbie AT freenet.de
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D18028
2020-01-22 23:28:42 +00:00
Konstantin Belousov
2ee49fac82 Add support for Hygon Dhyana Family 18h processor.
As a new x86 CPU vendor, Chengdu Haiguang IC Design Co., Ltd (Hygon)
is a joint venture between AMD and Haiguang Information Technology Co.,
Ltd., aims at providing x86 processors for China server market.

The first generation Hygon processor(Dhyana) shares most architecture
with AMD's family 17h, but with different CPU vendor ID("HygonGenuine")
and PCI vendor ID(0x1d94) and family series number 18h(Hygon negotiated
with AMD to confirm that only Hygon use family 18h).

To enable Hygon Dhyana support in FreeBSD, add new definitions
HYGON_VENDOR_ID("HygonGenuine") and X86_VENDOR_HYGON(0x1d94) to identify
Hygon Dhyana CPU.

Initialize the CPU features(topology, local APIC ext, MSI, TSC, hwpstate,
MCA, DEBUG_CTL, etc) for amd64 and i386 mode by sharing the code path of
AMD family 17h.

The changes have been applied on FreeBSD 13.0-CURRENT and tested
successfully on Hygon Dhyana processor.

References:
[1] Linux kernel patches for Hygon Dhyana, merged in 4.20:

https://git.kernel.org/tip/c9661c1e80b609cd038db7c908e061f0535804ef

[2] MSR and CPUID definition:

https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf

Submitted by:	Pu Wen <puwen@hygon.cn>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23163
2020-01-21 13:22:35 +00:00
Konstantin Belousov
65e5f2cdd4 x86: Wait for curthread to be set up as an indicator that the boot stack
is no longer used.

pc_curthread is set by cpu_switch after it stopped using the old
thread (or boot) stack.  This makes the smp_after_idle_runnable()
function not dependent on the internals of the scheduler operations.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23276
2020-01-20 17:23:03 +00:00
Mateusz Guzik
44c78346f6 x86: fix assertion in ipi_send_cpu to range check the passed id
Prior to the change for sufficiently bad id (and in particular NOCPU which is -1)
it would access memory outside of the cpu_apic_ids array.
2020-01-19 21:35:51 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Scott Long
757d4fbaa7 Introduce the concept of busdma tag templates. A template can be allocated
off the stack, initialized to default values, and then filled in with
driver-specific values, all without having to worry about the numerous
other fields in the tag. The resulting template is then passed into
busdma and the normal opaque tag object created.  See the man page for
details on how to initialize a template.

Templates do not support tag filters.  Filters have been broken for many
years, and only existed for an ancient make/model of hardware that had a
quirky DMA engine.  Instead of breaking the ABI/API and changing the
arugment signature of bus_dma_tag_create() to remove the filter arguments,
templates allow us to ignore them, and also significantly reduce the
complexity of creating and managing tags.

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D22906
2019-12-24 14:48:46 +00:00
Ryan Libby
9825eadf2c bitset: rename confusing macro NAND to ANDNOT
s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too.  The actual
implementation is "and not" (or "but not"), i.e. A but not B.
Fortunately this does appear to be what all existing callers want.

Don't supply a NAND (not (A and B)) operation at this time.

Discussed with:	jeff
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22791
2019-12-13 09:32:16 +00:00
Scott Long
0d42317659 Fix the TAA state machine to do the right thing when the TAA
migitation is available in microcode and the operator has set
the sysctl to automatic mode.

Reported by:	Coverity
CID: 1408334

MFC after:	3 days
Sponsored by:	Intel
2019-12-10 18:57:39 +00:00
Konstantin Belousov
ff326a1879 x86: Restore the critical section around whole ipi_bitmap_handler() if
hardclock IPI is delivered.

In the current code after r355311, critical section is taken only
around hardclockintr() call, and sched_preempt() is called after the
section is exited. If we reschedule after exit, as we typically would
due to conditions that caused IPI, in ULE the runq tdq_ipipending is
not cleared, which blocks generation of further preempt IPIs.

Since all relatively modern (10 years) hardware has per-cpu event
timers, restoring the critical section conditionally does not affect
it.

Reported and tested by: cy
Diagnosed and reviewed by: jeff (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D22716
2019-12-07 00:28:08 +00:00
Scott Long
961aacb107 Move the mds, irbs, and ssb mitigation knobs into machdep.mitigations.
They're in both the old and new places in HEAD for the moment for
discussion and transition.  The old locations will be garbage collected
in 4 weeks.  MFCs to 12 an 11 will keep the old and new for transition
purposes.

Reviewed by:	kib
MFC after:	4 weeks
Sponsored by:	Intel
Differential Revision:	https://reviews.freebsd.org/D22590
2019-12-06 02:43:05 +00:00
Conrad Meyer
ee02bd9c9c x86: Add missed break to TAA status sysctl
Just a typo that Coverity identified.

Coverity also identified an unused store in the same functional area (x86 TAA
stuff), but this commit does not address that issue (CID 1408334).

Reported by:	Coverity
CID:		1408328, 1408332
2019-12-04 02:42:22 +00:00
Jeff Roberson
0f9e06e18b Fix a few places that free a page from an object without busy held. This is
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D22611
2019-12-02 22:42:05 +00:00
Jeff Roberson
fb6a57ef89 Don't run sched_preempt() inside of an extra critical section. This disables
the sched_preempt() switch optimization and causes the sched lock to be dropped
and immediately reacquired.

Reviewed by:	jhb, kib, mav, markj (with changes)
Differential Revision:	https://reviews.freebsd.org/D22623
2019-12-02 22:34:19 +00:00
Konstantin Belousov
5c3771d272 bus_dma_dmar_load_ident(9): load identity mapping into the map.
Requested, reviewed and tested by:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22559
2019-11-27 19:57:17 +00:00
Scott Long
184b15ff07 Clean up and clarify meta commentary on TAA. Add a state to denote
that TSX doesn't exist on the CPU.

MFC after:	3 days
Sponsored by:	Intel
2019-11-27 19:12:32 +00:00
Konstantin Belousov
4f4f3c8fdc Limit bus_dma_dmar_set_buswide() definition to kernel only.
The header is abused for inclusion into userspace, and on stable
branches neither device_t nor bool types are not defined when used
from userspace.

Sponsored by:	The FreeBSD Foundation
X-MFC after:	now
2019-11-25 14:16:41 +00:00
Andrew Turner
849aef496d Port the NetBSD KCSAN runtime to FreeBSD.
Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in
the FreeBSD kernel. It is a useful tool for finding data races between
threads executing on different CPUs.

This can be enabled by enabling KCSAN in the kernel config, or by using the
GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later
needs a compiler change to allow -fsanitize=thread that KCSAN uses.

Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D22315
2019-11-21 11:22:08 +00:00
Konstantin Belousov
685666aaf7 bus_dma_dmar_set_buswide(9): KPI to indicate that the whole dmar
context should share page tables.

Practically it means that dma requests from any device on the bus are
translated according to the entries loaded for the bus:0:0 device.
KPI requires that the slot and function of the device be 0:0, and that
no tags for other devices on the bus were used.

The intended use are NTBs which pass TLPs from the downstream to the
host with slot:func of the downstream originator.

Reviewed and tested by:	mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22434
2019-11-18 20:56:59 +00:00
Konstantin Belousov
fa83f68917 Add x86 msr tweak KPI.
Use the KPI to tweak MSRs in mitigation code.

Reviewed by:	markj, scottl
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22431
2019-11-18 20:53:57 +00:00
Scott Long
e372160177 TSX Asynchronous Abort mitigation for Intel CVE-2019-11135.
This CVE has already been announced in FreeBSD SA-19:26.mcu.

Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.

Control knobs are:
machdep.mitigations.taa.enable:
        0 - no software mitigation is enabled
        1 - attempt to disable TSX
        2 - use the VERW mitigation
        3 - automatically select the mitigation based on processor
	    features.

machdep.mitigations.taa.state:
        inactive        - no mitigation is active/enabled
        TSX disable     - TSX is disabled in the bare metal CPU as well as
                        - any virtualized CPUs
        VERW            - VERW instruction clears CPU buffers
	not vulnerable	- The CPU has identified itself as not being
			  vulnerable

Nothing in the base FreeBSD system uses TSX.  However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.

Reviewed by:	emaste, imp, kib, scottph
Sponsored by:	Intel
Differential Revision:	22374
2019-11-16 00:26:42 +00:00
Scott Long
22d13bfd34 Revert a patch that accidentally was committed with r354729 2019-11-15 11:54:51 +00:00
Scott Long
99a6085fde Fix a typo in how the AVX512DQ feature bit is checked.
Reviewed by:	kib
Sponsored by:	Intel
2019-11-15 11:53:06 +00:00
Scott Long
837d733265 Add new bit definitions for TSX, related to the TAA issue. The actual
mitigation will follow in a future commit.

Sponsored by:	Intel
2019-11-12 19:15:16 +00:00
Konstantin Belousov
c08973d09c Workaround for Intel SKL002/SKL012S errata.
Disable the use of executable 2M page mappings in EPT-format page
tables on affected CPUs.  For bhyve virtual machines, this effectively
disables all use of superpage mappings on affected CPUs.  The
vm.pmap.allow_2m_x_ept sysctl can be set to override the default and
enable mappings on affected CPUs.

Alternate approaches have been suggested, but at present we do not
believe the complexity is warranted for typical bhyve's use cases.

Reviewed by:	alc, emaste, markj, scottl
Security:	CVE-2018-12207
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D21884
2019-11-12 18:01:33 +00:00
Roger Pau Monné
b2802351c1 xen: fix dispatching of NMIs
Currently NMIs are sent over event channels, but that defeats the
purpose of NMIs since event channels can be masked. Fix this by
issuing NMIs using a hypercall, which injects a NMI (vector #2) to the
desired vCPU.

Note that NMIs could also be triggered using the emulated local APIC,
but using a hypercall is better from a performance point of view
since it doesn't involve instruction decoding when not using x2APIC
mode.

Reported and Tested by:	avg
Sponsored by:		Citrix Systems R&D
2019-11-12 10:31:28 +00:00
Scott Long
c47c10a1f3 Add the text attribute for MDS_NO in the IA32_ARCH_CAP MSR. 2019-11-11 22:18:05 +00:00
Andriy Gapon
e688e78187 revert r354482, checking for XENHVM was a wrong way of checking for Xen 2019-11-07 21:43:31 +00:00
Andriy Gapon
bff7f83d39 IPI_TRACE is not really supported on xen
x86 stack_save_td_running() can work safely only if IPI_TRACE is a
non-maskable interrupt.  But at the moment FreeBSD/Xen does not provide
support for the NMI delivery mode.  So, mark the functionality as
unsupported similarly to other platforms without NMI.
Maybe there is a way to provide a Xen-specific working
stack_save_td_running(), but I couldn't figure it out.

MFC after:	3 weeks
Sponsored by:	Panzura
2019-11-07 21:14:59 +00:00
Andrew Gallatin
bb7aaac379 Add tunable to allow interrupts on hyperthreaded cores
Enabling interrupts on htt cores has benefits to workloads which are primarily
interrupt driven by increasing the logical cores available for interrupt handling.
The tunable is named machdep.hyperthreading_intr_allowed

Reviewed by:	kib, jhb
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22233
2019-11-04 19:30:19 +00:00
Conrad Meyer
ebcfcba8f8 amd64: Fix typo: RDPRU bit is 0x10, not 0x04
Bit 4 != 4, of course.

X-MFC-With:	r354162
2019-10-30 04:00:44 +00:00
Conrad Meyer
706bc29b7b amd64: Define and decode new AMD64 feature bits
These are documented in revisions 3.32 of the public AMD64 Vol. 2 and
revision 3.28 of Vol. 3, published October and September 2019, respectively.
2019-10-30 01:41:14 +00:00
Conrad Meyer
1a352c3c1b hw.intrbalance: Make sysctl tunable
This allows specifying a boot-time preference in loader.conf.
2019-10-19 16:37:49 +00:00
Andriy Gapon
3aa8a8ed2d remove wmb() call from x86 cpu_reset()
The rationale is pretty much the same as in r353747.
There is no subsequent dependent store.
The store is to the regular (TSO) memory anyway.

MFC after:	23 days
2019-10-19 07:13:15 +00:00
Conrad Meyer
b1f22a0083 x86: Remove unused variable from r353712
It was in my git tree (uncommitted) and didn't get carried over to SVN in
r353712.

X-MFC-With:	r353712
2019-10-18 02:25:30 +00:00
Conrad Meyer
bb044eaf54 x86: Fetch and save standard CPUID leaf 6 in identcpu
Rather than a few scattered places in the tree.  Organize flag names in a
contiguous region of specialreg.h.

While here, delete deprecated PCOMMIT from leaf 7.

No functional change.
2019-10-18 02:18:17 +00:00
Conrad Meyer
d23e252dfa x86: Use canonical spelling of MOVDIR64B feature/instruction
The former spelling probably confused MOVDIR64B with MOVDIRI64.

MOVDIR_64B is the 64-*byte* direct store instruction; MOVDIR_I64 is the
64-*bit* direct store instruction (underscores added here for clarity; they are
not part of the canonical instruction name).

No functional change.

Sponsored by:	Dell EMC Isilon
2019-10-14 20:55:01 +00:00
Mateusz Guzik
fa43c5d49e amd64: plug spurious cld instructions
ABI already guarantees the direction is forward. Note this does not take care
of i386-specific cld's.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21906
2019-10-08 21:14:11 +00:00
Eric van Gyzen
a912616493 Make the hw.intrs sysctl OID read-only
The handler ignores the new value, so make the OID read-only.

I found this while working on r353111.

MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2019-10-04 21:46:11 +00:00
Mark Johnston
b119329d81 Complete the removal of the "wire_count" field from struct vm_page.
Convert all remaining references to that field to "ref_count" and update
comments accordingly.  No functional change intended.

Reviewed by:	alc, kib
Sponsored by:	Intel, Netflix
Differential Revision:	https://reviews.freebsd.org/D21768
2019-09-25 16:11:35 +00:00
Konstantin Belousov
a9d0e0071c x86: Fall back to leaf 0x16 if TSC frequency is obtained by CPUID and
leaf 0x15 is not functional.

This should improve automatic TSC frequency determination on
Skylake/Kabylake/... families, where 0x15 exists but does not provide
all necessary information.  SDM contains relatively strong wording
against such uses of 0x16, but Intel does not give us any other way to
obtain the frequency. Linux did the same in the commit
604dc9170f2435d27da5039a3efd757dceadc684.

Based on submission by:	Neel Chauhan <neel@neelc.org>
PR:	240475
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D21777
2019-09-25 13:36:56 +00:00
Jeff Roberson
2194393787 Move phys_avail definition into MI code. It is consumed in the MI layer and
doing so adds more flexibility with less redundant code.

Reviewed by:	jhb, markj, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21250
2019-08-16 00:45:14 +00:00
John Baldwin
ea32110781 Stop listing "on motherboard" as the parent of nexus devices on x86.
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D21256
2019-08-14 22:13:11 +00:00
Ed Maste
ba084c18de sys/{x86,amd64}: remove one of doubled ;s
MFC after:	1 week
2019-08-13 19:39:36 +00:00
Warner Losh
c1ab04fce5 Floppy driver really only works on x86
Move the floppy driver to the x86 specific notes file.

Reviewed by: jhb, manu, jhibbits, emaste
Differential Revision: https://reviews.freebsd.org/D21208
2019-08-12 22:58:50 +00:00
Warner Losh
99e1c5ab38 Move sc out of the global file
x86 needs sc, as does sparc64. powerpc doesn't use it by default, but some old
powermac notebooks do not work with vt yet for reasons unknonw. Even so, I've
removed it from powerpc LINT. It's not in daily use there, and the intent is to
100% switch to vt now that it works for that platform to limit support burden.

All the other architectures omit some or all of the screen savers from their
lint config. Move them to the x86 NOTES files and remove the exclusions. This
reduces slightly the number of savers sparc64 compiles, but since they are in
GENERIC, the overage is adequate and if someone reaelly wants to sort them out
in sparc64 they can sweat the details and the testing.

Reviewed by: jhb (earlier version), manu (earlier version), jhibbits
Differential Revision: https://reviews.freebsd.org/D21233
2019-08-12 22:58:44 +00:00
Warner Losh
0d89c934cb Start to split out the really x86 specific NOTES from the global notes file.
Start with COMPAT_43, since it's really only relevant to x86.

Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D21203
2019-08-12 22:58:13 +00:00
Konstantin Belousov
b7b6b7a9c5 PR: 239143
Reported and tested by:	Wes Maag <jwmaag@gmail.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-07-14 21:08:54 +00:00
Konstantin Belousov
e2e0470dfa Ensure that mds_handler always points to a valid method.
Depending on system configuration, version, and architecture,
mds_handler might be dereferenced from doreti before
hw_mds_recalculate_boot() initialized it.  Statically assign void
method to cover all cases.

Reported by:	"Schuendehuette, Matthias (LDA IT PLM)" <matthias.schuendehuette@siemens.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-07-11 16:22:49 +00:00
Jonathan T. Looney
ca8929d2a3 Currently, MCA entries remain on an every-growing linked list. This means
that it becomes increasingly expensive to process a steady stream of
correctable errors. Additionally, the memory used by the MCA entries can
grow without bound.

Change the code to maintain two separate lists: a list of entries which
still need to be logged, and a list of entries which have already been
logged. Additionally, allow a user-configurable limit on the number of
entries which will be saved after they are logged. (The limit defaults
to -1 [unlimited], which is the current behavior.)

Reviewed by:	imp, jhb
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20482
2019-06-08 18:26:48 +00:00
Tycho Nightingale
56db4ebd34 another occurrence where a very large dma mapping can cause integer overflow
Submitted by:	rlibby
Sponsored by:	Dell EMC Isilon
2019-06-05 13:08:21 +00:00
Tycho Nightingale
88e9fbe568 very large dma mappings can cause integer overflow
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20505
2019-06-03 19:19:35 +00:00
John Baldwin
bebcdc0073 Add a constant for the LS config MSR on AMD CPUs.
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19506
2019-05-23 23:37:11 +00:00
Conrad Meyer
c63f1e21da Decode and name additional x86 feature bits
These are all enumerated in Intel's ISA extension reference, 37th ed.

Sponsored by:	Dell EMC Isilon
2019-05-22 23:22:36 +00:00
Andrew Gallatin
18f9bb6fe0 x86 MCA: introduce MCA hooks for different vendor implementations
This is needed for AMD SMCA processors, as SMCA uses different
MSR address for access MCA banks.

Use IA32 specific msr_ops as defualt, and use SMCA-specific msr_ops
when on an SMCA-enabled processor

Submitted by:	chandu from amd dot com
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D18055
2019-05-22 13:44:15 +00:00
Konstantin Belousov
48ec6d3bc9 Do not call hw_mds_recalculate() from initializecpu().
If MDS mitigation is enabled by the tunable but MDS microcode is not
early-loaded, software mitigation is selected.  This causes
initializecpu() to try to allocate memory which makes boot process
very unhappy.

Create SYSINIT that runs sufficiently late to succeed.

Reported by:	naddy
PR:	237968
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-05-21 22:56:21 +00:00
Stephen J. Kiernan
1177d38ce1 The older detection methods (smbios.bios.vendor and smbios.system.product)
are able to determine some virtual machines, but the vm_guest variable was
still only being set to VM_GUEST_VM.

Since we do know what some of them specifically are, we can set vm_guest
appropriately.

Also, if we see the CPUID has the HV flag, but we were unable to find a
definitive vendor in the Hypervisor CPUID Information Leaf, fall back to
the older detection methods, as they may be able to determine a specific
HV type.

Add VM_GUEST_PARALLELS value to VM_GUEST for Parallels.

Approved by:	cem
Differential Revision:	https://reviews.freebsd.org/D20305
2019-05-21 13:29:53 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Stephen J. Kiernan
c5c8916278 Add missing setting of hv_base to the leaf that we used.
Correct setting hv_high to use regs[0], not leaf.
2019-05-19 15:07:14 +00:00
Stephen J. Kiernan
949f834a61 Instead of individual conditional statements to look for each hypervisor
type, use a table to make it easier to add more in the future, if needed.

Add VirtualBox detection to the table ("VBoxVBoxVBox" is the hypervisor
vendor string to look for.) Also add VM_GUEST_VBOX to the VM_GUEST
enumeration to indicate VirtualBox.

Save the CPUID base for the hypervisor entry that we detected. Driver code
may need to know about it in order to obtain additional CPUID features.

Approved by:	bryanv, jhb
Differential Revision:	https://reviews.freebsd.org/D16305
2019-05-17 17:21:32 +00:00
Konstantin Belousov
8f7f38457f Free microcode memory later.
With lockless DI, pmap_remove() requires operational thread lock,
which is initialized at SI_SUB_RUN_QUEUE for thread0.  Move it even
later where APs are started, the moment after which other boot memory
like trampoline stacks is already being freed.

Reported by:	gtetlow
Sponsored by:	The FreeBSD Foundation
MFC after:	30 days
2019-05-17 17:11:01 +00:00
Konstantin Belousov
7c5a46a1bc Remove resolver_qual from DEFINE_IFUNC/DEFINE_UIFUNC macros.
In all practical situations, the resolver visibility is static.

Requested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	so (emaste)
Differential revision:	https://reviews.freebsd.org/D20281
2019-05-16 22:20:54 +00:00
Tycho Nightingale
b961c0f244 Allow loading the same DMA address multiple times without any prior
unload for the LinuxKPI.

Reviewed by:	kib, zeising
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20181
2019-05-16 17:41:16 +00:00
Ryan Libby
244081120e iommu static analysis cleanup
A static analyzer complained about a couple instances of checking a
variable against NULL after already having dereferenced it.
 - dmar_gas_alloc_region: remove the tautological NULL checks
 - dmar_release_resources / dmar_fini_fault_log: don't deref unit->regs
   unless initialized.

And while here, fix an inverted initialization check in dmar_fini_qi.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential revision:	https://reviews.freebsd.org/D20263
2019-05-16 04:24:08 +00:00
Conrad Meyer
e7e3d5223f x86: Correctly identify bhyve hypervisor
Spotted after a similar report by Olivier Cochard-Labbé.

Sponsored by:	Dell EMC Isilon
2019-05-16 01:32:54 +00:00
Konstantin Belousov
b55d4ebe5f Properly announce MD_CLEAR.
Submitted by:	Petr Lampa <lampa@fit.vutbr.cz>
MFC after:	3 days
2019-05-15 17:55:41 +00:00
Konstantin Belousov
7355a02bdd Mitigations for Microarchitectural Data Sampling.
Microarchitectural buffers on some Intel processors utilizing
speculative execution may allow a local process to obtain a memory
disclosure.  An attacker may be able to read secret data from the
kernel or from a process when executing untrusted code (for example,
in a web browser).

Reference: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00233.html
Security:	CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091
Security:	FreeBSD-SA-19:07.mds
Reviewed by:	jhb
Tested by:	emaste, lwhsu
Approved by:	so (gtetlow)
2019-05-14 17:02:20 +00:00
Mateusz Guzik
a8c2fcb287 x86: store pending bitmapped IPIs in per-cpu areas
This gets rid of the global cpu_ipi_pending array.

While replace cmpset with fcmpset in the delivery code and opportunistically
check if given IPI is already pending.

Sponsored by:	The FreeBSD Foundation
2019-05-12 06:36:54 +00:00
Konstantin Belousov
078116a662 amd64: fix BUS_SPACE_MAXSIZE to 64bit max value.
Reviewed by:	jhb, tychon (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D20154
2019-05-07 01:18:57 +00:00
Tycho Nightingale
8d2a55ca67 zero inputs to vm_page_initfake() for predictable results
Reviewed by:	kib
Submitted by:	Anton Rang <rang at acm.org>
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20162
2019-05-06 00:57:05 +00:00
Conrad Meyer
665919aaaf x86: Implement MWAIT support for stopping a CPU
IPI_STOP is used after panic or when ddb is entered manually.  MONITOR/
MWAIT allows CPUs that support the feature to sleep in a low power way
instead of spinning.  Something similar is already used at idle.

It is perhaps especially useful in oversubscribed VM environments, and is
safe to use even if the panic/ddb thread is not the BSP.  (Except in the
presence of MWAIT errata, which are detected automatically on platforms with
known wakeup problems.)

It can be tuned/sysctled with "machdep.stop_mwait," which defaults to 0
(off).  This commit also introduces the tunable
"machdep.mwait_cpustop_broken," which defaults to 0, unless the CPU has
known errata, but may be set to "1" in loader.conf to signal that mwait
wakeup is broken on CPUs FreeBSD does not yet know about.

Unfortunately, Bhyve doesn't yet support MONITOR extensions, so this doesn't
help bhyve hypervisors running FreeBSD guests.

Submitted by:   Anton Rang <rang AT acm.org> (earlier version)
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20135
2019-05-04 20:34:26 +00:00
Conrad Meyer
83dc49beaf x86: Define pc_monitorbuf as a logical structure
Rather than just accessing it via pointer cast.

No functional change intended.

Discussed with:	kib (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20135
2019-05-04 17:35:13 +00:00