ACPI SRAT table on arm64 uses GICC entries to provide CPU locality
information. These entries use an AcpiProcessorUid to identify the
CPU (unlike on x86 where the entries have an APIC ID).
Update acpi_pxm.c to extend the cpu_add/cpu_find/cpu_get_info
functions to handle AcpiProcessorUid. Use the updated functions
while parsing ACPI_SRAT_GICC_AFFINITY entry for arm64.
Also update sys/conf/files.arm64 to build acpi_pxm.c when ACPI is
enabled.
Reviewed by: markj (previous version)
Differential Revision: https://reviews.freebsd.org/D17942
This moves the architecture independent parts of sys/x86/acpica/srat.c
to sys/dev/acpica/acpi_pxm.c, to be used later on arm64. The function
declarations are moved to sys/dev/acpica/acpivar.h
We also need to update sys/conf/files.{i386,amd64} to use the new file.
No functional changes.
Reviewed by: markj, imp
Differential Revision: https://reviews.freebsd.org/D17941
Specifically, assume that the device is present if evaluation of _STA
method fails.
Before r330957 we ignored any _STA evaluation failure (which was
performed by AcpiGetObjectInfo in ACPICA contrib code) for the purpose
of acpi_DeviceIsPresent and acpi_BatteryIsPresent. ACPICA 20180313
removed evaluation of _STA from AcpiGetObjectInfo. So, we added
evaluation of _STA to acpi_DeviceIsPresent and acpi_BatteryIsPresent.
One important difference is that the new code ignored a failure only if
_STA did not exist (AE_NOT_FOUND). Any other kind of failure was
treated as a fatal failure. Apparently, on some systems we can get
AE_NOT_EXIST when evaluating _STA. And that error is not an evil twin
of AE_NOT_FOUND, despite a very similar name, but a distinct error
related to a missing handler for an ACPI operation region.
It's possible that for some people the problem was already fixed by
changes in ACPICA and/or in acpi_ec driver (or even in BIOS) that fixed
the AE_NOT_EXIST failure related to EC operation region.
This work is based on a great analysis by cem and an earlier patch by
Ali Abdallah <aliovx@gmail.com>.
PR: 227191
Reported by: 0mp
MFC after: 2 weeks
After r340644 there were two things wrong in cases where there is both
an ECDT, and an EC device exposed via acpica. The first is a rather
trivial situation where the device desc would say ECDT even when it was
not implicitly created via ECDT (not really sure why the compiler
doesn't seem to warn about this).
The other more pervasive issue is that the code is designed to
essentially not do anything for EC probe when its uid was already
created an EC based on the ECDT's uid. The issue was that probe would
still return 0 in this case, and so we'd end up with some weird
duplication. Now to be honest, I'm not actually sure what exactly broke,
but it was definitely not working as intended. To fix this, all that is
really needed is to make sure we return ENXIO when we're probing the
device already added for the ECDT entry. While here though, move the
check for this earlier to avoid wasted cycles when we know after
obtaining the uid that it's duplicative.
There remains one questionable bit here which I don't want to touch -
when doing probe for PNP0C09, if acquiring _UID for the device fails, 0
is assumed, which is a valid UID used by the implicit ECDT.
Reported by: Charlie Li, et al.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D18311
Remove the requirement that a device be a ACPI method battery to be supported
as a battery.
Require now that the device be in the battery devclass and implement the
get_status and get_info functions. This allows batteries which are not ACPI
method batteries to be supported.
Reviewed by: jtl
Approved by: jtl (mentor)
MFC after: 1 Month
Differential Revision: https://reviews.freebsd.org/D17434
This patch utilizes the fixed_devclass attribute in order to make sure
other acpi devices with params don't get confused for an EC device.
The existing code assumes that acpi_ec_probe is only ever called with a
dereferencable acpi param. Aside from being incorrect because other
devices of ACPI_TYPE_DEVICE may be probed here which aren't ec devices,
(and they may have set acpi private data), it is even more nefarious if
another ACPI driver uses private data which is not dereferancable. This
will result in a pointer deref during boot and therefore boot failure.
On X86, as it stands today, no other devices actually do this (acpi_cpu
checks for PROCESSOR type devices) and so there is no issue. I ran into
this because I am adding such a device which gets probed before
acpi_ec_probe and sets private data. If ARM ever has an EC, I think
they'd run into this issue as well.
There have been several iterations of this patch. Earlier
iterations had ECDT enumerated ECs not call into the probe/attach
functions of this driver. This change was Suggested by: jhb@.
Reviewed by: jhb
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D16635
Now that we are handling PCI resources in pci_host_generic_acpi.c, we
don't need these change (made by r336129)
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17792
This is a major update for pci_host_generic_acpi.c, the current
implementation has some gaps that are better fixed up in one go.
The changes are to:
* Follow x86 method of not adding PCI resources to PCI host bridge in
ACPI code. This has been moved to pci_host_generic_acpi.c, where we
walk thru its resources of the host bridge and add them.
* Fixup code in pci_host_generic_acpi.c to read all decoded ranges
and update the 'ranges' property. This allows us to share most of
the code with generic implementation (and the FDT one).
* Parse and setup IO ranges and bus ranges when walking the resources
above. Drop most of the changes related to this from acpica code.
* Add the ECAM memory area as mem resource 0. Implement the logic to
get the ECAM area from MCFG (using bus range which we now decode),
or from _CBA (using _BBN/bus range). Drop aarch64 ifdefs from acpica
code which did part of this.
* Switch resource activation to similar code as FDT implementation,
this can be moved into generic implementation in a later pass.
* Drop the mechanism of using the 7th bit of bus number as the domain,
this is not correct and will work only in very specific cases. Use
_SEG as PCI domain and use the bus ranges of the host bridge to
provide start bus number.
This commit should not make any functional change to dev/acpica/acpi.c
for other architectures, almost all the changes there are to revert
earlier additions in this file done for aarch64.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17791
On arm64 (where INTRNG is enabled), the interrupts have to be mapped
with ACPI_BUS_MAP_INTR() before adding them as resources to devices.
The earlier code did the mapping before calling acpi_set_resource(),
which bypassed code that checked for PCI link interrupts.
To fix this, move the call to map interrupts into acpi_set_resource()
and that requires additional work to lookup interrupt properties.
The changes here are to:
* extend acpi_lookup_irq_handler() to lookup an irq in the ACPI
resources
* create a helper function acpi_map_intr() which uses the updated
acpi_lookup_irq_handler() to look up an irq, and then map it
with ACPI_BUS_MAP_INTR()
* use acpi_map_intr() in acpi_pcib_route_interrupt() to map
pci link interrupts.
With these changes, we can drop the ifdefs in acpi_resource.c, and
we can also drop the call for mapping interrupts in generic_timer.c
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D17790
Curiously, the in-kernel routines always use the design voltage to
convert from mA to mW, but acpiconf in userland uses the current
voltage. As a result, this can report a different mW rate than
acpiconf.
Submitted by: Manuel Stühn <freebsdnewbie@freenet.de>
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D17077
The AML is even stupider than always returning 0. It will only return
non-zero for an OS that reports itself as "Windows 2015", at least
on the Threadripper board's AML that I've examined.
Those AMLs also suggest we may need this quirk for AMI0030 as well.
There may be other cases where we need to override the _STA in a
generic way, so we should consider writing code to do that.
The Device Specific Method (_DSM) is on optional object that defines
device specific controls. This will be useful for our power management
controller in upcoming patches. More information can be found in ACPI
spec 6.2 section 9.1.1
https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
This patch had a minor modification changing ENOMEM to AE_NO_MEMORY
after it got review and approval but before committing.
Test Plan: Tested in my s0ix branch
Reviewed by: kib
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D17121
This is an amalgam of a patch by Doug Ambrisko to
generalize uart_acpi_find_device, imp moving the
ACPI table to uart_dev_ns8250.c and advice by jhb
to work around a bug in the EPYC 3151 BIOS
(the BIOS incorrectly marks the serial ports as
disabled)
Reviewed by: imp
MFC after: 8 weeks
Differential Revision: https://reviews.freebsd.org/D16432
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.
Our kernel currently defines two-argument versions of timespecadd and
timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions. Solaris also defines a three-argument version, but
only in its kernel. This revision changes our definition to match the
common three-argument version.
Bump _FreeBSD_version due to the breaking KPI change.
Discussed with: cem, jilles, ian, bde
Differential Revision: https://reviews.freebsd.org/D14725
It's likely that the header was needed in the past for swi(9).
But now that code does not use swi(9) or any other interfaces defined
in sys/interrupt.h.
MFC after: 1 week
work when called by members of the 'operator' group. They are already
allowed to eg power off the system (via suid shutdown(8)), so they
might as well be permitted to suspend it.
Tested by: xmj@
Reviewed by: delphij@
MFC after: 2 weeks
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16062
I want to do this change because this call (actually,
AcpiHwLegacyWakePrep) does a memory allocation and ACPI namespace
evaluation. Although it is not very likely to run into any trouble, it
is still not safe to make those calls with interrupts disabled.
witness(4) and malloc(9) do not currently check for a context with
interrupts disabled via intr_disable and we lack a facility for doing
that. So, those unsafe operations fly under the radar. But if
intr_disable in acpi_EnterSleepState was replaced with spinlock_enter
(which it probably should be), then witness and malloc would immediately
complain.
Also, AcpiLeaveSleepStatePrep is documented as called when interrupts
are enabled. It used to require disabled interrupts, but that
requirement was changed a long time ago when support for _BFS and _GTS
was removed from ACPICA.
The ACPI wakeup sequence is very sensitive to changes. I consider this
change to be correct, but there can be fallouts from it.
What AcpiHwLegacyWakePrep essentially does is writing a value
corresponding to S0 into SLP_TYPx bits of PM1 Control Register(s).
According to ACPI specifications that write should be a NOP as SLP_EN
bit is not set. But I see in some chipset specifications that they
allow to ignore SLP_EN altogether and to act on a change of SLP_TYPx
alone.
Also, there are a couple of accesses to ACPI hardware before the new
location of the call to AcpiLeaveSleepStatePrep. One is to clear the
power button status and the other is to enable SCI. So, the move may
affect the interaction between then OS and ACPI platform.
I have not seen any regressions on my test system, but it's a desktop.
MFC after: 5 weeks
The TSC-s are checked and synchronized only if they were good
originally. That is, invariant, synchronized, etc.
This is necessary on an AMD-based system where after a wakeup from STR I
see that BSP clock differs from AP clocks by a count that roughly
corresponds to one second. The APs are in sync with each other. Not
sure if this is a hardware quirk or a firmware bug.
This is what I see after a resume with this change:
SMP: passed TSC synchronization test after adjustment
acpi_timer0: restoring timecounter, ACPI-fast -> TSC-low
Reviewed by: kib
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D15551
I have a system that is very unstable after resuming from suspend-to-RAM
but only if HPET is used as the event timer. The theory is that SMM
code / firmware could be enabling HPET for its own uses and unexpected
interrupts cause a trouble for it. Originally I wanted to solve the
problem in hpet_suspend() method, but that was insufficient as the event
timer could get reprogrammed again.
So, it's better, for my case and in general, to stop the event timer(s)
before entering the hardware suspend.
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D15413
This sysctl allows a deeper dive into the sleep abyss comparing to
debug.acpi.suspend_bounce. When the new sysctl is set the system will
execute the suspend sequence up to the call to AcpiEnterSleepState().
That includes saving processor contexts and parking APs. Then, instead
of actually entering the sleep state, the BSP will call resumectx() to
emulate the wakeup. The APs should get restarted by the sequence of
Init and Startup IPIs that BSP sends to them.
MFC after: 8 days
ACPI I/O port descriptors use _MIN and _MAX fields to specify the set
of allowable base (start) addresses for an I/O port resource along with
a _LEN field specifying the length. A fixed resource is supposed to be
encoded with _MIN == _MAX, but some buggy firmwares instead set _MAX to
the end of the fixed range. Relocating I/O ranges only make sense in
_PRS (possible resource settings), not in _CRS (current resource settings),
so if an I/O port range with _MAX set set to the end of the range is
present in _CRS, treat it as a fixed I/O port resource starting at
_MIN.
PR: 224096
Submitted by: Harald Böhm <harald@boehm.codes>
Pointy hat to: jhb (taking so long to actually commit this)
MFC after: 1 week
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
bus provide it with its needed memory resources.
This allows us to use PCIe on the ThunderX2 and, with a previous version
of the patch, on the SoftIron 3000 with ACPI.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Sponsored by: DARPA, AFRL
Sponsored by: Cavium (Hardware)
Differential Revision: https://reviews.freebsd.org/D8767
6.0 spec 6.4.3.5 bit 0 is ignored on QWord, DWord, and Word Address Space
Descriptors, but not Extended Address Space Descriptors.
Reviewed by: jhb
Sponsored by: DARPA, AFRL
Sponsored by: Cavium (Hardware)
Differential Revision: https://reviews.freebsd.org/D14516
allocated with a tag to come from the specified domain if it meets the
other constraints provided by the tag. Automatically create a tag at
the root of each bus specifying the domain local to that bus if
available.
Reviewed by: jhb, kib
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13545
This adds a new acpi_bus interface with a map_intr method. This is similar
to the Open Firmware map_intr method and allows us to create the needed
mapping from ACPI space to INTRNG space.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8617
By the ACPI standard (ACPI 5 chapter 8.4 Declaring Processors) Processors
can be implemented in 2 distinct ways:
* Through a Processor object type (which provides P_BLK)
* Through a Device object type
Prior to this change, the FreeBSD driver only supported the former. AMD
Epyc / Poweredge systems we are testing both implement the latter only. Add
the missing support.
Because P_BLK is not defined in the device object case, C-states entering
must be completely controlled via _CST methods rather than P_LVL2/3.
John Baldwin points out that ACPI 6.0 formally deprecates the Processor
keyword, so eventually processors will only be enumerated as Device objects.
Submitted by: attilio
Reviewed by: jhb, markj, Anton Rang <rang AT acm.org>
Relnotes: maybe
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13457
From ACPICA 20170929, AcpiOsGetTimer() should be available early because
While() loop timeout mechanism was reimplemented with it. Unfortunately,
it means AcpiLoadTables() may cause panic when a While() loop is executed.
After having lengthy discussions with ACPICA developers, I have concluded
that dummy timecounter is good enough for the purpose and it is the least
intrusive solution for now. Also, they reminded me the ACPI specification
implies OS timer function should be available before loading tables.
This addresses a deadlock during boot when EARLY_AP_STARTUP is configured:
a taskqueue thread may call pause() with an ACPI mutex held, and thread0
may block on this mutex before configuring the eventtimer. In this case
the taskqueue thread will sleep forever waiting for its callout to fire.
PR: 220277
Submitted by: jhb
MFC after: 3 days
For GEN1 Hyper-V, vmbus is attached to pcib0, which contains the
resources for PCI passthrough and SR-IOV. There is no
acpi_syscontainer0 on GEN1 Hyper-V.
For GEN2 Hyper-V, vmbus is attached to acpi_syscontainer0, which
contains the resources for PCI passthrough and SR-IOV. There is
no pcib0 on GEN2 Hyper-V.
The ACPI VMBUS device now only holds its _CRS, which is empty as
of this commit; its existence is mainly for upward compatibility.
Device tree structure is suggested by jhb@.
Tested-by: dexuan@
Collabrated-wth: dexuan@
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D10565
- Rename the default implementation of 'pcib_request_feature' and add
a pcib_request_feature() wrapper function (as is often done for
new-bus APIs implemented via kobj) that accepts a single function.
Previously the call to pcib_request_feature() ended up invoking the
method on the great-great-grandparent of the bridge device instead
of the grandparent. For a bridge that was a direct child of pci0 on
x86 this resulted in the method skipping over the Host-PCI bridge
driver and being invoked against nexus0
- When invoking _OSC from a Host-PCI bridge driver, invoke
device_get_softc() against the Host-PCI bridge device instead of the
child bridge that is requesting HotPlug. Using the wrong softc data
resulted in garbage being passed for the ACPI handle causing the
_OSC call to fail.
- While here, perform some other cleanups to _OSC handling in the ACPI
Host-PCI bridge driver:
- Don't invoke _OSC when requesting a control that has already been
granted by the firmware.
- Don't set the first word of the capability array before invoking
_OSC. This word is always set explicitly by acpi_EvaluateOSC()
since it is UUID-independent.
- Don't modify the set of granted controls unless _OSC doesn't exist
(which is treated as always successful), or the _OSC method
doesn't fail.
- Don't require an _OSC status of 0 for success. _OSC always
returns the updated control mask even if it returns a non-zero
status in the first word.
- Whine if _OSC ever tries to revoke a previously-granted control.
(It is not supposed to do that.)
- While here, add constants for the _OSC status word in acpivar.h
(though currently unused).
Reported by: adrian
Reviewed by: imp
MFC after: 1 week
Tested on: Lenovo x220
Differential Revision: https://reviews.freebsd.org/D10520
The MFC will include a compat definition of smp_no_rendevous_barrier()
that calls smp_no_rendezvous_barrier().
Reviewed by: gnn, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10313
During acpi_cmbat_attach() the acpi_cmbat_init_battery() notification
handler is registered. It has been observed this notification handler
can be called instantly, before the attach routine has returned. In
the notification handler there is a call to device_is_attached() which
returns false. Because the softc is set we know an attach is in
progress and the fix is simply to wait and try again in this case.
Reviewed by: avg @
MFC after: 1 week
Convert PCIe hot plug support over to asking the firmware, if any, for
permission to use the HotPlug hardware. Implement pci_request_feature
for ACPI. All other host pci connections to allowing all valid feature
requests.
Sponsored by: Netflix
On Core2 and older Intel CPUs, where TSC stops in C2, system does not
allow C2 entrance if timecounter hardware is TSC. This is done by
tc_windup() which tests for TC_FLAGS_C2STOP flag of the new
timecounter and increases cpu_disable_c2_sleep if flag is set. Right
now init_TSC_tc() only sets the flag if cpu_deepest_sleep >= 2, but
TSC is initialized too early for this variable to be set by
acpi_cpu.c.
There is no reason to require that ACPI reported C2 and deeper states
to set TC_FLAGS_C2STOP, so remove cpu_deepest_sleep test from
init_TSC_tc() condition. And since this is the only use of the
variable, remove it at all.
Reported and submitted by: Jia-Shiun Li <jiashiun@gmail.com>
Suggested by: jhb
MFC after: 2 weeks
using the ACPI C1/mwait sleep method.
Previously, the mwait instruction would return when an interrupt was
pending; however, the idle loop did not actually enable interrupts when
this occurred. This led to a situation where the idle loop could quickly
spin through the C1/mwait sleep method a number of times when an interrupt
was pending. (Eventually, the situation corrected itself when something
other than an interrupt triggered the idle loop to either enable interrupts
or schedule another thread.)
Reviewed by: kib, imp (earlier version)
Input from: jhb
MFC after: 1 week
Sponsored by: Netflix
memory-mapped devices that are normally PCIe drives. Devices can then use
the existing pci_get_class, etc. accessors to query this data.
The ivar values are different enough from the existing ACPI and ISA values
to not conflict.
Reviewed by: jhb
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8721
In order to make Prometheus do graphing/alerting on thermal sensors in a
generic fashion, we should attach the name of the thermal zone device as
a label. That way there is only a single metric for the temperature of a
thermal zone, with its name attached as a label.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D8775
If the bus number assigned to a Host-PCI bridge doesn't match the first
bus number in the associated producer range from _CRS, print a warning and
fail to attach rather than panicking due to an assertion failure.
At least one single-socket Dell machine leaves a "ghost" Host-PCI bridge
device in the ACPI namespace that seems to correspond to the I/O hub in
the second socket of a two-socket machine. However, the BIOS doesn't
configure the settings for this "ghost" bridge correctly, nor does it have
any PCI devices behind it.
Tested by: royger
MFC after: 2 weeks
When handling a GPE ACPI interrupt object the EcSpaceHandler()
function can be called which checks the EC_EVENT_SCI bit and then
recurse on the EcGpeQueryHandler() function. If there are multiple GPE
events pending the EC_EVENT_SCI bit will be set at the next call to
EcSpaceHandler() causing it to recurse again via the
EcGpeQueryHandler() function. This leads to a slow never ending
recursion during boot which prevents proper system startup, because
the EC_EVENT_SCI bit never gets cleared in this scenario.
The behaviour is reproducible with the ALASKA AMI in combination with
a newer Skylake based mainboard in the following way:
Enter BIOS and adjust the clock one hour forward. Save and exit the
BIOS. System fails to boot due to the above mentioned bug in
EcGpeQueryHandler() which was observed recursing multiple times.
This patch adds a simple recursion guard to the EcGpeQueryHandler()
function and also also adds logic to detect if new GPE events occurred
during the execution of EcGpeQueryHandler() and then loop on this
function instead of recursing.
Reviewed by: jhb
MFC after: 2 weeks
Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.
Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.
Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/__vdso_gettc.c into one source file in the new
libc/x86/sys location. __vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.
Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge
machines.
Tested by: Howard Su <howard0su@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473
time, by, by default disallow writes to the mmaped HPET pages.
Intent is to allow userspace to use HPET as fast (i.e. no-syscall)
timecounter for gettimeofday(2). Unfortunately, the permission model
does not make it possible to safely unhide /dev/hpet in the jails even
if default mode is set to 0444, because untrusted jailed root may
change device permissions to writeable.
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
The SYSINIT runs at SI_SUB_KICK_SCHEDULER after the scheduler is fully
initialized and timers are working. This fixes booting in the
EARLY_AP_STARTUP case.
the graphics drivers can benefit from access to the lid handle for querying and getting notifications
Submitted by: kmacy
Differential Revision: https://reviews.freebsd.org/D6643
After r285994, sysctl(8) was fixed to use 273.15 instead of 273.20 as 0C
reference and as result, the temperature read in sysctl(8) now exibits a
+0.1C difference.
This commit fix the kernel references to match the reference value used in
sysctl(8) after r285994.
Sponsored by: Rubicon Communications (Netgate)
- Add a pcib_detach() function for the PCI-PCI bridge driver. It
tears down the NEW_PCIB and hotplug state including destroying
resource managers, deleting child devices, and disabling hotplug
events.
- Add a detach method to the ACPI PCI-PCI bridge driver which calls
pcib_detach() and then frees the copy of the _PRT interrupt routing
table.
- Add a detach method to the PCI-Cardbus bridge driver which frees
the PCI bus resources in addition to calling cbb_detach().
- Explicitly clear any pending hotplug events during attach to ensure
future events will generate an interrupt.
- If a the Command Completed bit is set in the slot status register
when the command completion timeout fires, treat it as if the
command completed and the completion interrupt was just lost rather
than forcing a detach.
- Don't wait for a Command Completed notification if Command Completion
interrupts are disabled. The spec explicitly says no interrupt is
enabled when clearing CCIE, and on my T400 no interrupt is generated
when CCIE is changed from cleared to set, either. In addition, the
T400 doesn't appear to set the Command Completed bit in the cases
where it doesn't generate an interrupt, so don't schedule the timer
either. (If the CC bit were always set, one could always set the timer
and rely on the logic of treating CC set as a missed interrupt.)
Reviewed by: imp (older version)
Differential Revision: https://reviews.freebsd.org/D6424
Some ACPI operations such as mutex acquires and event waits accept a
timeout. The ACPI OSD layer implements these timeouts by using regular
sleep timeouts. However, this doesn't work during early boot before
event timers are setup. Instead, use polling combined with DELAY()
to spin.
This fixes booting on upcoming Intel systems with Kaby Lake processors.
Tested by: "Jeffrey E Pieper" <jeffrey.e.pieper@intel.com>
Reviewed by: jimharris
MFC after: 1 week
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.
This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed). This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP. It also permits all CPUs to be available for
handling interrupts before any devices are probed.
This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot. Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.
However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system. In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU. Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.
Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code. This removes the need to treat the single-CPU boot environment
as a special case.
As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP). This will allow the option to be turned off
if need be during initial testing. I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0. Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.
These changes have only been tested on x86. Other platform maintainers
are encouraged to port their architectures over as well. The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).
PR: kern/199321
Reviewed by: markj, gnn, kib
Sponsored by: Netflix
bus_get_cpus() returns a specified set of CPUs for a device. It accepts
an enum for the second parameter that indicates the type of cpuset to
request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
the device when DEVICE_NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus'
by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.
Compared to the r298933, this version uses 'struct _cpuset' in
<sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h>
(<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though
<sys/_bitset.h> does not after recent changes).
PCI-express HotPlug support is implemented via bits in the slot
registers of the PCI-express capability of the downstream port along
with an interrupt that triggers when bits in the slot status register
change.
This is implemented for FreeBSD by adding HotPlug support to the
PCI-PCI bridge driver which attaches to the virtual PCI-PCI bridges
representing downstream ports on HotPlug slots. The PCI-PCI bridge
driver registers an interrupt handler to receive HotPlug events. It
also uses the slot registers to determine the current HotPlug state
and drive an internal HotPlug state machine. For simplicty of
implementation, the PCI-PCI bridge device detaches and deletes the
child PCI device when a card is removed from a slot and creates and
attaches a PCI child device when a card is inserted into the slot.
The PCI-PCI bridge driver provides a bus_child_present which claims
that child devices are present on HotPlug-capable slots only when a
card is inserted. Rather than requiring a timeout in the RC for
config accesses to not-present children, the pcib_read/write_config
methods fail all requests when a card is not present (or not yet
ready).
These changes include support for various optional HotPlug
capabilities such as a power controller, mechanical latch,
electro-mechanical interlock, indicators, and an attention button.
It also includes support for devices which require waiting for
command completion events before initiating a subsequent HotPlug
command. However, it has only been tested on ExpressCard systems
which support surprise removal and have none of these optional
capabilities.
PCI-express HotPlug support is conditional on the PCI_HP option
which is enabled by default on arm64, x86, and powerpc.
Reviewed by: adrian, imp, vangyzen (older versions)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D6136
bus_get_cpus() returns a specified set of CPUs for a device. It accepts
an enum for the second parameter that indicates the type of cpuset to
request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
the device when DEVICE_NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus'
by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.
Reviewed by: wblock (manpage)
Differential Revision: https://reviews.freebsd.org/D5519
Arguably we should only be doing the probe/attach to children of
these devices as well.
Tested by: Michal Stanek <mst_semihalf.com> (arm64)
Differential Revision: https://reviews.freebsd.org/D6133
This allows the PCI-PCI bridge driver to save a reference to the child
device in its softc.
Note that this required moving the "pci" device creation out of
acpi_pcib_attach(). Instead, acpi_pcib_attach() is renamed to
acpi_pcib_fetch_prt() as it's sole action now is to fetch the PCI
interrupt routing table.
Differential Revision: https://reviews.freebsd.org/D6021
Both of the callers were expecting the input cap_set to be modified.
This fixes them to request cap_set to be updated with the returned buffer.
Reviewed by: jkim
Differential Revision: https://reviews.freebsd.org/D6040
Eventually with earlier AP startup this code will change to call the
startup function synchronously instead of queueing the task. Moving
the time we queue the task should be a no-op since taskqueue threads
don't start executing tasks until much later, but this reduces the diff
with the earlier AP startup patches.
Sponsored by: Netflix
Tell the firmware that we support PCI-express config space access
and MSI.
Reviewed by: jkim
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6023
This wrapper does not translate errors in the first word to ACPI
error status returns. Use this wrapper in the acpi_cpu(4) driver in
place of the existing _OSC code. While here, fix a bug where the wrong
count of words was passed when invoking _OSC.
Reviewed by: jkim
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6022
The ACPI and OFW PCI bus drivers as well as CardBus override this to
allocate the larger ivars to hold additional info beyond the stock PCI ivars.
This removes the need to pass the size to functions like pci_add_iov_child()
and pci_read_device() simplifying IOV and bus rescanning implementations.
As a result of this and earlier changes, the ACPI PCI bus driver no longer
needs its own device_attach and pci_create_iov_child methods but can use
the methods in the stock PCI bus driver instead.
Differential Revision: https://reviews.freebsd.org/D5891
VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in
the virtual memory system. DEVICE_NUMA is used to enable affinity
reporting for devices such as bus_get_domain().
MAXMEMDOM must still be set to a value greater than for any NUMA support
to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is
enabled and the system supports NUMA.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5782
Previously, the ACPI PCI bus driver did a single pass over the devices in
the namespace that were a child of a given PCI bus to associate the
PCI bus-enumerated device_t devices with the corresponding ACPI handles.
However, this meant that handles were only established at runtime for devices
found during the initial PCI bus scan.
PCI_IOV adds devices that show up after the initial PCI bus scan, and coming
changes to add a bus rescan can also add devices after the initial scan.
This change adds a pci_child_added() callback to the ACPI PCI bus that walks
the namespace to find the ACPI handle for each device that is added. Using
a callback means that the handle is correctly set for any device no matter
how it is added (initial scan, IOV, or a bus rescan).
Instead of providing a wrapper around device_delete_child() that the PCI
bus and child bus drivers must call explicitly, move the bulk of the logic
from pci_delete_child() into a bus_child_deleted() method
(pci_child_deleted()). This allows PCI devices to be safely deleted via
device_delete_child().
- Add a bus_child_deleted method to the ACPI PCI bus which clears the
device_t associated with the corresponding ACPI handle in addition to
the normal PCI bus cleanup.
- Change cardbus_detach_card to call device_delete_children() and move
CardBus-specific delete logic into a new cardbus_child_deleted() method.
- Use device_delete_child() instead of pci_delete_child() in the SRIOV code.
- Add a bus_child_deleted method to the OpenFirmware PCI bus drivers which
frees the OpenFirmware device info for each PCI device.
Reviewed by: imp
Tested on: amd64 (CardBus and PCI-e hotplug)
Differential Revision: https://reviews.freebsd.org/D5831
On some architectures, u_long isn't large enough for resource definitions.
Particularly, powerpc and arm allow 36-bit (or larger) physical addresses, but
type `long' is only 32-bit. This extends rman's resources to uintmax_t. With
this change, any resource can feasibly be placed anywhere in physical memory
(within the constraints of the driver).
Why uintmax_t and not something machine dependent, or uint64_t? Though it's
possible for uintmax_t to grow, it's highly unlikely it will become 128-bit on
32-bit architectures. 64-bit architectures should have plenty of RAM to absorb
the increase on resource sizes if and when this occurs, and the number of
resources on memory-constrained systems should be sufficiently small as to not
pose a drastic overhead. That being said, uintmax_t was chosen for source
clarity. If it's specified as uint64_t, all printf()-like calls would either
need casts to uintmax_t, or be littered with PRI*64 macros. Casts to uintmax_t
aren't horrible, but it would also bake into the API for
resource_list_print_type() either a hidden assumption that entries get cast to
uintmax_t for printing, or these calls would need the PRI*64 macros. Since
source code is meant to be read more often than written, I chose the clearest
path of simply using uintmax_t.
Tested on a PowerPC p5020-based board, which places all device resources in
0xfxxxxxxxx, and has 8GB RAM.
Regression tested on qemu-system-i386
Regression tested on qemu-system-mips (malta profile)
Tested PAE and devinfo on virtualbox (live CD)
Special thanks to bz for his testing on ARM.
Reviewed By: bz, jhb (previous)
Relnotes: Yes
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D4544