r326454.
bwn(4)/bhnd(4) has been tested with most chipsets currently supported by
bwn(4), and this change should be transparent to existing bwn(4) users;
please report any regressions that you do encounter.
To revert to using siba_bwn(4) instead of bhnd(4), place the following
lines in loader.conf(5):
hw.bwn_pci.preferred="0"
Once we're satisfied that the switch to bhnd(4) has seen sufficient broader
testing, bwn(4) will be migrated to use the native bhnd(9) interface
directly, and support for siba_bwn(4) will be dropped (see D13518).
Sponsored by: The FreeBSD Foundation
pmcstat request for close will generate a close event.
This event will be in turn received by pmcstat to close the file.
Reviewed by: kib
Tested by: pho
MFC after: 1 week
Sponsored by: Stormshield
The implementation of the Kernel Page Table Isolation (KPTI) for
amd64, first version. It provides a workaround for the 'meltdown'
vulnerability. PTI is turned off by default for now, enable with the
loader tunable vm.pmap.pti=1.
The pmap page table is split into kernel-mode table and user-mode
table. Kernel-mode table is identical to the non-PTI table, while
usermode table is obtained from kernel table by leaving userspace
mappings intact, but only leaving the following parts of the kernel
mapped:
kernel text (but not modules text)
PCPU
GDT/IDT/user LDT/task structures
IST stacks for NMI and doublefault handlers.
Kernel switches to user page table before returning to usermode, and
restores full kernel page table on the entry. Initial kernel-mode
stack for PTI trampoline is allocated in PCPU, it is only 16
qwords. Kernel entry trampoline switches page tables. then the
hardware trap frame is copied to the normal kstack, and execution
continues.
IST stacks are kept mapped and no trampoline is needed for
NMI/doublefault, but of course page table switch is performed.
On return to usermode, the trampoline is used again, iret frame is
copied to the trampoline stack, page tables are switched and iretq is
executed. The case of iretq faulting due to the invalid usermode
context is tricky, since the frame for fault is appended to the
trampoline frame. Besides copying the fault frame and original
(corrupted) frame to kstack, the fault frame must be patched to make
it look as if the fault occured on the kstack, see the comment in
doret_iret detection code in trap().
Currently kernel pages which are mapped during trampoline operation
are identical for all pmaps. They are registered using
pmap_pti_add_kva(). Besides initial registrations done during boot,
LDT and non-common TSS segments are registered if user requested their
use. In principle, they can be installed into kernel page table per
pmap with some work. Similarly, PCPU can be hidden from userspace
mapping using trampoline PCPU page, but again I do not see much
benefits besides complexity.
PDPE pages for the kernel half of the user page tables are
pre-allocated during boot because we need to know pml4 entries which
are copied to the top-level paging structure page, in advance on a new
pmap creation. I enforce this to avoid iterating over the all
existing pmaps if a new PDPE page is needed for PTI kernel mappings.
The iteration is a known problematic operation on i386.
The need to flush hidden kernel translations on the switch to user
mode make global tables (PG_G) meaningless and even harming, so PG_G
use is disabled for PTI case. Our existing use of PCID is
incompatible with PTI and is automatically disabled if PTI is
enabled. PCID can be forced on only for developer's benefit.
MCE is known to be broken, it requires IST stack to operate completely
correctly even for non-PTI case, and absolutely needs dedicated IST
stack because MCE delivery while trampoline did not switched from PTI
stack is fatal. The fix is pending.
Reviewed by: markj (partially)
Tested by: pho (previous version)
Discussed with: jeff, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
On a SPROM-less device, the PCI(e) bridge core will be initialized with its
power-on-reset defaults; this can leave the SPROM-derived BHND_PCI_SRSH_PI
value pointing to the wrong backplane address. This value is used by the
PCI core when performing address translation between the static register
windows in BAR0 that map the PCI core's register block, and backplane
address space.
Previously, bhndb_pci(4) incorrectly used the potentially invalid static
BAR0 PCI register windows when attempting to correct the BHND_PCI_SRSH_PI
value in the PCI core's SPROM shadow.
Instead, we now read/update BHND_PCI_SRSH_PI by fetching the PCI core's
backplane address from the core enumeration table, and then using a dynamic
register window to explicitly map the PCI core's register block into BAR0.
Sponsored by: The FreeBSD Foundation
The implementation will follow (D12723). For now, get the changes to
commit-protected files out of the way.
Approved by: secteam (gordon)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13925
Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.
This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
Since we have no control over the name, the MAKEDEV_CHECKNAME flag must be
used to return an error on an invalid (to devfs) name instead of panicing.
r305900 that originally added this feature also introduced a few other bugs:
- Proper locking not performed
- Theoretically broke the expectation that the control event buffer would
not span more than one pages, but did not update the CTASSERT that was
in place to prevent this. However, since the struct virtio_console_control
and the bulk buffer together were quite small, this could not have happened.
Also workaround an QEMU VirtIO spec violation in that it includes the NUL
terminator in the buffer length when the spec says it is not included.
PR: 223531
MFC after: 1 week
Attaching syscon_generic earlier than BUS_PASS_DEFAULT makes it more
difficult for specific syscon drivers to attach to the syscon node and to
get ordering right. Further discussion yielded the following set of
decisions:
- Move syscon_generic to BUS_PASS_DEFAULT
- If a platform needs a syscon with different attach order or probe
behavior, it should subclass syscon_generic and match on the SoC specific
compat string
- When we come across a need for a syscon that attaches earlier but only
specifies compatible = "syscon", we should create a syscon_exclusive driver
that provides generic access but probes earlier and only matches if "syscon"
is the only compatible. Such fdt nodes do exist in the wild right now, but
we don't really use them at the moment.
Additionally:
- Any syscon provider that has needs any more complex than a spinlock solely
for syscon access and a single memory resource should subclass syscon
directly rather than attempting to subclass syscon_generic or add complexity
to it. syscon_generic's attach/detach methods may be made public should the
need arise to subclass it with additional attach/detach behavior.
We introduce aw_syscon(4) that just subclasses syscon_generic but probes
earlier to meet our requirements for if_awg and implements #2 above for this
specific situation. It currently only matches a64/a83t/h3 since these are
the only platforms that really need it at the time being.
Discussed with: ian
Reviewed by: manu, andrew, bcr (manpages, content unchanged since review)
Differential Revision: https://reviews.freebsd.org/D13793
that userland doesn't switch partitions on its own, compare against
the partition mmcsd_ioctl_cmd() is going to switch to (based on the
device node used) rather than the currently selected partition.
executed, the interrupt aggregation code might have disabled the
SDHCI_INT_DMA_END and/or SDHCI_INT_RESPONSE bits in slot->intmask
and the SDHCI_SIGNAL_ENABLE register respectively. So when restoring
the interrupt masks based on the previous contents of slot->intmask
in sdhci_exec_tuning(), ensure that the SDHCI_INT_ENABLE register
doesn't lose these two bits.
While at it and in the spirit of r327339, let sdhci_tuning_intmask()
set the tuning error and re-tuning interrupt bits based on the
SDHCI_TUNING_ENABLED rather than the SDHCI_TUNING_SUPPORTED flag
being set, i. e. only when (re-)tuning is actually used. Currently,
this changes makes no net difference, though.
allocated with a tag to come from the specified domain if it meets the
other constraints provided by the tag. Automatically create a tag at
the root of each bus specifying the domain local to that bus if
available.
Reviewed by: jhb, kib
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13545
The value written to E1000_TARC(0) wasn't intended to have every bit but
E1000_TARC0_CB_MULTIQ_3_REQ cleared; a ~ was missing.
Also change the referenced spec update section in the comment to the correct
section.
Sponsored by: Intel Corporation
This adds a new acpi_bus interface with a map_intr method. This is similar
to the Open Firmware map_intr method and allows us to create the needed
mapping from ACPI space to INTRNG space.
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8617
Unconditional 32-bit shift is not endianness-safe.
Modify the logic to work both on LE and BE.
Submitted by: Wojciech Macek <wma@freebsd.org>
Reviewed by: np
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13102
The driver now fully observes watchdog(9) protocol.
Previously a too large timeout was silently clamped while the correct
behavior is to disable the watchdog and leave the error as is
(i.e. to not report success).
Also, previously a too small value caused the timer to stop while the
correct behavior is to use the minimal supported value.
MFC after: 2 weeks
RTC chips that have a control register bit for am/pm mode, the DS13xx series
uses one of the high bits in the hour register. Thus, when setting the time
in am/pm mode, the am/pm mode flag has to be ORed into the hour.
If the MD_VERIFY flag is set, we should use O_VERIFY. If the MD_VERIFY flag
is not set, we should not.
Reviewed by: stevek
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D13814
ian@ pointed out that BUS_PASS_DEFAULT + $anything is bogus, given that
BUS_PASS_DEFAULT is defined as __INT_MAX. Instead, we take a page out of
imx6_usbphy's book and use BUS_PASS_DEFAULT - 1000 to achieve the desired
effect of syscon_generic attaching before if_awg and other potential
consumers, but late enough that more specialized implementations should have
no problem attaching instead.
Reported by: ian
It still needs to be before if_awg at least in order to be available for
other operations, but it should not be attaching before interrupt
controllers at the very least.
This should make errors involving syscon register space colliding with other
devices a little more innocent, but these conflicts should really be tracked
down and resolved. One such conflict is with the Raspberry Pi 3 local
interrupt controller, noticed by tuexen@
Reported by: tuexen
Add cpuctl(4) ioctl CPUCTL_EVAL_CPU_FEATURES which forces re-read of
cpu_features, cpu_features2, cpu_stdext_features, and
std_stdext_features2.
The intent is to allow the kernel to see the changes in the CPU
features after micocode update. Of course, the update is not atomic
across variables and not synchronized with readers. See the man page
warning as well.
Reviewed by: imp (previous version), jilles
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D13770
The default 80MHz clock speed returned by bhnd_pmu_si_clock() was already
correct; this just prevents the "No backplane clock specified" warning
printf from being emitted when querying backplane clock speed.
Sponsored by: The FreeBSD Foundation
Enable the hardclock-based watchdog previously conditional on the
SW_WATCHDOG option whenever hardware watchdogs are not found, and
watchdogd attempts to enable the watchdog. The SW_WATCHDOG option
still causes the sofware watchdog to be enabled even if there is a
hardware watchdog. This does not change the other software-based
watchdog enabled by the --softtimeout option to watchdogd.
Note that the code to reprime the watchdog during kernel core dumps is
no longer conditional on SW_WATCHDOG. I think this was previously a bug.
Reviewed by: imp alfred bjk
MFC after: 1 week
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D13713
Apply the fix from r327499 to additional ioctl handlers.
Reported by: Ilja van Sprundel <ivansprundel@ioactive.com>
MFC after: 1 week
MFC with: r327499
Sponsored by: The FreeBSD Foundation