opnsense-src/sys/amd64
Kyle Evans d9320cd313 amd64: fix INVLPGB range invalidation
AMD64 Architecture Programmer's Manual Volume 3 says the following:

> ECX[15:0] contains a count of the number of sequential pages to
> invalidate in addition to the original virtual address, starting from
> the virtual address specified in rAX. A count of 0 invalidates a
> single page. ECX[31]=0 indicates to increment the virtual address at
> the 4K boundary. ECX[31]=1 indicates to increment the virtual address
> at the 2M boundary. The maximum count supported is reported in
> CPUID function 8000_0008h, EDX[15:0].

ECX[31] being what we call INVLPGB_2M_CNT, signaling to increment the
VA by 2M.

> This instruction invalidates the TLB entry or entries, regardless of
> the page size (4 Kbytes, 2 Mbytes, 4 Mbytes, or 1 Gbyte). [...]

Combined with this, my interpretation of the current code is: if
<va> is aligned on a PDE boundary, we'll use INVLPGB_2M_CNT to try and
invalidate <cnt> PDEs with a single call, but that only works if <va> is
the start of at least <cnt> 2M pages.  Otherwise, if <va> or any of the
subsequent PDEs isn't actually a superpage, then we would actually only
invalidate the *first* page within the PDE before skipping to the next
PDE, leaving the remainder of the 4K pages in between as they were.

The implication would seem to be that we would need to inspect the range
that we're trying to invalidate if we're planning on using
INVLPGB_2M_CNT at all, so this patch just simplifies it to a series of
4K invalidations.  My gut feeling is that we likely still come out on
top vs. the TLB shootdown we're avoiding.

This seems to explain some issues we've seen lately with fdgrowtable()
and kqueue on recent Zen4/Zen5 EPYC hardware, where we'd experience
corruption that we can't explain.

Approved by:	so
Security:	FreeBSD-EN-26:10.amd64
PR:		293382
Reviewed by:	alc, kib, markj

(cherry picked from commit 1b8e5c02f5c07521129e06ff8ab7c660238fd75c)
(cherry picked from commit ff11ae166cd9f8ae16a5c384d46aa1218f3ff013)
2026-04-29 22:13:01 +02:00
..
acpica x86: AMD Zen2: Zenbleed chicken bit mitigation 2023-10-10 09:34:31 -04:00
amd64 amd64: fix INVLPGB range invalidation 2026-04-29 22:13:01 +02:00
conf amd64/conf: Remove a config committed by accident 2026-02-24 19:22:13 +01:00
ia32 syscalls: fix missing SIGSYS for several ENOSYS errors 2023-10-09 06:24:31 +03:00
include libkern: add ilog2 macro 2025-02-10 04:27:12 -06:00
linux amd64/linux*: mark brandlists as static 2024-02-14 05:42:40 +02:00
linux32 Abstract UIO allocation and deallocation. 2024-03-08 23:27:20 -05:00
pci x86: Adjust base addr for PCI MCFG regions 2024-01-18 15:24:35 -08:00
sgx sys: Remove $FreeBSD$: one-line .c pattern 2023-08-16 11:54:36 -06:00
vmm bhyve: style, add comma to the last line of designated initializer 2025-04-04 03:54:06 +03:00
Makefile sys: Remove $FreeBSD$: one-line sh pattern 2023-08-16 11:54:58 -06:00