A race in g_part_wither() can lead to I/O being performed with a freed GEOM
when the device disappears. Close the race as best as we can for now,
following the code patterns from g_part_ctl_destroy() and g_part_ctl_undo().
This also fixes a leak, as g_wither_geom() does not wither providers, it
only orphans them, so the partition entries would never get destroyed in
g_wither_washer().
Note, this is not a complete fix, it can still race with g_part_start(), the
race has merely been narrowed.
Reviewed by: markj
Sponsored by: Dell EMC Isilon
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.
Top commit in Linux source tree:
69973b830859bc6529a7a0468ba0d80ee5117826
Sponsored by: Mellanox Technologies
When the owner of the wire reference releases the last reference, it
might be that the page was already attempted to be freed (but free
cannot be performed at that time due to wire). Check that the page
was removed from the object as the indicator of the free attempt and
finish the free operation if so.
Reported and tested by: Slava Shwartsman
Reviewed by: hselasky
Sponsored by: Mellanox Technologies
MFC after: 1 week
Suppose that we have an object with a mapped superpage, and that all
pages in the superpages are held (by some driver). Additionally,
suppose that the object is terminated, e.g. because the only process
mapping it is exiting. Then the reservation is broken, but the pages
cannot be freed until later, when they are unheld. In this situation,
the reservation code cannot clean psind, since no pages are freed, and
the page is freed and then reused with invalid psind.
Clean psind on vm_reserv_break() to avoid the situation.
Reported and tested by: Slava Shwartsman
Reviewed by: markj
Sponsored by: Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D14335
Do not use C constant suffixes. Bit values are small enough to not
require typing, despite they are used for 64bit MSR writes. The added
cast in hw_ibrs_recalculate() is redundand but I prefer to add it for
clarity.
Reported by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
significant source of cache line contention from vm_page_alloc(). Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.
Reviewed by: markj
Discussed with: kib, glebius
Tested by: pho (earlier version)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14273
ip6_calcmtu() only looks at the interface MTU if neither the TCP hostcache
nor the route provides an MTU. Update the routes so they do not provide
stale MTUs.
This fixes UNH IPv6 conformance test cases v6LC_4_1_08 and v6LC_4_1_09,
which use a RA to reduce the link MTU from 1500 to 1280.
Reported and tested by: Farrell Woods <Farrell_Woods@Dell.com>
Reviewed by: dab, melifaro
Discussed with: ae
MFC after: 1 week
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D14257
Previously, the address regions described by disabled admatch entries would
be treated as being mapped to the given core; while incorrect, this was
essentially harmless given that the entries describe unused address space
on the few affected devices.
We now perform parsing of per-core admatch registers and interrupt flags in
siba_erom, correctly skip any disabled admatch entries, and use the
siba_erom API in siba_add_children() to perform enumeration of attached
cores.
read or write of all registered realtime clocks. In the read case, the
values read are simply discarded. For writes, there's no alternative but
to actually write the current system time to the device.
makes them consistent with the way other page-table pages are allocated.
It also provides the rest of the VM system a good clue that these pages
are used.
Reviewed by: alc, kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14269
liblua glues the lua run time into the boot loader. It implements all
the runtime routines that lua expects. In addition, it has a few
standard 'C' headers that nueter various aspects of the LUA build that
are too specific to lua to be in libsa. Many refinements from the
original code to improve implementation and the number of included lua
libraries. Use int64_t for lua_Number. Have "/boot/lua" be the default
module path. Numerous cleanups from the original GSoC project,
including hacking libsa to allow lua to be built with only one change
outside luaconf.h.
Add the final bit of lua glue to bring in liblua and plug into the
multiple interpreter framework, previously committed.
Add LOADER_LUA option, currently off by default.
Presently, this is an experimental option. One must opt-in to using
this by defining WITH_LOADER_LUA and WITHOUT_FORTH. It's been
lightly tested, so keep a backup copy of your old loader handy.
The menu code, coming in the next commit, hasn't been exhaustively
tested. A LUA boot loader is 60k larger than a FORTH one, which is
80k larger than a no-interpreter one. Subtle changes in size
may tip things past some subtle limit (the binary is ~430k now
when built with LUA). A future version may offer coexistance.
Bump FreeBSD version to 1200058 to mark the milestone.
Pedro Souza's 2014 Summer of Code project. Rui Paulo, Pedro Arthur,
Zakary Nafziger and Wojciech A. Koszek also contributed. Warner Losh
reworked it extensively into its current form.
Obtained from: https://wiki.freebsd.org/SummerOfCode2014/LuaLoader
Sponsored by: Google Summer of Code
Relnotes: Yes
MFC After: 1 month
Differential Review: https://reviews.freebsd.org/D14295
We don't support older compilers. Most of the code in these files is
for pre-3.0 gcc, which is at least 15 years obsolete. Move to using
phk's sys/_stdargs.h for all these platforms.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14323
The usage is simpler, documented, and more common.
Reviewed by: cem
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14227
Line up the macro definitions and names here like the machine/stdarg.h
files that this replaced. This is easier to read and also makes it
easier to match up with other includes. Also two space indent va_end
to match the rest of the surrounding if block.
later by TCP-MD5 code.
This fixes the problem with broken TCP-MD5 over IPv4 when NIC has
disabled TCP checksum offloading.
PR: 223835
MFC after: 1 week
the module interface.
This is the more common approach and the syscall_helper interface is
easier to understand.
Reviewed by: jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14251
This is a first part of the change. It makes the drivers to calculate
the required number of chain frames to satisfy worst case scenarios, but
it does not change existing overly strict limits on them. The next step
will be to rewrite the allocator to not require megabytes of physically
contiguous address space, that may be problematic if done after boot,
after doing which the limits can be removed. Until that this code can
just correct user set limits, if they are set too high.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14261
r328113 and affecting SMP systems.
The way the time is set on PowerMacs is racy and relies on all the
CPUs in the system setting a register simultaneously in a rendezvous. A
few-cycle delay can result in out-of-sync times, which can break the
scheduler and result in calls like mtx_sleep() and pause() never timing out
if the thread is migrated while sleeping. r328113 added a call to a no-op
function between the beginning of the rendezvous and setting the time that
was only called on APs and added enough cycles to cause a problematic offset.
For some reason, the fan-management code was the first place this appeared.
Clue from: andreast
Reported by: many
to make the order of operations clearer to avoid the race condition
that was fixed in r328914. In particular, this commit corrects a
similar race that existed in the soft updates callback.
Doing some sleuthing through the SVN repository, it appears that
bufdone_finish() was added to support XFS:
------------------------------------------------------------------------
r153192 | rodrigc | 2005-12-06 19:39:08 -0800 (Tue, 06 Dec 2005) | 13 lines
Changes imported from XFS for FreeBSD project:
- add fields to struct buf (needed by XFS)
- 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3
- b_pin_count, count of pinned buffer
- add new B_MANAGED flag
- add breada() function to initiate asynchronous I/O on read-ahead blocks.
- add bufdone_finish(), bpin(), bunpin_wait() functions
Patches provided by: kan
Reviewed by: phk
Silence on: arch@
------------------------------------------------------------------------
It does not appear to ever have been used for anything else. XFS was
disconnected in r241607:
------------------------------------------------------------------------
r241607 | attilio | 2012-10-16 03:04:00 -0700 (Tue, 16 Oct 2012) | 5 lines
Disconnect non-MPSAFE XFS from the build in preparation for dropping
GIANT from VFS.
This is not targeted for MFC.
------------------------------------------------------------------------
and removed entirely in r247631:
------------------------------------------------------------------------
r247631 | attilio | 2013-03-02 07:33:54 -0800 (Sat, 02 Mar 2013) | 5 lines
Garbage collect XFS bits which are now already completely disconnected
from the tree since few months.
This is not targeted for MFC.
------------------------------------------------------------------------
Since XFS support is gone, there is no reason to retain biodone_finish().
Suggested by: Warner Losh (imp)
Discussed with: cem, kib
Tested by: Peter Holm (pho)
mappings for the pages used for the kernel and some initial allocations
used for the page table. It maps the kernel and the blocks used for
these initial allocations using 2MB pages.
However, if the kernel does not end on a 2MB boundary, it still maps the
last portion using a 2MB page, but reports that the unused 4K blocks
within this 2MB allocation are free physical blocks. This means that
these same physical blocks could also be mapped elsewhere - for example,
into a user process. Given the proximity to the kernel text and data
area, it seems wise to avoid allowing someone to write data to physical
blocks also mapped into these virtual addresses.
(Note that this isn't a security vulnerability: the direct map makes
most/all memory on the system mapped into kernel space. And, nothing
in the kernel should be trying to access these pages, as the virtual
addresses are unused. It simply seems wise to avoid reusing these
physical blocks while they are mapped to virtual addresses so close
to the kernel text and data area.)
Consequently, let's reserve the physical blocks covered by the
page-table mappings for these initial allocations.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D14268
size of UMA zone allocation is greater than page size. In this case zone
of zones can not use UMA_MD_SMALL_ALLOC, and we need to postpone switch
off of this zone from startup_alloc() until full launch of VM.
o Always supply number of VM zones to uma_startup_count(). On machines
with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over
a page. In the latter case account VM zones for number of allocations
from the zone of zones.
o Rewrite startup_alloc() so that it will immediately switch off from
itself any zone that is already capable of running real alloc.
In worst case scenario we may leak a single page here. See comment
in uma_startup_count().
o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some
extra SYSINITs, e.g. vm_page_init() may sneak in before.
o While here, remove uma_boot_pages_mtx. With recent changes to boot
pages calculation, we are guaranteed to use all of the boot_pages
in the early single threaded stage.
Reported & tested by: mav
icmp6_redirect_input() validates that a redirect packet came from the
current gateway for the respective destination. To do this, it compares
the source address, which has an embedded scope zone id, to the next-hop
address, which does not. If the address is link-local, which should be
the case, the comparison fails and the redirect is ignored.
Insert the scope zone id into the next-hop address so the comparison
is accurate.
Unsurprisingly, this fixes 35 UNH IPv6 conformance test cases.
Submitted by: Farrell Woods <Farrell_Woods@Dell.com> (initial revision)
Reviewed by: ae melifaro dab
MFC after: 1 week
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D14254
folks running filesystems created on check-hash enabled kernels
(which I will call "new") on a non-check-hash enabled kernels (which
I will call "old). The idea here is to detect when a filesystem is
run on an old kernel and flag the filesystem so that when it gets
moved back to a new kernel, it will not start getting a slew of
check-hash errors.
Back when the UFS version 2 filesystem was created, it added a file
flag FS_INDEXDIRS that was to be set on any filesystem that kept
some sort of on-disk indexing for directories. The idea was precisely
to solve the issue we have today. Specifically that a newer kernel
that supported indexing would be able to tell that the filesystem
had been run on an older non-indexing kernel and that the indexes
should not be used until they had been rebuilt. Since we have never
implemented on-disk directory indicies, the FS_INDEXDIRS flag is
cleared every time any UFS version 2 filesystem ever created is
mounted for writing.
This commit repurposes the FS_INDEXDIRS flag as the FS_METACKHASH
flag. Thus, the FS_METACKHASH is definitively known to have always
been cleared. The FS_INDEXDIRS flag has been moved to a new block
of flags that will always be cleared starting with this commit
(until they get used to implement some future feature which needs
to detect that the filesystem was mounted on a kernel that predates
the new feature).
If a filesystem with check-hashes enabled is mounted on an old
kernel the FS_METACKHASH flag is cleared. When that filesystem is
mounted on a new kernel it will see that the FS_METACKHASH has been
cleared and clears all of the fs_metackhash flags. To get them
re-enabled the user must run fsck (in interactive mode without the
-y flag) which will ask for each supported check hash whether it
should be rebuilt and enabled. When fsck is run in its default preen
mode, it will just ignore the check hashes so they will remain
disabled.
The kernel has always disabled any check hash functions that it
does not support, so as more types of check hashes are added, we
will get a non-surprising result. Specifically if filesystems get
moved to kernels supporting fewer of the check hashes, those that
are not supported will be disabled. If the filesystem is moved back
to a kernel with more of the check-hashes available and fsck is run
interactively to rebuild them, then their checking will resume.
Otherwise just the smaller subset will be checked.
A side effect of this commit is that filesystems running with
cylinder-group check hashes will stop having them checked until
fsck is run to re-enable them (since none of them currently have
the FS_METACKHASH flag set). So, if you want check hashes enabled
on your filesystems after booting a kernel with these changes, you
need to run fsck to enable them. Any newly created filesystems will
have check hashes enabled. If in doubt as to whether you have check
hashes emabled, run dumpfs and look at the list of enabled flags
at the end of the superblock details.
Fix for #31362 - ms_abi is implemented incorrectly for values >=16
bytes.
Summary:
This patch is a fix for following issue:
https://bugs.llvm.org/show_bug.cgi?id=31362 The problem was caused by
front end lowering C calling conventions without taking into account
calling conventions enforced by attribute. In this case win64cc was
no correctly lowered on targets other than Windows.
Reviewed By: rnk (Reid Kleckner)
Differential Revision: https://reviews.llvm.org/D43016
Author: belickim <mateusz.belicki@intel.com>
This fixes clang 6.0.0 assertions when building the emulators/wine and
emulators/wine-devel ports, and should also make it use the correct
Windows calling conventions. Bump __FreeBSD_version to make the fix
easy to detect.
PR: 224863
MFC after: 3 months
X-MFC-With: r327952
Use syscall_helper_register() to register syscalls and do it through the
module interface rather than sysinit.
This pattern is more common and easier to understand.
Reviewed by: jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14232
While the bug itself was serious, as we could either pass a non-busied
page to vm_pager_get_pages() or leak a busy page, it could only be
triggered under a very rare condition where the page is already inserted
into the object, but it is not valid yet.
Reviewed by: kib
MFC after: 2 weeks
o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
separate arrays. 65535 limit to maximum number of hash buckets now
removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
and for parent states.
o By default now is used Jenkinks hash function.
Obtained from: Yandex LLC
MFC after: 42 days
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D12685