Process core notes for a 32-bit process running on a 64-bit host need to
use 32-bit structures so that the note layout matches the layout of notes
of a core dump of a 32-bit process under a 32-bit kernel.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D11407
struct g_kevent_args.
On some architectures, e.g. PowerPC, there is additional padding in uap.
Reported and tested by: andreast
Sponsored by: The FreeBSD Foundation
The implementation is using fixed size array allocated in asm module,
need to use proper array declaration for C source.
CID: 1376405
Reported by: Coverity, cem
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D11321
which is able to manipulate the clock and data lines directly.
When an i2c bus is hung by a slave device stuck in the middle of a
transaction that didn't complete properly, this function manipulates the
clock and data lines in a sequence known to reliably reset slave devices.
The most common cause of a hung i2c bus is a system reboot in the middle of
an i2c transfer (so it doesnt' happen often, but now there is a way other
than power cycling to recover from it).
nostop option is set, if a start was issued.
The nostop option doesn't mean "never issue a stop" it means "only issue
a stop after the last in a series of transfers". If the transfer ends
due to error, then that was the last transfer in the series, and a stop
is required.
Before this change, any error during a transfer when nostop is set would
effectively hang the bus, because sc->started would never get cleared,
and that caused all future calls to iicbus_start() to return an error
because it looked like the bus was already active. (Unrelated errors in
handling the nostop option, to be addressed separately, could lead to
this bus hang condition even on busses that don't set the nostop option.)
If an NFSv3 server were to reply with weak cache consistency attributes,
but not post operation attributes, the client would use garbage attributes
from memory. This was spotted during work on the code for the NFSv4.1 client.
I have never seen evidence that this happens and it wouldn't make sense
for an NFSv3 server to do this, so this patch is basically "theoretical",
but does fix the problem if a server were to do the above.
PR: 219552
MFC after: 2 weeks
When a pin is set for input the value in the DR will be the same as the PSR.
When a pin is set for output the value in the DR is the value output to the
pad, and the value in the PSR is the actual electrical level sensed on the
pad, and they can be different if the pad is configured for open-drain mode
and some other entity on the board is driving the line low.
group. Change all code points that open-coded this functionality
to use the new function. This commit is a refactoring with no
change in functionality.
In the future this change allows more robust checking of cylinder
group reads along the lines discussed in the hardening UFS session
at BSDCan (retry I/O, add checksums, etc). For more detail see the
session notes at https://wiki.freebsd.org/DevSummit/201706/HardeningUFS
Reviewed by: kib
The implementation of ZFS refcount_t uses the emulated illumos mutex
(the sx lock) and the waiting memory allocation when ZFS_DEBUG is
enabled. This makes refcount_t unsuitable for use in GEOM g_up
thread where sleeping is prohibited.
When importing the ABD change I modified vdev_geom using illumos
vdev_disk as an example. As a result, I added a call to abd_return_buf
in vdev_geom_io_intr. The latter is called on g_up thread while the
former uses refcount_t.
This change fixes the problem by deferring the abd_return_buf call to
the previously unused vdev_geom_io_done that is called on a ZFS zio
taskqueue thread where sleeping is allowed.
A side bonus of this change is that now a vdev zio has a pointer
to its corresponding bio while the zio is active.
Reported by: Shawn Webb <shawn.webb@hardenedbsd.org>
Tested by: Shawn Webb <shawn.webb@hardenedbsd.org>
MFC after: 1 week
X-MFC with: r320156
This finishes what r245164 started and makes open(..., O_APPEND) work again
after r299753.
- Pass ioflags, incl. IO_APPEND, down to the direct write backend (r245164
added it to only the bio backend).
- (r299753 changed the WRONLY backend from bio to direct.)
PR: 220185
Reported by: Ben RUBSON <ben.rubson at gmail.com>
Reviewed by: bapt@, rmacklem@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11348
a hint.
Right now, for non-fixed mmap(2) calls, addr is de-facto interpreted
as the absolute minimal address of the range where the mapping is
created. The VA allocator only allocates in the range [addr,
VM_MAXUSER_ADDRESS]. This is too restrictive, the mmap(2) call might
unduly fail if there is no free addresses above addr but a lot of
usable space below it.
Lift this implementation limitation by allocating VA in two passes.
First, try to allocate above addr, as before. If that fails, do the
second pass with less restrictive constraints for the start of
allocation by specifying minimal allocation address at the max bss
end, if this limit is less than addr.
One important case where this change makes a difference is the
allocation of the stacks for new threads in libthr. Under some
configuration conditions, libthr tries to hint kernel to reuse the
main thread stack grow area for the new stacks. This cannot work by
design now after grow area is converted to stack, and there is no
unallocated VA above the main stack. Interpreting requested stack
base address as the hint provides compatibility with old libthr and
with (mis-)configured current libthr.
Reviewed by: alc
Tested by: dim (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
the "effective MSS" for the connection. The chip expects it this way.
Submitted by: Krishnamraju Eraparaju @ Chelsio
MFC after: 3 days
Sponsored by: Chelsio Communications
If a peripheral driver (e.g. da, sa, cd) is added or removed from the
peripheral driver list while an unrelated peripheral driver instance (e.g.
da0, sa5, cd2) is going away and is inside camperiphfree(), we could
dereference an invalid pointer.
When peripheral drivers are added or removed (see periphdriver_register()
and periphdriver_unregister()), the peripheral driver array is resized
and existing entries are moved.
Although we hold the topology lock while we traverse the peripheral driver
list, we retain a pointer to the location of the peripheral driver pointer
and then drop the topology lock. So we are still vulnerable to the list
getting moved around while the lock is dropped.
To solve the problem, cache a copy of the peripheral driver pointer. If
its storage location in the list changes while we have the lock dropped, it
won't have any effect.
This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed
to individual instances like "da0", "cd0") are not generally part of a
reference counting scheme to guard against deregistering them while there
are instances active. The caller (generally the person unloading a module)
has to be aware of active drivers and not unload something that is in use.
sys/cam/cam_periph.c:
In camperiphfree(), cache a pointer to the peripheral driver
instance to avoid holding a pointer to an invalid memory location
in the event that the peripheral driver list changes while we have
the topology lock dropped.
PR: kern/219701
Submitted by: avg
MFC after: 3 days
Sponsored by: Spectra Logic
Without the allocation length set, the target will either reject
the command or complete it without transferring any data.
This fixes the REPORT ZONES command for SCSI ZBC protocol devices,
as well as ATA ZAC protocol devices that are behind a SCSI to ATA
translation layer. (LSI/Broadcom's 12Gb SAS adapters translate ZBC
commands to ZAC commands.) Those are Host Aware and Host Managed SMR
drives.
This will fix REPORT ZONE commands sent to the da(4) driver via the
GEOM bio interface and zonectl, and REPORT ZONE commands sent from
camcontrol(8).
Note that in the case of camcontrol(8), we currently only send
SCSI ZBC commands to native SCSI protocol devices, not ATA devices
behind a SAT layer.
sys/cam/scsi/scsi_da.c:
Fill in the length field in scsi_zbc_in().
MFC after: 3 days
Sponsored by: Spectra Logic
and "next_skip" variables. The "skip" value in struct blist has long been
a 64-bit quantity but various functions have implicitly truncated this
value to 32 bits. Now, all arithmetic involving the "skip" value is 64
bits wide. (This should allow us to relax the size limit on a swap device
in the swap pager.)
Maintain the ability to test this allocator as a user-space application by
including <stdbool.h>.
Remove an unused variable from blst_radix_print().
Reviewed by: kib, markj
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D11358
changed from %d to a long-width type.
Use uintmax_t casting and %ju to futureproof the format string against
potential changes with either the #define or the implementation-specific
definition for offsetof(..).
The fields exist on all versions of the filesystem and using them is a mount
option on linux. For FreeBSD, the corresponding i_uid and i_gid are always
long enough so use them by default.
Reviewed by: Fedor Uporov
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D11354
Without disabling interrupts it's possible for another thread to preempt
and update the registers post-read (tlb1_read_entry) or pre-write
(tlb1_write_entry), and confuse the kernel with mixed register states.
MFC after: 2 weeks
Only amd64 (because of i386) needs 32-bit time_t compat now, everything else is
64-bit time_t. Rather than checking on all 64-bit time_t archs, only check the
oddball amd64/i386.
Reviewed By: emaste, kib, andrew
Differential Revision: https://reviews.freebsd.org/D11364
RPI1-B, Alix and APU2 boards as well as NanoBSD with the following message:
vnode_pager_generic_getpages_done: I/O read error 5
Seems the breakage was because it was missed to include acr in glabel update.
Reported by: Peter Blok <pblok@bsd4all.org>,
madpilot, imp and trasz.
Reviewed by: trasz
Tested by: Peter Blok and madpilot.
MFC after: 3 days.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D11365
After r307132 the sbuf buffer is malloc()ed, but corresponding
sbuf_delete() call was missing.
Fix a nearby whitespace bug.
MFC after: 3 days
Sponsored by: Dell EMC Isilon
vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1,
it forces the NFSv4 server to return numeric UIDs and GIDs instead
of "user@domain" strings. This helps with clients that can't
translate returned identifiers, eg when rerooting.
The same can be achieved by just never running nfsuserd(8),
but the sysctl is useful to toggle the behaviour back and forth
without rebooting.
Reviewed by: rmacklem (earlier version)
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11326
3306 zdb should be able to issue reads in parallel
illumos/illumos-gate/31d7e8fa33fae995f558673adb22641b5aa8b6e1
https://www.illumos.org/issues/3306
The upstream change was made before we started to import upstream commits
individually. It was imported into the illumos vendor area as r242733.
That commit was MFV-ed in r260138, but as the commit message says
vdev_file.c was left intact.
This commit actually implements the parallel I/O for vdev_file using a
taskqueue with multiple thread. This implementation does not depend on
the illumos or FreeBSD bio interface at all, but uses zio_t to pass
around all the relevent data. So, the code looks a bit different from
the upstream.
This commit also incorporates ZoL commit
zfsonlinux/zfs/bc25c9325b0e5ced897b9820dad239539d561ec9 that fixed
https://github.com/zfsonlinux/zfs/issues/2270
We need to use a dedicated taskqueue for exactly the same reason as ZoL
as we do not implement TASKQ_DYNAMIC.
Obtained from: illumos, ZFS on Linux
MFC after: 2 weeks
AKA Make time_t 64 bits on powerpc(32).
PowerPC currently (until now) was one of two architectures with a 32-bit time_t
on 32-bit archs (the other being i386). This is an ABI breakage, so all ports,
and all local binaries, *must* be recompiled.
Tested by: andreast, others
MFC after: Never
Relnotes: Yes
A NFSv4.1/pNFS server using File Layout can specify that Commit operations
are to be done against the DS instead of MDS. Since no extant pNFS
server did this, the code was untested and "#ifdef notyet".
The FreeBSD pNFS server I am developing does specify that Commits be done
through the DS, so the code has been enabled/tested.
This patch should only affect the case of a pNFS server that specfies
Commits through the DS.
PR: 219551
MFC after: 2 weeks
the requested protection.
The syscall returns success without changing the protection of the
guard. This is consistent with the current mprotect(2) behaviour on
the unmapped ranges. More important, the calls performed by libc and
libthr to allow execution of stacks, if requested by the loaded ELF
objects, do the expected change instead of failing on the grow space
guard.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
If mmap(2) is called with the MAP_STACK flag and the size which is
less or equal to the initial stack mapping size plus guard,
calculation of the mapping layout created zero-sized guard. Attempt
to create such entry failed in vm_map_insert(), causing the whole
mmap(2) call to fail.
Fix it by adjusting the initial mapping size to have space for
non-empty guard. Reject MAP_STACK requests which are shorter or equal
to the configured guard pages size.
Reported and tested by: Manfred Antar <null@pozo.com>
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
fixed size for the name buffer PFS_NAMELEN.
As r318736 was commited (ino64 project) the size of the permanent part
of the struct dirent was changed, so calulate PFS_DELEN properly.
We set the shareability attributes in TCR_EL1 on boot. These tell the
hardware the pagetables are in cached memory so there is no need to flush
the entries from the cache to memory.
This has about 4.2% improvement in system time and 2.7% improvement in
user time for a buildkernel -j48 on a ThunderX.
Keep the old code for now to allow for further comparisons.
It distinguishes between data flow sockets and listening sockets, and
in case of the latter doesn't change resource limits, since listening
sockets don't hold any buffers, they only carry values to be inherited
by their children.
When the NFSv4.1 client is doing pNFS, it needs to get an Open and
a Layout for every file it will be doing I/O on. The current code
does two separate RPCs to get these. This patch adds two new compounds
that do the both the Open and LayoutGet in the same RPC, reducing the
RPC count.
It also factors out the code that sets up and parses the LayoutGet operation
into separate functions, so that the code doesn't get duplicated for
these new RPCs.
This patch is fairly large, but should only affect the NFSv4.1 client
when the "pnfs" option is specified.
PR: 219550
MFC after: 2 weeks
Decouple the pageout cluster size from the size of the hash table entry
used by the swap pager for mapping (object, pindex) to a block on the
swap device(s), and keep the size of a hash table entry at its current
size.
Eliminate a pointless macro.
Reviewed by: kib, markj (an earlier version)
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D11305
Guard, requested by the MAP_GUARD mmap(2) flag, prevents the reuse of
the allocated address space, but does not allow instantiation of the
pages in the range. It is useful for more explicit support for usual
two-stage reserve then commit allocators, since it prevents accidental
instantiation of the mapping, e.g. by mprotect(2).
Use guards to reimplement stack grow code. Explicitely track stack
grow area with the guard, including the stack guard page. On stack
grow, trivial shift of the guard map entry and stack map entry limits
makes the stack expansion. Move the code to detect stack grow and
call vm_map_growstack(), from vm_fault() into vm_map_lookup().
As result, it is impossible to get random mapping to occur in the
stack grow area, or to overlap the stack guard page.
Enable stack guard page by default.
Reviewed by: alc, markj
Man page update reviewed by: alc, bjk, emaste, markj, pho
Tested by: pho, Qualys
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D11306 (man pages)
The issue is catched by "vm_map_wire: alien wire" KASSERT at the end
of the vm_map_wire(). We currently check for MAP_ENTRY_WIRE_SKIPPED
flag before ensuring that the wiring_thread is curthread. For HOLESOK
wiring, this means that we might see WIRE_SKIPPED entry from different
wiring.
The fix it by only checking WIRE_SKIPPED if the entry is put
IN_TRANSITION by us. Also fixed a typo in the comment explaining the
situation.
Reported and tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
H2+ SoC is a stripped down version of H3 without gigabit ethernet and 4K HDMI.
Also add sun8i-h2-plus-orangepi-zero.dts to the build as we run on this board.
The smbios code does a lot of unaligned access, since we don't really
care about smbios info on ARM (not all board expose information and those
who does don't expose useful ones) disable smbios for this arch (at least
for now).
dim@ compared clang IAS-built and GNU as-built boot0 and found them
equivalent. IAS encoded one instruction using two bytes where GNU as
used three, and another instruction using three bytes where GNU as used
two. The net result is equivalent and tested, so there is no need to
force IAS off for boot0.
A Build-ID is an identifier generated at link time to uniquely identify
ELF binaries. It allows efficient confirmation that an executable or
shared library and a corresponding standalone debuginfo file match.
(Otherwise, a checksum of the debuginfo file must be calculated when
opening it in a debugger.)
The FreeBSD base system includes GNU bfd ld 2.17.50 as the linker for
architectures other than arm64. Build-ID support was added to bfd ld
shortly after that version, so was not previously available to us.
We can now start making use of Build-ID as we migrate to using lld or
bfd ld from ports, conditionally enabled based on the LINKER_TYPE and
LINKER_VERSION make variables added in r320244 and subsequent commits.
Reviewed by: dim
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D11314
This allows them to be sent in a non truncated way and addresses a warning
given by newver versions of gcc.
Thanks to Anselm Jonas Scholl for reporting it and providing a patch.
FreeBSD note: the actual change has been in FreeBSD since r297848. This
commit accounts for integration of that change with subsequent changes,
especially r320156 (MFV of r318946) and r314274.
illumos/illumos-gate@403a8da73c403a8da73chttps://www.illumos.org/issues/5220
There are disk devices that have logical sector size larger than 512B, for
example 4KB. That is, their physical sector size is larger than 512B and they
do not provide emulation for 512B sector sizes. For such devices both a data
offset and a data size must be properly aligned. L2ARC should arrange that
because it uses physical I/O.
zio_vdev_io_start() performs a necessary transformation if io_size is not
aligned to vdev_ashift, but that is done only for logical I/O. Something
similar should be done in L2ARC code.
* a temporary write buffer should be allocated if the original buffer is
not going to be compressed and its size is not aligned
* size of a temporary compression buffer should be ashift aligned
* for the reads, if a size of a target buffer is not sufficiently large and
it is not aligned then a temporary read buffer should be allocated
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 3 weeks
illumos/illumos-gate@0255edcc850255edcc85https://www.illumos.org/issues/8056
The send size estimate for a zvol can be too low, if the size of the record
headers (dmu_replay_record_t's) is a significant portion of the size.
This is typically the case when the data is highly compressible, especially
with embedded blocks.
The problem is that dmu_adjust_send_estimate_for_indirects() assumes that
blocks are the size of the "recordsize" property (128KB).
However, for zvols, the blocks are the size of the "volblocksize" property
(8KB). Therefore, we estimate that there will be 16x less record headers than
there really will be.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Paul Dagnelie <pcd@delphix.com>
MFC after: 3 weeks
FreeBSD note: this commit removes small differences between what mav
committed to FreeBSD in r308782 and what ended up committed to illumos
after addressing all review comments.
illumos/illumos-gate@c5ee46810fc5ee46810fhttps://www.illumos.org/issues/7578
After some ZIL changes 6 years ago zil_slog_limit got partially broken
due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
Actually because of other changes about that time zl_itx_list_sz is not
really required to implement the functionality, so this patch removes
some unneeded broken code and variables.
Original idea of zil_slog_limit was to reduce chance of SLOG abuse by
single heavy logger, that increased latency for other (more latency critical)
loggers, by pushing heavy log out into the main pool instead of SLOG. Beside
huge latency increase for heavy writers, this implementation caused double
write of all data, since the log records were explicitly prepared for SLOG.
Since we now have I/O scheduler, I've found it can be much more efficient
to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.
Existing ZIL implementation had problem with space efficiency when it
has to write large chunks of data into log blocks of limited size. In some
cases efficiency stopped to almost as low as 50%. In case of ZIL stored on
spinning rust, that also reduced log write speed in half, since head had to
uselessly fly over allocated but not written areas. This change improves
the situation by offloading problematic operations from z*_log_write() to
zil_lwb_commit(), which knows real situation of log blocks allocation and
can split large requests into pieces much more efficiently. Also as side
effect it removes one of two data copy operations done by ZIL code WR_COPIED
case.
While there, untangle and unify code of z*_log_write() functions.
Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
block boundary, that may also improve efficiency if ZPL is made to do that.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alexander Motin <mav@FreeBSD.org>
MFC after: 3 weeks
Relocatable linking in aarch64 ld from binutils 2.25.1 does not work.
The linker corrupts the references to the external symbols which are
defined by other object in the linking set and should therefore lose
the GOT entry.
The problem is fixed in later versions of GNU ld and does not exist in
the in-tree lld linker that we now use by default for arm64, so the
workaround can be removed.
Reviewed by: kib
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D11302
The EFI memory descriptor 64-bit aligns PhysicalStart on both 32- and
64-bit platforms. Make the padding explicit for i386 EFI.
Submitted by: Siva Mahadevan <smahadevan@freebsdfoundation.org>
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D11301
- Rename _SKIP_READ_DEPEND to _SKIP_DEPEND since it also avoids writing.
- This now uses .NOMETA to avoid reading any .meta files related to
DEPENDOBJS. Objects not in OBJS/DEPENDOBJS may still have their .meta
files read in if they are in the dependency graph.
- This also avoids statting .meta and .depend files in the META_MODE +
-DNO_FILEMON case.
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
ext4 on linux has always supported more than 32000 directories through
the dir_nlink feature, but FreeBSD was unable to catch up on this feature.
As part of the 64 bit inode changes nlink_t has been extended and this
feature is now possible.
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D11210
initialized.
bdrewery@ has reported panics "newnfs_copycred: negative nfsc_ngroups".
The only way I can see that this occurs is that the credentials field of
the open structure gets used before being filled in.
I am not sure quite how this happens, but for the file create case, the
code is serialized via the vnode lock on the directory. If, somehow, a
link to the same file gets created just after file creation, this might
occur.
This patch ensures that the credentials field is initialized to a reasonable
set of credentials before the structure is linked into any list, so I
this should ensure it is initialized before use.
I am committing the patch now, since bdrewery@ notes that the panics
are intermittent and it may be months before he knows if the patch fixes
his problem.
Reported by: bdrewery
MFC after: 2 weeks
instantiated.
Calling pmap_copy() on non-faulted anonymous memory entries is useless.
Noted and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This patch disables outer cache sync in PL310 driver
by adding "arm,io-coherent" property. In addition to
the previous patches it was the last bit needed
for enabling proper operation of Armada 38x SoCs
with the IO cache coherency.
Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: mmel
Differential revision: https://reviews.freebsd.org/D11204
Armada 38x SoCs, in order to work properly in IO-coherent mode,
requires an update of the MBUS windows attributesd.
This patch also configures nexus coherent dma tag, because all
busses and children devices have to inherit this setting in runtime.
The latter has to be executed as a sysinit (SI_SUB_DRIVERS type),
so that bus_dma_tag_create() can be executed properly.
Submitted by: Michal Mazur <mkm@semihalf.com>
Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: ian
Differential revision: https://reviews.freebsd.org/D11203
Allow to set the dma tag for nexus in the platform init code,
so that all busses and devices would be able to inherit it.
This change is useful e.g. for setting coherent dma tag for
the platforms with hardware IO cache coherency.
Submitted by: ian
Michal Mazur <mkm@semihalf.com>
Reviewed by: ian
Differential revision: https://reviews.freebsd.org/D11202
- Inherit BUS_DMA_COHERENT flag from parent buses
- Use cacheable memory attributes on dma coherent platform
- Disable cache synchronization on coherent platform
Changes are based on ARMv8 busdma code and commit r299683.
Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: ian
Differential revision: https://reviews.freebsd.org/D11201
Add io_mapping_init_wc() and add a third (unused) parameter to
io_mapping_map_wc().
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11286
memory map request. When the VM fault handler is NULL a return code of
VM_PAGER_BAD is returned from the character device's pager populate
handler. This fixes compatibility with Linux.
MFC after: 1 week
Sponsored by: Mellanox Technologies
All of the problems were related to the FreeBSD-only features.
One was caused by a mismerge in the zfsbootcfg support code.
All others were in the TRIM support code.
MFC after: 1 week
X-MFC with: r320156
All of the problems were related to the FreeBSD-only features.
One was caused by a mismerge in the zfsbootcfg support code.
All others were in the TRIM support code.
Reported by: ken,
O. Hartmann <ohartmann@walstatt.org>,
Trond Endrestøl <Trond.Endrestol@fagskolen.gjovik.no>
MFC after: 1 week
X-MFC with: r320156
On some windows hosts TEST_UNIT_READY command will return
SRB_STATUS_ERROR and sense data "NOT READY asc:3a,1 (Medium
not present - tray closed)", this occurs periodically, and
not hurt anything else. So, we prefer to ignore this kind
of errors.
PR: 219973
Submitted by: Hongjiang Zhang <hongzhan microsoft com>
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11271
ARM kernel modules require .text relocations (DT_TEXTREL) in shared
object ouptut, which is not allowed by default by lld. Add the -znotext
option to enable this. For simplicity add it unconditionally: it is
already default and thus either redundant (GNU BFD ld and gold from
ports) or ignored as an unknown option (GNU BFD ld 2.17.50 in the base
system).
Reviewed by: kib
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D11250
being overwritten, they are set only bits (cleared by hardware).
Disable the Acknowledge of the controller slave address. The slave mode is
not supported.
Make sure the interrupt flag bit is being cleared as recommended, add a
delay() _after_ clear the interrupt bit.
Sponsored by: Rubicon Communications, LLC (Netgate)
illumos/illumos-gate@770499e185770499e185https://www.illumos.org/issues/8021
The ARC buf data project (known simply as "ABD" since its genesis in the ZoL
community) changes the way the ARC allocates `b_pdata` memory from using linear
`void *` buffers to using scatter/gather lists of fixed-size 1KB chunks. This
improves ZFS's performance by helping to defragment the address space occupied
by the ARC, in particular for cases where compressed ARC is enabled. It could
also ease future work to allocate pages directly from `segkpm` for minimal-
overhead memory allocations, bypassing the `kmem` subsystem.
This is essentially the same change as the one which recently landed in ZFS on
Linux, although they made some platform-specific changes while adapting this
work to their codebase:
1. Implemented the equivalent of the `segkpm` suggestion for future work
mentioned above to bypass issues that they've had with the Linux kernel memory
allocator.
2. Changed the internal representation of the ABD's scatter/gather list so it
could be used to pass I/O directly into Linux block device drivers. (This
feature is not available in the illumos block device interface yet.)
FreeBSD notes:
- the actual (default) chunk size is 4KB (despite the text above saying 1KB)
- we can try to reimplement ABDs, so that they are not permanently
mapped into the KVA unless explicitly requested, especially on
platforms with scarce KVA
- we can try to use unmapped I/O and avoid intermediate allocation of a
linear, virtual memory mapped buffer
- we can try to avoid extra data copying by referring to chunks / pages
in the original ABD
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Dan Kimmel <dan.kimmel@delphix.com>
MFC after: 3 weeks
I think that the change is still good, but reconciling it with a planned
merge of the ARC buf data scatter-ization is a bit more tedious
than I can handle.
MFC after: 17 days
From the linux tune2fs(8) manpage:
"Allow the kernel to initialize bitmaps and inode tables and keep a high
watermark for the unused inodes in a filesystem, to reduce e2fsck(8) time.
This first e2fsck run after enabling this feature will take the full time,
but subsequent e2fsck runs will take only a fraction of the original time,
depending on how full the file system is."
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D11211
When a PL310 cache is used on a system that provides hardware
coherency, the outer cache sync operation is useless, and can be
skipped. Moreover, on some systems, it is harmful as it causes
deadlocks between the Marvell coherency mechanism, the Marvell PCIe
or Crypto controllers and the Cortex-A9.
To avoid this, this commit introduces a new Device Tree property
'arm,io-coherent' for the L2 cache controller node, valid only for the
PL310 cache. It identifies the usage of the PL310 cache in an I/O
coherent configuration. Internally, it makes the driver disable the
outer cache sync operation.
Note, that other outer-cache operations are not removed, as they may
be needed for certain situations, such as booting secondary CPUs.
Moreover, in order to enable IO coherent operation, the decision
whether to use L2 cache maintenance callbacks is done in busdma
layer, which was enabled in one of the previous commits.
Submitted by: Michal Mazur <mkm@semihalf.com>
Marcin Wojtas <mw@semihalf.com>
Reviewed by: mmel
Obtained from: Semihalf
Differential revision: https://reviews.freebsd.org/D11245
There is a hardware problem between Cortex-A9 CPUs and on-chip devices
in Armada 38X SoCs that may cause hang on heavy load. This can be
however worked around by mapping all registers and PCI IO
as strongly ordered instead of device memory.
Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Reviewed by: mmel
Tested by: mw_semihalf.com
Obtained from: Semihalf
Differential revision: https://reviews.freebsd.org/D10218
r320070 removed the definition of maxbcachebuf from sys/param.h to
fix the build for arm.
This patch adds the definition of maxbcachebuf to sys/buf.h, which
should be ok, since sys/buf.h is not being included in arm/arm/elf_note.S.
Suggested by: kib
MFC after: 2 weeks
Do not queue dmar_map_entries with zeroed gseq to
dmar_qi_invalidate_locked(). Zero gseq stops the processing in the qi
task. Do not assign possibly uninitialized on-stack gseq to map
entries when requeuing them on unit tlb_flush queue. Random garbage
in gsec is interpreted as too high invalidation sequence number and
again stop the processing in the task.
Make the sequence numbers generation completely contained in
dmar_qi_invalidate_locked() and dmar_qi_emit_wait_seq(). Upper code
directly passes boolean requesting emiting wait command instead of
trying to provide hint to avoid it by passing NULL gseq pointer.
Microoptimize the requeueing to tlb_flush queue by doing it for the
whole queue.
Diagnosed and tested by: Brett Gutstein <bgutstein@rice.edu>
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Most of the lock slowpaths assert that the calling thread isn't an idle
thread. However, this may not be true if the system has panicked, and in
some cases the assertion appears before a SCHEDULER_STOPPED() check.
MFC after: 3 days
Sponsored by: Dell EMC Isilon
If the user issues a MTIOCEXTGET ioctl, and the tape drive in question has
a serial number that is longer than 80 characters, we malloc a buffer in
saextget() to hold the output of cam_strvis().
Since a mutex is held in that codepath, doing a M_WAITOK malloc could lead
to sleeping while holding a mutex. Change it to a M_NOWAIT malloc and bail
out if we fail to allocate the memory. Devices with serial numbers longer
than 80 bytes are very rare (I don't recall seeing one), so this
should be a very unusual case to hit. But it is a bug that should be fixed.
sys/cam/scsi/scsi_sa.c:
In saextget(), if we need to malloc a buffer to hold the output of
cam_strvis(), don't wait for the memory. Fail and return an error
if we can't allocate the memory immediately.
PR: kern/220094
Submitted by: Jia-Ju Bai <baijiaju1990@163.com>
MFC after: 3 days
Sponsored by: Spectra Logic
Since buildenv exports SYSROOT all of these uses will now look in
WORLDTMP by default.
sys/boot/efi/loader/Makefile
A LIBSTAND hack is no longer required for buildenv.
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
VM_MAP_WIRE_SYSTEM mode when wiring the newly grown stack.
System maps do not create auto-grown stack. Any stack we handled,
even for P_SYSTEM, must be for user address space. P_SYSTEM processes
with mapped user space is either init(8) or an aio worker attached to
other user process with aio buffer pointing into stack area. In either
case, VM_MAP_WIRE_USER mode should be used.
Noted and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
device nodes.
Otherwise, the current check of aio_offset == -1LL makes it possible
to pass negative file offsets down to the filesystems. This trips
assertions and is even unsafe for e.g. FFS which keeps metadata at
negative offsets.
Reported and tested by: pho
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D11266
Starting with DTS from Linux 4.11, the pins list, function, drive and pull
are no longer prefixed with "allwinner,".
Allow the pinctrl driver to handle both case.
The code still doesn't use d_off. That will come in a future commit.
The code also removes the checks for servers returning a fileno that
doesn't fit in 32bits, since that should work ok now.
Bump __FreeBSD_version since this patch changes the interface between
the NFS kernel modules.
Reviewed by: kib
use the armv6 busdma interface. This interface uses more memory than
the armv4 one, but bounces more data more often so may be more correct
than the armv4 one. It is intended for debugging purposes only at the
moment.
load and unload it all the time since the buffer never changes. In
addition, we were loading it with a hardware spin lock held, which
makes the sleepable lock in busdma (for the bounce pages) trigger a
witness warning, as well as ipend being called with it held by uart,
which made it impossible to unload.
These differences don't matter with the v4 busdma implementation, but
they do with the v6 implementation since the latter likes to bounce
transactions more, and will always do so for Atmel's driver.
It's more efficient as well as being more correct.
We can have support for reading ext4 "huge" files but we can't write
(anything) on ext4. and some filesystem. Formally enable the feature so
that we can mount such filesystems.
Submitted by: Fedor Uponov
Differential Revision: https://reviews.freebsd.org/D11209
that disk writes are more likely to be sequential. This change is
beneficial on both the solid state and mechanical disks that I've
tested. (A similar change in allocation policy was made by DragonFly
BSD in 2013 to speed up Poudriere with "stressful memory parameters".)
Increase the width of blst_meta_alloc()'s parameter "skip" and the local
variables whose values are derived from it to 64 bits. (This matches the
width of the field "skip" that is stored in the structure "blist" and
passed to blst_meta_alloc().)
Eliminate a pointless check for a NULL blist_t.
Simplify blst_meta_alloc()'s handling of the ALL-FREE case.
Address nearby style errors.
Reviewed by: kib, markj
MFC after: 5 weeks
Differential Revision: https://reviews.freebsd.org/D11247
timecounter instead of the GPT timer, freeing up the more flexible GPT
hardware for other uses. The EPIT driver is a standard (always in the
kernel) driver, and the existing GPT driver is now optional and included
only if you ask for device imx_gpt.
global timer was successful, since the implementation tries to read it.
Notably, if the platform has a variable-frequency global timer (because
of dynamic frequency scaling), it doesn't set up the global timer for use
as a system timecounter, and in that case it also can't use it for DELAY.
Such platforms use different timer hardware for both timecounter and DELAY.
list.h includes a number of FreeBSD headers as a workaround for the
LIST_HEAD name collision. To reduce pollution, avoid including list.h
in commonly used headers when it is not explicitly needed.
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11249
arm build.
In the arm build, elf_note.S includes sys/param.h and then does an
elf macro called ELFNOTE(). Although the compile error doesn't make
sense to me, I believe it just means that an "extern ..." can't exist
in param.h for this inclusion case.
I suspect adding #if !defined(LOCORE) might fix the build, but this
commit just takes the definition out.
I will ask freebsd-current@ what is the best was to deal with this
and do a subsequent commit after that.
Reported by: melounmichal@gmail.com
Clang 4.0 accepts the smc instruction with or without specifying
.arch_extension sec, but Clang 5.0 produces an error without it.
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Add a make.conf DTC variable that control which DTC (Device Tree Compiler)
to use.
Reviewed by: bdrewery, imp
Differential Revision: https://reviews.freebsd.org/D9577
By making MAXBCACHEBUF a tunable, it can be increased to allow for
larger read/write data sizes for the NFS client.
The tunable is limited to MAXPHYS, which is currently 128K.
Making MAXPHYS a tunable or increasing its value is being discussed,
since it would be nice to support a read/write data size of 1Mbyte
for the NFS client when mounting the AmazonEFS file service.
Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D10991
This generates startup LORs and panics when adding elements to bridge
devices. I will document further in https://reviews.freebsd.org/D10681
PR: 220073
Submitted by: dchagin
Reported by: db
The arm kernel linker scripts place the .init_pagetable section in .bss,
but .init_pagetable had no section flags set, and so did not match the
expected flags for .bss.
GNU ld silently ignores this case, but lld reports an error:
ld: error: incompatible section flags for .bss
>>> locore.o:(.init_pagetable): 0x0
>>> output section .bss: 0x3
PR: 220055
Submitted by: mmel, Rafael Espíndola
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
for the rtl8188eu chipset
- Rename struct r92c_rom member names: s/channel_plan/reserved5/,
s/xtal_calib/channel_plan to be compliant with definitions of the efuse
in vendor hal_pg.h
dirty. Assert that they are fully dirty rather than redundantly calling
vm_page_dirty() on them.
Reviewed by: kib, markj
MFC after: 1 week
X-MFC after: r319932
This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.
To make this useful, data member of the struct kevent must be extended
to 64bit. Using the opportunity, I also added ext members. This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.
The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).
Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2). Compat shims
are provided for both host native and compat32.
Requested by: bapt
Reviewed by: bapt, brooks, ngie (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D11025
Implement simple chain loader in loader; this update does add chain command,
taking device or file as argument to load and start new boot loader.
In case of BIOS, the chain will read the boot block to address 0000:7c00 and
jumps on it. In case of UEFI, the chain command is to be used with efi
application, typically stored in EFI System Partition.
The update also does add simple menu entry, if the variable chain_disk is set.
The value of the variable chain_disk is used as argument for chain loading.
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D5992
Currently some ARM platforms implement their own platform_probe_and_attach()
function and other use common routine that calls platform's PLATFORM_ATTACH
method.
Keep the old description to match the preferred way of naming things.
Pointed out by: andrew
This brings the default configurations (drivers, net80211 settings, etc) and some
of the shared configuration into std.AR_MIPS_BASE. I haven't yet moved the
-current settings (witness, memguard, etc) into it.
This should simplify building a lot of the same test images for my MIPS AP board
development and testing.
This is a work in progress; it's not designed to be perfect!
iflib - Handle out of order packet delivery from hardware in support of LRO
Out of order updates to rxd's is fixed in r315217. However, it is not
completely fixed. While refilling the buffers, iflib is not considering
the out of order descriptors. Hence, it is refilling sequentially.
"idx" variable in _iflib_fl_refill routine is incremented sequentially.
By doing refilling sequentially, it will override the SGEs that
are *IN USE* by other connections. Fix is to maintain a bitmap of
rx descriptors and differentiate the used one with unused one and
refill only at the unused indices. This patch also fixes a
few bugs in bnxt, related to the same feature.
Submitted by: bhargava.marreddy@broadcom.com
Reviewed by: shurd@
Differential Revision: https://reviews.freebsd.org/D10681
Do not attempt to initialize netmap queues that are already initialized
or aren't supposed to be initialized. Similarly, do not free queues
that are not initialized or aren't supposed to be freed.
PR: 217156
Sponsored by: Chelsio Communications
- Add asserts that the pages to write are dirty. The last page, if
partially written, is only required to be dirty, while completely
written pages should have all dirty bit set.
- Use uintmax_t to print vm_page pindexes.
- Use NULL instead of casted zero.
- Remove if () test which duplicated the loop ending condition.
- Miscellaneous style fixes.
Reviewed by: alc, markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
mlx4en network interfaces. This prevents infinite unit number growth
typically when the mlx4en driver is used inside virtual machines which
support runtime PCI attach and detach.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Display the mbuf/cluster count for a sockbuf and fix a couple whitespace
issues in the output.
Reviewed by: jhb, markj (both previous version)
Approved by: markj (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11062
illumos/illumos-gate@2889ec41c02889ec41c0https://www.illumos.org/issues/8311
Description:
There was a misunderstanding about the enforcement details of the "Read-only"
flag introduced for SMB/CIFS compatibility, way back in 2007 in the Sun PSARC
2007/315 case.
The original authors thought enforcement of the READONLY flag should work
similarly as the IMMUTABLE flag. Unfortunately, that enforcement is
incompatible with the expectations of Windows applications using this feature
through the SMB service. Applications assume (and the MS File System Algorithms
MS-FSA confirms they should) that an SMB client can:
(a) Open an SMB handle on a file with read/write access,
(b) Set the DOS attributes to include the READONLY flag,
(c) continue to have write access via that handle.
This access model is essentially the same as a Unix/POSIX application that
creates a file (with read/write access), uses fchmod() to change the file mode
to something not granting write access (i.e. 0444), and then continues to write
that file using the open handle it got before the mode change.
Currently, the SMB server works-around this problem in a way that will become
difficult to maintain as we implement support for SMB3 persistent handles, so
SMB depends on this fix.
I've written a test program that can be used to demonstrate this problem, and
added it to zfs-tests (tests/functional/acl/cifs/cifs_attr_004_pos).
It currently fails, but will pass when this problem fixed.
Steps to Reproduce:
Run the test program on a ZFS file system.
Expected Results:
Pass
Actual Results:
Fail.
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Prakash Surya <prakash.surya@delphix.com>
Author: Gordon Ross <gwr@nexenta.com>
MFC after: 2 weeks
illumos/illumos-gate@4585130b254585130b25https://www.illumos.org/issues/5428
Most of the upstream change is not applicable to FreeBSD.
Only the renaming of strtonum to zfs_strtonum is relevant to us.
And we already had it partially done.
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
MFC after: 1 week
illumos/illumos-gate@a4b8c9aa65a4b8c9aa65https://www.illumos.org/issues/8264
Oddly there is a lzc_clone function, but no lzc_promote function.
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@kebe.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Andrew Stormont <astormont@racktopsystems.com>
MFC after: 1 week
All interrupts are routed to the sole CPU in that case implicitly.
This is a regression in EARLY_AP_STARTUP. Previously the 'assign_cpu'
variable was only set when a multi-CPU system finished booting, so
it's value both meant that interrupts could be assigned and that
there was more than one CPU.
PR: 219882
Reported by: ota@j.email.ne.jp
MFC after: 3 days
This makes ddb show files more descriptive and also adjusts the
whitespace to align the columns for non-32-bit architectures.
Reviewed by: cem (previous version), jhb
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D11061
Follow up r319935 by actually committing the mpc85xx_get_platform_clock()
function. This function was created to facilitate other development, and I
thought I had committed it earlier.
Some blocks depend on the platform clock rather than the system clock.
The System clock is derived from the platform clock as one-half the
platform clock. Rewrite mpc85xx_get_system_clock() to use the new
function.
Pointy-hat to: jhibbits
The swap space backing a clean page is released when it is first dirtied,
so there's no need to attempt to release swap space when the page is
already dirty.
Reviewed by: alc
MFC after: 1 week
After such a failure, the page is invalid, so there's point in keeping it
around. Moreover, such pages were not being inserted into the active queue,
making them unreclaimable until a subsequent write or delete made them
valid.
Reported by: alc
Reviewed by: alc (previous revision)
MFC after: 1 week
Such requests would previously mark the entire page as valid, which was
incorrect since nothing guaranteed that the page's contents had been
initialized. This change also modifies subpage BIO_DELETEs so that the
entire page is marked dirty, rather than only a subrange. There is no
benefit to creating partially dirty swap pages.
Reviewed by: alc, kib (previous version)
MFC after: 3 days
This commit enables usage of HWPMC interrupts for the
Marvell SoCs, which use MPIC (Armada38x and ArmadaXP).
Those interrupts require extra unmasking, comparing to
others. Also, in order to process counters per-CPU,
they are masked/unmasked using separate registers' sets
for each core.
Submitted by: Michal Mazur <mkm@semihalf.com>
Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10913
When HWPMC stops sampling, ps_pmc may be freed before samples
are processed. In such situation treat PMC as stopped.
Add "ifdef" to fix build without INVARIANTS code.
Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10912
Additionally:
- Fix support for Cycle Counter (evsel == 0xFF)
- Stop and mask interrupts from all counters on init and finish
Submitted by: Michal Mazur <mkm@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield, Netgate
Differential revision: https://reviews.freebsd.org/D10910
This patch adds in-band link management over SGMII of the
SFP transceiver on Armada-388-Clearfog board.
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Netgate
Reviewed by: loos
Differential revision: https://reviews.freebsd.org/D10708
This patch contains a new driver for the network unit of Marvell
Armada 38x/XP SoCs, called NETA. This support was thoroughly tested
and optimised in terms of stability and performance. Additional
hardware features, like Buffer Management (BM) or Parser and Classifier
(PnC) will be progressively supported as needed.
Submitted by: Fabien Thomas <fabien.thomas@stormshield.eu>
Arnaud Ysmal <arnaud.ysmal@stormshield.eu>
Zbigniew Bodek <zbb@semihalf.com>
Michal Mazur <mkm@semihalf.com>
Bartosz Szczepanek <bsz@semihalf.com>
Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield (main development)
Netgate (cleanup and upstreaming)
Differential revision: https://reviews.freebsd.org/D10706
r319886 ("Add the initial support for the Marvell 88E6141
and 88E6341 switches.") unveiled a problem with possible
multiple lock creation. Move its initialization
to the driver attach and for obtaining the switch ID
create a temprorary one, which is immediately destroyed
after the check.
Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
additional allocation overhead. Previously, blst_meta_alloc() updated the
hint after every successful allocation. However, these "eager" hint
updates are of no actual benefit if, instead, the "lazy" hint update at
the start of blst_meta_alloc() is generalized to handle all cases where
the number of available blocks is less than the requested allocation.
Previously, the lazy hint update at the start of blst_meta_alloc() only
handled the ALL-FULL case. (I would also note that this change provides
consistency between blist_alloc() and blist_fill() in that their hint
maintenance is now entirely lazy.)
Eliminate unnecessary checks for terminators in blst_meta_alloc() and
blst_meta_fill() when handling ALL-FREE meta nodes.
Eliminate the field "bl_free" from struct blist. It is redundant. Unless
the entire radix tree is a single leaf, the count of free blocks is stored
in the root node. Instead, provide a function blist_avail() for obtaining
the number of free blocks.
In blst_meta_alloc(), perform a sanity check on the allocation once rather
than repeating it in a loop over the meta node's children.
In blst_leaf_fill(), use the optimized bitcount*() function instead of a
loop to count the blocks being allocated.
Add or improve several comments.
Address some nearby style errors.
Reviewed by: kib
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D11146
These quirks are intended for optimizing CPU performance, not for
applying errata workarounds. Nobody can expect that CPU with unfixed
errata is stable enough to execute the kernel until quirks are applied.
MFC after: 3 weeks
with acquired RIB lock.
This fixes a possible panic due to trying to acquire RIB rlock when it is
already exclusive locked.
PR: 215963, 215122
MFC after: 1 week
Sponsored by: Yandex LLC
Right now the driver only supports port VLANs, so make sure
etherswitch_getinfo() return the proper switch capabilities.
Handle the cases where not all ports are in use (that will also require
etherswitch cooperation).
Sponsored by: Rubicon Communications, LLC (Netgate)
This definition is a part of the maxiotune2 patch that will be
committed soon. It is being committed separately to ease merging
with the pNFS projects subversion trees.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D10991
version.
This commit contains mostly refactoring, a few fixes and minor added
functionality.
Submitted by: Vincenzo Maffione <v.maffione at gmail.com>
Requested by: many
Sponsored by: Rubicon Communications, LLC (Netgate)
struct thread.
For all architectures, the syscall trap handlers have to allocate the
structure on the stack. The structure takes 88 bytes on 64bit arches
which is not negligible. Also, it cannot be easily found by other
code, which e.g. caused duplication of some members of the structure
to struct thread already. The change removes td_dbg_sc_code and
td_dbg_sc_nargs which were directly copied from syscall_args.
The structure is put into the copied on fork part of the struct thread
to make the syscall arguments information correct in the child after
fork.
This move will also allow several more uses shortly.
Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
X-Differential revision: https://reviews.freebsd.org/D11080
from machine/proc.h, consistently on all architectures.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
X-Differential revision: https://reviews.freebsd.org/D11080
Some people may want to drop UFS-style ACLs for slimmer kernels.
Let's just not assume everyone needs ACLs.
Reported by: bde
Submitted by: Fedor Uporov
Differential Revision: https://reviews.freebsd.org/D11145
Do not try to set LMA bit while CPU is still in legacy mode.
Apparently Intel CPUs ignore non-id writes to LMA, while AMD's
(over-)react with #GP.
Reported and tested by: danfe
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
system retrieve its config data from the fdt data.
The properties that are common to all phys are decoded and returned in a
structure. The fdt node handles for the mac and phy devices are also
returned in the config data struct, so a driver can easily obtain additional
hardware-specific config values from the fdt data.
While the initial need for this is to help support phy drivers which are
configured with FDT data, there is nothing devicetree-specific about the
concept or the names, so they are available for use even on non-FDT systems.
The initial list of connection types comes from the current devicetree
bindings documentation, but values not documented there can be added to
the list in the future as needed, the values could be sorted into a
different order without perturbing FDT code, etc. The only invariant
is that MII_CONTYPE_UNKNOWN should be first (so it has a value of zero,
so that a con-type variable in a softc, for example, is initialized to
MII_CONTYPE_UNKNOWN by default).
After harvesting the hardware statistics counters and summing them into the
interface stats, properly clear the hardware counters back to zero. On imx5
and earlier hardware it is necessary to disable collection of stats while
writing zeroes to all the registers. On imx6 and newer it turns out it's
not even possible to write zeroes, instead you have to toggle a special
"zero everything" control bit in a register.
Count incoming packets with a bad start frame delim as input errors, and
incoming packets dropped due to no fifo space as input drops.
Remove all code related to harvesting the hardware stats less often than
once per second. It turns out the 32-bit stats registers are backed by
16-bit counters under the hood, and they can easily roll over if you only
harvest them once every 3 seconds like the old code was doing. Now we just
read all the regs once a second.
The combination of not properly zeroing the stats registers and 16-bit
counters sometimes wrapping between harvest calls resulted in basically
unusable statistics before these changes.
in /boot/defaults/loader.conf to something that's actually commonly used,
"mdroot". It's arbitrary, but it's easier to find this way.
MFC after: 2 weeks
Description from Brett:
"The busdma tags used to create mappings for the tx and rx rings did not have
the device's tag as parents, meaning that they did not respect the device's
busdma properties. The other tags used in the driver had their parents set
appropriately.
Also, the dma maps for each buffer in ixl_txeof() were being NULLed after
being unloaded, which is an error because those maps are then reused without
being recreated (I believe this also leaked resources since the maps were not
destroyed). Simply removing the line that sets the maps to NULL gives the
desired behavior. There does not seem to be a similar problem with ixl_rxeof().
Functions to free the tx and rx rings also NULL out the dma maps for each
buffer, but this seems okay because the maps are destroyed and not reused in
this case.
With these fixes, my ixl card seems to be working with the IOMMU enabled."
Submitted by: Brett Gutstein <bgutstein@rice.edu>
Reviewed by: erj
Approved by: Alan Cox <alc@rice.edu>
MFC after: 1 week
what this field represented was also inaccurate.) Suggested by: kib
In r178792, blist_create() grew a malloc flag, allowing M_NOWAIT to be
specified. However, blist_create() was not modified to handle the
possibility that a malloc() call failed. Address this omission.
Increase the width of the local variable "radix" to 64 bits. (This
matches the width of the corresponding field in struct blist.)
Reviewed by: kib
MFC after: 6 weeks
Close a potential race in reading the CPU dtrace flags, where a thread can
start on one CPU, and partway through retrieving the flags be swapped out,
while another thread traps and sets the CPU_DTRACE_NOFAULT. This could
cause the first thread to return without handling the fault.
Discussed with: markj@
In particular:
- Don't evaluate event conditions with a sleepqueue lock held, since such
code may attempt to acquire arbitrary locks.
- Fix the return value for wait_event_interruptible() in the case that the
wait is interrupted by a signal.
- Implement wait_on_bit_timeout() and wait_on_atomic_t().
- Implement some functions used to test for pending signals.
- Implement a number of wait_event_*() variants and unify the existing
implementations.
- Unify the mechanism used by wait_event_*() and schedule() to put the
calling thread to sleep.
This is required to support updated DRM drivers. Thanks to hselasky for
finding and fixing a number of bugs in the original revision.
Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D10986
quantity as the size of the range to fill, but returns a 32-bit quantity
as the number of blocks that were allocated to fill that range. This
revision corrects that mismatch. Currently, swaponsomething() limits
the size of a swap area to prevent arithmetic arithmetic overflow in
other parts of the blist allocator. That limit has also prevented this
type mismatch from causing problems.
Reviewed by: kib, markj
MFC after: 6 weeks
Differential Revision: https://reviews.freebsd.org/D11096
illumos/illumos-gate@dbfd9f9300dbfd9f9300https://www.illumos.org/issues/8156
dbuf_evict_notify() holds the dbuf_evict_lock while checking if it should do
the eviction itself (because the evict thread is not able to keep up).
This can result in massive lock contention.
It isn't necessary to hold the lock, because if we make the wrong choice
occasionally, nothing bad will happen.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 1 week
illumos/illumos-gate@5b062782535b06278253https://www.illumos.org/issues/8005
RAID-Z requires that space be allocated in multiples of P+1 sectors,
because this is the minimum size block that can have the required amount
of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2
sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector
is a unit of 2^ashift bytes, typically 512B or 4KB.
To satisfy this constraint, the allocation size is rounded up to the
proper multiple, resulting in up to 3 "pad sectors" at the end of some
blocks. The contents of these pad sectors are not used, so we do not
need to read or write these sectors. However, some storage hardware
performs much worse (around 1/2 as fast) on mostly-contiguous writes
when there are small gaps of non-overwritten data between the writes.
Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that
include pad sectors. If writing a pad sector will fill the gap between
two (required) writes, we will issue the optional zio, thus doubling
performance. The gap-filling performance improvement was introduced in
July 2009.
Writing the optional zio is done by the io aggregation code in
vdev_queue.c. The problem is that it is also subject to the limit on
the size of aggregate writes, zfs_vdev_aggregation_limit, which is by
default 128KB. For a given block, if the amount of data plus padding
written to a leaf device exceeds zfs_vdev_aggregation_limit, the
optional zio will not be written, resulting in a ~2x performance
degradation.
The problem occurs only for certain values of ashift, compressed block
size, and RAID-Z configuration (number of parity and data disks). It
cannot occur with the default recordsize=128KB. If compression is
enabled, all configurations with recordsize=1MB or larger will be
impacted to some degree.
The problem notably occurs with recordsize=1MB, compression=off, with 10
disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 10 days
illumos/illumos-gate@adaec86ad2adaec86ad2https://www.illumos.org/issues/8155
When writing pre-compressed buffers, arc_write() requires that the compression
algorithm used to compress the buffer matches the compression algorithm
requested by the zio_prop_t, which is set by dmu_write_policy().
This makes dmu_write_policy() and its callers a bit more complicated.
We can simplify this by making arc_write() trust the caller to supply the type
of pre-compressed buffer that it wants to write, and override the compression
setting in the zio_prop_t.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 10 days
illumos/illumos-gate@79809f9cf479809f9cf4https://www.illumos.org/issues/8269
It seems that currently normalization of stddev aggregation is done
incorrectly.
We divide both the sum of values and the sum of their squares by the
normalization factor. But we should divide the sum of squares by the
normalization factor squared to scale the original values properly.
FreeBSD note: the actual change was committed in r316853, this commit
adds the test files and record merge information.
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 1 week
Sponsored by: Panzura
Its purpose was to translate the values for msdosfs inode numbers,
which is calculated from the msdosfs structures describing the file,
into the range representable by 32bit ino_t. The translation acted
for filesystems larger than 128Gb, it reserved the range 0xf0000000
(FILENO_FIRST_DYN) to UINT32_MAX and remembered some arbitrary
translation of ino >= FILENO_FIRST_DYN into this range. It consumed
memory that could be only freed by unmount, and the translation was
not stable across remounts.
With ino_t type extended to 64 bit, there is no such issue and values
can be returned without compaction to 32bit. That is, for the native
environments, the translation layer is not necessary and adds
significant undeserved code complexity. For compat ABIs which use
32bit ino_t, the vfs.ino64_trunc_error sysctl provides some measures
to soften the failure mode when inode numbers truncation is not safe.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
Provide a new mode "2" which returns a special overflow indicator in
the non-representable field instead of the silent truncation (mode
"0") or EOVERFLOW (mode "1").
In particular, the typical use of st_ino to detect hard links with
mode "2" reports false positives, which might be more suitable for
some uses.
Discussed with: bde
Sponsored by: The FreeBSD Foundation