rmsr.r_offset now is set to rqsr.r_offset plus the number of bytes
zeroed before hitting the end-of-file. After this change rmsr.r_offset
no longer contains the EOF when the requested operation range is
completely beyond the end-of-file. Instead in such case rmsr.r_offset is
equal to rqsr.r_offset. Callers can obtain the number of bytes zeroed
by subtracting rqsr.r_offset from rmsr.r_offset.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31677
The removed text claimed that memcpy is implemented using bcopy and thus
strings may overlap. Use of bcopy is an implementation detail that is
no longer true, even if the implementation (on some archs) does allow
overlap.
In any case behaviour is undefined per the C standard if memcpy is
called with overlapping objects, and this man page already claimed that
src and dst may not overlap.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31192
rmacklem@ spotted two things in the system call:
- Upon returning from a successful operation, vop_stddeallocate can
update rmsr.r_offset to a value greater than file size. This behavior,
although being harmless, can be confusing.
- The EINVAL return value for rqsr.r_offset + rqsr.r_len > OFF_MAX is
undocumented.
This commit has the following changes:
- vop_stddeallocate and shm_deallocate to bound the the affected area
further by the file size.
- The EINVAL case for rqsr.r_offset + rqsr.r_len > OFF_MAX is
documented.
- The fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9)'s return
len is explicitly documented the be the value 0, and the return offset
is restricted to be the smallest of off + len and current file size
suggested by kib@. This semantic allows callers to interact better
with potential file size growth after the call.
Sponsored by: The FreeBSD Foundation
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D31604
Add missing wrapper code to librt for these new functions so that
SIGEV_THREAD works. Without machinery to convert it to SIGEV_THREAD_ID,
you got EINVAL.
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31618
Allow multiple vector IOs to be started with one system call.
aio_readv() and aio_writev() already used these opcodes under the
covers. This commit makes them available to user space.
Being non-standard extensions, they're only visible if __BSD_VISIBLE is
defined, like the functions.
Reviewed by: asomers, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31627
Variant I architectures use off and Variant II ones use size + off.
Define TLS_VARIANT_I/TLS_VARIANT_II symbols similarly to how libc
handles it.
Reviewed by: kib
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31539
Differential revision: https://reviews.freebsd.org/D31541
Add support for 'VDSO_TH_ALGO_X86_PVCLK'; add vDSO-based timekeeping for
devices that support the KVM/XEN paravirtual clock API.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31418
Remove a useless note about unlinking temporary files, they are unlinked
in tmpfile(3) [1]. Add a note about __cxa_atexit().
Explain exactly what are the FreeBSD implementation differences between
exit() and _Exit().
Noted by: markj [1]
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D31425
Add fflush(stdout) as the common idiom. Explain the need to use exit()
but advise against it.
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D31425
_PC_MIN_HOLE_SIZE and _PC_DEALLOC_PRESENT were mixed somehow before this
fix.
Sponsored by: The FreeBSD Foundation
Reviewed by: delphij
Differential Revision: https://reviews.freebsd.org/D31436
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.
The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.
fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.
fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.
Sponsored by: The FreeBSD Foundation
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28347
Current POSIX standard requires fork() to be async-signal safe. Neither
our implementation, nor implementations in other operating systems are,
and practically it is impossible to make fork() async-signal safe without
too much efforts. Also, that would put undue requirement that all atfork
handlers should be async-signal safe as well, which contradicts its main
use.
As result, Austin Group dropped the requirement, and added a new function
_Fork() that should be async-signal safe, but it does not call atfork
handlers. Basically, _Fork() can be implemented as a raw syscall.
Release of glibc 2.34 added _Fork(), do the same for FreeBSD.
Clarify threading behavior for fork() in the manpage.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D31378
They deliberately read out-of-bounds values to avoid byte-by-byte
loads and check multiple bytes at once. While this will work on x86,
it is flagged as an out-of-bounds read with ASAN, so we have to
disable instrumentation here. This also causes bounds errors for CHERI,
so in CheriBSD we use implementations that avoid OOB reads.
Differential Revision: https://reviews.freebsd.org/D31045
The ifunc resolver is called before the sanitizer runtime is initialized,
so any instrumentation results in an immediate crash.
Reviewed By: kib
Differential Revision: https://reviews.freebsd.org/D31046
This is needed to bootstrap llvm-tblgen on Linux since LLVM calls
`::open(...)` which does not work if open is a statement macro.
Also stop defining O_SHLOCK/O_EXLOCK and update the only bootstrap tools
user of those flags to deal with missing definitions.
Reviewed By: jrtc27
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31226
Linux standardized what we call CLOCK_{REALTIME,MONOTONIC}_FAST as
CLOCK_{REALTIME,MONOTONIC}_COARSE. In addition, Linux spells
CLOCK_UPTIME as CLOCK_BOOTTIME.
Add aliases to time.h and document these new aliases in
clock_gettime(2).
Reviewed by: vangyzen, kib (prior), dchagin (prior)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30988
of the /dev/hpet and /dev/hv_tsc devices, to not leak internal libc
filedescriptors on exec.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31344
The left side of the MIN() expression is the (signed) result of pointer
subtraction (ptrdiff_t). The right hand side is the also the (signed)
result of pointer subtraction, additionally subtracting the element size
('es'), which is unsigned size_t. This coerces the right-hand
expression into an unsigned value. MIN(signed, unsigned) triggers
-Wsign-compare.
Sorting elements of size greater than SSIZE_MAX is nonsensical, so we
can instead treat the element size as ssize_t, leaving the right-hand
result the same signedness as the left.
Reviewed by: arichardson, kib
Differential Revision: https://reviews.freebsd.org/D31292
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.
This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.
Reviewed by: philip (network), kbowling (transport), gbe (manpages)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26652
Before this patch there was a chance for thread that called rand(3)
slightly later to see rand3_state already allocated, but not yet
initialized. While this API is not expected to be thread-safe, it
is not expected to crash. ztest on 64-thread system reproduced it
reliably for me.
Submitted by: avg@
MFC after: 1 month
Before this patch there was a chance for thread that called rand(3)
slightly later to see rand3_state already allocated, but not yet
initialized. While this API is not expected to be thread-safe, it
is not expected to crash. ztest on 64-thread system reproduced it
reliably for me.
MFC after: 1 month
The early environment is typically cleared, so these new options
need the PRESERVE_EARLY_KENV kernel config(8) option. These environments
are reported as missing by kenv(1) if the option is not present in the
running kernel.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D30835
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
This permits more efficient accesses of thread-local variables, which
are heavily used at least by jemalloc and locale-aware code. Note that
on amd64 and i386, jemalloc's thread-local variables already have their
TLS model overridden by defining JEMALLOC_TLS_MODEL.
For now the change is applied only to tested platforms, but should in
principle be enabled everywhere.
PR: 255840
Suggested by: jrtc27
Reviewed by: kib
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31070
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
Finally, we have the correct function definition for strmode. NetBSD/OpenBSD
did this many years ago. This code is weird sign extension safe.
Reviewed by: imp@
Pull Request: https://github.com/freebsd/freebsd-src/pull/493
When debugging POSIX shared memory issues, it's really
useful to learn that there is a command line tool now
to manipulate shared memory segments.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D30896
so that libc vdso and kernel syscall give closer results.
Reported by: dchagin
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30873
Call binuptime inside switch statement, instead of pre-calculating
the abs argument.
Change the type of the abs argument to bool.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30873
We can use the buffer passed to fread(3) directly in the FILE *.
The buffer needs to be reset before each call to __srefill().
This preserves the expected behavior in all cases.
The change was found originally in OpenBSD and later adopted by NetBSD.
MFC after: 2 weeks
Obtained from: OpenBSD (CVS 1.18)
Differential Revision: https://reviews.freebsd.org/D30548
Previously, a negative change list length would be treated the same as
an empty change list. A negative event list length would result in
bogus copyouts. Make kevent(2) return EINVAL for both cases so that
application bugs are more easily found, and to be more robust against
future changes to kevent internals.
Reviewed by: imp, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30480
Document that LOG_PID is ignored and can not be disabled.
This change was made along with the move from RFC 3164 to RFC 5424 log messages.
PR: 255664
Reported by: des.gaufres@gmail.com
Reviewed by: gbe, jilles
Approved by: gbe (mentor, manpages), jilles
There are still references to timed(8) and timedc(8) in the base system,
which were removed in 2018.
PR: 255425
Reported by: Ceri Davies <ceri at submonkey dot net>
Reviewed by: ygy, gbe
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30232
It reopens the passed file descriptor, checking the file backing vnode'
current access rights against open mode. In particular, this flag allows
to convert file descriptor opened with O_PATH, into operable file
descriptor, assuming permissions allow that.
Reviewed by: markj
Tested by: Andrew Walker <awalker@ixsystems.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30148
It writes the core of live stopped process to the file descriptor
provided as an argument.
Based on the initial version from https://reviews.freebsd.org/D29691,
submitted by Michał Górny <mgorny@gentoo.org>.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955