Commit graph

404 commits

Author SHA1 Message Date
Warner Losh
602b575a88 syscall.master: Remove stray 4.2
Back in 4.3BSD, the system call table wasn't generated, and there was an
entry:
        "4.2 sigreturn",        /* 139 = old 4.2 sigreturn */
This got converted to
139     OBSOL   0 4.2 sigreturn
in 4.3 RENO. Since it was obsolete, nothing bad happened. In fact,
there was code in makeyscalls.sh to cope:
        {       comment = $4
                for (i = 5; i <= NF; i++)
                        comment = comment " " $i
                if (NF < 5)
                        $5 = $4
        }
so the generated comment in syscalls.c was almost correct:
        "obs_4.2",                      /* 139 = obsolete 4.2 sigreturn */
a bug that we have to this very day, despite makesyscalls.sh being
rewritten in lua.

However, this historical wart is the only place in our current
syscalls.master file where we have an extra field for the 'not
generated' class of system calls. Remove the historical wart so that the
re-write of makesyscalls.lua can be simpler (so, I hope, qemu's bsd-user
can large swathes of code automatically generated too). This should help
make things more understandable (changes to simplify makesyscalls.lue
aren't quite debugged, so have to wait for another day).

There's 3 different obsolete sigreturns (but only 1 that was ever in
FreeBSD 2.x and newer).

Sponsored by:		Netflix
2023-04-20 23:39:23 -06:00
Warner Losh
559b94a122 syscall.master: Fix comments
Have more accruate comments. While #if, #else, etc are copied to the
header files, lines that don't start with # are not.  And #include files
are only output to sysinc (which winds up at the front of init_sysent.c
which seems a bit odd). This is all radically undocumented, and likely
has drifted somewhat from 4.4BSD and what other systems do (they've
drifted too, fwiw).

Sponsored by:		Netflix
2023-04-20 16:18:02 -06:00
Konstantin Belousov
dac3102488 Rename kqueue1(2) to kqueuex(2) to avoid compat issues with NetBSD
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39377
2023-04-04 16:19:08 +03:00
Konstantin Belousov
61194e9852 Add kqueue1() syscall
It takes the flags argument.  Immediate use is to provide the KQUEUE_CLOEXEC
flag for kqueue(2).

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39271
2023-03-28 02:39:26 +03:00
Ed Maste
52a1d90c8b Allow posix_fadvise in capability mode
posix_fadvise operates only on a provided fd.  Noted by
Mathieu <sigsys@gmail.com> in review D34761.

No new CAP_ rights are added for posix_fadvise(), as 'advice' in
general only influences when I/O happens; the fd must have existing
CAP_ rights for actual data access.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34903
2022-04-14 15:11:21 -04:00
Ed Maste
e5821a2156 syscalls.master: remove obsolete comment about compatibility tables
Compatibility ABIs no longer use a separate syscalls.master.

Fixes:		be67ea40c5 ("freebsd32: generate from ...")
Sponsored by:	The FreeBSD Foundation
2022-03-30 11:07:00 -04:00
Konstantin Belousov
5346570276 swapoff: add one more variant of the syscall
Requested and reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33343
2021-12-09 02:48:46 +02:00
Konstantin Belousov
c1a8472793 syscalls: add COMPAT13
Reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33343
2021-12-09 02:48:32 +02:00
Brooks Davis
6d37a1670b syscalls: mprotect does not take a const
The mprotect syscall decleration is not const.  I added this one
incorrectly in a944d28d0e.

Reported by:	kib
Reviewed by:	kib, imp
2021-11-29 22:04:47 +00:00
Brooks Davis
a8efd4d1b3 syscalls: make syscall and __syscall SYSMUX
Rather than combining the declearation of nosys with the registration
of SYS_syscall, declare syscall(2) and __syscall(2) with the new
SYSMUX type in syscalls.master and declare nosys directly.  This
eliminates the last use of syscall aliases in the tree.

Reviewed by:	kib, imp
2021-11-29 22:04:44 +00:00
Brooks Davis
d7f306c5be makesyscalls: add a new SYSMUX type
This type is for system call multiplexers (syscall(2), __syscall(2))
that don't have a normal handler and instead are handled in the
machine-dependent syscall code.

Reviewed by:	kib, imp
2021-11-29 22:04:43 +00:00
Brooks Davis
cffb55f0f3 syscalls: normalize exit
Declare the exit system call normally.  This results in the
implementation being named sys_exit rather than sys_sys_exit and
being decalred as returning an int.  Infact it does not return
at all because exit1 does not, so add an __unreachable() to let the
compiler know that.

Reviewed by:	kib, imp
2021-11-29 22:04:43 +00:00
Brooks Davis
638c5fa8df syscalls: normalize (get|set)rlimit
Declare normal <foo>_args structs rather than going out of the way
to declare __<foo>_args.

Reviewed by:	kib, imp
2021-11-29 22:04:42 +00:00
Brooks Davis
ba4e5253a3 syscalls: normalize orecvfrom and ogetsockname
Declare o<foo>_args rather than reusing the equivalent <foo>_args
structs.  Avoiding the addition of a new type isn't worth the
gratutious differences.

Reviewed by:	kib, imp
2021-11-29 22:04:42 +00:00
Brooks Davis
3660e76a22 syscalls: correct a couple style issues
Reviewed by:	kib, imp
2021-11-29 22:04:41 +00:00
Brooks Davis
33f9ea209e syscalls: add missing SAL annotations
freebsd7_shmctl was missing an annotation

Reviewed by:	kib, imp
2021-11-29 22:04:41 +00:00
Brooks Davis
be67ea40c5 freebsd32: generate from sys/kern/syscalls.master
This avoids the need to keep a freebsd32-specific syscalls.master
in sync with the default ABI.  As evidenced by the number of commits
required to sync the two, it is extremely easy for them to get out
of sync due to misunderstandings and user errors.

Reviewed by:	kevans, kib
2021-11-22 22:36:58 +00:00
Brooks Davis
799ce8b8d2 syscalls: annotate args pointing to long, pointer, or time_t
Add _Contains_ annotations indicating that the data pointed to by a
pointer argument contains types that vary between FreeBSD ABIs. The
supported set is long (including size_t), pointer (including
intptr_t), and time_t.  The first two vary between 32- and 64-bit
ABIs.  The laste betwen i386 and everything else.

These will be used to detect which syscalls require handling on
particular ABIs.

Reviewed by:	kevans, kib
2021-11-22 22:36:58 +00:00
Brooks Davis
00e0a4c0d7 syscalls: abort2 doesn't return so declare as void
Reviewed by:	kib
2021-11-22 22:36:54 +00:00
Brooks Davis
4b2e1f1480 syscalls: umask returns a mode_t
Reviewed by:	kib
2021-11-22 22:36:54 +00:00
Brooks Davis
27f5b514a0 syscalls: update a few return types to ssize_t
Reviewed by:	kib
2021-11-22 22:36:54 +00:00
Brooks Davis
717e7fb27a syscalls: struct ucontext4 -> struct freebsd4_ucontext
This aligns with struct freebsd4_ucontext32 in freebsd32.

Reviewed by:	kib
2021-11-22 22:36:54 +00:00
Brooks Davis
d8bd949beb sys___sysctl: regularize argument struct
Let makesyscalls generate the normal struct __sysctl_args structure.
It works fine.

Reviewed by:	kib
2021-11-22 22:36:54 +00:00
Brooks Davis
88dfcfa2a0 sys_sigaltstack: use struct sigaltstack arg
This is idential to stack_t and more amenable to prepending "32" to
for freebsd32.

Reviewed by:	kib
2021-11-22 22:36:53 +00:00
Brooks Davis
85d1d2a675 syscalls: use struct siginfo rather than siginfo_t
This allows freebsd32 to use struct siginfo32 with an automatable
conversion.

Reviewed by:	kevans
2021-11-17 20:12:22 +00:00
Brooks Davis
f503288262 syscalls: fix type of osendmsg
osendmsg takes an struct omsghdr * not a void *.

Reviewed by:	kevans
2021-11-17 20:12:22 +00:00
Brooks Davis
2385f4d172 syscalls: use __socklen_t as appropriate
No functional change as __socklen_t is an int.

Obtained from:	CheriBSD

Reviewed by:	kevans
2021-11-17 20:12:22 +00:00
Brooks Davis
b64f3dc26c syscalls: [gs]etitimer takes an int which
Match the function decleration which takes an int not a signed int.
No functional change as the range of valid values is 0-2.

Obtained from: CheriBSD

Reviewed by:	kevans
2021-11-17 20:12:21 +00:00
Brooks Davis
b7fd86118f syscalls: sprinkle in const values
Add missing const qualifiers to a number of syscall arguments.

Obtained from:	CheriBSD

Reviewed by:	kevans
2021-11-17 20:12:21 +00:00
Brooks Davis
8e4a3add99 struct kevent_freebsd11 -> struct freebsd11_kevent
Rename to match the naming of syscalls and allow 32 to be appended
without making an ugly name like kevent_freebsd1132.

While here, make the kevent changelist argument const.

Reviewed by:	kib
2021-11-15 18:34:27 +00:00
Brooks Davis
f0da2a1467 syscalls: unwrap a long line
Style dictates that each variable is on a single line

Reviewed by:	kib
2021-11-15 18:34:27 +00:00
Konstantin Belousov
77b2c2f814 Add sched_getcpu()
for compatibility with Linux.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32901
2021-11-10 21:18:54 +02:00
Brooks Davis
6bc90e8acf syscalls.master: correct formatting issues
Reviewed by:	kevans, emaste
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D31351
2021-09-01 21:58:22 +01:00
Brooks Davis
df501bac69 syscalls.master: switch to CAPENABLED flags
Switch the main syscall table to use CAPENABLED flags rather than
capabilities.conf.  This avoid synchronization issues between
syscalls.master and capabilities.conf (e.g. when renaming a syscall
during development).

For now, move capabilities.conf to sys/compat/freebsd32 and use it
there.  Use of sys/compat/freebsd32/syscalls.master should be replaced
by makesyscalls.lua enhancements to allow the main one to be used.

This change results in no changes to generated files after running
`make sysent`.

Reviewed by:	kevans, emaste
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D31350
2021-09-01 21:58:16 +01:00
Brooks Davis
6945df3fff makesyscalls.lua: add a CAPENABLED flag
The CAPENABLED flag indicates that the syscall can be used in capsicum
capability mode.  It is intended to replace capabilities.conf.

Reviewed by:	kevans, emaste
MFC after:	1 week
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D31349
2021-09-01 21:58:06 +01:00
Ka Ho Ng
0dc332bff2 Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).
fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.

The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.

fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.

fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D28347
2021-08-05 23:20:42 +08:00
Konstantin Belousov
9b6b793bd7 Revert most of ce42e79310
to restore ABI compatibility for pre-10.x binaries.

It restores _umtx_lock() and _umtx_unlock() syscalls, and UMTX_OP_LOCK/
UMTX_OP_UNLOCK umtx_op(2) operations. UMUTEX_ERROR_CHECK flag is left
out for now, I do not think it makes a difference.

PR:	218571
Reviewed by:	brooks (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31220
2021-07-28 13:21:12 +03:00
Moritz Buhl
4bc2174a1b kern: fail getgroup and setgroup with negative int
Found using
https://github.com/NetBSD/src/blob/trunk/tests/lib/libc/sys/t_getgroups.c

getgroups/setgroups want an int and therefore casting it to u_int
resulted in `getgroups(-1, ...)` not returning -1 / errno = EINVAL.

imp@ updated syscall.master and made changes markj@ suggested

PR:			189941
Tested by:		imp@
Reviewed by:		markj@
Pull Request:		https://github.com/freebsd/freebsd-src/pull/407
Differential Revision:	https://reviews.freebsd.org/D30617
2021-06-02 13:22:57 -06:00
Brooks Davis
d89c1c461c Reserve gaps in syscall numbers for local use
It is best for auditing of syscalls.master if we only append to the
file.  Reserving unimplemented system call numbers for local use makes
this policy and provides a large set of syscall numbers FreeBSD
derivatives can use without risk of conflict.

Reviewed by:	jhb, kevans, kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27988
2021-01-26 18:27:45 +00:00
Brooks Davis
119fa6ee8a syscalls.master: Add a new syscall type: RESERVED
RESERVED syscall number are reserved for local/vendor use.  RESERVED is
identical to UNIMPL except that comments are ignored.

Reviewed by:	kevans
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27988
2021-01-26 18:27:44 +00:00
Brooks Davis
65a524b499 Remove documentation of unimplemented syscalls
We have not been able to run binaries from other BSDs well over a
decade.  There is no need to document their allocation decisions here.

We also don't need to reserve syscall numbers of never-implemented
syscalls.

Reviewed by:	jhb, kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D27988
2021-01-26 18:27:44 +00:00
Alan Somers
b3286afae3 Reallocate syscall numbers for aio_writev and aio_readv
The originally chosen numbers interfere with downstream projects'
syscalls.  Move them to the end of the syscall table instead.

Reported by:	jrtc27
Reviewed by:	brooks
MFC-With:	022ca2fc7f
Differential Revision:	022ca2fc7f
2021-01-07 19:49:27 -07:00
Alan Somers
022ca2fc7f Add aio_writev and aio_readv
POSIX AIO is great, but it lacks vectored I/O functions. This commit
fixes that shortcoming by adding aio_writev and aio_readv. They aren't
part of the standard, but they're an obvious extension. They work just
like their synchronous equivalents pwritev and preadv.

It isn't yet possible to use vectored aiocbs with lio_listio, but that
could be added in the future.

Reviewed by:    jhb, kib, bcr
Relnotes:       yes
Differential Revision: https://reviews.freebsd.org/D27743
2021-01-02 19:57:58 -07:00
Konstantin Belousov
7a202823aa Expose eventfd in the native API/ABI using a new __specialfd syscall
eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues.  Unfortunately, kqueues
are not passable between processes.  And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow.  This came up when debugging strange HW video decode
failures in Firefox.  A native implementation would avoid these problems
and help with porting Linux software.

Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.

Submitted by:   greg@unrelenting.technology
Reviewed by:    markj (previous version)
MFC after:      2 weeks
Differential Revision:  https://reviews.freebsd.org/D26668
2020-12-27 12:57:26 +02:00
Rick Macklem
d9021e389a Add a syscall for the nfs-over-tls daemons to use.
The nfs-over-tls daemons need a system call to perform operations such as
associate a file descriptor with a krpc socket.
The daemons will not be in head for some time, but it will make it
easier for testers of nfs-over-tls to do testing if the system call
is in head (basically the stub for libc which will be commited soon).

Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D24949
2020-05-28 21:06:10 +00:00
Kyle Evans
3e6b82913d close_range(2): use newly assigned AUE_CLOSERANGE 2020-04-24 01:30:00 +00:00
Kyle Evans
7d03e08112 Mark closefrom(2) COMPAT12, reimplement in libc to wrap close_range
Include a temporarily compatibility shim as well for kernels predating
close_range, since closefrom is used in some critical areas.

Reviewed by:	markj (previous version), kib
Differential Revision:	https://reviews.freebsd.org/D24399
2020-04-14 18:07:42 +00:00
Kyle Evans
472ced39ef Implement a close_range(2) syscall
close_range(min, max, flags) allows for a range of descriptors to be
closed. The Python folk have indicated that they would much prefer this
interface to closefrom(2), as the case may be that they/someone have special
fds dup'd to higher in the range and they can't necessarily closefrom(min)
because they don't want to hit the upper range, but relocating them to lower
isn't necessarily feasible.

sys_closefrom has been rewritten to use kern_close_range() using ~0U to
indicate closing to the end of the range. This was chosen rather than
requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the
call to kern_close_range for simplicity.

The flags argument of close_range(2) is currently unused, so any flags set
is currently EINVAL. It was added to the interface in Linux so that future
flags could be added for, e.g., "halt on first error" and things of this
nature.

This patch is based on a syscall of the same design that is expected to be
merged into Linux.

Reviewed by:	kib, markj, vangyzen (all slightly earlier revisions)
Differential Revision:	https://reviews.freebsd.org/D21627
2020-04-12 21:23:19 +00:00
Mateusz Guzik
0573d0a9b8 vfs: add realpathat syscall
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.

This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.

See the review for sample syscall counts.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D23574
2020-02-20 16:58:19 +00:00
Konstantin Belousov
146fc63fce Add a way to manage thread signal mask using shared word, instead of syscall.
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery.  Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.

The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).

With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.

The syscall is not exported from the stable libc version namespace on
purpose.  It is intended to be used only by our C runtime
implementation internals.

Tested by:	pho
Disscussed with:	cem, emaste, jilles
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D12773
2020-02-09 11:53:12 +00:00