opnsense-src/lib/libsys
Mark Johnston a1da7dc1cd socket: Implement SO_SPLICE
This is a feature which allows one to splice two TCP sockets together
such that data which arrives on one socket is automatically pushed into
the send buffer of the spliced socket.  This can be used to make TCP
proxying more efficient as it eliminates the need to copy data into and
out of userspace.

The interface is copied from OpenBSD, and this implementation aims to be
compatible.  Splicing is enabled by setting the SO_SPLICE socket option.
When spliced, data that arrives on the receive buffer is automatically
forwarded to the other socket.  In particular, splicing is a
unidirectional operation; to splice a socket pair in both directions,
SO_SPLICE needs to be applied to both sockets.  More concretely, when
setting the option one passes the following struct:

    struct splice {
	    int fd;
	    off_t max;
	    struct timveval idle;
    };

where "fd" refers to the socket to which the first socket is to be
spliced, and two setsockopt(SO_SPLICE) calls are required to set up a
bi-directional splice.

select(), poll() and kevent() do not return when data arrives in the
receive buffer of a spliced socket, as such data is expected to be
removed automatically once space is available in the corresponding send
buffer.  Userspace can perform I/O on spliced sockets, but it will be
unpredictably interleaved with splice I/O.

A splice can be configured to unsplice once a certain number of bytes
have been transmitted, or after a given time period.  Once unspliced,
the socket behaves normally from userspace's perspective.  The number of
bytes transmitted via the splice can be retrieved using
getsockopt(SO_SPLICE); this works after unsplicing as well, up until the
socket is closed or spliced again.  Userspace can also manually trigger
unsplicing by splicing to -1.

Splicing work is handled by dedicated threads, similar to KTLS.  A
worker thread is assigned at splice creation time.  At some point it
would be nice to have a direct dispatch mode, wherein the thread which
places data into a receive buffer is also responsible for pushing it
into the sink, but this requires tighter integration with the protocol
stack in order to avoid reentrancy problems.

Currently, sowakeup() and related functions will signal the worker
thread assigned to a spliced socket.  so_splice_xfer() does the hard
work of moving data between socket buffers.

Co-authored by:	gallatin
Reviewed by:	brooks (interface bits)
MFC after:	3 months
Sponsored by:	Klara, Inc.
Sponsored by:	Stormshield
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D46411
2024-09-10 16:51:37 +00:00
..
aarch64 libsys/aarch: Remove pointless MD syscall(2) 2024-03-07 00:59:01 +00:00
amd64 lib{c,sys}: return wrapped syscall APIs to libc 2024-03-13 18:36:02 +00:00
arm lib: Remove __ARM_ARCH checks that are always true 2024-06-12 11:49:05 +00:00
i386 man filesystems: fix xrefs after move to section 4 2024-05-16 10:25:29 -06:00
powerpc lib{c,sys}: return wrapped syscall APIs to libc 2024-03-13 18:36:02 +00:00
powerpc64 lib{c,sys}: return wrapped syscall APIs to libc 2024-03-13 18:36:02 +00:00
powerpcspe lib{c,sys}: fix powerpcspe build 2024-03-13 20:09:41 +00:00
riscv libsys/riscv: Remove pointless MD syscall(2) 2024-03-07 00:58:44 +00:00
x86 libsys: add guards to MD manpages 2024-02-09 17:22:13 +00:00
__error.c libsys: move errno to libsys 2024-02-21 02:26:11 +02:00
__getosreldate.c libc: move __getosreldate to libsys 2024-02-05 20:34:56 +00:00
__vdso_gettimeofday.c
_exit.2
_libsys.h syscalls.master: correct return type of {read,write}v 2024-04-24 20:48:46 +01:00
_umtx_op.2
_umtx_op_err.c libthr: move _umtx_op_err() to libsys 2024-02-05 20:34:56 +00:00
abort2.2
accept.2
access.2 faccessat(2): Honor AT_SYMLINK_NOFOLLOW 2024-08-11 17:49:06 +02:00
acct.2
adjtime.2
aio_cancel.2
aio_error.2
aio_fsync.2
aio_mlock.2
aio_read.2 Document aio_read2/aio_write2 2024-02-11 03:54:16 +02:00
aio_return.2
aio_suspend.2
aio_waitcomplete.2
aio_write.2 Document aio_read2/aio_write2 2024-02-11 03:54:16 +02:00
auxv.3 libc: elf auxiliary vector handling to libsys 2024-02-05 20:34:56 +00:00
auxv.c libsys auxv.c: add fences needed to ensure that flag works 2024-02-21 16:18:10 +02:00
bind.2
bindat.2
brk.2
cap_enter.2
cap_fcntls_limit.2
cap_ioctls_limit.2
cap_rights_limit.2 rights.4: various corrections on capability rights 2024-04-28 22:48:31 -06:00
chdir.2
chflags.2
chmod.2
chown.2
chroot.2
clock_gettime.2 clock_gettime: Catch up with the CLOCK_BOOTTIME changes 2024-07-02 11:27:35 -06:00
clock_gettime.c
close.2
closefrom.2
connect.2
connectat.2
copy_file_range.2
cpuset.2
cpuset_getaffinity.2
cpuset_getdomain.2
creat.2 libsys: move __libsys_interposer consumers 2024-02-05 20:34:55 +00:00
dup.2
eventfd.2
execve.2 man filesystems: fix xrefs after move to section 4 2024-05-16 10:25:29 -06:00
extattr_get_file.2
fcntl.2
ffclock.2
fhlink.2
fhopen.2
fhreadlink.2
flock.2
fork.2
fspacectl.2
fsync.2
getdirentries.2
getdtablesize.2
getfh.2
getfsstat.2
getgid.2
getgroups.2
getitimer.2
getlogin.2
getloginclass.2
getpagesize.3 libc: move getpagesize(s) to libsys 2024-02-05 20:34:56 +00:00
getpagesize.c libc: move getpagesize(s) to libsys 2024-02-05 20:34:56 +00:00
getpagesizes.3 libc: move getpagesize(s) to libsys 2024-02-05 20:34:56 +00:00
getpagesizes.c libc: move getpagesize(s) to libsys 2024-02-05 20:34:56 +00:00
getpeername.2
getpgrp.2
getpid.2
getpriority.2
getrandom.2
getrlimit.2 vm: Remove kernel stack swapping support, part 11 2024-07-29 01:43:59 +00:00
getrusage.2
getsid.2
getsockname.2
getsockopt.2 socket: Implement SO_SPLICE 2024-09-10 16:51:37 +00:00
gettimeofday.2 gettimeofday.2: Do mention improbable future removal 2024-04-28 20:11:22 +02:00
gettimeofday.c
getuid.2
interposing_table.c libsys: make __libsys_interposing static 2024-03-13 17:31:48 +00:00
intro.2 intro.2 as errno.2: Use the name macro for errno 2024-05-04 08:56:10 -06:00
ioctl.2
issetugid.2
jail.2
kcmp.2
kenv.2
kill.2
kldfind.2
kldfirstmod.2
kldload.2
kldnext.2
kldstat.2
kldsym.2
kldunload.2
kqueue.2
ktrace.2
libc_stubs.c lib{c,sys}: move auxargs more firmly into libsys 2024-02-19 22:44:08 +00:00
libsys.h libsys: add a libsys.h 2024-04-16 17:48:07 +01:00
libsys_sigwait.c libsys: don't expose sigwait wrapper 2024-03-13 17:04:07 +00:00
link.2
lio_listio.2 lio_listio(2): add LIO_FOFFSET flag to ignore aiocb aio_offset 2024-02-11 03:53:50 +02:00
listen.2
lockf.3 libsys: move some missed manpages 2024-02-08 19:50:32 +00:00
lseek.2
madvise.2
Makefile Make __libsys_interposing_slot libsys only 2024-04-22 21:28:26 +01:00
Makefile.sys libsys: Add MLINKs for recvmmsg.2 and sendmmsg.2 2024-08-03 10:57:57 -04:00
mincore.2
minherit.2
mkdir.2
mkfifo.2
mknod.2
mlock.2
mlockall.2
mmap.2
modfind.2
modnext.2
modstat.2
mount.2
mprotect.2
mq_close.2
mq_getattr.2
mq_notify.2
mq_open.2 mqueuefs: Relax restriction that path must begin with a slash 2024-05-23 13:40:46 -06:00
mq_receive.2
mq_send.2
mq_setattr.2
mq_unlink.2 man filesystems: fix xrefs after move to section 4 2024-05-16 10:25:29 -06:00
msgctl.2
msgget.2
msgrcv.2
msgsnd.2
msync.2
munmap.2
nanosleep.2
nfssvc.2
ntp_adjtime.2
open.2 open(2): devfs is in section 4 on HEAD 2024-08-28 01:23:20 +03:00
pathconf.2
pdfork.2
pipe.2
poll.2 ppoll(2) was actually added in 10.2 2024-06-23 16:13:28 -07:00
posix_fadvise.2
posix_fallocate.2
posix_openpt.2
procctl.2 procctl(2) actually appeared in 9.3 2024-06-23 16:13:28 -07:00
profil.2
pselect.2
ptrace.2
ptrace.c
quotactl.2
rctl_add_rule.2
read.2 read(2): Add write cross reference 2024-03-01 20:36:39 -07:00
readlink.2
reboot.2
recv.2
recvmmsg.c include: ssp: fortify <sys/socket.h> 2024-07-13 00:16:26 -05:00
rename.2 rename(2): Extend EINVAL's description 2024-08-28 01:09:33 +03:00
revoke.2
rfork.2
rfork_thread.3 libc: move rfork_thread(3) to libsys 2024-02-05 20:34:56 +00:00
rmdir.2
rtprio.2
sched_get_priority_max.2
sched_getcpu_gen.c libc: libc/gen/sched_getcpu_gen.c -> libsys/ 2024-02-05 20:34:55 +00:00
sched_setparam.2
sched_setscheduler.2
sched_yield.2
sctp_generic_recvmsg.2
sctp_generic_sendmsg.2
sctp_peeloff.2
select.2
semctl.2
semget.2
semop.2
send.2
sendfile.2 man filesystems: fix xrefs after move to section 4 2024-05-16 10:25:29 -06:00
sendmmsg.c libsys: move __libsys_interposer consumers 2024-02-05 20:34:55 +00:00
setfib.2
setgroups.2
setpgid.2
setregid.2
setresuid.2
setreuid.2
setsid.2
setuid.2
shm_open.2
shmat.2
shmctl.2
shmget.2
shutdown.2
sigaction.2
sigaltstack.2
sigfastblock.2
sigpending.2
sigprocmask.2
sigqueue.2 sigqueue(2): Document __SIGQUEUE_TID 2024-04-23 19:51:10 +03:00
sigreturn.2
sigstack.2
sigsuspend.2
sigwait.2
sigwaitinfo.2
sleep.3 libsys: move some missed manpages 2024-02-08 19:50:32 +00:00
socket.2
socketpair.2
stat.2
statfs.2 man filesystems: fix xrefs after move to section 4 2024-05-16 10:25:29 -06:00
swapon.2
Symbol.map libsys: sort Symbol.map 2024-06-19 23:08:05 +01:00
Symbol.sys.map Make __libsys_interposing_slot libsys only 2024-04-22 21:28:26 +01:00
Symbol.thr.map libthr: move _umtx_op_err() to libsys 2024-02-05 20:34:56 +00:00
symlink.2
sync.2
sysarch.2
syscall.2
syscalls.map libsys: don't try to expose yield 2024-03-07 01:01:36 +00:00
thr_exit.2
thr_kill.2
thr_new.2
thr_self.2
thr_set_name.2
thr_suspend.2
thr_wake.2
timer_create.2
timer_delete.2
timer_settime.2
timerfd.2 timerfd.2: Add documentation for CLOCK_UPTIME and CLOCK_BOOTTIME 2024-07-02 10:40:04 -06:00
truncate.2
umask.2
undelete.2
unlink.2
usleep.3 libsys: move some missed manpages 2024-02-08 19:50:32 +00:00
utimensat.2
utimes.2
utrace.2
uuidgen.2
vfork.2
wait.2 capsicum: allow subset of wait4(2) functionality 2024-08-27 17:22:12 +02:00
write.2