This new system call allows to set all necessary credentials of
a process in one go: Effective, real and saved UIDs, effective, real and
saved GIDs, supplementary groups and the MAC label. Its advantage over
standard credential-setting system calls (such as setuid(), seteuid(),
etc.) is that it enables MAC modules, such as MAC/do, to restrict the
set of credentials some process may gain in a fine-grained manner.
Traditionally, credential changes rely on setuid binaries that call
multiple credential system calls and in a specific order (setuid() must
be last, so as to remain root for all other credential-setting calls,
which would otherwise fail with insufficient privileges). This
piecewise approach causes the process to transiently hold credentials
that are neither the original nor the final ones. For the kernel to
enforce that only certain transitions of credentials are allowed, either
these possibly non-compliant transient states have to disappear (by
setting all relevant attributes in one go), or the kernel must delay
setting or checking the new credentials. Delaying setting credentials
could be done, e.g., by having some mode where the standard system calls
contribute to building new credentials but without committing them. It
could be started and ended by a special system call. Delaying checking
could mean that, e.g., the kernel only verifies the credentials
transition at the next non-credential-setting system call (we just
mention this possibility for completeness, but are certainly not
endorsing it).
We chose the simpler approach of a new system call, as we don't expect
the set of credentials one can set to change often. It has the
advantages that the traditional system calls' code doesn't have to be
changed and that we can establish a special MAC protocol for it, by
having some cleanup function called just before returning (this is
a requirement for MAC/do), without disturbing the existing ones.
The mac_cred_check_setcred() hook is passed the flags received by
setcred() (including the version) and both the old and new kernel's
'struct ucred' instead of 'struct setcred' as this should simplify
evolving existing hooks as the 'struct setcred' structure evolves. The
mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always
called by pairs around potential calls to mac_cred_check_setcred().
They allow MAC modules to allocate/free data they may need in their
mac_cred_check_setcred() hook, as the latter is called under the current
process' lock, rendering sleepable allocations impossible. MAC/do is
going to leverage these in a subsequent commit. A scheme where
mac_cred_check_setcred() could return ERESTART was considered but is
incompatible with proper composition of MAC modules.
While here, add missing includes and declarations for standalone
inclusion of <sys/ucred.h> both from kernel and userspace (for the
latter, it has been working thanks to <bsm/audit.h> already including
<sys/types.h>).
Reviewed by: brooks
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47618
(cherry picked from commit ddb3eb4efe55e57c206f3534263c77b837aff1dc)
Needed by the upcoming setcred() system call. More generally, is a step
on the way to support 32-bit compatibility for MAC-related system calls.
Reviewed by: brooks
Approved by: markj (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47878
(cherry picked from commit 3bdc5ba2ac760634056c66c3c98b6b3452258a5b)
This is in preparation for enabling the new setcred() system call to set
a process' MAC label.
No functional change (intended).
MFC after: 2 weeks
Approved by: markj (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46905
(cherry picked from commit 8a4d24a39098ed8170a37ca2aa83bf1da1976de1)
Besides simplifying existing code, this will later enable the new
setcred() system call to copy MAC labels.
MFC after: 2 weeks
Approved by: markj (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46904
(cherry picked from commit 2e593dd3b5e1c515d57b3d3f929e544a6622b04a)
Do this only when the headers for these functionalities were included
prior to this one. Indeed, if they need to be included, style(9)
mandates they should have been so before this one.
Remove the common MAC sysctl declaration from
<security/mac/mac_internal.h>, as it is now redundant (all its includers
also include <security/mac/mac_policy.h>).
Remove local such declarations from all policies' files.
Reviewed by: jamie
Approved by: markj (mentor)
MFC after: 5 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46903
(cherry picked from commit db33c6f3ae9d1231087710068ee4ea5398aacca7)
The original changes in 'sys/security/mac_grantbylabel/mac_grantbylabel.c' were
removed as MAC/grantbylabel has not been MFCed.
To be used by MAC/do.
Reviewed by: jamie
Approved by: markj (mentor)
MFC after: 5 days
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46899
(cherry picked from commit 5041b20503dbb442cc9ebd0a6e4db26905102c72)
so_peerlabel can only be used when the socket is not listening.
Reviewed by: markj
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46755
(cherry picked from commit 2fb778fab893b4a8a86ecfa20acf2e23bb2cdae8)
Whenever mac_syncache_init() returns an error, ensure that
*label = NULL. This simplifies the error handling by the caller.
Reviewed by: rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46701
(cherry picked from commit 3f2792166aeed4baf07d351bcb12a9d196c443eb)
The mac_ipacl policy module enables fine-grained control over IP address
configuration within VNET jails from the base system.
It allows the root user to define rules governing IP addresses for
jails and their interfaces using the sysctl interface.
Requested by: multiple
Sponsored by: Google, Inc. (GSoC 2019)
MFC after: 2 months
Reviewed by: bz, dch (both earlier versions)
Differential Revision: https://reviews.freebsd.org/D20967
Ensure MAC modules are inserted in order that they are registered.
Reviewed by: markj
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39589
Currently, sysctls which enable KDB in some way are flagged with
CTLFLAG_SECURE, meaning that you can't modify them if securelevel > 0.
This is so that KDB cannot be used to lower a running system's
securelevel, see commit 3d7618d8bf. However, the newer mac_ddb(4)
restricts DDB operations which could be abused to lower securelevel
while retaining some ability to gather useful debugging information.
To enable the use of KDB (specifically, DDB) on systems with a raised
securelevel, change the KDB sysctl policy: rather than relying on
CTLFLAG_SECURE, add a check of the current securelevel to kdb_trap().
If the securelevel is raised, only pass control to the backend if MAC
specifically grants access; otherwise simply check to see if mac_ddb
vetoes the request, as before.
Add a new secure sysctl, debug.kdb.enter_securelevel, to override this
behaviour. That is, the sysctl lets one enter a KDB backend even with a
raised securelevel, so long as it is set before the securelevel is
raised.
Reviewed by: mhorne, stevek
MFC after: 1 month
Sponsored by: Juniper Networks
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D37122
Summary:
Port the MAC modules to use the IfAPI APIs as part of this.
Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D38197
2449b9e5fe introduced API changes
that require ensuring that loadable MAC modules use the matching API.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
o Assert that every protosw has pr_attach. Now this structure is
only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw. This was suggested
in 1996 by wollman@ (see 7b187005d1), and later reiterated
in 2006 by rwatson@ (see 6fbb9cf860).
o Make struct domain hold a variable sized array of protosw pointers.
For most protocols these pointers are initialized statically.
Those domains that may have loadable protocols have spacers. IPv4
and IPv6 have 8 spacers each (andre@ dff3237ee5).
o For inetsw and inet6sw leave a comment noting that many protosw
entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36232
Add three simple hooks to the debugger allowing for a loaded MAC policy
to intervene if desired:
1. Before invoking the kdb backend
2. Before ddb command registration
3. Before ddb command execution
We extend struct db_command with a private pointer and two flag bits
reserved for policy use.
Reviewed by: markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35370
When packet is a SYN packet, we don't need to modify any existing PCB.
Normally SYN arrives on a listening socket, we either create a syncache
entry or generate syncookie, but we don't modify anything with the
listening socket or associated PCB. Thus create a new PCB lookup
mode - rlock if listening. This removes the primary contention point
under SYN flood - the listening socket PCB.
Sidenote: when SYN arrives on a synchronized connection, we still
don't need write access to PCB to send a challenge ACK or just to
drop. There is only one exclusion - tcptw recycling. However,
existing entanglement of tcp_input + stacks doesn't allow to make
this change small. Consider this patch as first approach to the problem.
Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D29576
place -- in the VOP rather than vn_setexttr() -- and that it is for historic
reasons. We might wish to relocate it in due course, but this way at least
we document the asymmetry.
pipes get stated all thet time and this avoidably contributed to contention.
The pipe lock is only held to accomodate MAC and to check the type.
Since normally there is no probe for pipe stat depessimize this by having the
flag.
The pipe_state field gets modified with locks held all the time and it's not
feasible to convert them to use atomic store. Move the type flag away to a
separate variable as a simple cleanup and to provide stable field to read.
Use short for both fields to avoid growing the struct.
While here short-circuit MAC for pipe_poll as well.
I have such a custom kernel configuration and its build failed with:
linking kernel.full
ld: error: undefined symbol: mac_vnode_assert_locked
>>> referenced by mac_framework.h:556 (/usr/devel/git/apu2c4/sys/security/mac/mac_framework.h:556)
>>> tmpfs_vnops.o:(mac_vnode_check_stat)
>>> referenced by mac_framework.h:556 (/usr/devel/git/apu2c4/sys/security/mac/mac_framework.h:556)
>>> vfs_default.o:(mac_vnode_check_stat)
>>> referenced by mac_framework.h:556 (/usr/devel/git/apu2c4/sys/security/mac/mac_framework.h:556)
>>> ufs_vnops.o:(mac_vnode_check_stat)
The code would unconditionally lock the vnode to audit or call the
mac hoook, even if neither want to do anything. Pre-check the state
to avoid locking in the common case of nothing to do.
Note this code should not be normally executed anyway as vnodes are
always return ready. However, poll1/2 from will-it-scale use regular
files for benchmarking, presumably to focus on the interface itself
as the vnode handler is not supposed to do almost anything.
This in particular fixes poll2 which passes 128 fds.
$ ./poll2_processes -s 10
before: 134411
after: 271572
This lock was made unnecessary by the addition of mac_policy_rms in r356120.
Reviewed by: mjg, kib
Differential Revision: https://reviews.freebsd.org/D24283
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
All checking routines walk a linked list of all modules in order to determine
if given hook is installed. This became a significant problem after mac_ntpd
started being loaded by default.
Implement a way perform checks for select hooks by testing a boolean.
Use it for priv_check and priv_grant, which are constantly called from priv_check.
The real fix would use hotpatching, but the above provides a way to know when
to do it.
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427
If any non-static modules are loaded (and mac_ntpd tends to be), the lock is
taken all the time al over the kernel. On platforms like arm64 this results in
an avoidable significant performance degradation. Since write-locking is almost
never needed, use a primitive optimized towards read-locking.
Sample result of building the kernel on tmpfs 11 times:
stock 11142.80s user 6704.44s system 4924% cpu 6:02.42 total
patched 11118.95s user 2374.94s system 4547% cpu 4:56.71 total
entry, when that entry has been seen already, keep the
already-looked-up value in a variable and use that instead of looking
it up again.
Approved by: alc, markj (earlier version), kib (earlier version)
Differential Revision: https://reviews.freebsd.org/D22348
around entry->{next,prev} when those are used for ordered list
traversal, and use those wrapper functions everywhere. Where the next
field is used for maintaining a stack of deferred operations, #define
defer_next to make that different usage clearer, and then use the
'right' pointer instead of 'next' for that purpose.
Approved by: markj
Tested by: pho (as part of a larger patch)
Differential Revision: https://reviews.freebsd.org/D22347
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.
Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.
Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D21882
neighbors, and is used in a way so that if entries a and b cannot be
merged, we consider them twice, first not-merging a with its successor
b, and then not-merging b with its predecessor a. This change replaces
vm_map_simplify_entry with vm_map_try_merge_entries, which compares
two adjacent entries only, and uses it to avoid duplicated
merge-checks.
Tested by: pho
Reviewed by: alc
Approved by: markj (implicit)
Differential Revision: https://reviews.freebsd.org/D20814