interface as nmount(2). Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
This replaces jail(2).
* jail_get, to read the parameters of existing jails. This replaces the
security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes. The current jail(2) system
call still exists, though it is now a stub to jail_set(2).
Approved by: bz (mentor)
The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:
- Improved driver model:
The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.
If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.
- Improved hotplugging:
With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).
The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.
- Improved performance:
One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.
Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.
Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
object which provides the backing store. Each descriptor starts off with
a size of zero, but the size can be altered via ftruncate(2). The shared
memory file descriptors also support fstat(2). read(2), write(2),
ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
manage shared memory file descriptors. The virtual namespace that maps
pathnames to shared memory file descriptors is implemented as a hash
table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
path argument to shm_open(2). In this case, an unnamed shared memory
file descriptor will be created similar to the IPC_PRIVATE key for
shmget(2). Note that the shared memory object can still be shared among
processes by sharing the file descriptor via fork(2) or sendmsg(2), but
it is unnamed. This effectively serves to implement the getmemfd() idea
bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
collected when they are not referenced by any open file descriptors or
the shm_open(2) virtual namespace.
Submitted by: dillon, peter (previous versions)
Submitted by: rwatson (I based this on his version)
Reviewed by: alc (suggested converting getmemfd() to shm_open())
syscalls, unless WITHOUT_SYSCALL_COMPAT is defined. The default case
will have the .c wrappers still. If you define WITHOUT_SYSCALL_COMPAT,
the .c wrappers will go away and libc will make direct syscalls.
After 7-stable starts, the direct syscall method will be default.
Approved by: re (kensmith)
and writev() except that they take an additional offset argument and do
not change the current file position. In SAT speak:
preadv:readv::pread:read and pwritev:writev::pwrite:write.
- Try to reduce code duplication some by merging most of the old
kern_foov() and dofilefoo() functions into new dofilefoo() functions
that are called by kern_foov() and kern_pfoov(). The non-v functions
now all generate a simple uio on the stack from the passed in arguments
and then call kern_foov(). For example, read() now just builds a uio and
calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().
PR: kern/80362
Submitted by: Marc Olzheim marcolz at stack dot nl (1)
Approved by: re (scottl)
MFC after: 1 week
- All those diffs to syscalls.master for each architecture *are*
necessary. This needed clarification; the stub code generation for
mlockall() was disabled, which would prevent applications from
linking to this API (suggested by mux)
- Giant has been quoshed. It is no longer held by the code, as
the required locking has been pushed down within vm_map.c.
- Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
to express their intention explicitly.
- Inspected at the vmstat, top and vm pager sysctl stats level.
Paging-in activity is occurring correctly, using a test harness.
- The RES size for a process may appear to be greater than its SIZE.
This is believed to be due to mappings of the same shared library
page being wired twice. Further exploration is needed.
- Believed to back out of allocations and locks correctly
(tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).
PR: kern/43426, standards/54223
Reviewed by: jake, alc
Approved by: jake (mentor)
MFC after: 2 weeks
by allprison_mtx), a unique prison/jail identifier field, two path
fields (pr_path for reporting and pr_root vnode instance) to store
the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
struct xprison instances that represent a snapshot of active jails.
Reviewed by: rwatson, tjr
The uuidgen command, by means of the uuidgen syscall, generates one
or more Universally Unique Identifiers compatible with OSF/DCE 1.1
version 1 UUIDs.
From the Perforce logs (change 11995):
Round of cleanups:
o Give uuidgen() the correct prototype in syscalls.master
o Define struct uuid according to DCE 1.1 in sys/uuid.h
o Use struct uuid instead of uuid_t. The latter is defined
in sys/uuid.h but should not be used in kernel land.
o Add snprintf_uuid(), printf_uuid() and sbuf_printf_uuid()
to kern_uuid.c for use in the kernel (currently geom_gpt.c).
o Rename the non-standard struct uuid in kern/kern_uuid.c
to struct uuid_private and give it a slightly better definition
for better byte-order handling. See below.
o In sys/gpt.h, fix the broken uuid definitions to match the now
compliant struct uuid definition. See below.
o In usr.bin/uuidgen/uuidgen.c catch up with struct uuid change.
A note about byte-order:
The standard failed to provide a non-conflicting and
unambiguous definition for the binary representation. My initial
implementation always wrote the timestamp as a 64-bit little-endian
(2s-complement) integral. The clock sequence was always written
as a 16-bit big-endian (2s-complement) integral. After a good
nights sleep and couple of Pan Galactic Gargle Blasters (not
necessarily in that order :-) I reread the spec and came to the
conclusion that the time fields are always written in the native
by order, provided the the low, mid and hi chopping still occurs.
The spec mentions that you "might need to swap bytes if you talk
to a machine that has a different byte-order". The clock sequence
is always written in big-endian order (as is the IEEE 802 address)
because its division is resulting in bytes, making the ordering
unambiguous.