out of cdregister() and daregister(), which are run from interrupt context.
The sysctl code does blocking mallocs (M_WAITOK), which causes problems
if malloc(9) actually needs to sleep.
The eventual fix for this issue will involve moving the CAM probe process
inside a kernel thread. For now, though, I have fixed the issue by moving
dynamic sysctl variable creation for these two drivers to a task queue
running in a kernel thread.
The existing task queues (taskqueue_swi and taskqueue_swi_giant) run in
software interrupt handlers, which wouldn't fix the problem at hand. So I
have created a new task queue, taskqueue_thread, that runs inside a kernel
thread. (It also runs outside of Giant -- clients must explicitly acquire
and release Giant in their taskqueue functions.)
scsi_cd.c: Remove sysctl variable creation code from cdregister(), and
move it to a new function, cdsysctlinit(). Queue
cdsysctlinit() to the taskqueue_thread taskqueue once we
have fully registered the cd(4) driver instance.
scsi_da.c: Remove sysctl variable creation code from daregister(), and
move it to move it to a new function, dasysctlinit().
Queue dasysctlinit() to the taskqueue_thread taskqueue once
we have fully registered the da(4) instance.
taskqueue.h: Declare the new taskqueue_thread taskqueue, update some
comments.
subr_taskqueue.c:
Create the new kernel thread taskqueue. This taskqueue
runs outside of Giant, so any functions queued to it would
need to explicitly acquire/release Giant if they need it.
cd.4: Update the cd(4) man page to talk about the minimum command
size sysctl/loader tunable. Also note that the changer
variables are available as loader tunables as well.
da.4: Update the da(4) man page to cover the retry_count,
default_timeout and minimum_cmd_size sysctl variables/loader
tunables. Remove references to /dev/r???, they aren't used
any longer.
cd.9: Update the cd(9) man page to describe the CD_Q_10_BYTE_ONLY
quirk.
taskqueue.9: Update the taskqueue(9) man page to describe the new thread
task queue, and the taskqueue_swi_giant queue.
MFC after: 3 days
Changes from the original implementation:
- Fragmentation is handled by the function m_fragment, which can
be called from whereever fragmentation is needed. Note that this
function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing
use.
- m_fragment works slightly differently from the old fragmentation
code in that it allocates a seperate mbuf cluster for each fragment.
This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent
fragments. While that is a nice feature in practice, it nerfed the
usefulness of mbuf_stress_test.
- Add two modes of random fragmentation. Chains with fragments all of
the same random length and chains with fragments that are each uniquely
random in length may now be requested.
bail out if the buffer is not already present.
- The buffer returned by incore() is not locked and should not be sent to
brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the
desired semantics.
line up the function names in an earlier generation of the API when
some of the functions returned structure pointers.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
- Surround all accesses of the BKGRD{WAIT,INPROG} flags with the vnode
interlock.
- Don't use the B_LOCKED flag and QUEUE_LOCKED for background write
buffers. Check for the BKGRDINPROG flag before recycling or throwing
away a buffer. We do this instead because it is not safe for us to move
the original buffer to a new queue from the callback on the background
write buffer.
- Remove the B_LOCKED flag and the locked buffer queue. They are no longer
used.
- The vnode interlock is used around checks for BKGRDINPROG where it may
not be strictly necessary. If we hold the buf lock the a back-ground
write will not be started without our knowledge, one may only be
completed while we're not looking. Rather than remove the code, Document
two of the places where this extra locking is done. A pass should be
done to verify and minimize the locking later.
Restructure the way ATA/ATAPI commands are processed, use a common
ata_request structure for both. This centralises the way requests
are handled so locking is much easier to handle.
The driver is now layered much more cleanly to seperate the lowlevel
HW access so it can be tailored to specific controllers without touching
the upper layers. This is needed to support some of the newer
semi-intelligent ATA controllers showing up.
The top level drivers (disk, ATAPI devices) are more or less still
the same with just corrections to use the new interface.
Pull ATA out from under Gaint now that locking can be done in a sane way.
Add support for a the National Geode SC1100. Thanks to Soekris engineering
for sponsoring a Soekris 4801 to make this support.
Fixed alot of small bugs in the chipset code for various chips now
we are around in that corner anyways.
mac_reflect_mbuf_icmp()
mac_reflect_mbuf_tcp()
These entry points permit MAC policies to do "update in place"
changes to the labels on ICMP and TCP mbuf headers when an ICMP or
TCP response is generated to a packet outside of the context of
an existing socket. For example, in respond to a ping or a RST
packet to a SYN on a closed port.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
explicit access control checks to delete and list extended attributes
on a vnode, rather than implicitly combining with the setextattr and
getextattr checks. This reflects EA API changes in the kernel made
recently, including the move to explicit VOP's for both of these
operations.
Obtained from: TrustedBSD PRoject
Sponsored by: DARPA, Network Associates Laboratories
A timecounter will be selected when registered if its quality is
not negative and no less than the current timecounters.
Add a sysctl to report all available timecounters and their qualities.
Give the dummy timecounter a solid negative quality of minus a million.
Give the i8254 zero and the ACPI 1000.
The TSC gets 800, unless APM or SMP forces it negative.
Other timecounters default to zero quality and thereby retain current
selection behaviour.
check for permissions, do it for all requests, not the known requests.
Later when we actually service the request we deal with the invalid
requests we previously caught earlier.
This commit changes the behaviour of the ptrace(2) interface for
boundary cases such as an unknown request without proper permissions.
Previously we would return EINVAL. Now we return EBUSY or EPERM.
Platforms need to define __HAVE_PTRACE_MACHDEP when they have MD
requests. This makes the prototype of cpu_ptrace() visible and
introduces a call to this function for all requests greater or
equal to PT_FIRSTMACH.
Silence on: audit
to walk the list and remove the current item and destroy/free it.
Alexander Kabaev will likely do the equivalent for the other list
types, but I just happened to have this one sitting in a local
non-FreeBSD tree already.
cpu_switch() where both the old and new threads are passed in as
arguments. Only powerpc uses the old conventions now.
- Update comments in the Alpha swtch.s to reflect KSE changes.
Tested by: obrien, marcel
- All those diffs to syscalls.master for each architecture *are*
necessary. This needed clarification; the stub code generation for
mlockall() was disabled, which would prevent applications from
linking to this API (suggested by mux)
- Giant has been quoshed. It is no longer held by the code, as
the required locking has been pushed down within vm_map.c.
- Callers must specify VM_MAP_WIRE_HOLESOK or VM_MAP_WIRE_NOHOLES
to express their intention explicitly.
- Inspected at the vmstat, top and vm pager sysctl stats level.
Paging-in activity is occurring correctly, using a test harness.
- The RES size for a process may appear to be greater than its SIZE.
This is believed to be due to mappings of the same shared library
page being wired twice. Further exploration is needed.
- Believed to back out of allocations and locks correctly
(tested with WITNESS, MUTEX_PROFILING, INVARIANTS and DIAGNOSTIC).
PR: kern/43426, standards/54223
Reviewed by: jake, alc
Approved by: jake (mentor)
MFC after: 2 weeks
From alc:
Move pageable pipe memory to a seperate kernel submap to avoid awkward
vm map interlocking issues. (Bad explanation provided by me.)
From me:
Rework pipespace accounting code to handle this new layout, and adjust
our default values to account for the fact that we now have a solid
limit on allocations.
Also, remove the "maxpipes" limit, as it no longer has a purpose.
(The limit on kva usage solves the problem of having two many pipes.)
than i386 or AMD64, TP register points to thread mailbox, and they can not
atomically clear km_curthread in kse mailbox, in this case, thread retrieves
its thread pointer from TP register and sets flag TMF_NOUPCALL in its thread
mailbox to indicate a critical region.
support routines in kern_acl.c:
- Define ACL_OVERRIDE_MASK and ACL_PRESERVE_MASK centrally in acl.h: the
mode bits that are (and aren't) stored in the ACL.
- Add acl_posix1e_acl_to_mode(): given a POSIX.1e extended ACL, generate
a compatibility mode (only the bits supported by the POSIX.1e ACL).
- acl_posix1e_newfilemode(): Given a requested creation mode and default
ACL, calculate the mode for the new file system object (only the bits
supported by the POSIX.1e ACL).
PR: 50148
Reported by: Ritz, Bruno <bruno_ritz@gmx.ch>
Obtained from: TrustedBSD Project
another thread. We use the td_oncpu member of the other field to locate
it's associated CPU and then search the that CPU's list of spin locks
contained in its per-CPU data. This is not always safe and may in fact
panic or just not work, but it is useful in at least one case.
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.
the "do-nothing" versions of __RCSID(), __RCSID_SOURCE(), __SCCSID(),
and __COPYRIGHT() were not strictly correct. They should not expand
into [nothing], because the ';' which follows them would then cause
a syntax error (in a strict C compiler, if not gcc...).
So, change the do-nothing versions of those macros to use the
'struct __hack' tactic, as was already used with __FBSDID().
Approved by: discussions with bde
MFC after: 1 week
__attribute__((__always_inline__)) by adding an __always_inline macro
(used like __dead2 etc). __inline_damnit has also been suggested but we
have a precedent of keeping the names similar so they are easier to find.
that an argument is not a NULL pointer.
Apply various obvious places.
I belive __printf*() implies __nonnull() so it is not needed on functions
already tagged that way.