to size_t *, which is incorrect because they may have different widths.
This caused some subtle forms of corruption, the mostly frequently
reported one being that the last character of a filename was sometimes
duplicated on amd64.
stopped returning events. Don't disable the event when removing
the handler because it still needs to be enabled for the other
handler. Also, remove duplicate AcpiEnableEvent calls since the
install function now does this for us.
into its own file:
- All of the $PIR interrupt routing is now done in a link-centric fashion.
When a host-PCI bridge that uses the $PIR attaches, it calls pir_parse()
to parse the table. This scans for link devices and merges all the masks
for each link device from the table entries. It then looks at the intline
register of PCI devices connected to a link to figure out if the BIOS has
routed this link and if so to which IRQ.
- The IRQ for any given link can be overridden via a hint like so:
'hw.pci.link.0x62.irq=10' Any IRQ set in this matter is treated as if it
were set that way by the BIOS.
- We only call the BIOS to route each link device once.
- When a PCI device wants to route an interrupt, we look it up in the $PIR
to find the associated link. If the link is routed, we simply return the
IRQ it is using. If it is not routed, we have to pick one. This uses a
different algorithm from the old code. First off, when we try to pick
an interrupt from a mask of possible interrupts, we try to pick the one
that is least loaded as far as PCI devices. We maintain this weight based
on the number of devices attached to each link device. When choosing an
IRQ, we first attempt to route using any PCI only interrupts (the old
code did this as well). If that doesn't work, we try to use the list of
IRQs that the BIOS has used. This is a new step that the new code didn't
do and avoids using IRQ 3 or 4 for every virgin interrupt routing. If
none of the IRQs that the BIOS used worked, then we fall back to trying
anything.
- The fallback mask for !PC98 was fixed to include IRQ 3 and not allow IRQ
2.
- We don't use the $PIR to route interrupts on a PCI-PCI bridge unless it
has already been used to route on at least one Host-PCI bridge. This
helps to avoid mixing and matching x86 firmware PCI interrupt routing
methods (which is a Bad Thing(tm)).
Silence on: current@
Previously the "struct disk" were owned by the device driver and this
gave us problems when the device disappared and the users of that device
were not immediately disappearing.
Now the struct disk is allocate with a new call, disk_alloc() and owned
by geom_disk and just abandonned by the device driver when disk_create()
is called.
Unfortunately, this results in a ton of "s/\./->/" changes to device
drivers.
Since I'm doing the sweep anyway, a couple of other API improvements
have been carried out at the same time:
The Giant awareness flag has been flipped from DISKFLAG_NOGIANT to
DISKFLAG_NEEDSGIANT
A version number have been added to disk_create() so that we can detect,
report and ignore binary drivers with old ABI in the future.
Manual page update to follow shortly.
support is partial in that it will refuse to create large files on
filesystems that haven't been upgraded to EXT2_DYN_REV or that don't
have the EXT2_FEATURE_RO_COMPAT_LARGE_FILE flag set in the superblock.
MFC after: 2 weeks
kernel. I'm not happy with it yet - refinements are to come.
This hack allows the kern.ps_strings and kern.usrstack sysctls to respond
to a 32 bit request, such as those coming from emulated i386 binaries.
value for MSGBUF_SIZE is configured. MSGBUF_SIZE =
(32768 * bootverbose ? 2 : 1) is always 1 or 2, so there is not enough space
in the buffer for metadata, and blindly using the nonexistent space tends
to cause fatal pagefaults. I think
MSGBUF_SIZE = (32768 * (bootverbose ? 2 : 1)) would be always 32768 since
bootverbose is only statically initialized to 0 early when MSGBUF_SIZE is
used. MSGBUF_SIZE = (32768 * ((boothowto & RB_VERBOSE) ? 2 : 1)) should
work, but this belongs in <sys/msgbuf.h> even less than previous versions.
MSGBUF_SIZE shouldn't be a macro.
it means that the correct value is unknown. Since this value is just
a hint to improve performance, initially assume that the first non-reserved
cluster is free, then correct this assumption if necessary before writing
the FSInfo block back to disk.
PR: 62826
MFC after: 2 weeks
result in a panic "vm_page_cache: caching a dirty page, ...": Access to the
page must be restricted or removed before calling vm_page_cache(). This
race condition is identical in nature to that which was addressed by
vm_pageout.c's revision 1.251 and vm_page.c's revision 1.275.
MFC after: 7 days
- When adding new waiting threads to the waitlist for an object,
use INSERT_LIST_TAIL() instead of INSERT_LIST_HEAD() so that new
waiters go at the end of the list instead of the beginning. When we
wake up a synchronization object, only the first waiter is awakened,
and this needs to be the first thread that actually waited on the object.
- Correct missing semicolon in INSERT_LIST_TAIL() macro.
- Implement lookaside lists correctly. Note that the Am1771 driver
uses lookaside lists to manage shared memory (i.e. DMAable) buffers
by specifying its own alloc and free routines. The Microsoft documentation
says you should avoid doing this, but apparently this did not deter
the developers at AMD from doing it anyway.
With these changes (which are the result of two straight days of almost
non-stop debugging), I think I finally have the object/thread handling
semantics implemented correctly. The Am1771 driver no longer crashes
unexpectedly during association or bringing the interface up.
resources. (Note that the correct range is 0x3f7,0x3f0-0x3f5.) Such
devices will be detected as follows:
fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> port
0x3f7,0x3f4-0x3f5,0x3f2-0x3f3,0x3f0-0x3f1 irq 6 drq 2 on acpi0
To do this, we find the minimum and maximum start addresses for the
resources and use them as the base for the IO and control ports.
Help from: jhb
missing parentheses). Use default handling (trap to debugger) for
udev2dev(x, 1) since it is an error and doesn't happen anywhere in
the sys tree except in bogusly commented out code in coda.
and given a value, but never used. This has no effect on the
resulting binaries, since gcc optimizes the variable away anyway.
PR: kern/62684
Approved by: rwatson (mentor)
panic "vm_page_cache: caching a dirty page, ...": Access to the page must
be restricted or removed before calling vm_page_cache(). This race
condition is identical in nature to that which was addressed by
vm_pageout.c's revision 1.251 and vm_page.c's revision 1.275.
Reviewed by: tegge
MFC after: 7 days
The Am1771 driver will sometimes do the following:
- Some thread-> NdisScheduleWorkItem(some work)
- Worker thread -> do some work, KeWaitForSingleObject(some event)
- Some other thread -> NdisScheduleWorkItem(some other work)
When the second call to NdisScheduleWorkItem() occurs, the NDIS worker
thread (in our case ndis taskqueue) is suspended in KeWaitForSingleObject()
and waiting for an event to be signaled. This is different from when
the worker thread is idle and waiting on NdisScheduleWorkItem() to
send it more jobs. However, the ndis_sched() function in kern_ndis.c
always calls kthread_resume() when queueing a new job. Normally this
would be ok, but here this causes KeWaitForSingleObject() to return
prematurely, which is not what we want.
To fix this, the NDIS threads created by kern_ndis.c maintain a state
variable to indicate whether they are running (scanning the job list
and executing jobs) or sleeping (blocked on kthread_suspend() in
ndis_runq()), and ndis_sched() will only call kthread_resume() if
the thread is in the sleeping state.
Note that we can't just check to see if the thread is on the run queue:
in both cases, the thread is sleeping, but it's sleeping for different
reasons.
This stops the Am1771 driver from emitting various "NDIS ERROR" messages
and fixes some cases where it crashes.
data for the file system on which the jail's root vnode is located.
Previous behavior (show data for all mountpoints) can be restored
by setting security.jail.getfsstatroot_only to 0. Note: this also
has the effect of hiding other mounts inside a jail, such as /dev,
/tmp, and /proc, but errs on the side of leaking less information.
called until DEVFS had a chance to initialize. Since DEVFS is mandatory
and things over in that department coincidentally works from without
any initialization now, this is safe.
could result in a panic "vm_page_cache: caching a dirty page, ...":
Access to the page must be restricted or removed before calling
vm_page_cache(). This race condition is identical in nature to that
which was addressed by vm_pageout.c's revision 1.251.
- Simplify the code surrounding the fix to this same race condition
in vm_pageout.c's revision 1.251. There should be no behavioral
change. Reviewed by: tegge
MFC after: 7 days
- don't unlock the vnode after vinvalbuf() only to have to relock it
almost immediately.
- don't refer to devices classified by vn_isdisk() as block devices.
is reserved by the loader, and thus any tunable name with that suffix will
be silently discarded.
Document this in the header and man page so that other developers do not
develop so many bumps on the head after banging it against the wall.
Detective work by: Mark Santcroos, grehan
the vnode around calls to vinvalbuf()). Apparently no one has tested
ext2fs with DEBUG_VOP_LOCKS. Vnode locking for vinvalbuf() might not
be required in non-soft-updates cases, but it is now asserted.
MFffs (uncommitted related and nearby cleanups: don't unlock the vnode
after vinvalbuf() only to have to relock it almost immediately; don't
refer to devices classified by vn_isdisk() as block devices).
them mostly with packet tags (one case is handled by using an mbuf flag
since the linkage between "caller" and "callee" is direct and there's no
need to incur the overhead of a packet tag).
This is (mostly) work from: sam
Silence from: -arch
Approved by: bms(mentor), sam, rwatson
ISA. npx has few isa dependencies, but it does unconditional outb()'s to
the isa bus in the !SMP case, and it attaches to isa if "device isa" is
configured in order to support PNP-ISA. The ifdef for the latter was
misplaced.
PR: 62595
dishonored in rev.1.1 by commenting out the code that honored it. This
gave the worst disadvantages of async mounts in an uncontrollable way.
Honoring the flag costs about 50% in real time in worst cases on a new
but not very fast ATA drive with write caching (probably more on drives
without write caching). The old misbehavior can be recovered using
async mounts after implementing them in mount_ext2fs(8) (just put the
MNT_ASYNC flag in mount_ext2fs's table of supported options like it
is in mount's table).
- rejects IPv6 packet toward IPv4-mapped address if its source address
is not an IPv4-mapped IPv6 address, since the converted IPv4 packets
would have an unexpected IPv4 source address.
- when V6ONLY socket option is set, discard packets destined to a
v4/ipv4 mapped ipv6 address.
- have PULLDOWN_TEST codepath.
- get rid of in6_mcmatch().
Obtained from: KAME
to one, DEBUG_FLAGS, which is also compatible with <bsd.prog.mk>.
Previously one had to set both DEBUG and DEBUG_FLAGS to build the
.ko.debug with debugging symbols which was boring when doing this
manually.
versions of the modules, and unconditionally putting -g in CFLAGS
has negative impact on the size of the resulting .ko object, even
now that debugging symbols are always stripped.
shown that it is not useful.
Rename the relative count g_access_rel() function to g_access(), only
the name has changed.
Change all g_access_rel() calls in our CVS tree to call g_access() instead.
Add an #ifndef BURN_BRIDGES #define of g_access_rel() for source
code compatibility.
operators) in and near revs.1.169-1.170 (open mode bandaid). This
(or better a proper fix) should have been done before cloning the
bandaid to many other file systems.
set to SIGCHLD. This avoids the creation of orphaned Linux-threaded
zombies that init is unable to reap. This can occur when the parent
process sets its SIGCHLD to SIG_IGN. Fix a similar situation in the
PT_DETACH code.
Tested by: "Steven Hartland" <killing AT multiplay.co.uk>
- rev.1.42 of ffs_readwrite.c added a special case in ffs_read() for reads
that are initially at EOF, and rev.1.62 of ufs_readwrite.c fixed
timestamp bugs in it. Removal of most of vfs_ioopt made it just and
optimization, and removal of the vm object reference calls made it less
than an optimization. It was cloned in rev.1.94 of ufs_readwrite.c as
part of cloning ffs_extwrite() although it was always less than an
optimization in ffs_extwrite().
- some comments, compound statements and vertical whitespace were vestiges
of dead code.
- culled long-dead #define's
- segment register defs moved to sr.h
- NPMAPS moved to pmap.h
- KERNBASE moved to vmparam.h
- removed include of <machine/cpu.h> and fixed src files that
relied on this.
Modifying segment register code no longer causes gcc rebuilds :-)
This is the first of two commits; bringing in the kernel support first.
This can be enabled by compiling a kernel with options TCP_SIGNATURE
and FAST_IPSEC.
For the uninitiated, this is a TCP option which provides for a means of
authenticating TCP sessions which came into being before IPSEC. It is
still relevant today, however, as it is used by many commercial router
vendors, particularly with BGP, and as such has become a requirement for
interconnect at many major Internet points of presence.
Several parts of the TCP and IP headers, including the segment payload,
are digested with MD5, including a shared secret. The PF_KEY interface
is used to manage the secrets using security associations in the SADB.
There is a limitation here in that as there is no way to map a TCP flow
per-port back to an SPI without polluting tcpcb or using the SPD; the
code to do the latter is unstable at this time. Therefore this code only
supports per-host keying granularity.
Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6),
TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective
users of this feature, this will not pose any problem.
This implementation is output-only; that is, the option is honoured when
responding to a host initiating a TCP session, but no effort is made
[yet] to authenticate inbound traffic. This is, however, sufficient to
interwork with Cisco equipment.
Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with
local patches. Patches for tcpdump to validate TCP-MD5 sessions are also
available from me upon request.
Sponsored by: sentex.net
systems define power/sleep buttons in both places but only deliver
notifies to the ones defined in the AML.
Also, reduce length of various function handler names.
PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
pci-hi/med/lo + node 'interrupts' property. This worked by
accident until recent notebooks required correct operation.
Tested by: Suleiman Souhlal <refugee@segfaulted.com>
routines to do anything except return error if the miniport adapter context
is not set (meaning we either having init'ed the driver yet, or the
initialization failed).
Also, be sure to NULL out the adapter context along with the
miniport characteristics pointers if calling the MiniportInitialize()
method fails.
the added comment for low-level details.) The effect of this race
condition is a panic "vm_page_cache: caching a dirty page, ..."
Reviewed by: tegge
MFC after: 7 days
created with the same name, and vice versa:
- Immediately recycle vnodes of files & directories that have been deleted
or renamed.
- When looking an entry in the VFS name cache or smbfs's private
cache, make sure the vnode type is consistent with the type of file
the server thinks it is, and re-create the vnode if it isn't.
The alternative to this is to recycle vnodes unconditionally when their
use count drops to 0, but this would make all the caching we do
mostly useless.
PR: 62342
MFC after: 2 weeks
- Factor out common settings and put them in an upper level Makefile.inc.
- Properly use PROG for real programs, not their products.
- Further reduce diffs to i386 versions.
Tested on: sparc64 (panther)
- Now that bsd.prog.mk deals with programs linked with -nostdlib
better, and has a notion of an "internal" program, use PROG
where possible. This has a good impact on the contents of
.depend files and causes programs to be linked with cc(1).
XXX: boot2 couldn't be converted as it's actually two programs.
Tested on: i386, amd64
the "disappearing subdisks" problem when new subdisks can't be created
due to some errors.
This is in fact an ugly hack, but a more elegant solution would probably
require a redesign of vinum in several places.
Approved by: joerg (mentor)
mindful of blocking on disk I/O and instead return EBUSY when such
blocking would occur.
Results from the DeBox project indicate that blocking on disk I/O
can slow the performance of a kqueue/poll based webserver. Using
a flag such as SF_NODISKIO and throwing connections that would block
to helper processes/threads helped increase performance.
Currently, only the Flash webserver uses this flag, although it could
probably be applied to thttpd with relative ease.
Idea by: Yaoping Ruan & Vivek Pai
nodes, or if they did, they're now locked away on the Kurt Gdel
memorial home for the numerically confused:
Don't cast a kernel pointer (from makedev(9)) to an integer (maj+minor combo).
on the card, unmap it first. This allows it to be picked up properly when
the queue gets kicked again. This was the root problem for the lost command
(i.e. stuck in getblk/vinvalb) problem. While here, panic if commands don't
map correctly instead of just silently ignoring the problem and dropping
command. Also slow down the dynamic allocation of new commands.
It should be safe to go back into the aac waters. Thanks to everyone who
suffered through this and provided good feedback.
802.11b chipset work. This chip is present on the SMC2602W version 3
NIC, which is what was used for testing. This driver creates kernel
threads (12 of them!) for various purposes, and required the following
routines:
PsCreateSystemThread()
PsTerminateSystemThread()
KeInitializeEvent()
KeSetEvent()
KeResetEvent()
KeInitializeMutex()
KeReleaseMutex()
KeWaitForSingleObject()
KeWaitForMultipleObjects()
IoGetDeviceProperty()
and several more. Also, this driver abuses the fact that NDIS events
and timers are actually Windows events and timers, and uses NDIS events
with KeWaitForSingleObject(). The NDIS event routines have been rewritten
to interface with the ntoskrnl module. Many routines with incorrect
prototypes have been cleaned up.
Also, this driver puts jobs on the NDIS taskqueue (via NdisScheduleWorkItem())
which block on events, and this interferes with the operation of
NdisMAllocateSharedMemoryAsync(), which was also being put on the
NDIS taskqueue. To avoid the deadlock, NdisMAllocateSharedMemoryAsync()
is now performed in the NDIS SWI thread instead.
There's still room for some cleanups here, and I really should implement
KeInitializeTimer() and friends.
seems to work well in RELENG_4. However, 5.X locking foo means that I'll
have to do some quick redesign.
Add ioctl handlers for ISP_GETROLE and ISP_SETROLE ioctls.
handling of resources shortages. The driver is now so fast that it can
completely fill all 512 slots on the card, but for some reason only 511
slots are being allocated. Anything that tries to go into the 512th
slot gets silently lost. Both bugs need to be fixed at a later date,
but this should fix the reports of hangs in getblk and vinvalb.
edge cases in the loop.
- Try to grab a command before dequeueing the bio from the bioq. The old
behaviour of requeuing deferred bios to the end of the bioq is arguably
wrong. This should be fixed in the future to check the bioq head without
automatically dequeueing the bio.
- do not use PROG for what's not a real C program,
- use sys.mk transformation rules where possible,
- only create the "machine" symlink on AMD64,
- removed MAINTAINER lines in individual makefiles,
- added the LIBSTAND defitinion to <bsd.libnames.mk>,
- somewhat better contents in .depend files.
Tested on: i386, amd64
Prodded by: bde
to catch are already nicely caught by trapping the null pointer derefs.
Remove no-longer-used noswitch/nothrow strings. They were referenced
by the stub cpu_switch() etc functions before they were implemented.
Try something a little different for the lock prefixes.
Prompted by: bde (the first two items anyway)
RLIM_INFINITY case for ogetrlimit().
- Use %jd and intmax_t to output negative time in usec in calcru().
- Rework getrusage() to make a copy of the rusage struct into a local
variable while holding Giant and then do the copyout from the local
variable to avoid having to have the original process rusage struct
locked while doing the copyout (which would not be safe). This also
includes a few style fixes from Bruce to getrusage().
Submitted by: bde (1, parts of 3)
Suggested by: bde (2)
delete it each time its run and have it regenerated each time by make.
I used a quick hackish script rather than putting it in the files file
and used the before-depend rule to avoid the depend/no-depend hacks.
failed, the reference count for the virtual memory object referenced
by the specified shared memory segment would have been erroneously
incremented.
Reported by: Joost Pol <joost@pine.nl>
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64
TIOCMSET before FreeBSD existed and have never been implemented by any
FreeBSD serial driver (not even as aliases).
Moved the TIOCM bit definitions to be with TIOCMGET.
Added more comments gaps between ioctl numbers. There are now a large
number of conflicts with ppp, slip, tap and tun ioctls (especially ppp
ones) from closing gaps that weren't there. This mainly breaks decoding
of ioctl numbers in kdump, but there are some serious conflicts where
the interpretation of a tty ioctl depends on the line discipline.
aic79xx.seq:
Convert the COMPLETE_DMA_SCB list to an "stailq". This allows us to
safely keep the SCB that is currently being DMA'ed back the host on
the head of the list while processing completions off of the bus. The
newly completed SCBs are appended to the tail of the queue. In the
past, we just dequeued the SCB that was in flight from the list, but
this could result in a lost completion should the host perform certain
types of error recovery that must cancel all in-flight SCB DMA operations.
Switch from using a 16bit completion entry, holding just the tag and the
completion valid bit, to a 64bit completion entry that also contains a
"status packet valid" indicator. This solves two problems:
o The SCB DMA engine on at least Rev B. silicon does not properly deal
with a PCI disconnect that occurs at a non-64bit aligned offset in the
chips "source buffer". When the transfer is resumed, the DMA engine
continues at the correct offset, but may wrap to the head of the buffer
causing duplicate completions to be reported to the host. By using a
completion buffer in host memory that is 64bit aligned and using 64bit
completion entries, such disconnects should only occur at aligned addresses.
This assumes that the host bridge will only disconnect on cache-line
boundaries and that cache-lines are multpiles of 64bits.
o By embedding the status information in the completion entry we can avoid
an extra memory reference to the HSCB for commands that complete without
error.
Use the comparison of a "host freeze count" and a "sequencer freeze count"
to allow the host to process most SCBs that complete with non-zero status
without having to clear critical sections. Instead the host can just pause the
sequencer, performs any necessary cleanup in the waiting for selection list,
increments its freeze count on the controller, and unpauses. This is only
possible because the sequencer defers completions of SCBs with bad status
until after all pending selections have completed. The sequencer then avoids
referencing any data structures the host may touch during completion of the
SCB until the freeze counts match.
aic79xx.c:
Change the strategy for allocating our sentinal HSCB for the QINFIFO. In
the past, this allocation was tacked onto the QOUTFIFO allocation. Now that
the qoutfifo has grown to accomodate larger completion entries, the old
approach will result in a 64byte allocation that costs an extra page of
coherent memory. We now do this extra allocation via ahd_alloc_scbs()
where the "unused space" can be used to allocate "normal" HSCBs.
In our packetized busfree handler, use the ENSELO bit to differentiate
between packetized and non-packetized unexpected busfree events that
occur just after selection, but before the sequencer has had the oportunity
to service the selection.
When cleaning out the waiting for selection list, use the SCSI mode
instead of the command channel mode. The SCB pointer in the command
channel mode may be referenced by the SCB dma engine even while the
sequencer is paused, whereas the SCSI mode SCB pointer is only accessed
by the sequencer.
Print the "complete on qfreeze" sequencer SCB completion list in
ahd_dump_card_state(). This list holds all SCB completions that are deferred
until a pending select-out qfreeze event has taken effect.
aic79xx.h:
Add definitions and structures to handle the new SCB completion scheme.
Add a controller flag that indicates if the controller is in HostRAID
mode.
aic79xx.reg:
Remove macros used for toggling from one data fifo mode to the other.
They have not been in use for some time.
Add scratch ram fields for our new qfreeze count scheme, converting
the complete dma list into an "stailq", and providing for the "complete
on qfreeze" SCB completion list. Some other fields were moved to retain
proper field alignment (alignment >= field size in bytes).
aic79xx.seq:
Add code to our idle loop to:
o Process deferred completions once a qfreeze event has taken full
effect.
o Thaw the queue once the sequencer and host qfreeze counts match.
Generate 64bit completion entries passing the SCB_SGPTR field as the
"good status" indicator. The first bit in this field is only set if
we have a valid status packet to send to the host.
Convert the COMPLETE_DMA_SCB list to an "stailq".
When using "setjmp" to register an idle loop handler, do not combine
the "ret" with the block move to pop the stack address in the same
instruction. At least on the A, this results in a return to the setjmp
caller, not to the new address at the top of the stack. Since we want
the latter (we want the newly registered handler to only be invoked from
the idle loop), we must use a separate ret instruction.
Add a few missing critical sections.
Close a race condition that can occur on Rev A. silicon. If both FIFOs
happen to be allocated before the sequencer has a chance to service the
FIFO that was allocated first, we must take special care to service the
FIFO that is not active on the SCSI bus first. This guarantees that a
FIFO will be freed to handle any snapshot requests for the FIFO that is
still on the bus. Chosing the incorrect FIFO will result in deadlock.
Update comments.
aic79xx_inline.h
Correct the offset calculation for the syncing of our qoutfifo.
Update ahd_check_cmdcmpltqueues() for the larger completion entries.
aic79xx_pci.c:
Attach to HostRAID controllers by default. In the future I may add a
sysctl to modify the behavior, but since FreeBSD does not have any
HostRAID drivers, failing to attach just results in more email and
bug reports for the author.
MFC After: 1week
namespace pollution in other headers. <sys/types.h> is now the only
prerequisite for <sys/sx.h>.
Fixed some style bugs:
- removed bogus LOCORE ifdef. Including this C header in assembler
sources is just nonsense.
- removed unused include of <sys/_mutex.h>. It finished rotting when
the mutex in struct sx became indirect in rev.1.15.
- removed most comments on #else and #endif's and cleaned up the others.
All were misindented...
- add an option for the output device in the hope that this can
be made non-blocking at some stage.
- define an alias for the disk device, required by dev/ofw/ofw_disk.c
- shift iobus to 0x9000000 so as not to clash with the OpenFirmware
entry point of 0x8000400 when address decoding.
- down-tone comments about the disk dev config :-)
using the direct-mapping of physmem to force PTE data structures
to be physically addressable so the interrupt-time real-mode
DSI trap handler could perform PTE spills. However, the memory
may have been > 256Mb, which would have caused a BAT spill and
double-interrupt.
The new trap code no longer handles PTE spills, so the requirement
that these pages be direct-mapped no longer applies. The irony is
UMA_MD_SMALL_ALLOC will return direct mappings for these structs :-)
- remove unused 601 and tlb exception code
- remove interrupt-time PTE spill code. The pmap code
will now take care of pinning kernel PTEs, and there
are no longer issues about physical mapping of PTE
data structures
- All segment registers are switched on kernel entry/exit,
allowing the kernel to have more virtual space and for
user virtual space to extend to 4G.
- The temporary register save area has been shifted from
unused exception vector space to the per-cpu data area.
This allows interrupts to be delivered to multiple CPUs
- ISI traps no longer spill to BAT tables. It is assumed
that all of kernel instruction memory is pinned.
- shift from 'ldmw/stmw' instructions to individual register
loads/stores when saving context. All PPC manuals indicate
this should be much faster.
- use '%r' for register names throughout.
TODO: need to test if DSI traps were the result of kernel stack
guard-page hits.
Reworked from: NetBSD
- Rename temporary variable names ("tmp", "tmp2") to more informative
names ("load", "pctcpu", "rss", ...)
- Unclutter indentation and return paths: rather than lots of nested
ifs, simply return earlier if it's not going to work out. Simplify
general structure and avoid "deep" code.
- Comment on the thread/process selection and locking.
- Correct handling of "running"/"runnable" states, avoid "unknown"
that people were seeing for running processes. This was due to
a misunderstanding of the more complex state machine / inhibitors
behavior of KSE.
- Do perform ttyinfo() printing on KSE (P_SA) processes, it seems
generally to work.
While I initially attempted to formulate this as two commits (one
layout, the other content), I concluded that the layout changes were
really structural changes.
Many elements submitted by: bde
Since we have a worker thread now, we can actually do the allocation
asynchronously in that thread's context. Also, we need to return a
status value: if we're unable to queue up the async allocation, we
return NDIS_STATUS_FAILURE, otherwise we return NDIS_STATUS_PENDING
to indicate the allocation has been queued and will occur later.
This replaces the kludge where we just invoked the callback routine
right away in the current context.
The basic process is to send a routing socket announcement that the
interface has departed, change if_xname, update the sockaddr_dl
associated with the interface, and announce the arrival of the interface
on the routing socket.
As part of this change, ifunit() is greatly simplified by testing
if_xname directly. if_clone_destroy() now uses if_dname to look up the
cloner for the interface and if_dunit to identify the unit number.
Reviewed by: ru, sam (concept)
Vincent Jardin <vjardin AT free.fr>
Max Laier <max AT love2party.net>
that Asus provides on its CDs has both a MiniportSend() routine
and a MiniportSendPackets() function. The Microsoft NDIS docs say
that if a driver has both, only the MiniportSendPackets() routine
will be used. Although I think I implemented the support correctly,
calling the MiniportSend() routine seems to result in no packets going
out on the air, even though no error status is returned. The
MiniportSendPackets() function does work though, so at least in
this case it doesn't matter.
In if_ndis.c:ndis_getstate_80211(), if ndis_get_assoc() returns
an error, don't bother trying to obtain any other state since the
calls may fail, or worse cause the underlying driver to crash.
(The above two changes make the Asus-supplied Centrino work.)
Also, when calling the OID_802_11_CONFIGURATION OID, remember
to initialize the structure lengths correctly.
In subr_ndis.c:ndis_open_file(), set the current working directory
to rootvnode if we're in a thread that doesn't have a current
working directory set.
machdep.c fixed the missing early initialization of curpcb, so curpcb
is now always set together with curthread and it cannot be NULL except
before the IDT has been set up (so trap() is unreachable) or after a
memory error. In any case, it was often used without checking.
curcpb shouldn't exist anyway. It doesn't exist for most non-i386 arches.
It just caches curthread->td_pcb in a global. This was a better idea
before it was per-cpu. trap() and some other places can get at it more
efficiently using td->td_pcb instead of PCPU_GET(curpcb). The main
exception is support.s which mostly wants only curpcb->pcb_onfault.
instead, just dec/inc in the ctor/dtor. For now, increment/decrement
in two's, since we're now performing the operation once per pair,
not once per pipe. Not really any measurable performance change
in my micro-benchmarks, but doing less work is good, especially when
it comes to atomic operations.
Suggested by: alc
it is still above the critical temperature on the next poll cycle. This
is a 10 second advance notice by default. Document the private
(non-standard) notify we will be using with devd(8).
changes to jointly allocated pipe pairs. Replace these checks
with pipe_present checks. This avoids a NULL pointer dereference
when a pipe is half-closed.
Submitted by: Peter Edwards <peter.edwards@openet-telecom.com>
used for the ICMP reply source in reponse to packets which are not
directly addressed to us. By default continue with with normal
source selection.
Reviewed by: bms
1. Root from inside a jail was able to unmount any file system
(except /).
2. Unprivileged root was able to unmount file systems mounted by
privileged root (execpt /).
3. User from inside a jail was able to mount file system when
sysctl vfs.usermount was set to 1.
4. User was able to mount file system when vfs.usermount was set to 1
(that's ok) and unmount it even if vfs.usermount was equal to 0
(that's not correct).
Possibility from point 1 was reported by: Dariusz Kowalski <darek@76.pl>
Only a part of this fix will be MFC'ed (if approved).
PR: kern/60149
Reviewed by: rwatson
Approved by: scottl (mentor)
MFC after: 3 days
the system. Also, decrease the poll interval to 10 seconds from 30
seconds. This is needed because some systems will report an invalid high
temperature for one poll cycle. It is suspected this is due to the
embedded controller timing out. A typical value is 138C for one cycle on a
system that is otherwise 65C. This prevents the system from prematurely
shutting down after one invalid reading. It will still shut down after 30
seconds of high temperature, which is the same as previous default
behavior.
Tested by: Scott Lambert <lambert AT lambertfam.org>
is for an 802.11 device or not. At least one driver I have does not
support the OID_802_11_NETWORK_TYPES_SUPPORTED OID.
Also, for now, don't do anything special in the ndis_suspend() method.
I originally wanted to shut down the NIC but leave the IFF_UP flag alone
since technically the interface is meant to remain up, but an interrupt
may be delivered to the ISR on suspend, and if this happens while the
NIC is halted, we will crash, since none of the miniport driver methods
will function.
This needs to be dealt with properly later, but for now this prevents
a panic, and the resume method properly re-inits the NIC.
the thread that calls pmap_pte_quick() and by virtue of the page queues
lock being held, we can manage PADDR1/PMAP1 as a CPU private mapping.
The most common effect of this change is to reduce the overhead of the page
daemon on multiprocessors.
In collaboration with: tegge
packet along with data, instead of in their own packet. When serving files
of size (packetsize - headersize) or smaller, this will result in one less
packet crossing the network. Quick testing with thttpd and http_load has
shown a noticeable performance improvement in this case (350 vs 330 fetches
per second.)
Included in this commit are two support routines, iov_to_uio, and m_uiotombuf;
these routines are used by sendfile to construct the header mbuf chain that
will be linked to the rest of the data in the socket buffer.
sense with sched_4bsd as it does with sched_ule.
- Use P_NOLOAD instead of the absence of td->td_ithd to determine whether or
not a thread should be accounted for in sched_tdcnt.
when uma_reclaim() was called. This was introduced when the zone
working-set algorithm was removed in favor of using the per cpu caches
as the working set.
would allocate two 'struct pipe's from the pipe zone, and malloc a
mutex.
- Create a new "struct pipepair" object holding the two 'struct
pipe' instances, struct mutex, and struct label reference. Pipe
structures now have a back-pointer to the pipe pair, and a
'pipe_present' flag to indicate whether the half has been
closed.
- Perform mutex init/destroy in zone init/destroy, avoiding
reallocating the mutex for each pipe. Perform most pipe structure
setup in zone constructor.
- VM memory mappings for pageable buffers are still done outside of
the UMA zone.
- Change MAC API to speak 'struct pipepair' instead of 'struct pipe',
update many policies. MAC labels are also handled outside of the
UMA zone for now. Label-only policy modules don't have to be
recompiled, but if a module is recompiled, its pipe entry points
will need to be updated. If a module actually reached into the
pipe structures (unlikely), that would also need to be modified.
These changes substantially simplify failure handling in the pipe
code as there are many fewer possible failure modes.
On half-close, pipes no longer free the 'struct pipe' for the closed
half until a full-close takes place. However, VM mapped buffers
are still released on half-close.
Some code refactoring is now possible to clean up some of the back
references, etc; this patch attempts not to change the structure
of most of the pipe implementation, only allocation/free code
paths, so as to avoid introducing bugs (hopefully).
This cuts about 8%-9% off the cost of sequential pipe allocation
and free in system call tests on UP and SMP in my micro-benchmarks.
May or may not make a difference in macro-benchmarks, but doing
less work is good.
Reviewed by: juli, tjr
Testing help: dwhite, fenestro, scottl, et al
track the load for the sched_load() function. In the SMP case this member
is not defined because it would be redundant with the ksg_load member
which already tracks the non ithd load.
- For sched_load() in the UP case simply return ksq_sysload. In the SMP
case traverse the list of kseq groups and sum up their ksg_load fields.
of sched_load(). This variable tracks the number of running and runnable
non ithd threads. This removes the need to traverse the proc table and
discover how many threads are runnable.
at packet arrival.
For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL
since it has higher resolution and lower overhead. Simultaneous
use of the two options is possible and they will return consistent
timestamps.
This introduces an extra test and a function call for SO_TIMEVAL, but I have
not been able to measure that.
and ffs_write(). These calls trace their origins to the dead vfs_ioopt
code, first appearing in revision 1.39 of ufs_readwrite.c.
Observed by: bde
Discussed with: tegge
interrupt handler so that no locks are needed, and schedules the
command completion routine with a taskqueue_fast. This also corrects the
locking in the command thread and removes the need for operation flags.
Simple load tests show that this is now considerably faster than FreeBSD 4.x
in the SMP case when multiple i/o tasks are running.
"scheduler" here has very little to do with scheduling. It is actually
the swapper, and it really must be the last SYSINIT'ed item like its
comment says, since proc0 metamorphoses into swapper by calling
scheduler() last in mi_start(), and scheduler() never returns.. Rev.1.29
of subr_4bsd.c broke this by adding another SI_ORDER_FIRST item
(kproc_start() for schedcpu_thread() onto the SI_SUB_RUN_SCHEDULER_LIST.
The sorting of SYSINITs with identical orders (at all levels) is
apparently nondeterministic, so this resulted in schedule() sometimes
being called second last and schedcpu_thread() not being called at all.
This quick fix just changes the code to almost match the comment
(SI_ORDER_FIRST -> SI_ORDER_ANY). "LAST" is misspelled "ANY", and
there is no way to ensure that there is only 1 very lst SYSINIT.
A more complete fix would remove the SYSINIT obfuscation.
won't associate in BSS mode if you use AUTHMODE_SHARED. I probably don't
understand enough to know when SHARED should be used vs. OPEN or WPA.
For now, go back to what works.
bit for this being the last CTIO2. It didn't matter since it really was the
last CTIO2 and the resources recycled, but still....
Add in CTIO3 define for future DAC work.
for direct-mapped addresses. Assume that any address less than KVA
is one of these and return it. Also assert that an address is KVA
does have a valid mapping - callers of pmap_kextract don't check
the return value, since they assume that they have a valid virtual
address.
addressing of memory. Makes a substantial improvement for apps that
stress the limited amount of KVM on PPC (e.g. untarring the ports tree).
uma_machdep.c stolen from amd64/ia64.
updated for the regparm ABI on amd64.
Context switch debug regs.
Update for fpu simplification
Don't needlessly reload %cr3, in case the cpu has the tlb flush filter
turned off. Re-add LAZY_SWITCH stubs.
profiling buffers and hash table. This makes it a lot easier to
do multiple profiling runs without rebooting or performing
gratuitous arithmetic. Sysctl is named debug.mutex.prof.reset.
Reviewed by: jake
- witness_lock() is split into two pieces: witness_checkorder() and
witness_lock(). Witness_checkorder() determines if acquiring a specified
lock at the time it is called would result in a lock order. It
optionally adds a new lock order relationship as well. witness_lock()
updates witness's data structures to assume that a lock has been acquired
by stick a new lock instance in the appropriate lock instance list.
- The mutex and sx lock functions now call checkorder() prior to trying to
acquire a lock and continue to call witness_lock() after the acquire is
completed. This will let witness catch a deadlock before it happens
rather than trying to do so after the threads have deadlocked (i.e. never
actually report it).
- A new function witness_defineorder() has been added that adds a lock
order between two locks at runtime without having to acquire the locks.
If the lock order cannot be added it will return an error. This function
is available to programmers via the WITNESS_DEFINEORDER() macro which
accepts either two mutexes or two sx locks as its arguments.
- A few simple wrapper macros were added to allow developers to call
witness_checkorder() anywhere as a way of enforcing locking assertions
in code that might acquire a certain lock in some situations. The
macros are: witness_check_{mutex,shared_sx,exclusive_sx} and take an
appropriate lock as the sole argument.
- The code to remove a lock instance from a lock list in witness_unlock()
was unnested by using a goto to vastly improve the readability of this
function.
instead of taskqueue_swi. This shaves from 1 to 10% of the overhead.
Overhaul the locking once more, there was a few possible races that
are now closed.
panic() so that the buffer overflow just beyond this point is always
caught, even when the code is not compiled with INVARIANTS.
Change chn_setblocksize() buffer reallocation code to attempt to avoid
the feed_vchan16() buffer overflow by attempting to always keep the
bufsoft buffer at least as large as the bufhard buffer.
Print a diagnositic message
Danger! %s bufsoft size increasing from %d to %d after CHANNEL_SETBLOCKSIZE()
if our best attempts fail. If feed_vchan16() were to be called by
the interrupt handler while locks are dropped in chn_setblocksize()
to increase the size bufsoft to match the size of bufhard, the panic()
code in feed_vchan16() will be triggered. If the diagnostic message
is printed, it is a warning that a panic is possible if the system
were to see events in an "unlucky" order.
Change the locking code to avoid the need for MTX_RECURSIVE mutexes.
Add the MTX_DUPOK option to the channel mutexes and change the locking
sequence to always lock the parent channel before its children to avoid
the possibility of deadlock.
Actually implement locking assertions for the channel mutexes and fix
the problems found by the resulting assertion violations.
Clean up the locking code in dsp_ioctl().
Allocate the channel buffers using the malloc() M_WAITOK option instead
of M_NOWAIT so that buffer allocation won't fail. Drop locks across
the malloc() calls.
Add/modify KASSERTS() in attempt to detect problems early.
Abuse layering by adding a pointer to the snd_dbuf structure that points
back to the pcm_channel that owns it. This allows sndbuf_resize() to do
proper locking without having to change the its API, which is used by
the hardware drivers.
Don't dereference a NULL pointer when setting hw.snd.maxautovchans
if a hardware driver is not loaded. Noticed by Ryan Sommers
<ryans at gamersimpact.com>.
Tested by: Stefan Ehmann <shoesoft AT gmx.net>
Tested by: matk (Mathew Kanner)
Tested by: Gordon Bergling <gbergling AT 0xfce3.net>
debug.ddb_use_printf sysctl, output kernel debugger data to both the
console and kernel message buffer via printf. This fixes the case where
backtrace() went directly to the console and should help debugging greatly.
Thanks to Ian Dowse for the work, minor edits or any bugs are by myself.
Submitted by: iedowse
the proc lock only if we actually need to perform a stop. This
avoids two locks and unlocks of the process lock each system call,
and wins me about 20% on a simply system call test (getuid(),
which would otherwise require no locking). This also has a net
improvement of about 10MB/s on some of the SMP bandwidth tests
I'm running.
Reviewed by: jhb
- malloc() returns a void* and does not need a cast
- when called with M_WAITOK, malloc() can not return NULL so don't
check for that case. The result of the check was bogus anyway since
it would leave the interface broken.
o remove extraneous bzero
o add SYSINIT to properly initialize ip4_def_policy
Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
Submitted by: gnn@neville-neil.com
assure backward compatibility (conditional on !BURN_BRIDGES), look it up
by its old name first, and log a warning (but accept the setting) if it
was found. If both the old and new name are defined, the new name takes
precedence.
Also export vm.kmem_size as a read-only sysctl variable; I find it hard to
tune a parameter when I don't know its default value, especially when that
default value is computed at boot time.
kbd_attach() is called kbd[0-9]+, with a different unit number. This
makes it impossible to write a devd rule which will automatically
switch to a USB keyboard when one is attached, because there is no way
to guess the correct device node to pass to kbdcontrol.
Therefore, change kbd_attach() to create a device node using the
keyboard device's real name (atkbd0, ukbd0...), and create the
kbd[0-9]+ node as an alias for backward compatibility.
on an SIOCSIFADDR (by way of brain damage in net80211).
Also, avoid trying to set NDIS_80211_AUTHMODE_AUTO since the Microsoft
documentation I have recommends not using it, and the Centrino driver
seems to dislike being told to use it.
every system call, and that grabs and release the process lock each
time. Don't fix it (yet), but document it so we know to fix it.
Also should be a 5.3-RELEASE todo item.
and NdisCancelTimer(). NdisInitializeTimer() doesn't accept an NDIS
miniport context argument, so we have to derive it from the timer
function context (which is supposed to be the adapter private context).
NdisCancelTimer is now an alias for NdisMCancelTimer().
Also add stubs for NdisMRegisterDevice() and NdisMDeregisterDevice().
These are no-ops for now, but will likely get fleshed in once I start
working on the Am1771/Am1772 wireless driver.
can look at the ACPI tables. If the startup fails, we panic and tell the
user to try rebooting with ACPI disabled. Previously in this case we
would try to use $PIR interrupt routing which only works for the atpic
while using the apic to handle interrupts which would result in misrouted
interrupts and a hang at boot time with no error message.
- Read the SCI out of the FADT instead of hardcoding 9 when checking to see
if an interrupt override entry is for the SCI.
- Try to work around some BIOS brain damage for the SCI's programming by
forcing the SCI to be level triggered and active low if it is routed
to a non-ISA interrupt (greater than 15) or if it is identity mapped with
edge trigger and active high polarity. This should fix some of the hangs
with device apic and ACPI that some people see.
Reviewed by: njl
problem here still to be solved: the sockaddr_hci has still a 16 byte
field for the node name. The code currently does not correctly use the
length field in the sockaddr to handle the address length, so
node names get truncated to 15 characters when put into a sockaddr_hci.
introducing a START_NOW command. This command does not send
and GET_IFINDEX message downstream (to wait for the response from
the ETHERNET node), but directly starts the sending process. This allows
one to generate traffic as input for any hook on any node.
ifconfig(8) flag since header for version 2 is the same but IP payload
is prepended with additional 4-bytes field.
Inspired by: Roman Synyuk <roman@univ.kiev.ua>
MFC after: 2 weeks
attached when shutting down, kill our kthreads, but don't destroy
the mutex pool and uma zone resources since the driver shutdown
routine may need them later.
device that doesn't exists. I'm using my discretion and
committing without mentor approval since Seigo is away.
Noticed by: Maxime Henrion <mux@freebsd.org>
own file and make it opt-in, not mandatory, depending on CPU_ENABLE_LONGRUN
config(8) option.
PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
Discussed with: nate
MFC after: 2 weeks
than the switchin functions to guarantee that we're operating with the
correct tlb entry.
- Remove the post copy/zero tlb invalidations. It is faster to invalidate
an entry that is known to exist and so it is faster to invalidate after
use. However, some architectures implement speculative page table
prefetching so we can not be guaranteed that the invalidated entry is still
invalid when we re-enter any of these functions. As a result of this we
must always invalidate before use to be safe.
a deadlock in several years. Furthermore, the IPI code is currently
protected by a seperate spinlock. This only served to make IPIs twice as
expensive as they had to be which severely slowed down the IPI heavy ULE
scheduler.
SW_INVOL. Assert that one of these is set in mi_switch() and propery
adjust the rusage statistics. This is to simplify the large number of
users of this interface which were previously all required to adjust the
proper counter prior to calling mi_switch(). This also facilitates more
switch and locking optimizations.
- Change all callers of mi_switch() to pass the appropriate paramter and
remove direct references to the process statistics.
mutex profiling code. As with existing mutex profiling, measurement
is done with respect to mtx_lock() instances in the code, as opposed
to specific mutexes. In particular, measure two things:
(1) Lock contention. How often did this mtx_lock() call get made and
have to sleep (or almost sleep) waiting for the lock. This helps
identify the "victims" of contention.
(2) Hold contention. How often, while the lock was held by a thread
as a result of this mtx_lock(), did another thread try to acquire
the same mutex. This helps identify the causes of contention.
I'm currently exploring adding measurement of "time waited for the
lock", but the current implementation has proven useful to me so far
so I figured I'd commit it so others could try it out. Note that this
increases the size of mutexes when MUTEX_PROFILING is enabled, so you
might find you need to further bump UMA_BOOT_PAGES. Fixes welcome.
The once over: des, others
one which runs the actual update. This fixes a bug where there were
a delay in applying the frequency adjustment. In extreme cases this
could result in marginal stability of the kernel-pll.
full state. (When swap is added their state will change appropriately.)
2. Set swap_pager_full and swap_pager_almost_full to the full state when
the last swap device is removed.
Combined these changes eliminate nonsense messages from the kernel on swap-
less machines.
Item 2 submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Prodding by: phk
For some very unclear reason this device contains a FTDI 8U232AM USB->COM
adapter, but reports different device id than original 8U232AM. At the same
time, it reports vendor id of FTDI.
Sponsored by: Porta Software Ltd
MFC after: 2 weeks
Suggested by: nate
- get rid of "magick" values in code and make sysctl's reflecting reality
on processor versions which have one or another frequency "forbidden"
due to errata.
MFC after: 2 weeks
Suggested by: nate
- get rid of "magick" values in code and make sysctl's reflecting reality
on processor versions which have one or another frequency "forbidden"
due to errata.
PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after: 2 weeks
the user requests a read-only mount. This is necessary because we
don't do the VOP_OPEN again if they upgrade a read-only mount to
read-write.
Noticed by: bde
The uidinfo code appears to be MPSAFE, and is referenced without Giant
elsewhere. While this grab of Giant was only made in fairly rare
circumstances (actually GC'ing on refcount==0), grabbing Giant here
potentially introduces lock order issues with any locks held by the
caller. So this probably won't help performance much unless you change
credentials a lot in an application, and leave a lot of file descriptors
and cached credentials around. However, it simplifies locking down
consumers of the credential interfaces.
Bumped into by: sam
Appeased: tjr
to a new prison_complete() task run by a task queue. This removes
a requirement for grabbing Giant in crfree(). Embed the 'struct task'
in 'struct prison' so that we don't have to allocate memory from
prison_free() (which means we also defer the FREE()).
With this change, I believe grabbing Giant from crfree() can now be
removed, but need to check the uidinfo code paths.
To avoid header pollution, move the definition of 'struct task'
to _task.h, and recursively include from taskqueue.h and jail.h; much
preferably to all files including jail.h picking up a requirement to
include taskqueue.h.
Bumped into by: sam
Reviewed by: bde, tjr
Replace wrong check returned EFBIG with EOVERFLOW handling from POSIX:
36708 [EOVERFLOW] The file is a regular file, nbyte is greater than 0, the
starting position is before the end-of-file, and the starting position is
greater than or equal to the offset maximum established in the open file
description associated with fildes.
ffs_write:
Replace u_int64_t cast with uoff_t cast which is more natural for types
used.
ffs_write & ffs_read:
Remove uio_offset and uio_resid checks for negative values, the caller
supposed to do it already. Add comments about it.
Reviewed by: bde
recwin and sendwin. This removes a big source of confusion and makes
following the code much easier.
Reviewed by: sam (mentor)
Obtained from: DragonFlyBSD rev 1.6 (hsu)
rid's and to deallocate resources if a failure occurs during attach. This
patch also fixes the driver to return failure if bus_alloc_resource() for
the IRQ fails rather than panic'ing on the next line by passing a NULL
resource to bus_setup_intr(). The other attachments already do all this.
Submitted by: Jun Su <csujun@263.net>
In case no real/physical IEEE 802 address is available, both the expired
"draft-leach-uuids-guids-01" (section "4. Node IDs when no IEEE 802
network card is available") and RFC 2518 (section "6.4.1 Node Field
Generation Without the IEEE 802 Address") recommend (quoted from RFC
2518):
"The ideal solution is to obtain a 47 bit cryptographic quality random
number, and use it as the low 47 bits of the node ID, with the _most_
significant bit of the first octet of the node ID set to 1. This bit
is the unicast/multicast bit, which will never be set in IEEE 802
addresses obtained from network cards; hence, there can never be a
conflict between UUIDs generated by machines with and without network
cards."
Unfortunately, this incorrectly explains how to implement this and
the FreeBSD UUID generator code inherited this generation bug from
the broken reference code in the standards draft. They should instead
specify the "_least_ significant bit of the first octet of the node ID"
as the multicast bit in a memory and hexadecimal string representation
of a 48-bit IEEE 802 MAC address.
This standards bug arised from a false interpretation, as the multicast
bit is actually the _most_ significant bit in IEEE 802.3 (Ethernet)
_transmission order_ of an IEEE 802 MAC address. The standards authors
forgot that the bitwise order of an _octet_ from a MAC address _memory_
and hexadecimal string representation is still always from left (MSB,
bit 7) to right (LSB, bit 0).
Fortunately, this UUID generation bug could have occurred on systems
without any Ethernet NICs only.
no-op on {i386/alpha/ia64/sparc64} where chars are signed by
default. Should help ARM and S390 which also suffer from this.
Tested on: ppc, i386, objdump disasm before/after diffs
Reviewed by: obrien, bde (a while back)
use a bounce buffer for the actual transfer to avoid crossing a 64k
boundary. To do this, we malloc a buffer twice as big as we need and then
find an aligned block within that buffer to do the transfer. The check
to see which part of the block we use used the wrong variable for part of
the condition meaning that in certain edge cases we would ask the BIOS to
cross a 64k boundary. The BIOS request would then fail resulting in file
transfers that just magically fail in the middle without any apparent
reason. Specifically, my tests for the splitfs boot floppies managed to
trigger this edge case.
MFC after: 1 week
X-MFC-info: along with fixes to libstand filesystems
Presumably, at some point, you had to include jail.h if you included
proc.h, but that is no longer required.
Result of: self injury involving adding something to struct prison
is NULL, otherwise ipsec4_process_packet() may try to m_freem() a
bad pointer.
In ipsec4_process_packet(), don't try to m_freem() 'm' twice; ipip_output()
already did it.
Obtained from: netbsd
o For traps, the cr.iip register points to the next instruction to
execute on interrupt return (modulo slot). Since we need to get
the bundle of the instruction that caused the FP fault/trap, make
sure we fetch the previous bundle if the next instruction is in
fact the first in a bundle.
o When we call the FPSWA handler, we need to tell it whether it's
a trap or a fault (first argument). This was hardcoded to mean a
fault.
Also, for FP faults, when a fault is converted to a trap, adjust the
cr.iip and cr.ipsr registers to point to the next instruction. This
makes sure that the SIGFPE handler gets a consistent state.
active scan is completed just as WI_RID_READ_APS.
This fixes wicontrol -L for ath(4) and awi(4) to have results even if
the driver cannot associate any APs.
rev.1.1040. It is a miscellaneous isa+pci driver, but came back
described as a pci-only driver and placed in an i4b pci subsection
after its migration to /sys/conf/NOTES. Put it back where it used to
be, fully unsorted in the `Miscellaneous hardware' section. Reduced
nearby disorder in this section by moving configuration of the digi
driver to where it was for the old digiboard drivers, so that the
order at least matches the order in the table of contents.
- references to removed math emulators for NPX_DEBUG
- header for the null set of mandatory devices
- reference to the removed (and bogus when it existed) sysctl
kern.timecounter.method.
FIxed some nearby disorder (descriptions of CPU_BLUELIGHTNING_3X,
CPU_DIRECT_MAPPED_CACHE, CPU_DISABLE_CMPXCHG, CPU_DISABLE_SSE,
CPU_ELAN_XTAL and CPU_SOEKRIS, and options for all of these except
CPU_DIRECT_MAPPED_CACHE).
problem with using taskqueue_swi is that some of the things we defer
into threads might block for up to several seconds. This is an unfriendly
thing to do to taskqueue_swi, since it is assumed the taskqueue threads
will execute fairly quickly once a task is submitted. Reorganized the
locking in if_ndis.c in the process.
Cleaned up ndis_write_cfg() and ndis_decode_parm() a little.
CPU_ENABLE_TCC enables Thermal Control Circuitry (TCC) found in some
Pentium(tm) 4 and (possibly) later CPUs. When enabled and detected,
TCC allows to restrict power consumption by using machdep.cpuperf*
sysctls. This operates independently of SpeedStep and is useful on
systems where other mechanisms such as apm(4) or acpi(4) don't work.
Given the fact that many, even modern, notebooks don't work properly
with Intel ACPI, this is indeed very useful option for notebook owners.
Obtained from: OpenBSD
MFC after: 2 weeks
CPU_ENABLE_TCC enables Thermal Control Circuitry (TCC) found in some
Pentium(tm) 4 and (possibly) later CPUs. When enabled and detected,
TCC allows to restrict power consumption by using machdep.cpuperf*
sysctls. This operates independently of SpeedStep and is useful on
systems where other mechanisms such as apm(4) or acpi(4) don't work.
Given the fact that many, even modern, notebooks don't work properly
with Intel ACPI, this is indeed very useful option for notebook owners.
Obtained from: OpenBSD
MFC after: 2 weeks
without IFCAP_VLAN_HWTAGGING. The previous version of the
leading comment in this file could lead to the opposite conclusion.
Fix some typos in the comment as well.
ubd_devinfo_vp() is getting an empty string from its usbd_get_string()
call on the vendor, instead of NULL. This means usb_knowndevs in not
consulted.
Add lines between grabbing those char *s and the USBVERBOSE ifdef to
set vendor to NULL if it is the empty string (similarly for product).
This causes vendor to be filled-out, although the product name read
overrules usb_knowndevs (this appears to be a conscience decision made
by the NetBSD folks):
PR: kern/56097
Submitted by: Hal Burch <hburch@lumeta.com>
MFC after: 1 week
pain and suffering. Attempt to back it out by removing the 'if the
requested range is larger than the window, clip to the window' code.
This is a band-aide until the issues are better understood and the
issues with the lazy allocation patches are resolved.
Makes it possible to have multiple packet aliasing instances in a
single process by moving all static and global variables into an
instance structure called "struct libalias".
Redefine a new API based on s/PacketAlias/LibAlias/g
Add new "instance" argument to all functions in the new API.
Implement old API in terms of the new API.
This takes us a lot closer to refcounting dev_t.
This patch originally by cg@ with a few minor changes by me.
It is largely untested, but has been HEADSUP'ed twice, so presumably
people have not found any issues with it.
Submitted by: cg@
This prevents xpt_bus_register() from dereferencing NULL.
- Assign pointer to NULL after cam_sim_free().
Submitted by: Paul Twohey <twohey@CS.Stanford.EDU>
file has been removed, it should be purged from the cache, but it need
not be removed from the directory stack causing corruption; instead,
it will simply be removed once the last references and holds on it
are dropped at the end of the unlink/rmdir system calls, and the
normal !UN_CACHED VOP_INACTIVE() handler for unionfs finishes it off.
This is easily reproduced by repeated "echo >file; rm file" on a
unionfs mount. Strangely, "echo -n >file; rm file" didn't make
it happen.
- Unify the conditional assignments section so that architectural
exclusions come first, sorted, then options and !options, sorted
by the option name, also in directory order, then architecture
specific sections, sorted by the architecture name, with i386
being a traditional exception.
Prodded by: bde
According to the Windows DDK header files, KSPIN_LOCK is defined like this:
typedef ULONG_PTR KSPIN_LOCK;
From basetsd.h (SDK, Feb. 2003):
typedef [public] unsigned __int3264 ULONG_PTR, *PULONG_PTR;
typedef unsigned __int64 ULONG_PTR, *PULONG_PTR;
typedef _W64 unsigned long ULONG_PTR, *PULONG_PTR;
The keyword __int3264 specifies an integral type that has the following
properties:
+ It is 32-bit on 32-bit platforms
+ It is 64-bit on 64-bit platforms
+ It is 32-bit on the wire for backward compatibility.
It gets truncated on the sending side and extended appropriately
(signed or unsigned) on the receiving side.
Thus register_t seems the proper mapping onto FreeBSD for spin locks.
the definitions for NDIS_BUS_SPACE_IO and NDIS_BUS_SPACE_MEM logically
belong in hal_var.h. At least, that's my story, and I'm sticking to it.
Also, remove definition of __stdcall from if_ndis.c now that it's pulled
in from pe_var.h.
in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated
file descriptors which is used to locate available descriptors when a new
one is needed. It also moves the task of growing the file descriptor table
out of fdalloc(), reducing complexity in both fdalloc() and do_dup().
Debts of gratitude are owed to tjr@ (who provided the original patch on
which this work is based), grog@ (for the gdb(4) man page) and rwatson@
(for assistance with pxeboot(8)).
of adding the code to lock and unlock the vnodes and taking care
to avoid deadlock, simplify linux_emul_convpath() by comparing the
vnode pointers directly instead of comparing their va_fsid and
va_fileid attributes. This allows the removal of the calls to
VOP_GETATTR().
This gives +10% performance on simple tests, so definitly worth it.
A few percent more could be had by not using M_ZERO'd alloc's, but
we then need to clear fields all over the place to be safe, and
that was deemed not worth the trouble (and it makes life dangerous).
be sure to increment the refcount of the argument so it is not
prematurely deleted. This is a workaround and may appear in a different
form in ACPI-CA. This fixes battery evaluation on Thinkpads that was
broken by fixing the Dell battery state.
Submitted by: Luming Yu <luming.yu@intel.com>
of the functions in libkern. Without this, parts of the kernel would
reference a non-existent (undeclared and undefined) ffs() function; the
only reason this didn't break the kernel build is that gcc happens to
have a built-in ffs() and incorrectly fails to warn about the lack of
prototypes for built-in functions.
ithread_remove_handler() may fail to remove the interrupt handler if
it decides to let the ithread do the removal. The problem is that during
boot "cold" is set, which causes msleep() to return immediately. This
will cause ithread_remove_handler() to fail to wait for the ithread
to do the removal from the handler TAILQ before freeing the handler
back to the heap. Bad things will happen when some other user of the
TAILQ, such as ithread_add_handler() or the actual ithread attempts to use
the freed handler. Fix the problem by forcing ithread_remove_handler()
to do the actual removal itself if the "cold" flag is set.
Reviewed by: jhb
kmem_free(). Note: The FreeBSD-specific code in this file has been
subsumed by the FreeBSD-specific header file, pdq_freebsd.h. That header
file already specifies the use of contigmalloc() and contigfree(). Thus,
the purpose of this change is to avoid having nonsensical examples of
FreeBSD-specific memory allocation in our source tree.
the MacIO chip and PSIM's IOBus. Bus-specific drivers should
use the identify method to attach themselves to nexus so
interrupt can be allocated before the h/w is probed. The
'early attach' routine in openpic is used for this stage
of boot. When h/w is probed, the openpic can be attached
properly. It will enable interrupts allocated prior to
this.
and add_child entry point to allow devices to use the identify
method to add themselves if need be (e.g. openpic, syscons).
Export interrupt-controller-add routine for extern int cntlr drivers.
Eliminate recursive OFW device-tree walk and only iterate the
top-level ala sparc64. Allow child devices to set the device
type with write_ivars.
Step 1 of many in removing the hard-dependency on OpenFirmware.
map ranges that are smaller than what our resource manager code knows
is available, rather than requiring that they match exactly. This
fixes a problem with the Intel PRO/1000 gigE driver: it wants to map
a range of 32 I/O ports, even though some chips appear set up to
decode a range of 64. With this fix, it loads and runs correctly.
unexpected interrupts. If an interrupt is triggered and we're not
finished initializing yet, bail. If we have finished initializing,
but IFF_UP isn't set yet, drain the interrupt with ndis_intr() or
ndis_disable_intr() as appropriate, then return _without_ scheduling
ndis_intrtask().
In kern_ndis.c:ndis_load_driver() only relocate/dynalink a given driver
image once. Trying to relocate an image that's already been relocated
will trash the image. We poison a part of the image header that we
don't otherwise need with a magic value to indicate it's already been
fixed up. This fixes the case where there are multiple units of the
same kind of device.
count.
- Fix the twiddle output so that it actually spins.
- Save %cx around BIOS calls to read in sectors from the disc as at least
one BIOS trashes %cx when called to read off of a USB CD-ROM drive.
Submitted by: Martin Nilsson <martin@gneto.com>
MFC after: 1 week