functions less noisy: We printf if a new function took longer than
the previous record holder, or of the previous record holder took
more than twice as long as the current record.
to have the kernel switch to a new thread, instead of doing it in
userland. It is in fact needed on ia64 where syscall restarts do not
return to userland first. It's completely handled inside the kernel.
As such, any context created by the kernel as part of an upcall and
caused by some syscall needs to be restored by the kernel.
Be sure to shift (long)1 << 33 and higher, not (int)1. Otherwise bad
things happen(TM). This is why beast.freebsd.org paniced with ULE.
Reviewed by: jeff
mutex to be locked. It is redundant since em_init() is called and this
correctly locks the mutex and calls em_stop().
5.2 release candidate since this can cause a panic if the watchdog
expires.
Tested by: kuriyama
violated the constness were corrected before the freeze. This was
suggested by mdodd@, I think, and sam@ and others have signed off on
this if I recall my conversations with them correctly.
system super block after fsck has repaired the file system. The value of
fs_ronly was getting overwritten, which caused ffs_update() to attempt to
update inode timestamps even though the file system was still mounted
read-only.
This fixes the "giving up on N buffers" error that is triggered by running
fsck on the root file system and then rebooting without mounting the file
system read-write.
lots of old interfaces, and digi now supports all cards that dgb
supported. The author of the driver says that this is no longer
necessary.
Approved by: babkin@
a long time: lmc The LAN Media Corp PCI WAN driver based on tulip.
This driver hasn't compiled for 3 years since the PCI compat shims
were removed, and Lan Media appears to have gone out of business.
These cards appear to be rare (a recent search of ebay had no hits).
Should someone wish to revive this driver, submitting patches to make
it compile plus a testing report will bring it back.
or whose drivers haven't even compiled for years.
The loran hardware was very unique, and only a few copies of it ever
existed. It used the old COMPAT_ISA_DRIVER and when the author was
contacted, he indicated that he had no intention of ever updating this
driver and it was no longer relevant to the FreeBSD world and can be
removed without impact to anybody.
Approved by: phk
of doing a loop and taking two 32 bit passes at the runqueue bits. All
the 64 bit platforms should probably do this since there are 64 run queues.
Approved by: re (scottl)
used on amd64, and were actually totally broken. They had the wrong
calling conventions. I believe the i386 versions are going away too.
Approved by: re (scottl)
i386 version. The curthread special case in pcpu.h solves my complaint
about the verbose macro expansion in this case. Note that the i386
version still has some OBE comments, I didn't re-add them back again.
Approved by: re (scottl)
and the mpo_create_cred() MAC policy entry point to
mpo_copy_cred_label(). This is more consistent with similar entry
points for creation and label copying, as mac_create_cred() was
called from crdup() as opposed to during process creation. For
a number of policies, this removes the requirement for special
handling when copying credential labels, and improves consistency.
Approved by: re (scottl)
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
between vm_map and vnode locks is that vm_map locks are acquired first. In
revision 1.150 mmap(2) was changed to pass a locked vnode into vm_mmap().
This creates a lock-order reversal when vm_mmap() calls one of the vm_map
routines that acquires a vm_map lock. The solution implemented herein is
to release the vnode lock in mmap() before calling vm_mmap() and reacquire
this lock if necessary in vm_mmap().
Approved by: re (scottl)
Reviewed by: jeff, kan, rwatson
Update notes to reflect that cx is no longer a counted device
Update options for new cx option
# commented out ELAN_PPS and ELAN_XTAL since they produced errors
Submitted by: rik@cronyx.ru
Approved by: re@ <scottl>
- Add a really, really, nasty hack to provide stub versions of all of
the 'device apic' functions used by the ACPI MADT APIC enumerator if
'device apic' is not compiled into the kernel. This is gross but is
the best we can do with the current kernel linker implementation.
Approved by: re (scottl / blanket)
SI_SUB_CPU - 1 and probe enumerators, probe CPUs, and setup the local
APIC programming all at SI_SUB_CPU / SI_ORDER_FIRST. This is needed to
help get the ACPI module working again as it moves the APIC enumeration
code after SI_SUB_KLD.
- In the MADT parser, use mp_maxid rather than MAXCPU to terminate a loop
when assigning per-cpu ACPI IDs to avoid a dependency on 'options SMP'.
- Allow the apic device to be disabled via 'hint.apic.0.disabled' from the
loader. Note that since this is done in the local APIC code, it works
for both the ACPI and non-ACPI cases.
Approved by: re (scott / blanket)
- Dynamically allocate the cpu_softc[] array based on mp_maxid instead of
using a statically sized array that depended on 'options SMP'.
- Use mp_maxid rather than MAXCPU when walking all the CPUs looking for a
match.
- Always call smp_rendezvous() since UP kernels now provide this.
- Use mp_ncpus rather than cpu_ndevices when determining if we need to
disable C3 for SMP machines.
Approved by: re (rwatson)
Reviewed by: njl
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.
Approved by: re (scottl)
Tested on: i386, amd64, alpha
aid other kernel code, especially code which can be in a module such as
the acpi_cpu(4) driver, to work properly with both SMP and UP kernels.
The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and
the smp_rendezvous() function. This also means that CPU_ABSENT() is now
always implemented the same on all kernels.
Approved by: re (scottl)
This is the vastly updated cx drvier from Roman Kurakin <rik@cronyx.ru>
who has been patiently waiting for this update for sometime.
The driver is mostly a rewrite from the version we have in the tree.
While some similarities remain, losing the little history that the old
driver has is not a big loss, and the re@ felt it was easier this way (less
error prone).
The userland parts of this update will be committed shortly.
The driver is not connected to the build yet. I want to make sure I
don't break any platform at any time, so I want to test that with
these files in the tree before I continue (on the off chance I'm
forgetting a file).
I changed the DEBUG macro to CX_DEBUG from the code that was submitted
(to not break when we go to building with opt_global.h after the
release), as well adding $FreeBSD$.
Submitted by: Roman Kurakin
Approved by: re@ <scottl>
Before committing the initial tcp_hostcache I changed them from memcpy()
to conform with FreeBSD style without realizing the difference in argument
definition.
This fixes hostcache operation for IPv6 (in general and explicitly IPv6
path mtu discovery) and T/TCP (RFC1644).
Submitted by: Taku YAMAMOTO <taku@cent.saitama-u.ac.jp>
Approved by: re (rwatson)
on. MCOUNT and FAKE_MCOUNT() may clobber all the call-used registers,
and one FAKE_MCOUNT() was placed so that an active %eax was clobbered.
The fix is to move this FAKE_MCOUNT() earlier where it should have
been anyway.
Fixed 3 layers of bitrot in the comment about why this FAKE_MCOUNT()
was where it was by removing the comment. (mcount() should be called
as early as possible after entering a new level, but an implementation
detail got in the way until 3 layers of changes ago.)
Kernel profiling still gives wrong results because the new interrupt
code rearranged object files too much. mcount() depends on trap,
syscall and interrupt handlers being between certain magic labels with
interrupt handlers last, and on nothing else being there. Splitting
up exception.o moved the magic labels to effectively random places
relative to what they are supposed to delimit. This mainly broke the
call graph; the flat profile is still usable.
code. Both the driver and the new code were wrong. Driver interrupt
handlers are supposed to take "void *vsc" arg, but some including all
COMPAT_ISA drivers and the pci part of the cy driver want an "int unit"
arg. They got this using bogus casts of function pointers which should
have kept working despite their bogusness. However, the new interrupt
code doesn't honor requests to pass an arg of ((void *)0), so things
are very broken if the arg is actually a representation of unit 0.
The fix is to use a normal "void *vsc" arg for the pci case and a
wrapper for the COMPAT_ISA case (of the cy driver). This cleans up
new-busification of the pci case but takes the COMPAT_ISA case a little
further from new-bus. The corresponding bug for the COMPAT_ISA case
has already been fixed similarly using a wrapper in compat_isa.c and
we need another wrapper just to undo that.
Fixed some directly related style bugs (mainly by removing compatibility
cruft).
cy.c:
Fixed an indirectly related old bug in cyattach_common(). A wrong status
was returned in the unlikely event that malloc() failed.
Approved by: re (scottl)
issues which they found and asked to be changed so 3ware can offcially
support the driver.
Summary of the most significant changes:
- TWE_OVERRIDE is no longer supported
- If twe_getparam failed, bogus data would be returned to the caller
- Cache the device unit in the twe_drive structure to aid debugging
- Add the 3ware driver version.
- Proper return error codes for many functions.
- Track the minimum queue length statistics
- 4.x compat: use the cached unit number from the twe_drive structure
instead of the the cached si_drv2. 3ware found that after many loads
and unloads that si_drv2 became corrupted. This did not happen in
-current.
Submitted by: Vinod Kashyap (with modifications by me)
Approved by: re (rwatson)
o Back out workaround for not resetting lucent cards more than once. With
these fixes, it appaers they are no longer necessary.
o Set wi_gone when the card goes awol: typically when we get 0xffff back from
the card. Also, don't interact with a card that's gone, so we fail in
seconds rather than minutes. Also reduce amount of time we wait to .5s
in wi_cmd.
o clear wi_gone on ifconfig down to give some cards a chance after they wedge
(this appears to unwedge one of my prism cards with old firmware). ifconfig
up will fail quickly enough if the card really is out to lunch.
o Add delay in wi_init of 100ms.
o wi_stop(ifp, 0->1) changes so that we clear sc_enabled so that we
exit out of the interrupt routine by just acking the interrupt
Submitted by: iedowse
Approved by: re@ (scottl)
# after the freeze I'll fix some of the minor style issues that reviewers
# of this patch have told me about.
code is compiled in to support the O_IPSEC operator. Previously no
support was included and ipsec rules were always matching. Note that
we do not return an error when an ipsec rule is added and the kernel
does not have IPsec support compiled in; this is done intentionally
but we may want to revisit this (document this in the man page).
PR: 58899
Submitted by: Bjoern A. Zeeb
Approved by: re (rwatson)
to sendfile(2) being erroneously automatically restarted after a signal
is delivered. Fixed by converting ERESTART to EINTR prior to exiting.
Updated manual page to indicate the potential EINTR error, its cause
and consequences.
Approved by: re@freebsd.org
using critcal_enter() and critical_exit() to attempt to replace spl*()
calls. The critical section was calling selrecord(), which locks an
MTX_DEF mutex, which is not legal in a critical section.
Tested by: Stefan Ehmann <shoesoft@gmx.net> and "make universe"
Approved by: re (scottl)
forced unmount case. Otherwise, a file system that is referenced
only by process fd_cdir/fd_rdir references to the file system root
vnode will be successfully unmounted without the MNT_FORCE flag.
The previous behaviour was not compatible with the unmount semantics
required by amd(8), so file systems could be unexpectedly unmounted
while there were still references to the file system root directory.
Reported by: Erez Zadok <ezk@cs.sunysb.edu>
Approved by: re (scottl)
The altio resource magic no longer worked probably due to other changes
in the kernel. Redo that part so it also fits better into ATAng.
Fix detach so it doesn't panic the system when a pccard device is
yanked.
Approved by: re@
was equal to MAXCPU, we would overrun the pcpu_mtx array because maxcpu
was calculated incorrectly.
- Add some more debugging code so that memory leaks at the time of
uma_zdestroy() are more easily diagnosed.
Approved by: re (rwatson)
o fix race condition when processing rx descriptors: because we use
a self-linked descriptor at the end of the rx descriptor list to
avoid rx overruns (which can easily happen for 5212 parts that enable
PHY errors) we must carefully check that a descriptor is "done" by
looking ahead to the next descriptor before believing the done bit
in the current descriptor (this is all handled in the HAL since the
rx descriptor format is chip-specific so we need to pass in two
additional parameters--the physical address of the current descriptor
and the virtual address of the next descriptor in the list)
o check copyout return status for SIOCGATHSTATS ioctl
Approved by: re (scottl)
o support for 5112 and 2112 radios on 5212-based products
o revised interface for ah_procRxDesc needed to handle a race
condition created with the use of self-linked rx descriptors
o support for setting the MAC address
o remove some unused methods from the public API
o revised diagnostic API (replace dump* methods with getDiagState)
o const'ify set key cache method parameters
o support for optional 32khz sleep clock
o implement ah_setSlotTime for 5211 parts
o ANI improvements for 5212 parts
Approved by: re (scottl)
mpf are allocated on the stack, which causes this check to falsely trigger.
A new check which takes on-stack mbufs into account will be reintroduced
after 5.2 is out the door.
Approved by: re (watson)
Requested by: many
When the hostcache bucket limit is reached the last bucket wasn't
removed from the bucket row but inserted a few lines later at the
bucket row head again. This leads to infinite loop when the same
bucket row is accessed the next time for a lookup/insert or purge
action.
Tested by: imp, Matt Smith
Approved by: re (rwatson)
this problem put these lines back in. While they should be
unnecessary, they appear to be sometimes necessary.
Reviewed in concept: dfr
Approved by: re (scottl@)
caused crashes, typically during shutdown, because the second free
referenced a mutex that had been destroyed.
Tested by: several
Approved by: re (scottl)
Make it possible to configure GPIO pins as led(4) devices, PPS inputs
and PPS-echo outputs with a sysctl. Led(4) and PPS-echo can be configured
for active-high or active-low.
Be more complete in initialization of timecounter hardware.
Approved by: re@
them working (cache, automatic rebuild and hotswap) the FFDC
info (First Failure Data Capture) on the adapter must be
initialised.
Logical drives in critical/degraded states weren't added to
the drive list. FreeBSD was not able to see a degraded array
after a reboot. Degraded drives are now also added to the drivelist
and the state of the logical drive is given at boottime.
The adapter type is detected from informations in nvram page 5
and displayed at boottime.
Change IPS_OS_FREEBSD definition from 10 to 8 according to IBM
specs.
Submitted by: <Patrick Guelat> pgfb@imp.ch
Reviewed by: mbr, scottl
Approved by: re
zeroed. Doing a bzero on the entire struct route is not more
expensive than assigning NULL to ro.ro_rt and bzero of ro.ro_dst.
Reviewed by: sam (mentor)
Approved by: re (scottl)
idx'th present CPU with pc_acpi_id equal to *acpi_id. If *acpi_id
does not match that processor's pc_acpi_id, return the value for
ProcId derived from the MADT in *acpi_id. If pc_acpi_id is 0xffffffff,
always override it with the value of *acpi_id. Finally, return
pc_cpuid in *cpu_id and use that as our primary key.
* Use pc_cpuid as our unique key because we know it is valid since
MD code set it. The values for ProcId in the ASL and MADT don't
match up on some machines (!), forcing us to fall back to ordered
probing in that case.
* Remove some #ifdef SMP since the refcount doesn't hurt performance
and will be needed for dynamic _CST objects. Only one #ifdef SMP
(for smp_rendezvous) remains.
* Hook up SMP in the compile flags in the Makefile.
Tested by: marcel, truckman
Approved by: re (scottl)
uncovering some interesting problems. Be conservative and effecitvely
disable this by default. Interested parties may still define
KERNBUILDDIR by hand to achive the same effect.
I plan on referting this change after 5.2 is released, or sooner if
the issues with building releases are resolved and re@ approves.
Approved by: re@ (scottl, marcel)
transfer descriptors when a large request needs to be split into
more than one 8k chunk. The bug was that the calculation did not
take into account the offset of the chunk within the overall request.
This is reported to fix crashes and data corruption on ohci
controllers.
Submitted by: green
Approved by: re
for ipfw processing w/o an indication the packets were generated
by ipfw--and so should not be processed (this manifested itself
as a LOR.) The flag bit in the mbuf that was used to mark the
packets was not listed in M_COPYFLAGS so if a packet had a header
prepended (as done by IPsec) the flag was lost. Correct this by
defining a new M_PROTO6 flag and use it to mark packets that need
this processing.
Reviewed by: bms
Approved by: re (rwatson)
MFC after: 2 weeks
rtalloc_ign() in in_pcbconnect_setup() before it is filled out.
Otherwise, stack junk would be left in sin_zero, which could
cause host routes to be ignored because they failed the comparison
in rn_match().
This should fix the wrong source address selection for connect() to
127.0.0.1, among other things.
Reviewed by: sam
Approved by: re (rwatson)
and the nfs3 client. Also fix some bugs that happen to be causing crashes
in both v3 and v4 introduced by the v4 import.
Submitted by: Jim Rees <rees@umich.edu>
Approved by: re
the MTRR Base/Mask registers. If you use the documented algorithm in the
systems programming guide, you'll get a GPF. The only thing that has
prevented this so far is that the bios pre-sets some MTRR entries which
we mis-interpreted sufficiently to fool the memcontrol interface into
thinking all the address space was taken and therefore rejected XFree86's
requests. However, not all bioses do this.. You get an insta-panic in
that case. Grrr. A better fix (dynamic mask) will happen by 5.3/5-stable
so that we automatically adapt to more than 40 physical bits.
Approved by: re (scottl)
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.
Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)
delete of objects. Also revert our temporary workaround in dsmthdat.c
that always copied objects. This is the correct fix for errors
evaluating _BST (and GBST) on IBM Thinkpads where an argument (Arg3)
was returned to the caller and the object was freed while still in use.
This will be in a future ACPI-CA dist.
Thanks to: kochi@netbsd.org, shaohua.li@intel.com
fixes an interrupt storm for certain users. This is done on the vendor
branch since the code is already in the 20031029 ACPI-CA dist and will
be imported after 5.2R.
Tested by: sebastian ssmoller <sebastian.ssmoller@gmx.net>
PR: i386/57909
Approved by: re (jhb)
different kernel to boot with kernel="NAME" would load the kernel and
loader.conf-selected modules from /boot/NAME, but it would not change
module_path. So, for instance, the automatically loaded acpi.ko would come
from /boot/kernel/acpi.ko, *always*.
Mind you, this happened for unassisted boot. If you interrupted, typed
"unload" and then "boot NAME", it would Do The Right Thing.
The source of the problem is the double initialization with beastie's
loader.rc. One would happen inside "start", and would load the kernel. The
next one would happen later in the loader.rc script, resetting module_path.
Because module_path is set to the Right Value by the functions in support.4th
that actually load the kernel, when beastie.4th proceeded to boot
module_path would remain wrong, as the kernel was already loaded.
This can be corrected by removing either initialization, and also by changing
the command used by beastie.4th from "boot" to "boot-conf", which makes sure
you use the right kernel and modules.
I chose to remove the second initialization, since this let you interrupt
(or confirm) boot before beastie even comes up. I avoid also doing the
boot-conf change because that would simply cause the kernel and modules to
be loaded twice (in fact, that was my original patch, until, in writing this
very commit message, I saw the error of my ways).
This commit changes the semantics of module loading when using the beastie
menu. Now it does what one would expect it to, but not what it was actually
doing, so something may break for unusual setups depending on broken
behavior. As our japanese friends so nicely put it, shikata ga nakatta. :-)
Approved by: re (scottl)
known samples of broken chipsets that needed mixed mode in the first place
are so broken (ie: locks up) that we can't use IO APIC mode at all and it
needs to be turned off in the bios. So, the MIXED_MODE penalty on the
good chipsets gained nothing.
Approved by: re (scottl)
the compiler having to parse and optimize the PCPU_GET(curthread) so often.
__curthread() is an inline optimized version of PCPU_GET(curthread) that
knows that pc_curthread is at offset zero in the pcpu struct. Add a
CTASSERT() to catch any possible changes to this. This accounts for
just over a 1% wall clock speedup for total kernel compile/link time,
and 20% compile time speedup on some specific files depending on which
compile options are used.
Approved by: re (jhb)
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.
It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.
tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.
It removes significant locking requirements from the tcp stack with
regard to the routing table.
Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)
boot-disabled devices instead of skipping the last interrupt. This is
especially important for devices that only have one interrupt as this
bug was keeping any interrupt from being tried at all.
Reviewed by: msmith
Approved by: re (scottl)
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.
It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.
tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.
It removes significant locking requirements from the tcp stack with
regard to the routing table.
Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)
accordingly. The define is left intact for ABI compatibility
with userland.
This is a pre-step for the introduction of tcp_hostcache. The
network stack remains fully useable with this change.
Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)
on SMP systems has a chance of working. This was a loose end of the
implementation of the ACPI Cx idle states. Since our logical CPU Id
is the ACPI processor Id, we do not need to jump through hoops to
obtain it.
Approved: re@ (jhb)
happen in interrupt context; 1) sleep locks, and 2) malloc/free
calls.
1) is fixed by using spin locks instead.
2) is fixed by preallocating a FIFO (implemented with a STAILQ)
and using elements from this FIFO instead. This turns out
to be rather fast.
OK'ed by: re (scottl)
Thanks to: peter, jhb, rwatson, jake
Apologies to: *
an acpi_cpu method for shutdown that disables entry to acpi_cpu_idle
and then IPIs/waits for threads to exit. This fixes a panic late in
reboot in the SMP case.
* In the !SMP case, don't use the processor id filled out by the MADT
since there can only be one processor. This was causing a panic in
acpi_cpu_idle if the id was 1 since the data was being dereferenced from
cpu_softc[1] even though the actual data was in cpu_softc[0] (which is
correct).
* Rework the initialization functions so that cpu_idle_hook is written
late in the boot process.
* Make the P_BLK, P_BLK_LEN, and cpu_cx_count all softc-local variables.
This will help SMP boxes that have _CST or multiple P_BLKs. No such
boxes are known at this time.
* Always allocate the C1 state, even if the P_BLK is invalid. This means
we will always take over idling if enabled. Remove the value -1 as
valid for cx_lowest since this is redundant with machdep.cpu_idle_hlt.
* Reduce locking for the throttle initialization case to around the write
to the smi_cmd port. Add disabled code to write the CST_CNT. It will
be enabled once _CST re-evaluation is tested (post 5.2R).
Thank you: dfr, imp, jhb, marcel, peter
Tested by: rwatson, Harald Schmalzbauer <h@schmalzbauer.de>
Approved by: re (rwatson)
occurs when kmem_malloc() fails to allocate a sufficient number of vm
pages. Specifically, we avoid the lock-order reversal by not grabbing
Giant around pmap_remove() if the map is the kmem_map.
Approved by: re (jhb)
Reported by: Eugene <eugene3@web.de>
- turn on SMP in generic
- add 'device atpic' - this is unconditional on i386, but certain nvidia
based systems need to disable acpi because the reference bios seems to be
hosed. If acpi is disabled, we won't find the apic. amd64 has the
mptable code in a seperate compile option as well.
- turn sym back on, it doesn't fail to compile anymore.
Approved by: re
source count pointers at them so that intr_execute_handlers() won't
choke when it tries to handle an unregisterd ATPIC interrupt source.
- Install the low-level ATPIC interrupt handlers when we first program the
ATPIC in atpic_startup() rather than at SI_SUB_INTR. This is only
necessary to work around buggy code that enables interrupts too early
in the boot process (namely, the vm86 code).
Approved by: re (rwatson)
regocnized as such at the time. Now that the other bogons in the
tree have been fixed, we can remove this ugly kludge.
o Remove stale/bogus opt_foo.h files. These are left over from
by-gone resources. And they point to the need, yet again, to
improve the build system so meta information is only in one place.
Submitted by: ru
Reviewed by: bde
Approved by: re@ (jhb)
purpose and the resulting vattr structure was ignored. In addition,
the VOP_GETATTR call was made with no vnode lock held, resulting in
vnode locking violation panic with debug kernels.
Reported by: truckman
Approved by: re@ (rwatson)
In practice it seems that in situations of high packet loss the ACK
timeout seems to hit this maximum (perhaps inappropriately, but the
estimation algorithm is not perfect, so apparently it happens). In
any case, 10 seconds is way too high a value so lower to 1 second.
MFC after: 3 days
the "old" SYSINIT. This makes sure things happen in the right order.
XXX: md(4) needs to be fully geom-ified and in particluar /dev/md.ctl
should be abandonded for the GEOM OaM api.
Approved by: re@
rather than right before and right after. This allows these routines
to manipulate the mesh.
KASSERT that nobody creates a geom on an alien class.
Assert topology in g_valid_obj().
Approved by: re@
TOC's for the same media!! that borks up GEOM.
Although this looks like bad HW the following patch removes the
chance for GEOM panic'ing.
Approved by: re@
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
o Each source gets its own queue, which is a FIFO, not a ring buffer.
The FIFOs are implemented with the sys/queue.h macros. The separation
is so that a low entropy/high rate source can't swamp the harvester
with low-grade entropy and destroy the reseeds.
o Each FIFO is limited to 256 (set as a macro, so adjustable) events
queueable. Full FIFOs are ignored by the harvester. This is to
prevent memory wastage, and helps to keep the kernel thread CPU
usage within reasonable limits.
o There is no need to break up the event harvesting into ${burst}
sized chunks, so retire that feature.
o Break the device away from its roots with the memory device, and
allow it to get its major number automagically.
to see_other_uids but with the logical conversion. This is based
on (but not identical to) the patch submitted by Samy Al Bahra.
Submitted by: Samy Al Bahra <samy@kerneled.com>
more than one sf_buf for one vm_page. To accomplish this, we add
a global hash table mapping vm_pages to sf_bufs and a reference
count to each sf_buf. (This is similar to the patches for RELENG_4
at http://www.cs.princeton.edu/~yruan/debox/.)
For the uninitiated, an sf_buf is nothing more than a kernel virtual
address that is used for temporary virtual-to-physical mappings by
sendfile(2) and zero-copy sockets. As such, there is no reason for
one vm_page to have several sf_bufs mapping it. In fact, using more
than one sf_buf for a single vm_page increases the likelihood that
sendfile(2) blocks, hurting throughput.
(See http://www.cs.princeton.edu/~yruan/debox/.)
- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
(in particular, bootstrap)
- This is still a WIP. It seems that there are some highly bogus bioses
on nVidia nForce3-150 boards. I can't stress how broken these boards
are. I have a workaround in mind, but right now the Asus SK8N is broken.
The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE. SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
atpic' in addition, because they somehow managed to forget to connect the
8254 timer to the apic, even though its in the same silicon! ARGH!
This directly violates the ACPI spec.
with multiple ports on a shared interrupt demultiplexed by the puc_intr()
handler.
siointr1() first read as much input as possible and then checked all
possibly-relevant status registers, partly for robustness and partly
for historical reasons. This is very bad if it is called for every
port sharing an interrupt like puc_intr() does. It can spend too long
reading all the input for some ports when the interrupt is for a more
urgent event on another, or just too long checking all the status
registers when there are lots of ports. The inter-character time is
too long for reading all the input even when the interrupt is for a
transmitter interrupt on the same port, and at 921600 bps the inter-char
time is 10.85 usec and was often exceeded with just 2 ports, leaving
the transmitters idle for about 6% of the time.
The tweak is to break out of the read loop after reading 1 char if
output can be done. This avoids most of the idle transmitter time for
2 active ports at 921600 bps bidirectional on the test system. It
also reduces overhead by about 20%. More complete fixes use the
programmable tx low watermark on 16950's and reduce overhead by another
65%.
do not have mh_nextpkt initialized. Somtimes what's there is "1", and the
ip_input() code pukes trying to m_free() it, rendering divert sockets and
such broken.
This really underscores the need to get rid of MT_TAG.
Reviewed by: rwatson
is the warning that points to the bug in `(char *)malloc(...)' where
malloc() is implicitly declared as returning int. We do similar things
here, but they work because u_int is the same as uintptr_t on i386's.)
system calls, and prefer these calls over getsockopt()/setsockopt()
for ABI reasons. When addressing UNIX domain sockets, these calls
retrieve and modify the socket label, not the label of the
rendezvous vnode.
- Create mac_copy_socket_label() entry point based on
mac_copy_pipe_label() entry point, intended to copy the socket
label into temporary storage that doesn't require a socket lock
to be held (currently Giant).
- Implement mac_copy_socket_label() for various policies.
- Expose socket label allocation, free, internalize, externalize
entry points as non-static from mac_net.c.
- Use mac_socket_label_set() in __mac_set_fd().
MAC-aware applications may now use mac_get_fd(), mac_set_fd(), and
mac_get_peer() to retrieve and set various socket labels without
directly invoking the getsockopt() interface.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
will now need editing except for spot checks.
Changed this buffer from a circular one to a linear one. This is more
useful for some cases and the sysctl that prints it doesn't support
circular buffers.
Fixed (output) formatting bugs in this sysctl. An off by 1 error caused
a garbage byte to be returned after annotation of large deltas, and
a race with the writer sometimes caused premature string termination.
o when compiling lint, undefine certain things and redefine them so that the
driver doesn't #error out. Since lint kernels aren't supposed to be
bootable, I'm no troubled by this breakage.
This fixes the tinderbox
Suggested by: rwatson
Approved by: bms
SO_PEERLABEL. This provides an interface to query the label of a
socket peer without embedding implementation details of mac_t in
the application. Previously, sizeof(*mac_t) had to be specified
by an application when performing getsockopt().
Document mac_get_peer(3), and expand documentation of the other
mac_get(3) functions. Note that it's possible to get EINVAL back
from mac_get_fd(3) when pointing it at an inappropriate object.
NOTE: mac_get_fd() and mac_set_fd() support for sockets will
follow shortly, so the documentation is slightly ahead of the
code.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
mac_setsockopt_label() into mac_socket_label_set(); make it non-static
so that it can be invoked from kern_mac.c for mac_set_fd().
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
While we end up the same place, we end up with two different CS register
values after the jump and 0xf000 is compatible with the hardware reset
value.
This makes a difference if the BIOS does a near jump before a far jump.
Detective work and patch by: Adrian Steinmann <ast@marabu.ch>
- improve sysinfo(2) syscall;
- add dummy fadvise64(2) syscall;
- add dummy *xattr(2) family of syscalls;
- add protos for the syscalls 222-225, 238-249 and 253-267;
- add exit_group(2) syscall, which is currently just wired to exit(2).
Obtained from: OpenBSD
MFC after: 2 weeks
Its restoration in rev.1.102 was mistranslated to the equivalent of
setsofttty() in rev.1.105. This increased overheads by causing a
context switch to the SWI handler after almost every interrupt. The
increase was approx. 50% on a Celeron 366 (from 23 usec to 34 usec
per interrupt).
I'm having bad luck with different parts of the sys tree being checked
out at slightly different times. Back it out, noting it doesn't cause
harm in any case. Tinderbox also makes these things more fun.
of newfs, to signify the newfs operation has not yet completed. Re-
write the superblock with the correct magic number once all of the
cylinder groups have been created to show the operation has finished.
Sponsored by: St. Bernard Software
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.
sure to sooptcopyin() the (struct mac) so that the MAC Framework
knows which label types are being requested. This fixes process
queries of socket labels.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
opt_ddb.h. These changes expand green's work of including
opt_global.h to prefer opt files in the kernel directory. Further
refinement might be needed, but I think this is good.
Note: While this is a step on the path to moving the meta information
about modules into the config files, it doesn't actually do that. It
just pulls in the opt files in a way that allows one to build
'generic' modules outside the tree.
disposing fifo resources in fifo_cleanup() instead using of
"vp->v_usecount == 1". There may be other references to the vnode, for
instance by nullfs, at the time fifo_open() or fifo_close() is called,
which could cause a resource leak.
Don't bother grabbing the vnode interlock in fifo_cleanup() since it no
longer accesses v_usecount.
the nfsv4 files. It is intended to be a short-term bridge while
alfred deals with the problem in a better way (eg, don't hesitate to
back this out when the real fix comes along). I've not heard back
from alfred in a few hours and other people are hitting this problem.
Approved by: markm, rwatson, grog, murray
problem, but the spamage of consoles is really bad. Until we can get
this to be less chatty, disable it so people can boot. The badness of
the spamage is worse than the badness that it reports :-(. Once the
underlying problems have been fixed, it can be reenabled.
Approved by: kken, markm, rwatson, grog, murray
* Use the cpu_idle_hook() to do idling for C1-C3.
* Use both _CST and the FADT to detect Cx states.
* Use both _PTC and P_CNT for controlling throttling.
* Add a notify handler to detect changes in _CST and _PSS
* Call the _INI function for each processor if present. This will be
done by ACPI-CA in the future.
* Fix a bug on SMP systems where CPUs will attach multiple times if the
bus is rescan.
* Document new sysctls for controlling idling.
and remove two unneccessary variable initializations.
Make the introduction comment more clear with regard which parts of
the packet are touched.
Requested by: luigi
value as reserved for internal use in boot blocks, because RB_PAUSE
broke binary compatibility by usurping the RB_DUAL flag. Probably no
one except me has boot blocks for which this matters, since most boot
blocks based on biosboot including pc98's boot2 can't boot elf kernels,
and /boot/loader doesn't properly pass flags set by the previous stage.
reboot.h:
Also mark the historical RB_PROBEKBD flag (0x80000) as reserved for
internal use in boot blocks.
boot2.c:
Added comments to inhibit usurping of other flags.
Approved by: guido, imp
MFC after: 1 week
on non-VCHR vnodes. This fixes a panic when reading data from files on a
filesystem with a small (less than a page) block size.
PR: 59271
Reviewed by: alc
kses from the run queues. Also, on SMP, we track the transferable
count here. Threads are transferable only as long as they are on the
run queue.
- Previously, we adjusted our load balancing based on the transferable count
minus the number of actual cpus. This was done to account for the threads
which were likely to be running. All of this logic is simpler now that
transferable accounts for only those threads which can actually be taken.
Updated various places in sched_add() and kseq_balance() to account for
this.
- Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're
really doing. The load is accounted for seperately from the runq because
the load is accounted for even as the thread is running.
- Fix a bug in sched_class() where we weren't properly using the PRI_BASE()
version of the kg_pri_class.
- Add a large comment that describes the impact of a seemingly simple
conditional in sched_add().
- Also in sched_add() check the transferable count and KSE_CAN_MIGRATE()
prior to checking kseq_idle. This reduces the frequency of access for
kseq_idle which is a shared resource.
- missing parenthesization of some macro args
- point of do-while(0) hack defeated by putting a semicolon after while(0)
Fixed some style bugs in macros:
- not splitting the line when the macro value cannot be lined up in much
the same macros that didn't parenthesize their args
- braces around a 1-line statement
- do-while(0) hack not indented in the usual way in the same macros that
defeated its point.
complex locking and rework ip_rtaddr() to do its own rtlookup.
Adopt all its callers to this and make ip_output() callable
with NULL rt pointer.
Reviewed by: sam (mentor)
- For acpi_pci_link_entry_dump(), add a few helper functions to display
the trigger mode, polarity, and sharemode of an individual IRQ resource.
These functions are then called for both regular and extended IRQ
resources.
- In acpi_pci_link_set_irq(), use the same type of IRQ resource
(regular vs. extended) for the new current resource as the type of
the resources from _PRS.
- When routing an interrupt don't ignore extended IRQ resources. Also,
use the same type of IRQ resource (regular vs. extended) for the new
current resource when as the type of the resource from _PRS.
Tested by: peter
This fixes a dependency of mac_label.c on namespace pollution in
<vm/uma.h>.
Similarly for SYSCTL_DECL() although I had no problems with it. This
probably makes some includes of <sys/sysctl.h> bogus.
longer uses these interrupt vectors for its ISA interrupt pins, so these
entries will not be overwritten. If we get a spurious interrupt from the
ATPIC when using the APIC, it will be treated as a stray interrupt instead
of causing a panic.
Short description of ip_fastforward:
o adds full direct process-to-completion IPv4 forwarding code
o handles ip fragmentation incl. hw support (ip_flow did not)
o sends icmp needfrag to source if DF is set (ip_flow did not)
o supports ipfw and ipfilter (ip_flow did not)
o supports divert, ipfw fwd and ipfilter nat (ip_flow did not)
o returns anything it can't handle back to normal ip_input
Enable with sysctl -w net.inet.ip.fastforwarding=1
Reviewed by: sam (mentor)
of depending on namespace pollution 2 layers deep in <vm/uma.h>. Fixed
most nearby include messes (another like this, several the opposite of
this, and some formatting).
be printed, if the module were loaded into a kernel which had INET6 enabled.
The gre(4) driver does not use INET6, nor is it specified for IPv6. The
tunnel_status() function in ifconfig(8) is somewhat overzealous and assumes
that all tunnel interfaces speak KAME ifioctls.
This fix follows the path of least resistance, by teaching gre(4) about
the two KAME ifioctls concerned.
PR: bin/56341
- Move the IPI and local APIC interrupt vectors up into the 0xf0 - 0xff
range. The pmap lazyfix IPI was reordered down next to the TLB
shootdowns to avoid conflicting with the spurious interrupt vector.
- Move the base of APIC interrupts up 16 so that the first 16 APIC
interrupts do not overlap the vectors used by the ATPIC.
- Remove bogus interrupt vector reservations for LINT[01].
- Now that 0xc0 - 0xef are available, use them for device interrupts.
This increases the number of APIC device interrupts to 191.
- Increase the system-wide number of global interrupts to 191 to catch up
to more APIC interrupts.
Requested by: peter (2)
the packets are immediately returned for sending (e.g. when bridging
or packet forwarding). There are more efficient ways to do this
but for now use the least intrusive approach.
Reviewed by: imp, rwatson
in exit1(), make sure the p_klist is empty after sending NOTE_EXIT.
The process won't report fork() or execve() and won't be able to handle
NOTE_SIGNAL knotes anyway.
This fixes some race conditions with do_tdsignal() calling knote() while
the process is exiting.
Reported by: Stefan Farfeleder <stefan@fafoe.narf.at>
MFC after: 1 week
- In the receive routine handle the case where last descriptor could have
less than 4 bytes of data.
- Handle race between detach/ioctl routine.
MFC after: 3 days
kernel build. This makes it possible for me not to get pissed off that
random.ko crashes the system trying to rdtsc() when the i386/cpu.h
support code decides it's okay to call that op when neither I386_CPU or
I486_CPU is defined. I guess it also makes WITNESS/INVARIANTS defines
get picked up by the modules.
vnode of the parent. However, this check should not be performed if
the lookup failed. This change should fix "union_lookup returning
. not same as startdir" panics people were seeing. The bug was
introduced by an incomplete import of a NetBSD delta in rev 1.38.
- Move the aforementioned check out from DIAGNOSTIC. Performance
is the least of our unionfs worries.
- Minor reorganization.
PR: 53004
MFC after: 1 week
- Return EBUSY if the region was wired by mlock(2) and MS_INVALIDATE
is specified to msync(2). This is required by the Open Group Base
Specifications Issue 6.
- vm_map_sync() doesn't return KERN_FAILURE. Thus, msync(2) can't
possibly return EIO.
- The second major loop in vm_map_sync() handles sub maps. Thus,
failing on sub maps in the first major loop isn't necessary.
any functions that call them. Calling proc_rwmem() with the proc lock
held is not safe. Currently, we're protected from any races by Giant.
Eventually proc_rwmem() should require the proc lock and not Giant.
parts of ptrace using proc_rwmem(). proc_rwmem() requires giant, and
giant must be acquired prior to the proc lock, so ptrace must require giant
still.
multicast hash are written. There are still two distinct algorithms used,
and there actually isn't any reason each driver should have its own copy
of this function as they could all share one copy of it (if it grew an
additional argument).
Give the HZ/overflow check a 10% margin.
Eliminate bogus newline.
If timecounters have equal quality, prefer higher frequency.
Some inspiration from: bde
5212-based devices because PHY errors are used to collect data
on environmental noise that and doesn't truly reflect the state
of the communications media. The result is confused users.
Folks that want to watch PHY errors can still get the statistics
through the device ioctl (used by athstats).
o reject scan requests for a device that isn't marked up
This fixes a problem where requesting a scan before marking the device
up would cause a panic because the current channel was set to "any" (0xffff).
o correct a read-lock assert in in_pcblookup_local that should be
a write-lock assert (since time wait close cleanups may alter state)
Supported by: FreeBSD Foundation
preemption two CPUs can be in the same function at the same time
and clobber each others variables. Remove register declaration
from local variables.
Reviewed by: sam (mentor)
and empty its turnstile while the blocking threads still pointed to the
turnstile. If the thread on the first CPU blocked on a lock owned by
one of the threads blocked on the turnstile just woken up, then the
first CPU could try to manipulate a bogus thread queue in the turnstile
during priority propagation.
- Update locking notes for ts_owner and always clear ts_owner, not just
under INVARIANTS.
Tested by: sam (1)
deleted in 1.81. Increase the initial timeout limit to 2ms to
eliminate spurious messages of excessive timeouts in the NFS
client code.
Requested by: Poul-Henning Kamp <phk@phk.freebsd.dk>
Requested by: Mike Silbersack <silby@silby.com>
Requested by: Sam Leffler <sam@errno.com>
Giant and is also MPSAFE.
Push Giant further down into __mac_get_fd() and __mac_set_fd(),
grabbing it only for constrained regions dealing with VFS, and
dropping it entirely for operations related to labeling of pipes.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Radeon IGP support (still lacking PCI IDs), and DRM interface 1.2 updates which
include finally tying the DRM instances to specific devices rather than relying
on the X Server.
This switch toggles between strict multicast delivery, and traditional
multicast delivery.
The traditional (default) behaviour is to deliver multicast datagrams to all
sockets which are members of that group, regardless of the network interface
where the datagrams were received.
The strict behaviour is to deliver multicast datagrams received on a
particular interface only to sockets whose membership is bound to that
interface.
Note that as a matter of course, multicast consumers specifying INADDR_ANY
for their interface get joined on the interface where the default route
happens to be bound. This switch has no effect if the interface which the
consumer specifies for IP_ADD_MEMBERSHIP is not UP and RUNNING.
The original patch has been cleaned up somewhat from that submitted. It has
been tested on a multihomed machine with multiple QuickTime RTP streams
running over the local switch, which doesn't do IGMP snooping.
PR: kern/58359
Submitted by: William A. Carrel
Reviewed by: rwatson
MFC after: 1 week
vector stubs and into the C functions they call.
- Move disabling and EOIing of interrupt sources out of PIC driver entry
points and into intr_execute_handlers(). Intr_execute_handlers() only
disables a source for an interrupt if it is a stray interrupt or has
threaded handlers. Sources with fast handlers no longer disable (mask)
the source while executing the handlers.
- Move the setting of clkintr_pending into intr_execute_handlers() and set
the variable for any interrupt source with a vector of 0. (Should only
be true for IRQ 0.) This fixes clkintr_pending in the NO_MIXED_MODE
case.
- Implement lapic_eoi() and use it to implement ioapic_eoi_source().
- Rename atpic_sched_ithd() to atpic_handle_intr() since it is used to
handle all atpic interrupts and not just threaded ones.
Inspired by: peter's changes to amd64 in p4 (1)
Requested by: bde (2)
extra argument to the devfs MAC policy entry points was accidentally
merged from the MAC branch during my earlier commit to these policies,
and is not scheduled to be merged just yet.
defines for these constants that include the trailing NUL byte. These
new constants have SIZ in their name instead of LEN. As soon as all
consumers in the tree are converted to use the new defines the old
defines will be put under BURN_BRIDGES.
Reviewed by: archie, julian, ru
Approved by: re (in principle)
after the additions made for the new statfs structure (version
1.157). These must be updated in a separate checkin after
syscalls.master has been checked in so that they reflect its
new CVS identity. As these are purely derived files, it is not
clear to me why they are under CVS at all. I presume that it has
something to do with having `make world' operate properly.
accurate reporting of multi-terabyte filesystem sizes.
You should build and boot a new kernel BEFORE doing a `make world'
as the new kernel will know about binaries using the old statfs
structure, but an old kernel will not know about the new system
calls that support the new statfs structure. Running an old kernel
after a `make world' will cause programs such as `df' that do a
statfs system call to fail with a bad system call.
Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Tim Robbins <tjr@freebsd.org>
Reviewed by: Julian Elischer <julian@elischer.org>
Reviewed by: the hoards of <arch@freebsd.org>
Sponsored by: DARPA & NAI Labs.
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.
This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.
While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.
NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.
Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
vfs_mount_alloc/vfs_mount_destroy functions and take care to completely
destroy the mount point along with its locks. Mount struct has grown in
coplexity recently and depending on each failure path to destroy it
completely isn't working anymore.
2. Eliminate largely identical vfs_mount and vfs_unmount question by
moving the code to handle both cases into a newly introduced vfs_domount
function.
3. Simplify nfs_mount_diskless to always expect an allocated mount
struct and never attempt an allocation/destruction itself. The
vfs_allocroot allocation was there to support 'magic' swap space
configuration for diskless clients that was already removed by PHK some
time ago.
4. Include a vfs_buildopts cleanups by Peter Edwards to validate the
sanity of nmount parameters passed from userland.
Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com>
Reviewed by: rwatson
important change is in cpu_switch() where we disable the high FP
registers for the thread that we switch-out if the CPU currently
has its high FP registers. This avoids that the high FP registers
remain enabled for the thread even when the CPU has unloaded them
or the thread migrated to another processor.
Likewise, when we switch-in a thread of that has its high FP
registers on the CPU, we enable them. This avoids an otherwise
harmless, but unnecessary trap to have them enabled.
The code that handles the disabled high FP trap (in trap()) has
been turned into a critical section for the most part to avoid
being preempted. If there's a race, we bail out and have the
processor trap again if necessary.
Avoid using the generic ia64_highfp_save() function when the
context is predictable. The function adds unnecessary overhead.
Don't use ia64_highfp_load() for the same reason. The function
is now unused and can be removed.
These changes make the lazy context switching of the high FP
registers in an UP kernel functional.
turnstiles to implement blocking isntead of implementing a thread queue
directly. These turnstiles are somewhat similar to those used in Solaris 7
as described in Solaris Internals but are also different.
Turnstiles do not come out of a fixed-sized pool. Rather, each thread is
assigned a turnstile when it is created that it frees when it is destroyed.
When a thread blocks on a lock, it donates its turnstile to that lock to
serve as queue of blocked threads. The queue associated with a given lock
is found by a lookup in a simple hash table. The turnstile itself is
protected by a lock associated with its entry in the hash table. This
means that sched_lock is no longer needed to contest on a mutex. Instead,
sched_lock is only used when manipulating run queues or thread priorities.
Turnstiles also implement priority propagation inherently.
Currently turnstiles only support mutexes. Eventually, however, turnstiles
may grow two queue's to support a non-sleepable reader/writer lock
implementation. For more details, see the comments in sys/turnstile.h and
kern/subr_turnstile.c.
The two primary advantages from the turnstile code include: 1) the size
of struct mutex shrinks by four pointers as it no longer stores the
thread queue linkages directly, and 2) less contention on sched_lock in
SMP systems including the ability for multiple CPUs to contend on different
locks simultaneously (not that this last detail is necessarily that much of
a big win). Note that 1) means that this commit is a kernel ABI breaker,
so don't mix old modules with a new kernel and vice versa.
Tested on: i386 SMP, sparc64 SMP, alpha SMP
- Fail in agp_alloc_gatt if the aperture size is 0 instead of panicing in
contigmalloc.
Reported by: Bjoern Fischer <bfischer@Techfak.Uni-Bielefeld.DE>
Reviewed by: jhb
MFC after: 1 week
interrupt such as IRQ 22 or 19. However, the ACPI BIOS still routes
interrupts from some PCI devices to the same intpin calling the pin
IRQ 22. Thus, ACPI expects to address a single interrupt source via two
different names. To work around this, if the SCI is remapped to a non-ISA
interrupt (i.e., greater than 15), then we use
acpi_OverrideInterruptLevel() function to tell ACPI to use IRQ 22 or 19
rather than IRQ 9 for the SCI.
Previously we would change IRQ 22 or 19's name to IRQ 9 when we encountered
such an Interrupt Source Override entry in the MADT which routed the SCI
properly but left PCI devices mapped to IRQ 22 or 19 w/o a routable
interrupt.
Tested by: sos
This ensures that uart gets a higher console priority than syscons when
a serial console is being used. Testing against the "console" environment
variable doesn't make sense since we only have one loader console driver.
should now only have HTT CPUs if they have explicitly asked for them
either by enabling HyperThreading in the BIOS or by using the
MPTABLE_FORCE_HTT kernel option.
should only be used if they are enabled in the BIOS. Now that we support
enumerating CPUs using the ACPI MADT, any HTT machine using ACPI should
respect the BIOS setting. For HTT machines with ACPI disabled in the
kernel, the MPTABLE_FORCE_HTT kernel option can be used to try to probe HTT
CPUs like have done in the past for the MP Table case. This option should
only be enabled if HTT is enabled in the BIOS.
that we currently do not keep track of whether the thread has
actually used the high FP registers before. If not, we should
not save them in the context which automaticly means that we
also would not restore them from the context. For now, do it
unconditionally so that we can reach functional completeness.
functions switched to using {g|s}et_mcontext(). The problem is that
sigreturn(), being a syscall, can be given an async. context (i.e.
one corresponding to an interrupt or trap). When this happens, we
try to return to user mode via epc_syscall_return with a trapframe
that can only be used to return to user mode via exception_restore.
To fix this, we check the frame's flags immediately prior to
epc_syscall_return and branch to exception_restore for non-syscall
frames. Modify the assertion in set_mcontext() to check that if
there's a mismatch, it's because of sigreturn().
ktr_resize_pool(); this eliminates a potential livelock.
Return ENOSPC only if we encountered an out-of-memory condition when
trying to increase the pool size.
Reviewed by: jhb, bde (style)
cache after a data access error we must discard all cache lines. When
disabled existing cache lines are not invalidated by stores to memory, so
we risk reading stale data that was cached before the data access error if
we don't flush them. This is especially fatal when the memory involved
is the active part of the kernel or user stack. For good measure we also
flush the instruction cache.
This fixes random crashes when the X server probes the PCI bus through
/dev/pci.
commit broke the world because it depended on namespace pollution that
was only in my version of <machine/bootinfo.h>. The include was removed
in rev.1.63 after the last reference to it went away in rev.1.61.
dsp_open: rearrange to only hold one lock at a time
dsp_close: ditto
mixer_hwvol_init: delete locking, the only consumer seems to
be the ess driver and it only call it a creation time, I
think the device will be stable across the sleepable malloc.
cmi interrupt routine: Release locks while caller chn_intr,
either this or do what emu10k1 does which is have no locks
at in the interrupt handler.
Submitted by: mat@cnd.mcgill.ca
consistency initialized. Consequently, a number of conditionals that
checked the validity of b_object before passing it to VM_OBJECT_LOCK()
and VM_OBJECT_UNLOCK() are no longer needed.
The reason this was done was to avoid a race to the root when an
NFS server went down. However a semi-recent change to the way that
the kernel's lookup() routine traverses mount points prevents this.
Rev 1.39 of vfs_lookup.c changed the ordering of locks such that we
aquire a shared lock on the mount point being accessed and then drop
the directory vnode lock before requesting the target lock.
With that in place we no longer need shared locks for NFS to prevent
race to the root lockups.
it is marked as RTF_UP. This appears to fix a crash that was sometimes
triggered when dhclient(8) tried to send a packet after an interface
had been detatched.
Reviewed by: sam
a resource leak. Move the resource deallocation code from fifo_close()
to a new function, fifo_cleanup(), and call fifo_cleanup() from
fifo_close() and the appropriate places in fifo_open().
Tested by: Lukas Ertl
Pointy hat to: truckman
because RFNOWAIT was being passed to kproc_create.
The result was that shutdown took quite a bit longer because this
errant "child" would not respond to termination signals from init
at system shutdown.
RFNOWAIT dissassociates itself from the caller by attaching to init
as a parent proc. We could have had the taskqueue proc listen for
SIGKILL, but being able to SIGKILL a potentially critical system
process doesn't seem like a good idea.
comment about this flag in rev.1.61. It is not historical like the
comment said; it is the flag that says that most of what is laboriously
put in the bootinfo struct is actually there. Newer kernels were
bootable by even the broken boot2 without losing anything except the
symbol table, but older kernels need at least the memory sizes.
Restoring the "|" with RB_BOOTINFO that was lost in rev.1.43 costs 5
bytes. The fix can be done in only 4 bytes by fixing some code that
was removed in rev.1.61 (put RB_BOOTINFO back in in the initial value
of "opts" and fix RBX_MASK to not clobber it.)
crashes that had sn0 as the irq that's being serviced, when there was
no sn0 in the system. This seems to prevent them. Also, we want to
wait until after we've registered with the network layer before we
turn on the interrupt spigot to avoid races.
- redo updating.
rijndael-api-fst.[ch]:
- switch to use new low level rijndael api.
- stop using u8, u16 and u32.
- space cleanup.
Tested by: gbde(8) and phk's test program
free one sem_undo with un_cnt == 0 instead of all of them. This is a
temporary workaround until the SLIST_FOREACH_PREVPTR loop gets fixed so
that it doesn't cause cycles in semu_list when removing multiple adjacent
items. It might be easier to just use (doubly-linked) LISTs here instead
of complicated SLIST code to achieve O(1) removals.
This bug manifested itself as a complete lockup under heavy semaphore use
by multiple processes with the SEM_UNDO flag set.
PR: 58984
Only update them in the newly created context to reflect the state
after copying the dirty registers onto the user stack. If we were to
update the trapframe, we lose the state at entry into the kernel. We
may need that after we create the context, such as for KSE upcalls.
We have to update the trapframe after writing the dirty registers to
the user stack for signal delivery to work. But this is best done in
sendsig() itself where it applies, not in get_mcontext() where it's
done unconditionally.
with debugger, so testing P_TRACED in SIGPENDING is useless. This test also
is the culprit which causes lots of 'failed to set signal flags properly for
ast()' to be printed on console which is just a false complaint.
must return EINVAL if size is zero. Submitted by: tegge
- In order to avoid a race condition in multithreaded applications, the
check and removal operations by munmap(2) must be in the same critical
section. To accomodate this, vm_map_check_protection() is modified to
require its caller to obtain at least a read lock on the map.
revision 1.142
date: 2003/10/11 03:04:26; author: toshii
Fix a done list handling bug which exhibits under high shared
interrupt rate and bus traffic. As the interrupt register is
read after checking hcca_done_head, there was a small chance
of dropping a done list. Ignore OHCI_WDH interrupt bit if
hcca_done_head is zero so that OHCI_WDH is processed later.
revision 1.141
date: 2003/09/10 20:08:29; author: mycroft;
Update actlen even in the case where a TD returns an error --
this is critical for the umass bulk-only STALL case.
revision 1.176
date: 2003/11/04 19:11:21; author: mycroft;
Ignore a CRCTO error on a SETUP transaction in combination with
STALLED or NAK. This fixes problems with the GL641.
- remove the unnecessary elm arg from SIMPLEQ_REMOVE_HEAD().
this mirrors the functionality of SLIST_REMOVE_HEAD() (the other
singly-linked list type) and FreeBSD's STAILQ_REMOVE_HEAD()
use set_mcontext() to restore the context in sigreturn(). Since we
put the syscall number and the syscall arguments in the trapframe
(we don't save the scratch registers for syscalls, which allows us
to reuse the space to our advantage), create a MD specific flag so
that we save the scratch registers even for syscalls. We would not
be able to restart a syscall otherwise.
The signal trampoline does not need to flush the regiters anymore,
because get_mcontext() already handles that. In fact, if we set up
the context correctly, we do not need to have a trampoline at all.
This change however only minimally changes the trampoline code. In
follow-up commits this can be further optimized.
Note that normally we preserve cfm and iip in the trapframe created
by the EPC syscall path when we restore a context in set_mcontext()
because those fields are not normally set for a synchronuous context.
The kernel puts the return address and frame info of the syscall
stub in there. By preserving these fields we hide this detail from
userland which allows us to use setcontext(2) for user created
contexts. However, sigreturn() is commonly called from the trampoline,
which means that if we preserve cfm and iip in all cases, we would
return to the trampoline after the sigreturn(), which means we hit
the safety net: we call exit(2). So, we do not preserve cfm and iip
when we have a synchronous context that also has scratch registers
(the uncommon context created by sendsig() only), under the assumption
that if such a context is created in userland, something special is
going on and the use of cfm and iip is then just another quirk. All
this is invisible in the common case.
if we drop into the pmap or vnode layers.
- Migrate the handling of zero-length msync(2)s into vm_map_sync() so that
multithread applications can't change the map between implementing the
zero-length hack in msync(2) and reacquiring the map lock in
vm_map_sync().
Reviewed by: tegge
Since all callers either passed 0 or 1 for clear_ret, define bit 0 in
the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI
code for possible (but unlikely) future use. The remaining bits are for
use by MD code.
This change is triggered by a need on ia64 to have another knob for
get_mcontext().
o When we're resetting the board, make sure that we error out the pending
CCBs first. Otherwise the aha_cmd won't accept further commands, such
as those that are used to reset the card (AOP_INITIALIZE_MBOX). This
appears to cause a cascade failure where no more commands are possible
to the card.
o Reduce from 10s down to 1s the amount of time we're willing to tolerate
the card being awol. This helps the above case.
o Add some error checking to two commands issued in the probe.
I have a dim memory of gibbs@ trying to tell me about this problem a
few years ago, so pointy hat to imp@ for sitting on it so long.
in the log message for kern_sched.c 1.83 (which should have been
repo-copied to preserve history for this file), the (4BSD) scheduler
algorithm only works right if stathz is nearly 128 Hz. The old
commit lock said 64 Hz; the scheduler actually wants nearly 16 Hz
but there was a scale factor of 4 to give the requirement of 64 Hz,
and rev.1.83 changed the scale factor so that the requirement became
128 Hz. The change of the scale factor was incomplete in the SMP
case. Then scheduling ticks are provided by smp_ncpu CPUs, and the
scheduler cannot tell the difference between this and 1 CPU providing
scheduling ticks smp_ncpu times faster, so we need another scale
factor of smp_ncp or an algorithm change.
This quick fix uses the scale factor without even trying to optimize
the runtime divisions required for this as is done for the other
scale factor.
The main algorithmic problem is the clamp on the scheduling tick counts.
This was 295; it is now approximately 295 * smp_ncpu. When the limit
is reached, threads get free timeslices and scheduling becomes very
unfair to the threads that don't hit the limit. The limit can be
reached and maintained in the worst case if the load average is larger
than (limit / effective_stathz - 1) / 2 = 0.65 now (was just 0.08 with
2 CPUs before this change), so there are algorithmic problems even for
a load average of 1. Fortunately, the worst case isn't common enough
for the problem to be very noticeable (it is mainly for niced CPU hogs
competing with less nice CPU hogs).
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().
- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.
- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.
Not objected in: -arch, -current
that msync(2) is its only caller.
- Migrate the parts of the old vm_map_clean() that examined the internals
of a vm object to a new function vm_object_sync() that is implemented in
vm_object.c. At the same, introduce the necessary vm object locking so
that vm_map_sync() and vm_object_sync() can be called without Giant.
Reviewed by: tegge
are zx1 based machines and they don't particularly like it when we
poke at them with PC legacy code. The atkbd and psm devices were
disabled in the hints file so that one could enable them on machines
that support legacy devices, but that's not really something you can
expect from a first-time installer. This still leaves syscons (sc)
and the vga device, which were enabled by default and wrecking havoc
anyway. We could disable them by default like the atkbd and psm
devices, but there's really no point in pretending we're in a better
shape that way.
o pickup Giant in divert_packet to protect sbappendaddr since it
can be entered through MPSAFE callouts or through ip_input when
mpsafenet is 1
o add missing locking on output
o add locking to abort and shutdown
o add a ctlinput handler to invalidate held routing table references
on an ICMP redirect (may not be needed)
Supported by: FreeBSD Foundation
o add assertions in tcp_respond to validate inpcb locking assumptions
o use local variable instead of chasing pointers in tcp_respond
Supported by: FreeBSD Foundation
whether or not the isr needs to hold Giant when running; Giant-less
operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive
Supported by: FreeBSD Foundation
PALM_4 initialisation hack. I've not confirmed it myself, but
seeing as we already don't use it for the Sony Clie_41, let's drop
it from the Clie_40 also and see what happens.
(Question: What about the Clie_S360 and Clie_NX60 devices? Do we
need to drop Palm4 from those as well? Possibly, but I've not had
any reports about those so I don't know.)
PR: kern/56575
MFC after: 3 days
is highly MD in an emulation environment since it operates on the host
environment. Although the setregs functions are really for exec support
rather than signals, they deal with the same sorts of context and include
files. So I put it there rather than create yet another file.
since there is no direct association between M:N thread and kse,
sometimes, a thread does not have a kse, in that case, return a pctcpu
from its last kse, it is not perfect, but gives a good number to be
displayed.
pmap_pte() and pmap_pte_quick(). The distinction being based upon the
locks that are held by the caller. When the given pmap is not the
current pmap, pmap_pte() should be used when Giant is held and
pmap_pte_quick() should be used when the vm page queues lock is held.
- When assigning to PMAP1 or PMAP2, include PG_A anf PG_M.
- Reenable the inlining of pmap_is_current().
In collaboration with: tegge
has a LOR between IPFW inpcb locks but I'm committing it now as the
lesser of two evils (the other being unlocked use of in_pcblookup).
Supported by: FreeBSD Foundation
successfully initialized in the label as a socket peer label, not a
socket label. For current policy modules, this didn't make a
difference, but if a policy module had label data in the peer label
that was to be GC'd in a different way than the normal socket label,
it might have been a problem.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
that it was obsoleted. it is better to fail than just hiding use
of ipsec_gethist() at build.
Sugessted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
Removed banal comments about ELAN*. Complain about ELAN* being misnamed
instead (so that these options are not obviously related to a CPU and
don't sort with CPU_ELAN).
Complain about CPU_DISABLE_CMPXCHG being in the wrong namespace.
to a routing table entry w/o bumping the reference count or locking
against the entry being free'd. This caused major havoc (for some
reason it appeared most frequently for folks running natd). Fix
is to bump the reference count whenever we copy the route cache
contents into a private copy so the entry cannot be reclaimed out
from under us. This is a short term fix as the forthcoming routing
table changes will eliminate this cache entirely.
Supported by: FreeBSD Foundation
Instead, let the vm objects be lazily instantiated at fault time. This
results in the allocation of fewer vm objects and vm map entries due to
aggregation in the vm system.
sure we handle stacked registers properly by taking into account
that:
1. bspstore points after the frame (due to cover),
2. we need to adjust for intermediate NaT collections.
not in scheduler specific data because eventually it will be required by
all schedulers.
- Implement sched_pin and unpin as an inline for now. If a scheduler needs
to do something more complicated than adjusting the pinned count we can
move this into a function later in an api compatible way.
o move it from subr_bus.c to netisr.c where it more properly belongs
o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to
grab Giant as needed when MPSAFE operation is enabled
Supported by: FreeBSD Foundation
pin that is used by the default identity mapping if it still maps to the
old vector. The ACPI case might need some tweaking for the SCI interrupt
case since ACPI likes to address the intpin using both the IRQ remapped to
it as well as the previous existing PCI IRQ mapped to it.
Reported by: kan
on ia64. The bug is present in i386 as well but didn't show up due to
more relaxed page protections. This fix has been submitted to the vendor.
Submitted by: marcel
rather than signed. This fixes some cosmetics such as verbose printf's
for IRQs greater than 127.
- The calculation for next_ioapic_base was also adjusted so that it will
only complain once for each hole in the IRQs provided by ACPI for IO
APICs.
Reported by: Michal Mertl <mime@traveller.cz>
Don't put the name of the file in a comment. $FreeBSD$ gives more than
enough about the file's pathname.
Fixed misdescription of the file. It isn't the whole unified Makefile...
Moved the settings of WERROR and of the standard extra CFLAGS
-finline-limit and -fno-strict-aliasing to a less wrong place. They
were in the section for profiling.
of ffs_reload()'s mountp parameter to mp in rev.1.28 of ffs_vnops.c
had not been merged here.
ext2fs_reload() is still missing locking from not merging other changes
to ffs_reload(), but none of these is related to recent locking changes.
but can be enabled by setting hw.atm.hatmN.mpsafe in the kernel
environment to a non-zero value before loading the driver. When
the problems with network MPSAFEty have been sorted out this will
be removed and the driver will default to MPSAFE.
We put them directly onto the free list instead of calling the
external mbuf free routine (that routine would have cleaned the flag).
This fixes a bug which manifests itself in falsely reporting a lot of used
buffers when configuring the interface down.
conservative lock. The problem with the lock-less algorithm is that
it suffers from the ABA problem. Running an application with funnels
a couple of 100kpkts/s through the netgraph system on a dual CPU system
with MPSAFE drivers will panic almost immediatly with the old algorithm.
It may be possible to eliminate the contention between threads that insert
free items into the list and those that get free items by using the
Michael/Scott queue algorithm that has two locks.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.
Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.
Discussed with: jeff
1.36 +73 -60 src/sys/compat/linux/linux_ipc.c
1.83 +102 -48 src/sys/kern/sysv_shm.c
1.8 +4 -0 src/sys/sys/syscallsubr.h
That change was intended to support vmware3, but
wantrem parameter is useless because vmware3 uses SYSV shared memory
to talk with X server and X server is native application.
The patch worked because check for wantrem was not valid
(wantrem and SHMSEG_REMOVED was never checked for SHMSEG_ALLOCATED segments).
Add kern.ipc.shm_allow_removed (integer, rw) sysctl (default 0) which when set
to 1 allows to return removed segments in
shm_find_segment_by_shmid() and shm_find_segment_by_shmidx().
MFC after: 1 week
the amd64 implementation of the pcpu macros is even more verbose than on
i386 and that causes gcc to way overestimate the complexity of this
2-instruction macro. The other platforms can probably lower their
default values.
from the xe driver. Should probably be removed when current probe/attach
problems with the driver are fixed, but is useful now when requesting
diagnostic information from users.
Reviewed by: imp (mentor)
isa_device pointer as its argument and uses that to call the driver's
interrupt handler passing the unit number as its argument. This should
fix COMPAT_OLDISA devices with a unit number of 0.
Reviewed by: peter
Reported by: bde
- share policy-on-socket for listening socket.
- don't copy policy-on-socket at all. secpolicy no longer contain
spidx, which saves a lot of memory.
- deep-copy pcb policy if it is an ipsec policy. assign ID field to
all SPD entries. make it possible for racoon to grab SPD entry on
pcb.
- fixed the order of searching SA table for packets.
- fixed to get a security association header. a mode is always needed
to compare them.
- fixed that the incorrect time was set to
sadb_comb_{hard|soft}_usetime.
- disallow port spec for tunnel mode policy (as we don't reassemble).
- an user can define a policy-id.
- clear enc/auth key before freeing.
- fixed that the kernel crashed when key_spdacquire() was called
because key_spdacquire() had been implemented imcopletely.
- preparation for 64bit sequence number.
- maintain ordered list of SA, based on SA id.
- cleanup secasvar management; refcnt is key.c responsibility;
alloc/free is keydb.c responsibility.
- cleanup, avoid double-loop.
- use hash for spi-based lookup.
- mark persistent SP "persistent".
XXX in theory refcnt should do the right thing, however, we have
"spdflush" which would touch all SPs. another solution would be to
de-register persistent SPs from sptree.
- u_short -> u_int16_t
- reduce kernel stack usage by auto variable secasindex.
- clarify function name confusion. ipsec_*_policy ->
ipsec_*_pcbpolicy.
- avoid variable name confusion.
(struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct
secpolicy *)
- count number of ipsec encapsulations on ipsec4_output, so that we
can tell ip_output() how to handle the packet further.
- When the value of the ul_proto is ICMP or ICMPV6, the port field in
"src" of the spidx specifies ICMP type, and the port field in "dst"
of the spidx specifies ICMP code.
- avoid from applying IPsec transport mode to the packets when the
kernel forwards the packets.
Tested by: nork
Obtained from: KAME
- Add the following functions to the api: sched_bind(), sched_unbind(),
sched_pin(), and sched_unpin(). Bind/unbind are used for traditional
cpu binding. Pin and unpin are meant to allow the kernel to hold a thread
on a particular cpu so that it may cache per-cpu data without fear of
being migrated.
no matter where in the directory structure it may be. Use this and the "-k"
flag in the generated gdbinit files so that the "getsyms" function in gdb
requires no user intervention to run and will find every module if they're
in the kernel build's module directory. This is still quite useful for
cases where gdb knows that the path for some modules is /boot/kernel and
others are in the object directory for /usr/src/sys/$ARCH/compile/kernel.
Approved by: grog
waitrunningbufspace() calls so that they are always able to
proceed and clean up buffer space.
Submitted by: Brian Fundakowski Feldman <green@freebsd.org>
o Fix minor type problems
o Fix minor problem with a couple debug printfs.
o Default to a sane media type when none is reported.
o Minor style changes
The PR complains this will fix the IBM 300GL cards.
Submitted by: Max Gotlib
PR: 11462
Use zpfind() to see if the process became a zombie if pfind() doesn't find it
and if the caller wants to know about process death, so that the caller knows
the process died even if it happened before the kevent was actually registered.
MFC after: 1 week
This fixes a race condition (specifically with signal events) that could
lead to the kn being re-inserted into the list after it has been destroyed,
which is not something we want to happen.
PR: kern/58258
operating mode to HostAP, the card will lock up indefinitely (but
the wi(4) driver can recover if you eject the card). The problem is
that the card needs to be "reset" in a way before you even change the
media to hostap. In practice this isn't as noticeable because you
probably do some operation beforehand which prevents the lock-up
before you enable hostap mode.
e.g.:
"ifconfig wi0 up media autoselect mediaopt hostap" will lock up
(if you just inserted the card).
"ifconfig wi0 up ssid foo media autoselect mediaopt hostap" won't lock up.
- Compile 'device acpi' into GENERIC by default as well. Note that
the beastie loader menu item to disable ACPI still works if ACPI is
compiled into the kernel.
let the MD code choose whether or not to implement such a policy. The new
i386 interrupt code allows multiple FAST handlers for a given source for
example. However, the code does not allow FAST and non-FAST handlers to be
mixed.
we would manage this better by having the interrupt code add each
interrupt vector to the resource map when each source is registered.
- Use the new interrupt code API for registering and tearing down interrupt
handlers.
- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.
devices claiming resources that they don't actually use. The PIC drivers
only register valid interrupt sources, so we don't need to rely on these
drivers to claim invalid IRQs to prevent their use by other drivers.
slave pin on the master PIC in the !APIC_IO case. The PIC drivers now
manage these details internally.
- Remove an spl0() that hasn't done anything since SMPng was first
committed.
- Update some comments that have rotted since SMPng.
- Use intr_suspend/resume() callouts to the interrupt code layer which
suspends and resumes all the known interrupt sources instead of calling
icu_reinit() directly.
APIC Descriptor Table to enumerate both I/O APICs and local APICs. ACPI
does not embed PCI interrupt routing information in the MADT like the MP
Table does. Instead, ACPI stores the PCI interrupt routing information
in the _PRT object under each PCI bus device. The MADT table simply
provides hints about which interrupt vectors map to which I/O APICs. Thus
when using ACPI, the existing ACPI PCI bridge drivers are sufficient to
route PCI interrupts.
- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.
default we provide 16 interrupt sources for IRQs 0 through 15. However,
if the I/O APIC driver has already registered sources for any of those IRQs
then we will silently fail to register our own source for that IRQ.
Note that i386/isa/icu.h is now specific to the 8259A and no longer
contains any info relevant to APICs. Also note that fast interrupts no
longer use a separate entry point. Instead, both fast and threaded
interrupts share the same entry point which merely looks up the appropriate
source and passes control to intr_execute_handlers().
that provides methods via a PIC driver to do things like mask a source,
unmask a source, enable it when the first interrupt handler is added, etc.
The interrupt code provides a table of interrupt sources indexed by IRQ
numbers, or vectors. These vectors are what new-bus uses for its IRQ
resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt
code then maps that vector a given interrupt source object. When an
interrupt comes in, the low-level interrupt code looks up the interrupt
source for the source that triggered the interrupt and hands it off to
this code to execute the appropriate handlers.
By having an interrupt source abstraction, this allows us to have different
types of interrupt source providers within the shared IRQ address space.
For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1
through 60 may map to pins on various I/O APICs, and IRQs 120 through
128 may map to MSI interrupts for various PCI devices.
the root path. This is reported to make non-PXE netbooting, such as
is used on sparc64 systems, work correctly when the TFTP server is
not the same as the root server.
PR: kern/57328
Submitted by: Per Kristian Hove <Per.Hove@math.ntnu.no>
header copy made on input path: this is now handled differently.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
is really EtherExpress or EEPro or what, but it does appear in a
couple of ethernet cards that have appeared recently on ebay. Silicom
appears to make these cards, and they have the 82595TX chipset in
them, and sometimes uarts. The ex driver needs some work to support
these cards, but I thought I'd get the device into pccarddevs.
The hardware driver decides the name under /dev/led and provides
the function to turn the lamp on/off.
All leds are serviced by a single timeout which runs at a basic rate
of hz/10.
The LED is controlled by ascii strings as follows.
0 Turn off.
1 Turn on.
f Flash: _-
f2 Flash: __--
f3 Flash: ___---
f4...f9 etc.
d%d Digits. "d12": -__________-_-______________________________
s%s String, roll your own:
'a-j' gives on for (1...10)/10 sec.
'A-J' gives on for (1...10)/10 sec.
'sAaAbBa': _-_--__-
m%s Morse
'.' dot
'-' dash
' ' letter space
'\n' word space
My mdoc skills do not reach to express that.
Add a sysctl declaration for hw.ata.atapi_dma, which had gone MIA (though
setting it in loader.conf still worked, it was not visible at runtime)
Approved by: sos
to the pci attachment. Cardbus is a derived class of pci so all pci
drivers are automatically available for matching against cardbus devices.
Reviewed by: imp
message encoding and decoding stuff into the base module. All of this
is accessed by several of the NgATM modules and putting this into
atmbase reduceds the memory footprint.
cr.isr sanity check. We actually encounter insanities, which very
likely means that the insanity check itself is insane. Remove an empty
comment while I'm at it.
directly on the radix tree and does not hold any routing table refernces.
This fixes the reference counting problems that manifested itself as a
panic during unmount of filesystems that were mounted by NFS over an
interface that had been removed.
Supported by: FreeBSD Foundation
idle. They figure out that we're idle fast enough that the cache pollution
introduces by scanning their run queue is more expensive than waiting
a little longer.
- Add kseq_setidle() to mark us as being idle. Use this in place of
kseq_find().
- Remove kseq_load_highest(), kseq_find() was the only consumer of this
interface. kseq_balance() has it's own customized version that finds the
lowest and highest loads simultaneously.
Continuously told that this would be faster by: terry
GEOM was not designed to handle media that does not have
a size. Blank CD's are of that type, so cheat and set the
media size to -1. This allows burning to work, but makes
GEOM issue outofrange reads that makes the ATAPI subsystem
spew out a few warnings. GEOM should be tought about this.
GEOM was not designed to handle changing the sectorsize
between opens. Writing multitack CD's with both audio and
data tracks needs to change sector size on the fly. We
cheat here and stuff the current sectorsize into GEOM
private internals. GEOM should grow some clean way for this.
o Fix MFC cards. We were bogusly setting CCR_IOBASE[01] and CCR_IOLIMIT.
now when we activate the resource, we adjust these for MFC cards, per the
spec.
o Change type of pf_mfc_* to be bus_addr_t, which is more correct than
long.
This makes my 3C362D/3C363D and 3CXEM556 cards work! Woo Hoo!
o Remove redundant $FreeBSD$
o Better comments about ep_get_macaddr.
o remove one tab in a switch statement (style only)
o Recognize ID 0x0035 as the device ID for the 3CXEM556 that I have. This
makes the 3CXEM556 work for me. Not 100% sure this is the assigned ID,
as I don't have the datasheets for this part, but it does work and get
the correct ethrnet address.
o Comment about the whole fake IRQ 3 thing. some need it, some don't, all
work with it.
the total load, the timeshare load, and the number of threads that can
be migrated to another cpu. Account for these seperately.
- Introduce a KSE_CAN_MIGRATE() macro which determines whether or not a KSE
can be migrated to another CPU. Currently, this only checks to see if
we're an interrupt handler. Eventually this will also be used to support
CPU binding.
An example of useless is bios.h. An example of wrong is msdos.h (due
to the use of long for 32-bit fields).
display.h cannot be removed because it's used by syscons. That header
however has no platform dependency and shouldn't really be here.
Removal if these headers may cause build failures in the ports tree.
It's the ports that need fixing in that case.
Tested with: buildworld, LINT
previously only considered the send sequence space. Unfortunately,
some OSes (windows) still use a random positive increments scheme for
their syn-ack ISNs, so I must consider receive sequence space as well.
The value of 250000 bytes / second for Microsoft's ISN rate of increase
was determined by testing with an XP machine.
wasn't curthread, i.e. when we receive a thread pointer to use
as a function argument. Use VOP_UNLOCK/vrele in these cases.
The only case there td != curthread known at the moment is
boot() calling sync with thread0 pointer.
This fixes the panic on shutdown people have reported.
slice assignment. Add a comment describing what it does.
- Remove a stale XXX comment, the nice should not impact the interactivity,
nice adjustments only effect non-interactive tasks in ULE.
- Don't allow nice -20 tasks to totally starve nice 0 tasks. Give them at
least SCHED_SLICE_MIN ticks. We still allow nice 0 tasks to starve nice
+20 tasks as intended.
- SCHED_PRI_NRESV does not have the off by one error in PRIO_TOTAL so we
do not have to account for it in the few places that we use it.
Requested by: bde
0 and SCHED_SLP_RUN_MAX * 2. This allows us to simplify the algorithm
quite a bit. Before, it dealt with arbitrary values which required us
to do nasty integer division tricks that didn't quite work out correctly.
- Chnage sched_wakeup() to detect conditions where the slp+runtime could
exceed SCHED_SLP_RUN_MAX * 2. This can happen if we go to sleep for
longer than 6 seconds. In this case, we'll just clear the runtime and
set the sleep time to the max.
- Define a new function, sched_interact_fork() which updates the slp+runtime
of a newly forked thread. We want to limit the amount of history retained
from the parent so that we learn the child's behavior quickly. We don't,
however want to decay it to nothing. Previously, we would simply divide
each parameter by 100 whenever we forked. After a few forks the values
would reach 0 and tasks would not be considered interactive.
- Add another KTR entry, cleanup some existing entries.
- Remove a useless sched_interact_update() from sched_priority(). This is
already done by the callers that require it.
destination objects are locked on entry and exit. Add comments to
the callers noting that the locks can be released by swap_pager_copy().
- Remove several instances of GIANT_REQUIRED.
we will generate for a given ip/port tuple has advanced far enough
for the time_wait socket in question to be safely recycled.
- Have in_pcblookup_local use tcp_twrecycleable to determine if
time_Wait sockets which are hogging local ports can be safely
freed.
This change preserves proper TIME_WAIT behavior under normal
circumstances while allowing for safe and fast recycling whenever
ephemeral port space is scarce.
o change os glue API to be compatible with Linux so hal.o's can
be used on any system
o add ABI version to catch driver-HAL mismatches
o move hal version information from ah_osdep.c to binary component
o remove ath_hal_wait os glue component
o assign constant values to all enums to avoid potential compiler
incompatibilities
o add support for 3Com badged cards (PCI vendor ID)
o add support for IBM mini-pci cards (PCI device ID)
o expose MAC, PHY, and radio hardware revisions
o support for big-endian platforms
o new method to set slot time in us
o bug fix for 5211: beacon timers not setup correctly
o bug fix for 5212: don't crash when handed a 5112 radio
Requested by: jhb
Initialize the real mode stack. This is needed at least for the return
address from the lcall.
Requested by: takawata
Fix style bugs in acpi_wakecode.S
Requested by: bde
Remove the kernel option now that we have the tunable.
to use the direct mapped KVA at KERNBASE to service the request. This also
allows pmap_mapdev() to be used for such addresses very early during the
boot process and might provide some small savings on KVA.
Reviewed by: peter
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.
Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
mbuf cluster, copy the data to a separate mbuf that do not use a
cluster. this change will reduce the possiblity of packet loss
in the socket layer.
Obtained from: KAME
256 raw receive buffers (96 byte each) fit into one page. This breaks the
limit imposed by the usage of an uint8_t for the buffer number. Restrict
the allocation size for buffers to a maximum of 8192.
- Add an IPI based mechanism for migrating kses. This mechanism is
broken down into several components. This is intended to reduce cache
thrashing by eliminating most cases where one cpu touches another's
run queues.
- kseq_notify() appends a kse to a lockless singly linked list and
conditionally sends an IPI to the target processor. Right now this is
protected by sched_lock but at some point I'd like to get rid of the
global lock. This is why I used something more complicated than a
standard queue.
- kseq_assign() processes our list of kses that have been assigned to us
by other processors. This simply calls sched_add() for each item on the
list after clearing the new KEF_ASSIGNED flag. This flag is used to
indicate that we have been appeneded to the assigned queue but not
added to the run queue yet.
- In sched_add(), instead of adding a KSE to another processor's queue we
use kse_notify() so that we don't touch their queue. Also in sched_add(),
if KEF_ASSIGNED is already set return immediately. This can happen if
a thread is removed and readded so that the priority is recorded properly.
- In sched_rem() return immediately if KEF_ASSIGNED is set. All callers
immediately readd simply to adjust priorites etc.
- In sched_choose(), if we're running an IDLE task or the per cpu idle thread
set our cpumask bit in 'kseq_idle' so that other processors may know that
we are idle. Before this, make a single pass through the run queues of
other processors so that we may find work more immediately if it is
available.
- In sched_runnable(), don't scan each processor's run queue, they will IPI
us if they have work for us to do.
- In sched_add(), if we're adding a thread that can be migrated and we have
plenty of work to do, try to migrate the thread to an idle kseq.
- Simplify the logic in sched_prio() and take the KEF_ASSIGNED flag into
consideration.
- No longer use kseq_choose() to steal threads, it can lose it's last
argument.
- Create a new function runq_steal() which operates like runq_choose() but
skips threads based on some criteria. Currently it will not steal
PRI_ITHD threads. In the future this will be used for CPU binding.
- Create a kseq_steal() that checks each run queue with runq_steal(), use
kseq_steal() in the places where we used kseq_choose() to steal with
before.
the rstack functionality:
1. Fix a KASSERT that tests for the address to be above the upward
growable stack. Typically for rstack, the faulting address can be
identical to the record end of the upward growable entry, and
very likely is on ia64. The KASSERT tested for greater than, not
greater equal, so whenever the register stack had to be grown
the assertion fired.
2. When we grow the upward growable stack entry and adjust the
unlying object, don't forget to adjust the size of the VM map.
Not doing so would trigger an assert in vm_mapzdtor().
Pointy hat: marcel (for not testing with INVARIANTS).
those cylinder groups that have at least 75% of the average free
space per cylinder group for that file system are considered as
candidates for the creation of a new directory. The previous formula
for minbfree would set it to zero if the file system was more than
75% full, which allowed cylinder groups with no free space at all
to be chosen as candidates for directory creation, which resulted
in an expensive search for free blocks for each file that was
subsequently created in that directory.
Modify the calculation of minifree in the same way.
Decrease maxcontigdirs as the file system fills to decrease the
likelyhood that a cluster of directories will overflow the available
space in a cylinder group.
Reviewed by: mckusick
Tested by: kmarx@vicor.com
MFC after: 2 weeks
that libtool-using packages seem to love using this flag.
/usr/include/sys/cdefs.h:184:5: warning: "__STDC_VERSION__" is not defined
/usr/include/sys/cdefs.h:372:5: warning: "_POSIX_C_SOURCE" is not defined
/usr/include/sys/cdefs.h:378:5: warning: "_POSIX_C_SOURCE" is not defined
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).
Supported by: FreeBSD Foundation
are now in the header of the external buffer itself which allows us
to manipulate them in the free routine without having to lock the softc
structure or the free list. To get space for these flags the chunk number
is reduced to 8 bit which amounts to a maximum of 256 chunks per allocated
page. This restriction is now enforced by a CTASSERT.
structures come out the right size.
Fix the ones that broke. stat32 had some missing fields from the end
and statfs32 was broken due to the strange definition of MNAMELEN
(which is dependent on sizeof(long))
I'm not sure if this fixes any actual problems or not.
the module. Previously we grabbed the mutex used by the callouts,
then stopped the callout with callout_stop, but if the callout
was already active and blocked by the mutex then it would continue
later and reference the mutex after it was destroyed. Instead
stop the callout first then lock.
Supported by: FreeBSD Foundation
o add some more debugging help for figuring out why folks are
getting complaints about releasing routing table entries with
a zero refcnt
o fix comment that talked about spl's
o remove duplicate define of DUMMYNET_DEBUG
Supported by: FreeBSD Foundation
direct dispatch) to avoid extensive kernel stack usage and to
avoid directly re-entering the network stack. The latter causes
locking problems when, for example, a complete TCP handshake`
happens w/o a context switch.
clobbers this variable. Long ago, when the idle loop wasn't in a
process, it set switchtime.tv_sec to zero to indicate that the time
needs to be read after the idle loop finishes. The special case for
this isn't needed now that there is an idle process (for each CPU).
The time is read in the normal way when the idle process is switched
away from. The seconds component of the time is only zero for the
first second after the uptime is set, and the mostly-dead code was only
executed during this time. (This was slightly broken by using uptimes
instead of times relative to the Epoch -- in the original version the
seconds component of the time was only 0 for the first second after
the Epoch.)
In mi_switch(), moved the setting of switchticks to just after the
first (and now only) setting of switchtime. This setting used to be
delayed since a late setting was needed for the idle case and an early
setting was not needed. Now the early setting is needed so that
fork_exit() doesn't need to set either switchtime or switchticks.
Removed now-completely-rotted comment attached to this. Most of the
code described by the comment had already moved to sched_switch().
very first cell in the mbuf should have a cell header word (of which
everything except the payload type and the CLP bit is ignored). All
other cells should be 48 byte and get the same header as the first cell.
This fixes a problem with sending more than 120000 raw cells/sec through
an HE155. The card seems to need 2 cell times to DMA the transmit buffer
ready queue entry and the transmit buffer descriptor so at 1/3 the
link rate the transmit buffer ready queue starts to fill up. Even with this
patch it's obviously impossible to send raw cells at link rate.
- implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c.
make ip{,6}_ecn_egress() return integer to tell the caller that
this packet should be dropped.
- handle ECN at fragment reassembly in ip_input.c and frag6.c.
Obtained from: KAME
begin with sched_lock held but not recursed, so this variable was
always 0.
Removed fixup of sched_lock.mtx_recurse after context switches in
sched_switch(). Context switches always end with this variable in the
same state that it began in, so there is no need to fix it up. Only
sched_lock.mtx_lock really needs a fixup.
Replaced fixup of sched_lock.mtx_recurse in fork_exit() by an assertion
that sched_lock is owned and not recursed after it is fixed up. This
assertion much match the one in mi_switch(), and if sched_lock were
recursed then a non-null fixup of sched_lock.mtx_recurse would probably
be needed again, unlike in sched_switch(), since fork_exit() doesn't
return to its caller in the normal way.
Correct a bug when the number of pages for external mbufs was
very large. In this case the page number could overflow into the large
buffer flag. Make this more unlikley by move that flag further away.
is returned from the card to the driver. Add a counter that shows
how many times this allocation has failed. Note, that we could even
further delay the allocation of the mbuf until we know, that we need it
(there are no receive errors and the connection is open). This will be done
in a later commit.
Print the new statistics field in atmconfig.
atomic instructions instead. Remove the stuff used to track
whether an external mbuf travels through the system. This is
temporary only and will come back soon.
code has the typical branch prediction detour, which creates cross-
section branches. A LINT kernel is apparently large enough nowadays
that the .text and .text2 sections cannot always be layed-out so that
branches between them reach.
The fix is to stop using the alpha-specific bitops and instead use
the portable implementation used by all platforms other than alpha
and i386.
with an mbuf until it is reclaimed. This is in contrast to tags that
vanish when an mbuf chain passes through an interface. Persistent tags
are used, for example, by MAC labels.
Add an m_tag_delete_nonpersistent function to strip non-persistent tags
from mbufs and use it to strip such tags from packets as they pass through
the loopback interface and when turned around by icmp. This fixes problems
with "tag leakage".
Pointed out by: Jonathan Stone
Reviewed by: Robert Watson
reset in a child process after a fork(). Currently, however, only the
real timer is cleared while the virtual and profiling timers are inherited.
The realtimer is cleared because it lives directly in struct proc in
p_realtimer. It is in the zero'd section of struct proc. The other timers
live in the p_timer[] array in struct pstats. These timers are copied on
fork() rather than zero'd. The fix is to move p_timer[] to the zero'd
part of struct pstats so that they are zero'd instead of copied on fork().
Note: Since at least FreeBSD 2.0 (and possibly earlier) we've had storage
for two real interval timers. Now that the uarea is less important,
perhaps we could move all of p_timer[] over to struct proc and drop the
p_realtimer special case to fix that.
PR: kern/58647
Reported by: Dan Nelson <dnelson@allantgroup.com>
MFC after: 1 week
the RNAT bit index constant. The net effect of this is that there's
no discontinuity WRT NaT collections which greatly simplifies certain
operations. The cost of this is that there can be up to 504 bytes of
unused stack between the true base of the kernel stack and the start
of the RSE backing store. The cost of adjusting the backing store
pointer to keep the RNAT bit index constant, for each kernel entry,
is negligible.
The primary reasons for this change are:
1. Asynchronuous contexts in KSE processes have the disadvantage of
having to copy the dirty registers from the kernel stack onto the
user stack. The implementation we had so far copied the registers
one at a time without calculating NaT collection values. A process
that used speculation would not work. Now that the RNAT bit index
is constant, we can block-copy the registers from the kernel stack
to the user stack without having to worry about NaT collections.
They will be in the right place on the user stack.
2. The ndirty field in the trapframe is now also usable in userland.
This was previously not the case because ndirty also includes the
space occupied by NaT collections. The value could be off by 8,
depending on the discontinuity. Now that the RNAT bit index is
contants, we have exactly the same number of NaT collection points
on the kernel stack as we would have had on the user stack if we
didn't switch backing stores.
3. Debuggers and other applications that use ptrace(2) can now copy
the dirty registers from the kernel stack (using ptrace(2)) and
copy them whereever they want them (onto the user stack of the
inferior as might be the case for gdb) without having to worry
about NaT collections in the same way the kernel doesn't have to
worry about them.
There's a second order effect caused by the randomization of the
base of the backing store, for it depends on the number of dirty
registers the processor happened to have at the time of entry into
the kernel. The second order effect is that the RSE will have a
better cache utilization as compared to having the backing store
always aligned at page boundaries. This has not been measured and
may be in practice only minimally beneficial, if at all measurable.
license. Only clause 3 has been revoked. Restore the fourth clause
as clause 3.
Pointed out by: das@
Remove my name as a copyright holder since I don't use a BSD license
compatible or comparable to the UCB license. I choose not to add a
complete second license for my work for aesthetic reasons, nor to
replace the UCB license on grounds of rewriting more than 90% of the
source files. The rewrite can also be seen as an enhancement and since
the files were practically empty, it's rather trivial to have changed
90% of the files.
the maximum number of pages for buffers) return -1 instead of 0.
This fixes a panic under conditions when many mbufs are needed.
Update the head pointer of the receive buffer pool queue even when
we could not supply a buffer to the chip. Otherwise the chip will
not re-interrupt us for another try. A better strategy would probably
be to remember this condition and to supply buffers without an interrupt
as soon as buffers get available.
Contributed by: Thomaswuerfl@gmx.de
- In sched_prio(), adjust the run queue for threads which may need to move
to the current queue due to priority propagation .
- In sched_switch(), fix style bug introduced when the KSE support went in.
Columns are 80 chars wide, not 90.
- In sched_switch(), Fix the comparison in the idle case and explicitly
re-initialize the runq in the not propagated case.
- Remove dead code in sched_clock().
- In sched_clock(), If we're an IDLE class td set NEEDRESCHED so that threads
that have become runnable will get a chance to.
- In sched_runnable(), if we're not the IDLETD, we should not consider
curthread when examining the load. This mimics the 4BSD behavior of
returning 0 when the only runnable thread is running.
- In sched_userret(), remove the code for setting NEEDRESCHED entirely.
This is not necessary and is not implemented in 4BSD.
- Use the correct comparison in sched_add() when checking to see if an idle
prio task has had it's priority temporarily elevated.
instead of retrying them blindly.
This should fix some of the problems people have been having with cdrom
drives taking a long time to probe. This should also eliminate the need
for the initial TUR in cdsize().
cam_periph.c: Don't keep retrying if the error we get back is a fatal
error. This should help us detect the transition from
"Logical unit not ready, cause not reportable" to "Medium
not present" in the "TUR many" handler. (The TUR many
handler gets triggered for Logical unit not ready, cause
not reportable errors.)
scsi_cd.c: Remove the initial test unit ready in cdsize(). Hopefully
it isn't necessary after the above change.
Submitted by: gibbs (mostly)
Tested by: peter
MFC After: 2 weeks
added for XFree86. There are 2 reasons for doing this with sysarch():
1. The memory mapped I/O space is not at a fixed physical address. An
application has to use some interface to get the base address. It
gets worse if the machine has multiple memory mapped I/O spaces.
2. Access to the memory mapped I/O space needs to happen through a
translation that is flagged as uncachable. There's no interface
that allows a process to do uncached memory I/O, other than though
/dev/mem (possibly).
So, until we either disallow direct access to I/O or bus space from
userland or have a better way of doing this, sysarch() has the least
negative impact on existing interfaces.
of lock acquires and releases performed.
- Move an assertion from vm_object_collapse() to vm_object_zdtor()
because it applies to all cases of object destruction.
Make the pccard attachment work with NEWCARD
Start locking of the driver, but only the macros are defined right now
Tested on: Megahertz CC10BT/2
# (These cards are very popular on ebay these days, and run < $10 including
# shipping from some sellers).
routines. Otherwise we run into trouble with speculative tlb preloads
on SMP systems. This effectively defeats Jeff's revision 1.438
optimization (for his pentium4-M laptop) in the SMP case. It breaks
other systems, particularly athlon-MP's.
if we do acquire an advisory lock, great! We'll release it later.
However, if we fail to acquire a lock, we perform the coredump
anyway. This problem became particularly visible with NFS after
the introduction of rpc.lockd: if the lock manager isn't running,
then locking calls will fail, aborting the core dump (resulting in
a zero-byte dump file).
Reported by: Yogeshwar Shenoy <ynshenoy@alumni.cs.ucsb.edu>
type, rather than "object_label" as the first argument. This reduces
complexity a little for the consumer, and also makes it easier for
use to rename the underlying entry points in struct mac_policy_obj.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
in bufs_info sysctl handler. dev->dma and dev->dma_lock existence are
protected by DRM_LOCK(). Fixes panic on sysctl hw.dri when the device is
uninitialied (when you aren't in X).
- Add a DDB function to dump the contents of an ithread and optionally
details about each handler in that ithread. This function can be used
by MD code to implement DDB commands that display information about
interrupt sources and their registered handlers.
the ACPI timer and we shouldn't do that if ACPI is already around to do
that for us.
- Set a description and tweak the order of checks in the probe function
to more closely match other PCI drivers.
This should probably be moved to sys/dev/piix/piix.c at some point and
turned on for all i386 kernels rather than just SMP ones.
(aka RFC2292bis). Though I believe this commit doesn't break
backward compatibility againt existing binaries, it breaks
backward compatibility of API.
Now, the applications which use Advanced Sockets API such as
telnet, ping6, mld6query and traceroute6 use RFC3542 API.
Obtained from: KAME
dcons(4): very simple console and gdb port driver
dcons_crom(4): FireWire attachment
dconschat(8): User interface to dcons
Tested with: i386, i386-PAE, and sparc64.
giant, which implies that we need to take out giant it we're NOT
MPSAFE.
# I can't believe the number of people that looked at this failed to
# detect this.
vm_pageout_page_stats() from Giant.
- Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the
vm object to be locked on entry. (All of the pager routines now expect
this.)
that at most 20% of sockets can be in time_wait at one time, ensuring
that time_wait sockets do not starve real connections from inpcb
structures.
No implementation change is needed, jlemon already implemented a nice
LRU-ish algorithm for tcp_tw structure recycling.
This should reduce the need for sysadmins to lower the default msl on
busy servers.
overlapping TR/TC entries (which results in a machine check). Note
that we don't look at the size of the memory descriptor, because
it doesn't guarantee non-overlap.
With this change, a UP kernel could boot on a Intel Tiger4 machine
with the following options:
options LOG2_ID_PAGE_SIZE=26 # 64M
options LOG2_PAGE_SIZE=14 # 16K
Approved by: marcel
setting. Make va_copy be an alias if __ISO_C_VISIBLE >= 1999.
Why? more than a few ports have an autoconf that looks for __va_copy
because it is available on glibc. It is critical that we use it if
at all possible on amd64. It generally isn't a problem for i386 and its
ilk because autoconf driven code tends to fall back to an assignment.
locking, and the apparently unnecessary locking for -stable has been removed.
This may fix issues with missed interrupts since April, which manifested
themselves as slowdowns or hangs in radeon, in particular. Many cleanups also
took place. In the shared code, there are improvements to r128 driver
stability.
when loaded as a module
o cleanup data structures on module unload when no application has
been started (i.e. kldload, kldunload w/o mrtd)
o remove extraneous unlocks immediately prior to destroying them
Supported by: FreeBSD Foundation
part of the support is that it still assumes one master and one target
where as AGP 3.0 actually supports multiple devices on the bus.
Submitted by: Keith Whitwell <keith@tungstengraphics.com>
Sponsored by: The Weather Channel
a minor conflict):
o Use ETHER_ADDR_LEN in preference to '6'.
o Remove two unnecessary (caddr_t) casts. One of them causes problems in
my tree where etherbroadcastaddr is const, and (caddr_t) casts the const
away.
we had were bogus.
While here, reassign the copyright to the Project. There's nothing
in this files that originates from NetBSD, especially now that the
FreeBSD/alpha bits have been removed, but even then the amount of
inherited code that we actually used was nil.
mcontext_t for the register values. Currently only ld8 and ldfd
instructions are handled as those are the ones we need now (a
misaligned ld8 occurs 4 times in ntpd(8) and a misaligned ldfd
occurs once in mozilla 1.4 and 1.5). Other instructions are added
when needed.
at the first address and spills it to the second address. This
allows unaligned_fixup() to update the context of the process in
a way that assures proper rounding.
Similar functions for single-and extended-precision are added when
needed.
in that it provides an abstract (intermediate) representation for
instructions. This significantly improves working with instructions
such as emulation of instructions that are not implemented by the
hardware (e.g. long branch) or enhancing implemented instructions
(e.g. handling of misaligned memory accesses). Not to mention that
it's much easier to print instructions.
Functions are included that provide a textual representation for
opcodes, completers and operands.
The disassembler supports all ia64 instructions defined by revision
2.1 of the SDM (Oct 2002).
implement i386 compat numbers where it makes sense. This would save a
syscall translation layer. Yes, this breaks the abi slightly again, but
fortunately its just a recompile rather than tweaking the source. I will
be fixing the libc stubs while I'm here.
to a multiple of the access byte width. This overcomes errors in the
AML often found in Toshiba laptops. These errors were allowed by
the Microsoft ASL compiler and interpreter. This will NOT be imported
by ACPI-CA so make the change on our local branch. File was already off
the vendor branch.
Submitted by: blaz
Original idea: Rick Richardson for Linux
enable strict checks of the AML. Our default behavior will be to relax
checks to work on as many platforms as possible. Also clean up and document
other ACPI options while I'm here.
Include src/sys/security/mac/mac_internal.h in kern_mac.c.
Remove redundant defines from the include: SYSCTL_DECL(), debug macros,
composition macros.
Unstaticize various bits now exposed to the remainder of the kernel:
mac_init_label(), mac_destroy_label().
Remove all the functions now implemented in mac_process/mac_vfs/mac_net/
mac_pipe. Also remove debug counters, sysctls exporting debug
counters, enforcement flags, sysctls exporting enforcement flags.
Leave module declaration, sysctl nodes, mactemp malloc type, system
calls.
This should conclude MAC/LINT/NOTES breakage from the break-out process,
but I'm running builds now to make sure I caught everything.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Unstaticize mac_late.
Remove ea_warn_once, now in mac_vfs.c.
Unstaticisize mac_policy_list, mac_static_policy_list, use
struct mac_policy_list_head instead of LIST_HEAD() directly.
Unstaticize and un-inline MAC policy locking functions so they can
be referenced from mac_*.c.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
security/mac/mac_net.c
security/mac/mac_pipe.c
security/mac/mac_process.c
security/mac/mac_system.c
security/mac/mac_vfs.c
Note: Here begins a period of NOTES/LINT build breakage due to duplicate
symbols that will shortly be removed from kern_mac.c.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Extended attribute transaction warning flag if transactions aren't
supported on the EA implementation being used.
Debug fallback flag to permit a less conservative fallback if reading
an on-disk label fails.
Enforce_fs toggle to enforce file systme access control.
Debugging counters for file system objects: mounts, vnodes, devfs_dirents.
Object initialization, destruction, copying, internalization,
externalization, relabeling for file system objects.
Life cycle operations for devfs entries.
Generic extended attribute label implementation for use by UFS, UFS2 in
multilabel mode.
Generic single-level label implementation for use by all file systems
when in singlelabel mode.
Exec-time transition based on file label entry points.
Vnode operation access control checks (many).
Mount operation access control checks (few).
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Pipe enforcement flag.
Pipe object debugging counters.
MALLOC type for MAC label storage.
Pipe MAC label management routines, externalize/internalization/change
routines.
Pipe MAC access control checks.
Un-staticize functions called from mac_set_fd() when operating on a
pipe. Abstraction improvements in this space seem likely.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Network and socket enforcement toggles.
Counters for network objects (mbufs, ifnets, bpfdecs, sockets, and ipqs).
Label management routines for network objects.
Life cycle events for network objects.
Label internalization/externalization/relabel for ifnets, sockets,
including ioctl implementations for sockets, ifnets.
Access control checks relating to network obejcts.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
in mac_internal.h:
Sysctl tree declarations.
Policy list structure definition.
Policy list variables (static, dynamic).
mac_late flag.
Enforcement flags for process, vm, which have checks in multiple files.
mac_labelmbufs variable to drive conditional mbuf labeling.
M_MACTEMP malloc type.
Debugging counter macros.
MAC Framework infrastructure primitives, including policy locking
primitives, kernel label initialization/destruction, userland
label consistency checks, policy slot allocation.
Per-object interfaces for objects that are internalized and externalized
using system calls that will remain centrally defined: credentials,
pipes, vnodes.
MAC policy composition macros: MAC_CHECK, MAC_BOOLEAN, MAC_EXTERNALIZE,
MAC_INTERNALIZE, MAC_PERFORM.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
vm_pageout_scan(). Rationale: I don't like leaving a busy page in the
cache queue with neither the vm object nor the vm page queues lock held.
- Assert that the page is active in vm_pageout_page_stats().
Until we can have perfect knowledge that all callers above us think it's okay
for us to sleep, releasing *our* locks of course, we don't dare try and sleep.
in connection with Marvell based SATA->PATA dongles.
The problem was caused by a combination of things working
together to make it hard to spot...
The ATA driver has always started the ATA command, then build
the SG list for DMA and then finally started the DMA engine.
While this is according to specs, it poses a potential
problem as some controllers apparently do not allow for unlimitted
time between starting the ATA command and starting the DMA engine.
At about the same time as ATAng was committed there were lots
of other changes applied, some of which was locking in parts
that causes the busdma load functions to take significantly
longer to load the SG list.
This pushed the time spent between starting the ATA command and
starting the DMA engine over the hill for some controllers
(especially the Silicon Image DS3112a) and caused what looked
like lost interrupts.
The solution is to get all the SG list work or rather all
busdma related stuff done before we even try to start anything.
This has the nice side effect of seperating busdma out the
way it should be, so the working of the ATA machinery is not
cluttered up with busdma droppings, making the code easier
to read and understand.
sysctl that a given variable is tunable.
Also added is CTLFLAG_RDTUN, which is CTLFLAG_RD|CTLFLAG_TUN; TUN does
not always imply read-only, so RDTUN should be used where RD was used
before.
for dev_strategy() use.
Retire bio_driver[12] (aliases for b_io.bio_driver[12]) these fields are
reserved for device driver use and can as such never have any interest
in the buf end of things.
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in svr4_emul_find(),
and clean_pipe().
- Avoid integer overflow on large nfds argument in svr4_sys_poll()
- Reject negative nbytes argument in svr4_sys_getdents()
- Don't copy out past the end of the struct componentname
pathname buffer in svr4_sys_resolvepath()
- Reject out-of-range signal numbers in svr4_sys_sigaction(),
svr4_sys_signal(), and svr4_sys_kill().
- Don't malloc() user-specified lengths in show_ioc() and
show_strbuf(), place arbitrary limits instead.
- Range-check lengths in si_listen(), ti_getinfo(), ti_bind(),
svr4_do_putmsg(), svr4_do_getmsg(), svr4_stream_ti_ioctl().
Some fixes obtain from OpenBSD.
pick up the DEVFS inode number from the dev_t and find our directory
entry from that, we don't need to scan the directory to find it.
This also solves an issue with on-demand devices in subdirectories.
Submitted by: cognet
by libguile that needs to know the base of the RSE backing store. We
currently do not export the fixed address to userland by means of a
sysctl so user code needs to hardcode it for now. This will be revisited
later.
The RSE backing store is now at the bottom of region 4. The memory stack
is at the top of region 4. This means that the whole region is usable
for the stacks, giving a 61-bit stack space.
Port: lang/guile (depended of x11/gnome2)
chain passed into dc_encap, which dc_start was unaware of. This caused
the old (now invalid) mbuf to be passed to BPF_MTAP.
Spotted by: Kenjiro Cho <kjc@csl.sony.co.jp>
is non-free. (More checks can/should be added in the future.)
Use M_ASSERTVALID in BPF_MTAP so that we catch when freed mbufs are
passed in, even if no bpf listeners are active.
Inspired by a bug in if_dc caught by Kenjiro Cho.
rijndael_blockDecrypt() as both input and output.
This property is important because inside rijndael we can get away
with allocating just a 16 byte "work" buffer on the stack (which
is very cheap), whereas the calling code would need to allocate the
full sized buffer, and in all likelyhood would have to do so with
an expensive malloc(9).
table, acquiring the necessary locks as it works. It usually returns
two references to the new descriptor: one in the descriptor table
and one via a pointer argument.
As falloc releases the FILEDESC lock before returning, there is a
potential for a process to close the reference in the file descriptor
table before falloc's caller gets to use the file. I don't think this
can happen in practice at the moment, because Giant indirectly protects
closes.
To stop the file being completly closed in this situation, this change
makes falloc set the refcount to two when both references are returned.
This makes life easier for several of falloc's callers, because the
first thing they previously did was grab an extra reference on the
file.
Reviewed by: iedowse
Idea run past: jhb
to the object's type field and the call to vm_pageout_flush() are
synchronized.
- The above change allows for the eliminaton of the last parameter
to vm_pageout_flush().
- Synchronize access to the page's valid field in vm_pageout_flush()
using the containing object's lock.
- Specifying VM_MAP_WIRE_HOLESOK should not assume that the start
address is the beginning of the map. Instead, move to the first
entry after the start address.
- The implementation of VM_MAP_WIRE_HOLESOK was incomplete. This
caused the failure of mlockall(2) in some circumstances.
Use EP_{READ,WRITE}{,_MULTI}_{1,2,4} instead. I've had several people
submit patches like this over the years of varying qualities, markm
being the last. The names were chosen in consulation with mdodd on
irc.
I've tested this with only PCMCIA cards: 3CCE589EC and 3CCSH572BT.
I've not tried with my more extensive ISA, EISA and cbus collection.
Reviewed by: mdodd
the point where it being a macro is no longer sensible, and it will
only be more so in days to come.
BIO_STRATEGY() is now only used from DEV_STRATEGY() and should not
be used directly anymore.
Put the contents of both in the new function dev_strategy() and
make DEV_STRATEGY() call that function.
In addition, this allows us to make the rather magic bufdonebio()
helper function static.
This alse saves hunderedandsome bytes of code in a typical kernel.
Though this is still incomplete and has some missing features such as
exclusive login and event notification, it may be enough for someone
who wants to play with it.
This driver is supposed to work with firewire(4), targ(4) of CAM(4)
and scsi_target(8) which can be found in /usr/share/example/scsi_target.
This driver doesn't require sbp(4) which implements initiator mode.
Sample configuration:
Kernel: (you can use modules as well)
device firewire
device scbus
device targ
device sbp_targ
After reboot:
# mdconfig -a -t malloc -s 10m
md0
# scsi_target 0:0:0 /dev/md0
(Assuming sbp_targ0 on scbus0)
You should find the 10MB HDD on FreeBSD/MacOS X/WinXP or whatever connected
to the target using FireWire.
Manpage is not finished yet.
- Change type of target->luns to allocate an array of LUNs dynamically.
This allows targets to change their number of LUNs after each bus reset.
- Serialize ORB POINTER command for each LUN.
- Improve debug messages.
definition structure. Define one flag, CN_FLAG_NODEBUG, which
indicates the console driver cannot be used in the context of the
debugger. This may be used, for example, if the console device
interacts with kernel services that cannot be used from the
debugger context, such as the network stack. These drivers are
skipped over for calls to cn_checkc() and cn_putc(), and the
calling function simply moves on to the next available console.
- Correct the logic for the AIF array index pointers so that correct slot is
always looked at.
- Copy the full FIB payload size when copying AIF's, not just the first 64
bytes.
Thanks to Mirapoint, Inc, for pointing these problems out and offering a
solution.
a fair bit of difference to the power consumption and lets my cpu cool
down enough for the temperature sensitive fan controller to completely
stop the cpu fan at times.
halt state that minimizes power consumption while still preserving
cache and TLB coherency. Halting the processor is not conditional at
this time. Tested with UP and SMP kernels.
address has been changed when PFIL_HOOKS is enabled and, if it has,
arrange for the proper action by ip*_forward.
Submitted by: Pyun YongHyeon
Supported by: FreeBSD Foundation
address has been changed when PFIL_HOOKS is enabled and, if it has,
arrange for the proper action by ip*_forward.
Supported by: FreeBSD Foundation
Submitted by: Pyun YongHyeon
Xcpustop(). %es is used in at least the call to savectx() when savectx()
calls bcopy(), so not loading it was fatal if a stop IPI interrupts
user mode.
This reduces bugs starting and stopping CPUs for debuggers. CPUs are
stopped mainly in kdb_trap() and cpu_reset(). At reset time there is
a good chance that all the CPUs are in the kernel, so the bug was
probably harmless then.
classes and if a method is not found in a given class, its base classes
are searched (in the order they were declared). This search is recursive,
i.e. a method may be define in a base class of a base class.
* Change the kobj method lookup algorithm to one which is SMP-safe. This
relies only on the constraint that an observer of a sequence of writes
of pointer-sized values will see exactly one of those values, not a
mixture of two or more values. This assumption holds for all processors
which FreeBSD supports.
* Add locking to kobj class initialisation.
* Add a simpler form of 'inheritance' for devclasses. Each devclass can
have a parent devclass. Searches for drivers continue up the chain of
devclasses until either a matching driver is found or a devclass is
reached which has no parent. This can allow, for instance, pci drivers
to match cardbus devices (assuming that cardbus declares pci as its
parent devclass).
* Increment __FreeBSD_version.
This preserves the driver API entirely except for one minor feature used
by the ISA compatibility shims. A workaround for ISA compatibility will
be committed separately. The kobj and newbus ABI has changed - all modules
must be recompiled.
rounding errors. This was the source of the majority of the
interactivity problems. Reintroduce the old algorithm and its XXX.
- Up the interactivity threshold to 30. It really could stand to be even
a tiny bit higher.
- Let the sleep and run time accumulate up to 5 seconds of history rather
than two. This helps stop XFree86 from becoming non-interactive during
bursts of activity.
trashed after being freed. This has caused several panics including
kern/42277 related to soft updates. Jim Kuhn tracked the problem
down to ipfw limit rule processing. In the expiry of dynamic rules,
it is possible for an O_LIMIT_PARENT rule to be removed when it still
has live children. When the children eventually do expire, a pointer
to the (long gone) parent is dereferenced and a count decremented.
Since this memory can, and is, allocated for other purposes (in the
case of kern/42277 an inodedep structure), chaos ensues. The offset
in question in inodedep is the offset of the 16 bit count field in
the ipfw2 ipfw_dyn_rule.
Submitted by: Jim Kuhn <jkuhn@sandvine.com>
Reviewed by: "Evgueni V. Gavrilov" <aquatique@rusunix.org>
Reviewed by: Ben Pfountz <netprince@vt.edu>
MFC after: 1 week
passes the fdidx from VOP_OPEN down.
This is for all I know the final API for this functionality, but
the locking semantics for messing with the filedescriptor from
the device driver are not settled at this time.
Discussed in from [FreeBSD-tech-jp 3396] to [FreeBSD-tech-jp 3407]
at FreeBSD-tech-jp@jp.freebsd.org.
NOTE: We must put ed_probe_SIC() function into if_ed_isa.c because
this is a bus dependent code. But the ed driver code is not
separated explicitly whether it is bus dependent or independent
now.
Refer to: http://plaza17.mbn.or.jp/~chi/myprog/FreeBSD/sicat.html
Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)
every page. If the source entry was read-only, one or more wired pages
could be in backing objects.
- vm_fault_copy_entry() should not set the PG_WRITEABLE flag on the page
unless the destination entry is, in fact, writeable.
elevated either due to priority propagation or because we're in the
kernel in either case, put us on the current queue so that we dont
stop others from using important resources. At some point the priority
elevations from sleeping in the kernel should go away.
- Remove an optimization in sched_userret(). Before we would only set
NEEDRESCHED if there was something of a higher priority available. This
is a trivial optimization and it breaks priority propagation because it
doesn't take threads which we may be blocking into account. Notice that
the thread which is blocking others gets up to one tick of cpu time before
we honor this NEEDRESCHED in sched_clock().
lock around a call to the original function. Make the timeout
function in callout_reset() use the wrapped function to avoid a
lock assertion panic.
Reviewed by: sam
Reported by: cgiordano@ids.net
sigreturn() ABI and the signal context on the stack.
Make the trapframe (and its shadows in the ucontext and sigframe etc)
8 bytes larger in order to preserve 16 byte stack alignment for the
following C code calls. I could have done some padding after the
trapframe was saved, but some of the C code still expects an argument of
'struct trapframe'. Anyway, this gives me a spare field that can be used
to store things like 'partial trapframe' status or something else in
the future.
The runtime impact is fairly small, *except* for threaded apps and things
that decode contexts and the signal stack (eg: cvsup binary). Signal
delivery isn't too badly affected because the kernel generates the
sigframe that sigreturn uses after the handler has been called.
The size of mcontext_t and struct sigframe hasn't changed. Only
the last few fields (sc_eip etc) got moved a little and I eliminated
a spare field. mc_len/sc_len did change location though so the
sanity checks there will still trap it.
- Make multicast work
- Fix (some of) the watchdog timeouts after card reset
- Add support for CE2, CEM28 and CEM33 cards
- General code cleanup
Any card that worked previously should still work, as well as a lot that
didn't.
The driver is not yet style(9) compliant; those changes are forthcoming,
once the functional changes are done.
PR: kern/50644
Reviewed by: imp
Approved by: imp
I changed. That is never a good sign.
1) only map 1 page at address zero, not 4096 pages
2) page 1 starts at address 4096 (PAGE_SIZE) not 4095 (PAGE_MASK). I
don't even want to think what the pte's looked like.
3) subtract the r/o page group start address from the end before
converting it to a count. Otherwise an extra page is mapped.
If you were affected by this, the symptoms of this was a hang at boot
after the spinner. Sorry folks. :-(
"You broke my laptop!" by: sam
accesses softc after it is freed. Use a different malloc type for
softc than the rest of the bus code to make it more clear when these
things happen that it is the driver that's at fault, not the bus code.
Suggested by: sam and/or phk (I think)
timeout would continue to happen: boom! Fix this[*] by timing out earlier.
[*] almost fixes the race on unload: wi_inquire could be running when
untimeout is called, and there's no way to know when it has actually
returned. This race is very rare and hard to lose.
Submitted by: scottl
seeded with arc4random rather than calling arc4random for each
packet. Note this is the same algorithm used to select the IV when
doing WEP on the host.
o don't grab the mutex at the top of ath_detach; it does nothing
useful
o deal with entry to ath_ioctl during detach to disable promiscuous
mode as a result of calling bpfdetach2: cannot call ath_init when
the device is marked invalid as the code isn't prepared to deal
with it (in particular by that time the hal reference may have
been yanked)
change ath_rate_ctl_reset to handle transition from station
mode to adhoc mode; was not resetting the initial xmit rate
causing outbound frames to be dicarded
use because a kernel thread is borrowing it. The borrowed page table
can change spontaneously, making any dependence on its continued use
subject to a race condition.
- _pmap_unwire_pte_hold() cannot use pmap_is_current(): If a change is
made to a page table page mapping for a borrowed page table, the TLB
must be updated.
In collaboration with: tegge
you on the current queue. In the future, it would be nice if priority
propagation could deterministicly pluck a thread off of the next queue
and put it on the current queue. Until then this hack stops us from
holding up our entire current queue, including interrupt handlers, while
a thread on the next queue is blocked while holding Giant.
- Inherit our pctcpu information from our parent.
- correct signedness mixups.
- log fix.
- preparation for 64bit sequence number.
introduce SA id (unique ID for SA - SPI is useless as duplicated
SPI is allowed)
- no need to malloc/free cksum buffer.
Obtained from: KAME
kqueue write events on a socket and you regularly create tons of pipes
which overwrites the structure causing a panic when removing the knote
from the list. If the peer has gone away (and it's a write knote), then
don't bother trying to remove the knote from the list.
Submitted by: Brian Buchanan and myself
Obtained from: nCircle
- Return NULL instead of returning memory outside of the stackgap
in stackgap_alloc() (FreeBSD-SA-00:42.linux)
- Check for stackgap_alloc() returning NULL in ibcs2_emul_find();
other calls to stackgap_alloc() have not been changed since they
are small fixed-size allocations.
- Replace use of strcpy() with strlcpy() in exec_coff_imgact()
to avoid buffer overflow
- Use strlcat() instead of strcat() to avoid a one byte buffer
overflow in ibcs2_setipdomainname()
- Use copyinstr() instead of copyin() in ibcs2_setipdomainname()
to ensure that the string is null-terminated
- Avoid integer overflow in ibcs2_setgroups() and ibcs2_setgroups()
by checking that gidsetsize argument is non-negative and
no larger than NGROUPS_MAX.
- Range-check signal numbers in ibcs2_wait(), ibcs2_sigaction(),
ibcs2_sigsys() and ibcs2_kill() to avoid accessing array past
the end (or before the start)
parameter in the read and write case dereferenced an unitialized
pointer and can't possibly ever have catched an actual invalid
argument.
This was apparently true for the read/write and getconf cases. The
latter does not even receive the paramter that is to be verified.
I'm surprised that this did not cause kernel panics, but it seems
that the uninitialized local variable happens to contain data that
may be used as a pointer to memory that satisfies the test condition.
Make the code work as intended by moving the test inside the switch
case where the pointer has been properly initialized.
Since the read and write case shared just about all code (except
for the single call to PCIB_READ_CONFIG resp. PCIB_WRITE_CONFIG) I
have merged both cases.
Noticed by: trhodes@FreeBSD.org (Tom Rhodes)
- Allocate storage for uap->msg always because it is copyin()'ed in
native sendmsg().
- Convert sockopt level from Linux to FreeBSD after native recvmsg() calling.
- Some cleanups.
Tested with: Oracle 9i shared server connection mode.
MFC after: 1 week
o correct recursive locking when polling and in em_82547_move_tail
o destroy mutex on detach
o add EM_LOCK_ASSERT and similar macros for creating+deleteing the mtx
Submitted by: Daniel Eischen <eischen@vigrid.com>
beasts which are reported to exist in both Atmel and Prism2 flavours. In
particular, Itronix branded laptops have the Atmel part with an Intersil
radio.
Obtained from: NetBSD
from UWX_REG_MUMBLE to UWX_REG_AR_MUMBLE. Compatibility defines are
present in libuwx. Change the names here so that we don't depend on
compatibility defines.
Note that there's now an UWX_REG_PFS and an UWX_REG_AR_PFS and the
former is not a compatibility define for the latter AFAICT. Change
to UWX_REG_AR_PFS as that seems to be the one we need to handle.
all the fixes locally applied and submitted to the author. Not
included in BETA 5, but part of this import are:
o FreeBSD specific ifdefs to make this compile within a kernel.
These are limited to include directives and defines.
o Removal of unused variables, proper casts and initializations
to allow building with -Werror. This happens in code so has a
higher chance of causing future import conflicts but not enough
to worry about it.
I'm especially thankful that the author accepted the change to
replace DISABLE_TRACE with UWX_TRACE_ENABLE so that we can use it
in kernel config files without nasty mappings or indirections as
that would make the integration less perfect. Thanks Cary!
an uninitialized sysctl_ctx, using flag DA_FLAG_SCTX_INIT. This
prevents a panic encoutered with some umass units that probe correctly
but fail to attach. Same problem, and same fix, as scsi_cd.c rev. 1.86.
Reviewed by: njl, ken
pmap_copy_page() et al. to accept a vm_page_t rather than a physical
address. Also, this change will facilitate locking access to the vm page's
valid field.
has been initialized.
(cdsysctlinit): Set flag CD_FLAG_SCTX_INIT after sysctl_ctx has been
initialized.
This resolves a panic encountered when a cd drive is sucessfully probed
but fails to attach.
Reviewed by: ken
o minor optimization of cardbus_cis processing. Remove a bunch of generic
entries that are handled by generic.
o no longer need the card_get_type stuff.
This MIB specifies how many bus resets should be observed before the
lost device entry is removed. The default value is 3.
You can set this value to 0 if you want a SBP device to be detached from CAM
layer as soon as the device is physically detached like USB.
routine of its own, and allows us to move the indentation back two
layers making the code more readable.
delete a prototype that should have been killed years ago in pccardvar.h.
# adding quirks here is way harder than it needs to be. :-(
In unodered excution case, we cannot detect link-chain end only
by prev == NULL if lastest ORB is executed earlyer than the former
ORBs. Use ORB_LINK_DEAD flag for this case.
- Don't reset agent for management ORB.
- Improve debug messages.
Spotted by: sbp target mode
a long-time bug: vm_pager_get_pages() assumes that m[reqpage] contains a
valid page upon return from pgo_getpages(). In the case of the device
pager this page has been freed and replaced by a fake page. The fake page
is properly inserted into the vm object but m[reqpage] is left pointing
to a freed page. For now, update m[reqpage] to point to the fake page.
Submitted by: tegge
caused snapshot related problems.
- The vp can not be NULL here or we would panic in vfs_bio_awrite(). Stop
confusing the logic by checking for it in several places.
Submitted by: kirk and then rototilled by me to remove vp == NULL checks.
for 21143 based cards which use SIA mode.
This fixes 10mbit mode for ZNYX ZX346Q cards and other
21143 based cards.
PR: 32118
Submitted by: Rene de Vries <rene@tunix.nl>
Geert Jan de Groot <GeertJan.deGroot@tunix.nl>
Obtained from: BSDI
MFC after: 2 weeks
VOP_INACTIVE routines need not worry about their vnode getting
recycled if they block. Remove the code from nfs_inactive() that
used vget() to get an extra vnode reference that was held during
the nfs_vinvalbuf() call.
so make the code slightly more uniform. The vnode lock is acquired in
all cases and now the only difference between VCHR and other is we
call UFS_UPDATE instead of VOP_FSYNC().
Use pre-emption detection to avoid the need for wiring a userland buffer
when copying opaque data structures.
sysctl_wire_old_buffer() is now a no-op. Other consumers of this
API should use pre-emption detection to notice update collisions.
vslock() and vsunlock() should no longer be called by any code
and should be retired in subsequent commits.
Discussed with: pete, phk
MFC after: 1 week
go away in due course. Involuntary pre-emption means that we can't count
on wiring of pages alone for consistency when performing a SYSCTL_OUT()
bigger than PAGE_SIZE.
Discussed with: pete, phk
- Slightly rewrite the fsync loop to be more lock friendly. We must
acquire the vnode interlock before dropping the mnt lock. We must
also check XLOCK to prevent vclean() races.
- Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean()
races.
- Use a local variable to store the results of the nvp == TAILQ_NEXT
test so that we do not access the vp after we've vrele()d it.
- Add an XXX comment about UFS_UPDATE() not being protected by any lock
here. I suspect that it should need the VOP lock.
LK_RETRY either, we don't want this vnode if it turns into another.
- Remove the code that checks the mount point after acquiring the lock
we are guaranteed to either fail or get the vnode that we wanted.
- In vtryrecycle() try to vgonel the vnode if all of the previous checks
passed. We won't vgonel if someone has either acquired a hold or usecount
or started the vgone process elsewhere. This is because we may have been
removed from the free list while we were inspecting the vnode for
recycling.
- The VI_TRYLOCK stops two threads from entering getnewvnode() and recycling
the same vnode. To further reduce the likelyhood of this event, requeue
the vnode on the tail of the list prior to calling vtryrecycle(). We can
not actually remove the vnode from the list until we know that it's
going to be recycled because other interlock holders may see the VI_FREE
flag and try to remove it from the free list.
- Kill a bogus XXX comment. If XLOCK is set we shouldn't wait for it
regardless of MNT_WAIT because the vnode does not actually belong to
this filesystem.
purge, the purge in vclean, and the filesystems purge, we had 3 purges
per vnode.
- Move the insmntque(vp, 0) to vclean() so that we may remove it from the
two vgone() functions and reduce the number of lock operations required.
whether or not the sync failed. This could potentially get set between
the time that we VOP_UNLOCK and VI_LOCK() but the race would harmelssly
lead to the sync being delayed by an extra 30 seconds. If we do not move
the vnode it could cause an endless loop if it continues to fail to sync.
- Use vhold and vdrop to stop the vnode from changing identities while we
have it unlocked. Other internal vfs lists are likely to follow this
scheme.
- Create a new function, vgonechrl(), which performs vgone for an in-use
character device. Move the code from vflush() that did this into
vgonechrl().
- Hold the xlock across the entirety of vgonel() and vgonechrl() so that
at no point will an invalid vnode exist on any list without XLOCK set.
- Move the xlock code out of vclean() now that it is in the vgone*()
functions.
work in, but we had it mapped read-only. While this has always been the
case, the PG_PS enable hack hid it and the apm bios code ended up taking
advantage of it.
This is so that we may grab the interlock while still holding the
sync_mtx. We have to VI_TRYLOCK() because in all other cases the lock
order runs the other way.
- If we don't meet any of the preconditions, reinsert the vp into the
list for the next second.
- We don't need to panic if we fail to sync here because each FSYNC
function handles this case. Removing this redundant code also
simplifies locking.
will not actually be set even though we're calling sosetopt. sosetopt
calls down to a single ctloutput function if the name or level is
implemented by a specific protocol.
Submitted by: pete@isilon.com
fail. Remove the panic from that case and document why it might fail.
- Document the reason for calling cache_purge() on a newly created vnode.
- In insmntque() order the operations so that we can call mtx_unlock()
one fewer times. This makes the code somewhat clearer as well.
- Add XXX comments in sched_sync() and vflush().
- In vget(), do not sleep while waiting for XLOCK to clear if LK_NOWAIT is
set.
- In vclean() we don't need to acquire a lock around a single TAILQ_FIRST
call. It's ok if we race here, the vinvalbuf will just do nothing.
- Increase the scope of the lock in vgonel() to reduce the number of lock
operations that are performed.
we release the mntvnode_mtx.
- Call vgonel() directly instead of going through vrecycle() since we own
the interlock now.
- Remove a few cases where we locked the interlock just so that we could
call VOP_UNLOCK with interlock held.
mntvnode_mtx.
- Use a local variable to store the results of the test to see if the
next vnode on the mount list has changed. This is so that we no longer
acess the vnode after we vput() it.
stack trace supplied by phk, I now understand what's going on here. The
check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been
partially torn down in vclean(). It is not clear that this would cause
a problem. Document this in nfs_bio.c, which is where the other two
filesystems copied this code from.
I do not yet understand why, but apm *depended* on the fact that the old
PSE code caused the first 1MB of ram to be mapped read/write because it
was in the same 4MB page as the kernel text+data+bss blob.
If anybody ever tried DISABLE_PSE before, apm would not work.
If your cpu did not have PSE, apm would not work there either (eg: 486).
This bug has been around for a Very Long Time.
The Pentium-4-fix commits did not emulate this unintended side effect of
the PSE post-early-boot fixup, and thus apm blew up. I've added a hack to
emulate the bug until either apm is fixed or we set fire to our bridges.
This is bad though because it gives kernel mode code the opportunity
to accidently write to the first few megs of the general page pool
which is remapped at KERNBASE. It needs to be fixed properly.
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.
Other/related changes:
o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts
Notes:
1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)
A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.
been widely deploy and that's causing us a lot of pain. Back out the
last commit for a few weeks so that we can lessen the support load in
current@ asking why they can't build kernels anymore. Instructions in
UPDATING have been updated, but this should be more effective.
Revert the reverting: November 1st, 2003
quantities on every other architecture.) This change is required in order
to move pmap_prefault() out of the pmap and into the machine-independent
layer.
any queued packets for the isr, process those packets before the newly
submitted packet, maintaining ordering of all packets being delivered
to the netisr. Remove the bypass counter since we don't bypass anymore.
Leave the comment about possible problems and options since later
performance optimization may change the strategy for addressing ordering
problems here.
Specifically, this maintains the strong isr ordering guarantee; additional
parallelism and lower latency may be possible by moving to weaker
guarantees (per-interface, for example). We will probably at some point
also want to remove the one instance netisr dispatch limit currently
enforced by a mutex, but it's not clear that's 100% safe yet, even in
the netperf branch.
Reviewed by: sam, others
o move route_cb to be private to rtsock.c
o replace global static route_proto by locals
o eliminate global #define shorthands for info references
o remove some register decls
o ansi-fy function decls
o move items to be close in scope to their usage
o add rt_dispatch function for dispatching the actual message
o cleanup tangled logic for doing all-but-me msg send
Support by: FreeBSD Foundation
RTF_STATIC routes. Do not check for RTF_HOST so as to avoid being DoSed
when an RTF_GENMASK route exists in the table.
Add a more verbose comment about exactly what this code does.
Submitted by: ru
frame marker) and the syscall stub frame info in the trap frame.
Previously we stored the stub frame info in (rp,pfs) and the
caller frame info in (iip,cfm). This ends up being suboptimal
for the following reasons:
1. When we create a new context, such as for an execve(2), we had
to set the (rp,pfs) pair for the entry point when using the
syscall path out of the kernel but we need to set the (iip,cfm)
pair when we take the interrupt way out. This is mostly just
an inconsistency from the kernel's point of view, but an ugly
irregularity from gdb(1)'s point of view.
2. The getcontext(2) and setcontext(2) syscalls had to swap the
(rp,pfs) and (iip,cfm) pairs to make the context compatible
with one created purely in userland.
Swapping the (rp,pfs) and (iip,cfm) pairs is visible to signal
handlers that actually peek at the mcontext_t and to gdb(1).
Since this change is made for gdb(1) and we don't care about
signal handlers that peek at the mcontext_t because we're still
a tier 2 platform, this ABI breakage is academic at this moment
in time.
Note that there was no real reason to save the caller frame info
in (iip,cfm) and the stub frame info in (rp,pfs).
validating the offset within a given memory buffer before handing the
real work off to uiomove(9).
Use uiomove_frombuf in procfs to correct several issues with
integer arithmetic that could result in underflows/overflows. As a
side-effect, the code is significantly simplified.
Add additional sanity checks when computing a memory allocation size
in pfs_read.
Submitted by: rwatson (original uiomove_frombuf -- bugs are mine :-)
Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows)
And many changes.
* all
- Major change of struct fw_xfer.
o {send,recv}.buf is splitted into hdr and payload.
o Remove unnecessary fields.
o spd is moved under send and recv.
- Remove unnecessary 'volatile' keyword.
- Add definition of rtcode and extcode.
* firewire.c
- Ignore FWDEVINVAL devices in fw_noderesolve_nodeid().
- Check the existance of the bind before call STAILQ_REMOVE().
- Fix bug in the fw_bindadd().
- Change element of struct fw_bind for simplicity.
- Check rtcode of response packet.
- Reduce split transaction timeout to 200 msec.
(100msec is the default value in the spec.)
- Set watchdog timer cycle to 10 Hz.
- Set xfer->tv just before calling fw_get_tlabel().
* fwohci.c
- Simplifies fwohci_get_plen().
* sbp.c
- Fix byte order of multibyte scsi_status informations.
- Split sbp.c and sbp.h.
- Unit number is not necessary for FIFO¤ address.
- Reduce LOGIN_DELAY and SCAN_DELAY to 1 sec.
- Add some constants defineded in SBP-2 spec.
* fwmem.c
- Introduce fwmem_strategy() and reduce memory copy.
fd_cmask field in the file descriptor structure for the first process
indirectly from CMASK, and when an fd structure is initialized before
being filled in, and instead just use CMASK. This appears to be an
artifact left over from the initial integration of quotas into BSD.
Suggested by: peter
avoid problems with some Pentium 4 cpus and some older PPro/Pentium2
cpus. There are several problems, some documented in Intel errata.
This patch:
1) moves the kernel to the second page in the PSE case. There is an
errata that says that you Must Not point a 4MB page at physical
address zero on older cpus. We avoided bugs here due to sheer luck.
2) sets up PSE page tables right from the start in locore, rather than
trying to switch from 4K to 4M (or 2M) pages part way through the boot
sequence at the same time that we're messing with PG_G.
For some reason, the pmap work over the last 18 months seems to tickle
the problems, and the PAE infrastructure changes disturb the cpu
bugs even more.
A couple of people have reported a problem with APM bios calls during
boot. I'll work with people to get this resolved.
Obtained from: bmilekic
(direct dispatch) in interrupt threads when the netisr in question
isn't already active. If a netisr is already active, or direct
dispatch is already in progress, we queue the packet for later
delivery. Previously, this option was disabled by default. I have
measured 20%+ performance improvements in IP packet forwarding with
this enabled.
Please report any problems ASAP, especially relating to stack depth or
out-of-order packet processing.
Discussed with: jlemon, peter
Sponsored by: DARPA, Network Associates Laboratories
was that accessing the status reg could occour too fast, confusing
the logic in the flash part. Could not have been located without:
HW donated by: Jonas Bülow <jonas@servicefactory.se>
prior to invalidating the TLB to be certain that the processor doesn't
keep a cached copy.
Discussed with: pete
Paniced: tegge
Pointy Hat: The usual spot
evaluating them at compile time rather than at run time. As for x86
and amd64, this requires GCC and it's enabled only if __OPTIMIZE__ is
defined (ie, if at least -O is used).
Reviewed by: jake
Their purpose is to give explicit hints to the compiler to judge
the likelyhood of a test to succeed or fail. Not all architectures
have support for such optimizations, but for those who do, it can
give a nice performance improvement in hot loops.
Obviously, this should be used very rarely in very specific code.
Reviewed by: peter
Obtained from: OpenBSD
AcpiEnterSleepState() calling a long AcpiOsStall() with interrupts
disabled. This fix will instead be added to ACPI-CA.
PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
the TLB and ~1600 if it is not. Therefore, it is more effecient to
invalidate the TLB after operations that use CMAP rather than before.
- So that the tlb is invalidated prior to switching off of a processor, we
must change the switchin functions to switchout functions.
- Remove td_switchout from the thread and move it to the x86 pcb.
- Move the code that calls switchout into swtch.s. These changes make this
optimization truely x86 specific.
then the mbuf has been consumed by a hook; otherwise beware of a null
mbuf return (gack). In particular the bridge was doing the wrong thing.
While in the ipv6 code make it's handling of pfil_run_hooks identical
to netbsd.
Pointed out by: Pyun YongHyeon <yongari@kt-is.co.kr>
change 38496
o add ipsec_osdep.h that holds os-specific definitions for portability
o s/KASSERT/IPSEC_ASSERT/ for portability
o s/SPLASSERT/IPSEC_SPLASSERT/ for portability
o remove function names from ASSERT strings since line#+file pinpints
the location
o use __func__ uniformly to reduce string storage
o convert some random #ifdef DIAGNOSTIC code to assertions
o remove some debuggging assertions no longer needed
change 38498
o replace numerous bogus panic's with equally bogus assertions
that at least go away on a production system
change 38502 + 38530
o change explicit mtx operations to #defines to simplify
future changes to a different lock type
change 38531
o hookup ipv4 ctlinput paths to a noop routine; we should be
handling path mtu changes at least
o correct potential null pointer deref in ipsec4_common_input_cb
chnage 38685
o fix locking for bundled SA's and for when key exchange is required
change 38770
o eliminate recursion on the SAHTREE lock
change 38804
o cleanup some types: long -> time_t
o remove refrence to dead #define
change 38805
o correct some types: long -> time_t
o add scan generation # to secpolicy to deal with locking issues
change 38806
o use LIST_FOREACH_SAFE instead of handrolled code
o change key_flush_spd to drop the sptree lock before purging
an entry to avoid lock recursion and to avoid holding the lock
over a long-running operation
o misc cleanups of tangled and twisty code
There is still much to do here but for now things look to be
working again.
Supported by: FreeBSD Foundation
file for vnode mappings. Note that this uses vn_fullpath() and may
be somewhat unreliable, although not too unreliable for shared
libraries. For non-vnode mappings, just print "-" for the field.
Obtained from: TrustedBSD Projects
Sponsored by: DARPA, AFRL, Network Associates Laboratories
make sure we return any allocated space to the drive. This should get
rid of a number of inconsistencies (hopefully all) that have been seen
after configuration errors.
even could call VOP_REVOKE() on vnodes associated with its dev_t's
has originated, but it stops right here.
If there are things people belive destroy_dev() needs to learn how to
do, please tell me about it, preferably with a reproducible test case.
Include <sys/uio.h> in bluetooth code rather than rely on <sys/vnode.h>
to do so.
The fact that some of the USB code needs to include <sys/vnode.h>
still disturbs me greatly, but I do not have time to chase that.
from fiddling with CS_TTGO since fiddling with CS_TTGO was removed in
rev.1.218 of the i386/isa version (which was merged with loss of history
in rev.1.223 of this version).
some symbols in X_db_search_symbol(). Reject the same symbols that
rev.1.13 did (all except STT_OBJECT and STT_FUNC), except don't reject
typeless symbols. This keeps the typeless symbols in non-verbosely
written assembler code visible, but makes file symbols invisible. ELF
file symbols have type STT_FILE and value 0, so this stops small values
and offsets sometimes being displayed in terms of the first file symbol
in the kernel (usually device_if.c). I think it rejects some other
unwanted symbols (small absolute symbols for things like struct offsets).
It may reject some wanted symbols (large absolute symbols for addresses
like PTmap).
about because we're still tier 2 and our current compiler, as well
as future compilers will not support varargs. This is mostly a
no-op in practice, because <sys/varargs.h> should already cause
compile failures.
use the ability on ia64 to map the register stack. The orientation of
the stack (i.e. its grow direction) is passed to vm_map_stack() in the
overloaded cow argument. Since the grow direction is represented by
bits, it is possible and allowed to create bi-directional stacks.
This is not an advertised feature, more of a side-effect.
Fix a bug in vm_map_growstack() that's specific to rstacks and which
we could only find by having the ability to create rstacks: when
the mapped stack ends at the faulting address, we have not actually
mapped the faulting address. we need to include or cover the faulting
address.
Note that at this time mmap(2) has not been extended to allow the
creation of rstacks by processes. If such a need arises, this can
be done.
Tested on: alpha, i386, ia64, sparc64
do exactly the same as vop_nopoll() for consistency and put a
comment in the two pointing at each other.
Retire seltrue() in favour of no_poll().
Create private default functions in kern_conf.c instead of public
ones.
Change default strategy to return the bio with ENODEV instead of
doing nothing which would lead the bio stranded.
Retire public nullopen() and nullclose() as well as the entire band
of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump}
funtions, they are the default actions now.
Move the final two trivial functions from subr_xxx.c to kern_conf.c
and retire the now empty subr_xxx.c
This is just a cleanup here (modulo rev.1.108 of kern/tty.c), since the
input speed can be different from to output speed and extra code to
handle both speeds naturally handled all cases.
provide no methods does not make any sense, and is not used by any
driver.
It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.
Change the defaults to be nullopen() and nullclose() which simply
does nothing.
Remove explicit initializations to these from the drivers which
already used them.
- Removed conversion of a zero input speed to the output speed. This
has been done better in ttioctl() since rev.1.108 of kern/tty.c
almost 5 years ago. comparam() did the conversion incompletely for
the case where the output speed is also zero. It had complications
to avoid using zero speeds, but would still have used a zero input
speed for setting watermarks if kern/tty.c had passed one.
- Never permit the input speed to be different from the output speed.
There was no validity check on the input speed for the case of a zero
output speed. Then we didn't change the physical speeds, but we used
the unvalidated input speed for setting watermarks and didn't return
an error, so ttioctl() stored the unvalidated input speed in the tty
struct where it could cause problems later.
- Removed complications that were to avoid using a divisor of 0. The
divisor is now always valid if the speed is accepted.
cd_setreg() were still using !(read_eflags() & PSL_I) as the condition
for the lock hidden by COM_LOCK() (if any) being held. This worked
when spin mutexes and/or critical_enter() used hard interrupt disablement,
but it has caused recursion on the non-recursive mutex com_mtx since
all relevant interrupt disablement became soft. The recursion is
harmless unless there are other bugs, but it breaks an invariant so
it is fatal if spinlocks are witnessed.
struct msdosfsmount so that this file has the same prerequisites as
it used to. The new prerequistite was a meta-style bug. It required
many style bugs (unsorted includes ...) elsewhere.
Formatted prototypes in KNF. Resisted urge to sort all the prototypes,
to minimise differences with NetBSD. (NetBSD has reformatted the
prototypes but has not sorted them and still uses __P(()).)
node lock while sending a management frame as this will potentially
result in a LOR with a driver lock. This doesn't happen for the
Atheros driver but does for the wi driver. Use a generation number
to help process each node once when scanning the node table and
drop the node lock if we need to timeout a node and send a frame.
AP has basic rates that we do not support then ignore them instead
of marking the rate set in error.
This fixes an 11b station associating with an 11g/b AP.
are allowed by Windows (ref: MS KB article 120138).
XXX From my reading of the CIFS specification, it's not clear that
clients need to validate filenames at all.
PR: 57123
Submitted by: Paul Coucher
MFC after: 1 month
in the loopback test in the probe. The delay was too short for consoles
at speeds lower than about 3200 bps. This shouldn't have caused many
problems, since such low speeds are rare and the probe is forced to
succeed for consoles.
consdev structure.
If the consdev name is not set and we have a cn_dev, set the name
from there. Try to issue a printf about this, even though it may
not have a place to go.
Modify the sysctl related code to pick up the name from the consdev
instead.
Most of the actual use of the cn_dev field is merely to get the name,
and most of the actual initializations are bogusly using makedev()
because the probe/attach has not been completed.
Instead we will migrate console drivers to fill in the name and if
the driver needs it: the unit number, thereby avoiding the bogus
calls to makedev().
and the Z8530 drivers used the I/O address as a quick and dirty way to
determine which channel they operated on, but formalizing this by
introducing iobase is not a solution. How for example would a driver
know which channel it controls for a multi-channel UART that only has a
single I/O range?
Instead, add an explicit field, called chan, to struct uart_bas that
holds the channel within a device, or 0 otherwise. The chan field is
initialized both by the system device probing (i.e. a system console)
or it is passed down to uart_bus_probe() by any of the bus front-ends.
As such, it impacts all platforms and bus drivers and makes it a rather
large commit.
Remove the use of iobase in uart_cpu_eqres() for pc98. It is expected
that platforms have the capability to compare tag and handle pairs for
equality; as to determine whether two pairs access the same device or
not. The use of iobase for pc98 makes it impossible to formalize this
and turn it into a real newbus function later. This commit reverts
uart_cpu_eqres() for pc98 to an unimplemented function. It has to be
reimplemented using only the tag and handle fields in struct uart_bas.
Rewrite the SAB82532 and Z8530 drivers to use the chan field in struct
uart_bas. Remove the IS_CHANNEL_A and IS_CHANNEL_B macros. We don't
need to abstract anything anymore.
Discussed with: nyan
Tested on: i386, ia64, sparc64
aic7xxx_pci.c:
When performing our register test, be careful
to avoid resetting the chip when pausing the
controller. The test reads the HCNTRL register
and then writes it back with the PAUSE bit
explicitly set. If the last write to the controller
before our probe is to reset it, the CHIPRST
bit will still be set, so we must mask it off
before the PAUSE operation. On some chip versions,
we cannot access registers for a few 100us after
a reset, so this inadvertant reset was causing PCI
errors to occur on the read to check for paused
status.
Submitted by: gibbs
Reimplement pmap_release() such that it uses the page table rather than
the pte object to locate the page table directory pages. (Temporarily,
retain an assertion on the emptiness of the pte object.)
systems where the data/stack/etc limits are too big for a 32 bit process.
Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.
Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.
Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.
Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.
Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.
Instead, use EXCA_MEMREG_WIN_SHIFT which is the amount we shift the
bus address by to write into upper memory (eg above 24MB). Use the
latter in this case.
be gone in FreeBSD 6, so put BURN_BRIDGES around it. The TRB also
felt that if something better comes along sooner, it can be used to
replace this code.
Delayed by: BSDcon and subsequent disk crash.
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules
Heavy lifting by: "Max Laier" <max@love2party.net>
Supported by: FreeBSD Foundation
Obtained from: NetBSD (bits of pfil.h and pfil.c)
attached network could exhaust kernel memory, and cause a system
panic, by sending a flood of spoofed ARP requests.
Approved by: jake (mentor)
Reported by: Apple Product Security <product-security@apple.com>
Skinny is the protocol used by Cisco IP phones to talk to Cisco Call
Managers. With this code, one can use a Cisco IP phone behind a FreeBSD
NAT gateway.
Currently, having the Call Manager behind the NAT gateway is not supported.
More information on enabling Skinny support in libalias, natd, and ppp
can be found in those applications' manpages.
PR: 55843
Reviewed by: ru
Approved by: ru
MFC after: 30 days
order to use "unmanaged" pages in the kmem object, vm_map_delete() must
unconditionally perform pmap_remove(). Otherwise, sparc64 has problems.
Tested by: jake
32 bit binary stuff. 32 bit binaries do not like it much when the kernel
tries hard to put things above the 8GB mark.
I have a work-in-progress to fix this properly, but I didn't want to burn
anybody with this yet.
initialize a TSC timecounter until we know if it is broke or not.
XXX I think there is a bug in the i386 code here. init_TSC_tc() comes
after:
if (statclock_disable)
return;
ie: if you turn off the statclock interrupt, you dont get the TSC either.
for breakpoint and trace traps from usermode. Although all the setidt
entries are interrupt gates on amd64, all but the trace and bpt trap
entry handlers reenable interrupts after the swapgs instruction in order
to simulate the trap/interrupt gate distinction. In other words, the
amd64 code behaves the same way that i386 does here.
for ddb input in some atkbd-based console drivers. ddb must not use any
normal locks but DELAY() normally calls getit() which needs clock_lock.
This also removes the need for recursion on clock_lock.
known constants at compile time rather than at run time. We have a number
of nasty hacks around the place to cache ntohl() of constants (eg: nfs).
This change allows the compiler to compile-time evaluate ntohl(1) as
0x01000000 rather than having to emit assembler code to do it. This
has other smaller flow-on effects because the compiler can see that
ntohl(constant) itself has a constant value now and can propagate the
compile time evaluation.
Obtained from: Ideas from NetBSD and Linux, and some code from NetBSD
1.186: onoe; Sony's PEGA-WL110 CF WLAN (which strangely has fujitsu's
vendor id)
1.185: ichiro; Quatech Inc, PCMCIA Enhanced Parallel Port Card
Also:
o update $NetBSD$
o minor tweaks to FUJITSU. We've tried to keep the CIS only entries seprate
from vendor id/product id.
freed belong to the kernel object.)
- Increase the granularity of the vm object locking in vm_hold_load_pages()
in order to reduce the number of times that we acquire and release the
same lock.
Fix to the messages output under CAM_DEBUG_CCB: the summary sense
information (error bits and sense key) is in the error field, not
in the result field, of struct ata_request. No other functional change.
completion of recovery is indicated by positioning the CAM_AUTOSNS_VALID
bit in the status field of the CCB, not in the flags field.
This fixes an endless loop of sense recovery actions.
Reviewed by: ken
function, startup_alloc(), that is used for single page allocations prior
to the VM starting up. If it is used after the VM startups up, it
replaces the zone's allocf pointer with either page_alloc() or
uma_small_alloc() where appropriate.
Pointy hat to: me
Tested by: phk/amd64, me/x86
Temporarily disable the UMA_MD_SMALL_ALLOC stuff since recent commits
break sparc64, amd64, ia64 and alpha. It appears only i386 and maybe
powerpc were not broken.
device to access 64-bit addresses from a 32-bit PCI bus. While the
RealTek manual says you can set this bit and the chip will perform
DAC only if you give it a DMA address with any of the upper 32
bits set, this appears not to be the case. If I turn on the DAC
bit, the chip sets the 'system error' bit in the status register
when I to do a DMA on my Athlon test box with 32-bit PCI bus (VIA
chipset) even though I only have 128MB of physical memory, and thus
can never give the chip a 64-bit address.
Obviously, I can't just set it and forget it, so until I figure
out the right rule for when it's safe/necessary to enable it, keep
it turned off.
not guaranteed that the RSE writes the NaT collection immediately,
sort of atomically, to the backing store when it writes the register
immediately prior to the NaT collection point. This means that we
cannot assume that the low 9 bits of the backingstore pointer do not
point to the NaT collection. This is rather a surprise and I don't
know at this time if it's a bug in the Merced or that it's actually
a valid condition of the architecture. A quick scan over the sources
does not indicate that we depend on the false assumption elsewhere,
but it's something to keep in mind.
The fix is to write the saved contents of the ar.rnat register to
the backingstore prior to entering the loop that copies the dirty
registers from the kernel stack to the user stack.
functions reference UMA internals from <vm/uma_int.h>, which makes
them highly unwanted in non-UMA specific files.
While here, prune the includes in pmap.c and use __FBSDID(). Move
the includes above the descriptive comment.
The copyright of uma_machdep.c is assigned to the project and can
be reassigned to the foundation if and when when such is preferrable.
Tested at 100Mbit only, using Asus P4P800 onboard 3C940.
The -stable version of this patch I have in use for ~2 weeks now, and works
just fine for me.
Based on: Nathan L. Binkert's patch for OpenBSD
Patch submitted by and thanks to: Jung-uk Kim <jkim@niksun.com>
MFC after: 2 weeks
Doing so creates a race where the buf is on neither list.
- Only vfree() in an error case in vclean() if VSHOULDFREE() thinks we
should.
- Convert the error case in vclean() to INVARIANTS from DIAGNOSTIC as this
really should not happen and is fast to check.
sufficient to guarantee that this race is not hit. The XLOCK will likely
have to be redesigned due to the way reference counting and mutexes work
in FreeBSD. We currently can not be guaranteed that xlock was not set
and cleared while we were blocked on the interlock while waiting to check
for XLOCK. This would lead us to reference a vnode which was not the
vnode we requested.
- Add a backtrace() call inside of INVARIANTS in the hopes of finding out if
this condition is ever hit. It should not, since we should be retaining
a reference to the vnode in these cases. The reference would be sufficient
to block recycling.
working set cache. This has several advantages. Firstly, we never touch
the per cpu queues now in the timeout handler. This removes one more
reason for having per cpu locks. Secondly, it reduces the size of the zone
by 8 bytes, bringing it under 200 bytes for a single proc x86 box. This
tidies up other logic as well.
- The 'destroy' flag no longer needs to be passed to zone_drain() since it
always frees everything in the zone's slabs.
- cache_drain() is now only called from zone_dtor() and so it destroys by
default. It also does not need the destroy parameter now.
broken consumers of the malloc interface who assume that the allocated
address will be an even multiple of the size.
- Remove disabled time delay code on uma_reclaim(). The comment there said
it all. It was not an effective strategy and it should not be left in
#if 0'd for all eternity.
restart instruction bits in the PSR. As such, we were returning
from interrupt to the instruction in the bundle that caused us
to enter the kernel, only now we're returning to a completely
different bundle.
While close here: add two KASSERTs to make sure that we restore
sync contexts only when entered the kernel through a syscall and
restore an async context only when entered the kernel through an
interrupt, trap or fault.
While not exactly here, but close enough: use suword64() when we
copy the dirty registers from the kernel stack to the user stack.
The code was intended to be be replaced shortly after being added,
but that was a couple of weeks ago. I might as well avoid that it
is a source for panics until it's replaced.
can get (or not) and what we do with them. This fixes the behaviour
for NaT consumption and speculation faults in that we now don't panic
for user faults.
Remove the dopanic label and move the code to a function. This makes
it easier in the simulator to set a breakpoint.
While here, remove the special handling of the old break-based syscall
path and move it to where we handle the break vector. While here,
reserve a new break immediate for KSE. We currently use the old break-
based syscall to deal with restoring async contexts. However, it has
the side-effect of also setting the signal mask and callong ast() on
the way out. The new break immediate simply restores the context and
returns without calling ast().
of "dumb" PCI-based serial/parallel boards get a hint how to enable
them.
I wasn't sure about the ia64, pc98, powerpc, and sparc64 archs whether
they'd support puc(4) or not.
extended irq lists. If the resource has a trailing byte but not the full
resource string, do not attempt to parse the resource string. This fixes
panics on transition to battery and shutdown for Larry. Patch has been
submitted to vendor and they will incorporate in next release.
Tested by: Larry Rosenman <ler@lerctr.org>
PR: kern/56254
page_alloc() function from the slab_zalloc() function. This allows us
to unconditionally call uz_allocf().
- In page_alloc() cleanup the boot_pages logic some. Previously memory from
this cache that was not used by the time the system started was left in
the cache and never used. Typically this wasn't more than a few pages,
but now we will use this cache so long as memory is available.
by accepting the user supplied flags directly. Previously this was not
done so that flags for the same field would not be defined in two
different files. Add comments in each header instructing future
developers on how now to shoot their feet.
- Fix a test for !OFFPAGE which should have been a test for HASH. This would
have caused a panic if we had ever destructed a malloc zone. This also
opens up the possibility that other zones could use the vsetobj() method
rather than a hash.
but for CPL != 0. For some reason yet unknown it is possible for the
CPL to be 2. This would previously be counted as kernel mode, which
resulted in nasty panics. By changing the test it is now treated as
user mode, which is more correct. We still need to figure out how it
is possible that the privilege level can be 2 (or 1 for that matter),
because it's not used by us. We only use 3 (user mode) and 0 (kernel
mode).
don't cache as many items.
- Introduce the bucket_alloc(), bucket_free() functions to wrap bucket
allocation. These functions select the appropriate bucket zone to
allocate from or free to.
- Rename ub_ptr to ub_cnt to reflect a change in its use. ub_cnt now reflects
the count of free items in the bucket. This gets rid of many unnatural
subtractions by 1 throughout the code.
- Add ub_entries which reflects the number of entries possibly held in a
bucket.
IF_HANDOFF() does it for us behind the scenes. Remove the extra call
to re_start() otherwise we try to transmit twice.
In re_encap(), fix the code that guards against consuming too many
descriptors in the TX ring so that it actually works. With the
new 8169S chip, I was able to hit a corner case that drained the
free descriptor count all the way to 0. This is not supposed to
be possible.
reserved bits in the port that must be zero are 24:30, not 20:30. Bits
16:23 are used to set the bus number. This meant that when we tested for
config mechanism #1, if the previous PCI configuration transaction sent
used a bus number greater than 15, one of the bits in 20:23 would be
non-zero and we would fail to use config mechanism #1 and thus fail to see
that PCI existed on the machine at all.
Obtained from: Shanley's PCI System Architecture book
Tested by: des
Proxied through: njl
user mode. This goes with rev.1.468 of machdep.c which changed the gates
for these traps to interrupt gates. Having the interrupts disabled for
these traps from user mode is just an unwanted side effect.
This fixes at least 1 case of "panic: absolutely cannot call
smp_ipi_shootdown with interrupts already disabled". Too much code was
run with interrupts disabled, and it sometimes hit a sanity check.
Fix verified by: deischen
sending an ICMP packet doesn't cause a panic. A better solution is needed;
possibly defering the transmit to a dedicated thread.
Observed by: "Aaron Wohl" <freebsd@soith.com>
COM_NOFIFO() and COM_ESP cases are supposed to be a subsets of the
plain 16550A case, but 16650-related changes made the former fall into
the latter and then both fall into general code for printing the tx
fifo size. This mainly caused hard to parse boot messages like:
"sio0: type 16550A fifo disabled lookalike with 1 bytes FIFO".
COM_NOFIFO() on an ESP port gave a larger mess whose extent is not
clear.
Fixed some nearby style bugs.
defined values instead of hard-coded values. Don't repeat the register
access part of the code 4 times times or triple-space statements. This
fixes half of the style bugs in rev.1.172.
Hardware flow control of 16650As is still officially unsupported. I
was mistaken about it being broken. It is broken in 16650s but is
fixed in 16650As except for the maximum trigger level (which is no
longer used). Testing of the 16650's broken hardware flow control
watermarks by programming them on 16950s showed that their effects are
not too bad if the fifo size and trigger level are reasonably large
(16 is much better than 8).
an UART interface could get stuck when a new interrupt condition
arose while servicing a previous interrupt. Since an interrupt was
already pending, no new interrupt would be triggered.
Avoid infinite recursion by flushing the Rx FIFO and marking an
overrun condition when we could not move the data from the Rx
FIFO to the receive buffer in toto. Failure to flush the Rx FIFO
would leave the Rx ready condition pending.
Note that the SAB 82532 already did this due to the nature of the
chip.
from the sparc64 subtree, which breaks building non-sparc64 platforms
in the event the sparc64 subtree does not exist.
The problem is specific to the module, because non-module builds are
affected by the presence or absence of "device ebus" in the kernel
configuration.
PR: kern/56869
precisely where locking would be needed before adding it, but it
seems uart(4) draws slightly too much attention to have it without
locking for too long.
The lock added is a spinlock that protects access to the underlying
hardware. As a first and obvious stab at this, each method of the
hardware interface grabs the lock. Roughly speaking this serializes
the methods. Exceptions are the probe, attach and detach methods.
o change timeout to MPSAFE callout
o restructure rule deletion to deal with locking requirements
o replace static buffer used for ipfw control operations with malloc'd storage
Sponsored by: FreeBSD Foundation
o change time to MPSAFE callout
o make debug printfs conditional on DUMMYNET_DEBUG and runtime controllable
by net.inet.ip.dummynet.debug
o make boot-time printf dependent on bootverbose
Sponsored by: FreeBSD Foundation
o replace magic constants with #defines (e.g. ETHER_ADDR_LEN)
o move mib variables to net.link.ether.bridge with backwards compatible
entries for well-known items maintained under BURN_BRIDGES
o revamp debugging support so it is conditioanlly compiled with BRIDGE_DEBUG
(on currently) and runtime controlled by net.link.ether.bridge.debug
o change timeout to MPSAFE callout
o optimize lookup for common case of two interfaces
o optimize forwarding path to take IFNET lock only when needed
o make boot-time printf dependent on bootverbose
o sundry style changes (ANSI decls, extraneous spaces, etc.)
Sponsored by: FreeBSD Foundation
a problem for command responses since we rarely ever filled the queue.
However, adapter-initiated commands have a much smaller queue and could
tickle this bug. It's possible that this might fix the recently reported
problems with the aac-2120s, though I haven't been able to reproduce the
problem locally.
MFC-After: 1 day
and some of their bits (i.e., fifo trigger levels, frequency multipliers
and divisors, and bits to select the registers for these). This
attempts to completely describe the 16950's complicated register selects
for 16950-specific registers only.
the description of the data latch registers (they were described as
readonly).
Added some better and worse aliases for standard registers, mostly taken
from the 16950 data sheet. Define deprecated aliases in terms of the
preferred one.
Don't define com_efr in terms of com_fifo. It is unrelated (in a
different bank).
Merged comments to match (put them at the right of the #defines instead
of duplicating them).
Sorted the resulting sections on UART type and register bank. Added a
comment for each bank.
to ns16550.h. The organization of these files was sort of backwards.
The bits in the registers have no driver or bus dependencies but they
but the offsets of the registers in bus space are very bus-dependent.
However, it does no harm to keep the definitions of the register offsets
in ns16550.h provided they are thought of as internal ns*50 offsets.
that has been recorded earlier and overwrite it again later by
reading it directly from the EEPROM again.
Read the MAC address from the PAR0/PAR1 registers instead, which
are autoloaded on reboot.
Tested on AN985, AN983B. According to the datasheets, it should
also work for the AL981 (I don't have such a chip on a card at home)
PR: 52988
Submitted by: Andrew Gordon <arg-bsd@arg.me.uk>
MFC after: 2 weeks
calculate smoothed signal quality data for each node.
o add a 16-deep history buffer to each driver-private node storage that
holds rssi and antenna info for received frames
o override the default per-node "get rssi" method to return an average
rssi value based on samples collected over the last second
o enable beacon reception so even idle systems maintain a running history
of signal quality
This data may also be useful for improving the rate control algorithm.
Based on work by Tom Marshall <tommy@home.tig-grr.com> for MADWIFI.
things than record a single value.
o add a per-node method for returning the "current RSSI" for a node
o create a default method that returns ni_rssi which is the rssi for
the last received frame
o use the per-node "get rssi" method to return data for the RID's
submitted by wicontrol, et. al.
Loosely based on work by Tom Marshall <tommy@home.tig-grr.com> for MADWIFI.
to the 802.11 layer if they are at least IEEE80211_MIN_LEN
o mask off interrupt status bits that we don't care about so we don't do
the wrong thing; this fixes a problem where the beacon miss interrupt status
bit is delivered together with other status bits when operating in monitor
mode (we would post a beacon miss swi and then do the wrong thing)
In particular, let drivers send up control frames so we can dispatch
them to bpf in monitor mode.
This is the first (small) step to adding more functionality such as
power save mode.
was added to the fast path to support the COM_IIR_RXRDYBUG() case even
when that case is not configured. This increased the relative overhead
of sio input by almost 25% in the worst case and by 2-3% in the usual
case (usually only about 0.2% absolute per port at 115200 bps). The
quick fix is to significantly pessimize only the COM_IIR_RXRDYBUG()
case.
safe since the 802.11 layer does the right thing for 11a operation)
o select short preamble operation based on the negotiated capabilities; not
just the local state/capability
o fillin the duration field in the 802.11 header as appropriate
o remove detection of 11g support; no longer needed
Obtained from: MADWIFI (with modifications)
capabilities for outbound management frames. But beware of sending
this when operating on 5GHz channels; some 11a AP's reject association
requests if this bit is set in the capabilities listed.
Obtained from: MADWIFI (with modifications)
special signal-delivery protections for setugid processes. In the
event that a system is relying on "unusual" signal delivery to
processes that change their credentials, this can be used to work
around application problems.
Also, add SIGALRM to the set of signals permitted to be delivered to
setugid processes by unprivileged subjects.
Reported by: Joe Greco <jgreco@ns.sol.net>
we're on a 32-bit/64-bit bus or not. Use this to decide if we should
set the PCI dual-address cycle enable bit in the C+ command register.
(Enabling DAC on a 32-bit bus seems to do bad things.)
Also, initialize the C+ command register early in the re_init() routine.
The documentation says this register should be configured first.
in6_pcbpurgeif0(LIST_FIRST(udbinfo.listhead), ifp);
in6_pcbpurgeif0(LIST_FIRST(ripcbinfo.listhead), ifp);
The problem here is that udbinfo.listhead and ripcbinfo.listhead are
not initialized during the device probe/attach phase of the kernel
boot process. So if, for example, a network driver calls ether_ifattach()
in its foo_attach() routine and then decides that something is wrong
and calls ether_ifdetach() to reverse the process, we will panic trying
to dereference the uninitialized list head pointers. (Though the
same sequence of events performed after the kernel has come up works
file, i.e. doing kldload if_foo from multiuser.)
Change this to:
if (udbinfo.listhead != NULL)
in6_pcbpurgeif0(LIST_FIRST(udbinfo.listhead), ifp);
if (ripcbinfo.listhead != NULL)
in6_pcbpurgeif0(LIST_FIRST(ripcbinfo.listhead), ifp);
to avoid the NULL pointer dereferences.
pmap_remove_pte(), passed NULL instead of the required page table
page to pmap_unuse_pt(). Compute the necessary page table page
in pmap_remove_pte(). Also, remove some unreachable code from
pmap_remove_pte().
count in _vm_object_allocate(). (Access to the generation count is
governed by the vm object's lock.) Note: the introduction of the
atomic increment in revision 1.238 appears to be an accident. The
purpose of that commit was to fix an Alpha-specific bug in UMA's
debugging code.
We simply use the detected FIFO size to determine whether we have
a post 16550 UART or not. The support lacks proper serialization of
hardware access for now.
pmap_extract_and_hold(). Note, however, that GIANT_REQUIRED should not be
removed until all platforms fully implement the "prot" parameter to
pmap_extract_and_hold().
Reviewed by: tegge
fixes a longstanding issue WRT resetting the chip after startup- it
would fail if we were connected as an F-port to a switch. If we
were connected as an F-port, we got assigned a hard loop ID of 255,
which is really a bogus loop id. Then when we turned around to
reset ourselves, the firmware would reject the ICB_INIT request
because the loop id was bogus. *sputter*
Minor fixlet from somebody in NetBSD with too much time on their
hands (dma -> DMA).
the "compatible" property too in the ns8250 case. This gets the serial
console to work on Blade 100s, where the device name is just "serial".
Reviewed by: marcel
Second (PPS) timing interface. The support is non-optional and by
default uses the DCD line signal as the pulse input. A compile-time
option (UART_PPS_ON_CTS) can be used to have uart(4) use the CTS line
signal.
Include <sys/timepps.h> in uart_bus.h to avoid having to add the
inclusion of that header in all source files.
Reviewed by: phk
This commit puts the relevant code snippets under #ifdef GONE_IN_5
(rather than #ifndef BURN_BRIDGES) thereby disabling the code now.
The code wil be entirely removed before 5.2 unless we find reasons
why this would be a bad idea.
Approach suggested by: imp
seems to be necessary for the 8139C+ under certain circumstances, and
doesn't appear to hurt the other chips. (In the failure case, the
packet would be sent through the TX DMA ring but not get echoed
back. I suspect this has something to do with the link state changing
unexpectedly.)
autoload and then copying the contends of the station address
registers. For some reason, reading the EEPROM on the 8169S doesn't
work right. This gets around the problem, and allows us to read
the station address correctly on the 8169S.
- Insert a delay after initiating packet transmition in re_diag() to
allow lots of time for the frame to echo back to the host, and wait
for both the 'RX complete' and 'timeout expired' bits in the ISR
register to be set.
- Deal more intelligently with the fact that the frame length
field in the RX descriptor is a different width on the 8139C+
than it is on the 8169/8169S/8110S
- For the 8169, you have to set bit 17 in the TX config register
to enter digital loopback mode, but for the 8139C+, you have to
set both bits 17 and 18. Take this into account so that re_diag()
works properly for both types of chips.
ethernet chips. This driver is pretty simple, however it contains
special DSP initialization code which is needed in order to get
the chip to negotiate a gigE link. (This special initialization
may not be needed in subsequent chip revs.) Also:
- Fix typo in if_rlreg.h (RL_GMEDIASTAT_1000MPS -> RL_GMEDIASTAT_1000MBPS)
- Deal with shared interrupts in re_intr(): if interface isn't up,
return.
- Fix another bug in re_gmii_writereg() (properly apply data field mask)
- Allow PHY driver to read the RL_GMEDIASTAT register via the
re_gmii_readreg() register (this is register needed to determine
real time link/media status).
we think is the correct trigger mode and polarity. This allows us to
implement BUS_CONFIG_INTR() as an update of the RTE in question.
Consequently, we can trust the RTE when we enable an interrupt and
avoids that we need to know about the trigger mode and polarity at
that time.
method. This is necessary on ia64 where it's known that serial interfaces
described in the ACPI namespace may not have the well-known IRQs assigned
to them. This confuses us in thinking they are PCI based interrupts and
wrongly program the APIC.
about interrupt trigger mode and interrupt polarity. This allows ACPI
for example to pass interrupt resource information up the hierarchy.
The default implementation of the method therefore is to pass the
request to the parent.
Reviewed by: jhb, njl
for the 8169S, according to my sample board. The RealTek Linux driver
mentions 0x00800000. I'm assigning this to the 8110S until I get
more info on it. (The (preliminary) RealTek docs only say that 8169S/8110S
chips will have some combination of those two bits set, but doesn't say
exactly what bit combination goes with which chip variant.)
intpin register is expressed in hardware where 0 means none, 1 means INTA,
2 INTB, etc. The other way is commonly used in loops where 0 means INTA,
1 means INTB, etc. The matchpin argument to pci_cfgintr_search() is
supposed to be the first form, but we passsed in a loop index of the
second. This fix adds one to the loop index to convert to the first form.
Reported by: Pavlin Radoslavov <pavlin@icir.org>
is not a size of 1. Since we already know there is a FIFO, we can
safely assume that it is at least 16 bytes. Note that all this is
mostly academic anyway. We don't use the size of the Rx FIFO
currently. If we add support for hardware flow control, we only
care about Rx FIFO sizes larger than 16.
on data structures on the kernel stack which are guaranteed to be 16 byte
aligned by gcc, the amd64 ABI and __aligned(16).
Ensire the tss_rsp0 initial stack pointer is 16 byte aligned in case
sizeof(pcb) becomes odd at some point. This is convenient for the
interrupt handler case because the ring crossing pushes cause the
required odd alignment before the call to the C code.
Have fast_syscall add an additional 8 bytes to ensure that the trapframe
has the correct odd alignment for the call to C code. Note that there are
no checks to make sure that the trapframe size is appropriate for this.
This makes get/setfpcontext work properly (finally). You get a GPF in
kernel mode if any of this is botched without the alignment fixup code
that is apparently needed on i386.
written by Stuart Walsh and Duncan Barclay (with some kibbitzing by
me). I'm checking it in on Stuart's behalf.
The BCM4401 is built into several x86 laptop and desktop systems. For the
moment, I have only enabled it in the x86 kernel config because although
it's a PCI device, I haven't heard of any standalone NICs that use it. If
somebody knows of one, we can easily add it to the other arches.
This driver uses register/structure data gleaned from the Linux
driver released by Broadcom, but does not contain any of the code
from the Linux driver itself. It uses busdma.
latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn
determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE.
The constants are used instead of the literal hardcoding (in its
various forms) of the size of the direct mappings created in region
6 and 7. The default and probably only workable size is still 256M,
but for kicks we use 128M for LINT.
specified directory is not found in the mount list. Before the
MNT_BYFSID changes, unmount(2) used to return ENOENT for a nonexistent
path and EINVAL for a non-mountpoint, but we can no longer distinguish
between these cases. Of the two error codes, EINVAL was more likely
to occur in practice, and it was the only one of the two that was
documented.
Update the manual page to match the current behaviour.
Suggested by: tjr
Reviewed by: tjr
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.
Reviewed by: tegge
rl(4) driver and put it in a new re(4) driver. The re(4) driver shares
the if_rlreg.h file with rl(4) but is a separate module. (Ultimately
I may change this. For now, it's convenient.)
rl(4) has been modified so that it will never attach to an 8139C+
chip, leaving it to re(4) instead. Only re(4) has the PCI IDs to
match the 8169/8169S/8110S gigE chips. if_re.c contains the same
basic code that was originally bolted onto if_rl.c, with the
following updates:
- Added support for jumbo frames. Currently, there seems to be
a limit of approximately 6200 bytes for jumbo frames on transmit.
(This was determined via experimentation.) The 8169S/8110S chips
apparently are limited to 7.5K frames on transmit. This may require
some more work, though the framework to handle jumbo frames on RX
is in place: the re_rxeof() routine will gather up frames than span
multiple 2K clusters into a single mbuf list.
- Fixed bug in re_txeof(): if we reap some of the TX buffers,
but there are still some pending, re-arm the timer before exiting
re_txeof() so that another timeout interrupt will be generated, just
in case re_start() doesn't do it for us.
- Handle the 'link state changed' interrupt
- Fix a detach bug. If re(4) is loaded as a module, and you do
tcpdump -i re0, then you do 'kldunload if_re,' the system will
panic after a few seconds. This happens because ether_ifdetach()
ends up calling the BPF detach code, which notices the interface
is in promiscuous mode and tries to switch promisc mode off while
detaching the BPF listner. This ultimately results in a call
to re_ioctl() (due to SIOCSIFFLAGS), which in turn calls re_init()
to handle the IFF_PROMISC flag change. Unfortunately, calling re_init()
here turns the chip back on and restarts the 1-second timeout loop
that drives re_tick(). By the time the timeout fires, if_re.ko
has been unloaded, which results in a call to invalid code and
blows up the system.
To fix this, I cleared the IFF_UP flag before calling ether_ifdetach(),
which stops the ioctl routine from trying to reset the chip.
- Modified comments in re_rxeof() relating to the difference in
RX descriptor status bit layout between the 8139C+ and the gigE
chips. The layout is different because the frame length field
was expanded from 12 bits to 13, and they got rid of one of the
status bits to make room.
- Add diagnostic code (re_diag()) to test for the case where a user
has installed a broken 32-bit 8169 PCI NIC in a 64-bit slot. Some
NICs have the REQ64# and ACK64# lines connected even though the
board is 32-bit only (in this case, they should be pulled high).
This fools the chip into doing 64-bit DMA transfers even though
there is no 64-bit data path. To detect this, re_diag() puts the
chip into digital loopback mode and sets the receiver to promiscuous
mode, then initiates a single 64-byte packet transmission. The
frame is echoed back to the host, and if the frame contents are
intact, we know DMA is working correctly, otherwise we complain
loudly on the console and abort the device attach. (At the moment,
I don't know of any way to work around the problem other than
physically modifying the board, so until/unless I can think of a
software workaround, this will have do to.)
- Created re(4) man page
- Modified rlphy.c to allow re(4) to attach as well as rl(4).
Note that this code works for the sample 8169/Marvell 88E1000 NIC
that I have, but probably won't work for the 8169S/8110S chips.
RealTek has sent me some sample NICs, but they haven't arrived yet.
I will probably need to add an rlgphy driver to handle the on-board
PHY in the 8169S/8110S (it needs special DSP initialization).
ia64_count_cpus() and ia64_probe_sapics() called a single function
to do the the actual work. The difference in behaviour was handled
in that function and was further complicated by adding bootverbose
related code. As such, even the simplest of changes was hard to
comprehend.
Untangling has been done by increasing code duplication and using
a more naive style of coding. FWIW, the object file is slightly
smaller than before, so things aren't as bad as it may seem.
Triggered by: a simple fix on the P4 branch that never got merged.
from the SAB82532 and the Z8530 hardware drivers by introducing
uart_cpu_busaddr(). The assumption is not true on pc98 where
bus_space_handle_t is a pointer to a structure.
The uart_cpu_busaddr() function will return the bus address
corresponding the tag and handle given to it by the BAS.
WARNING: the intend of the function is STRICTLY to allow hardware
drivers to determine which logical channel they control and is NOT
to be used for actual I/O. It is therefore EXPLICITLY allowed that
uart_cpu_busaddr() returns only the lower 8 bits of the address
and garbage in all other bits. No mistakes...
Quick fix for calling DELAY() for ddb input in some (atkbd-based)
console drivers. ddb must not use any normal locks, but DELAY()
normally calls getit() which needs clock_lock. One problem with using
normal locks in ddb is that deadlock is possible, but deadlock on
clock_lock is unlikely becaluse clock_lock is bogusly recursive,
apparently just to hide the problem of ddb using it. The i8254 clock
hardware has mostly write-only registers so it is important for it to
use a lock that gives exclusive access. (atkbd hardware is also
unfriendly to reentrant software but that problem is more local and
already solved.) I mostly saw the symptoms of the bug caused by
unlocking in getit() running cpu_unpend(). cpu_unpend() should not
be called while in ddb and Debugger() calls for failing assertions
about this caused a breakpoint within ddb.
ddb must also not call getit() because ddb may be being used to step
through clock initialization code that has stopped or otherwise mangled
the clock. If the clock is stopped, then getit() always returns the
same value and DELAY() takes forever if it trusts getit().
The quick fix is implement DELAY(n) as (n * timer_freq / 1000000)
inb(0x84)'s if ddb is active.
machdep.c:
Don't permit recursion on clock_lock.
kdb_trap(). Stopping the other CPUs acts like locking them out, but
it wasn't done early enough or held long enough to prevent concurrent
accesses to shared data. In particular, the saved regs could be
clobbered.
with 64-bit longs again. This was fixed in rev.1.42 but the fix
rotted non-fatally in rev.1.105 and fatally in rev.1.137.
Many more non-egregrious casts are strictly required for conversions
from semi-opaque types to pointers, but we avoid most of them by using
types that are almost certain to be compatible with uintptr_t for
representing pointers (e.g., vm_offset_t). Here we don't really want
the u_longs, but we have them because a.out.h and its support code
doesn't use typedefs (it uses unsigned in V7 and unsigned long in
FreeBSD) and is too obsolete to fix now.
FIDs to be 128-bits wide and adds support for realms.
Add a new CODA_COMPAT_5 option, which requests support for the old
Coda 5.x interface instead of the new one.
Create a new coda5.ko module that supports the 5.x interface, and make
the existing coda.ko module use the new 6.x interface. These modules
cannot both be loaded at the same time.
Obtained from: Jan Harkes & the coda-6.0.2 distribution,
NetBSD (drochner) (CODA_COMPAT_5 option).
building a module. Inclusion of option files (opt_ddb.h in this
case) is not possible for modules. The inclusion of opt_ddb.h
in this header is questionable.
(ns8250 copied and s/ns8250/i8251/g), but there for linkage purposes.
Real code to follow, once I get past some boot issues on my pc98 boxes
with recent current.
of what uart(4) is and/or is not see the initial commit log of one
of the files in sys/dev/uart (or see share/man/man4/uart.4).
Note that currently pc98 shares the MD file with i386. This needs
to change when pc98 support is fleshed-out to properly support the
various UARTs. A good example is sparc64 in this respect.
We build uart(4) as a module on all platforms. This may break
the ppc port. That depends on whether they do actually build
modules.
To use uart(4) on alpha, one must use the NO_SIO option.
It improves on sio(4) in the following areas:
o Fully newbusified to allow for memory mapped I/O. This is a must
for ia64 and sparc64,
o Machine dependent code to take full advantage of machine and firm-
ware specific ways to define serial consoles and/or debug ports.
o Hardware abstraction layer to allow the driver to be used with
various UARTs, such as the well-known ns8250 family of UARTs, the
Siemens sab82532 or the Zilog Z8530. This is especially important
for pc98 and sparc64 where it's common to have different UARTs,
o The notion of system devices to unkludge low-level consoles and
remote gdb ports and provides the mechanics necessary to support
the keyboard on sparc64 (which is UART based).
o The notion of a kernel interface so that a UART can be tied to
something other than the well-known TTY interface. This is needed
on sparc64 to present the user with a device and ioctl handling
suitable for a keyboard, but also allows us to cleanly hide an
UART when used as a debug port.
Following is a list of features and bugs/flaws specific to the ns8250
family of UARTs as compared to their support in sio(4):
o The uart(4) driver determines the FIFO size and automaticly takes
advantages of larger FIFOs and/or additional features. Note that
since I don't have sufficient access to 16[679]5x UARTs, hardware
flow control has not been enabled. This is almost trivial to do,
provided one can test. The downside of this is that broken UARTs
are more likely to not work correctly with uart(4). The need for
tunables or knobs may be large enough to warrant their creation.
o The uart(4) driver does not share the same bumpy history as sio(4)
and will therefore not provide the necessary hooks, tweaks, quirks
or work-arounds to deal with once common hardware. To that extend,
uart(4) supports a subset of the UARTs that sio(4) supports. The
question before us is whether the subset is sufficient for current
hardware.
o There is no support for multiport UARTs in uart(4). The decision
behind this is that uart(4) deals with one EIA RS232-C interface.
Packaging of multiple interfaces in a single chip or on a single
expansion board is beyond the scope of uart(4) and is now mostly
left for puc(4) to deal with. Lack of hardware made it impossible
to actually implement such a dependency other than is present for
the dual channel SAB82532 and Z8350 SCCs.
The current list of missing features is:
o No configuration capabilities. A set of tunables and sysctls is
being worked out. There are likely not going to be any or much
compile-time knobs. Such configuration does not fit well with
current hardware.
o No support for the PPS API. This is partly dependent on the
ability to configure uart(4) and partly dependent on having
sufficient information to implement it properly.
As usual, the manpage is present but lacks the attention the
software has gotten.
o Introduce PUC_PORT_TYPE_UART so that we can attach to uart(4),
o Introduce port sub-types (eg PUC_PORT_UART_NS8250, PUC_PORT_UART_Z8530)
to handle different hardware and determine resource sizes.
o Introduce two new IVARs: PUC_IVAR_SUBTYPE and PUC_IVAR_REGSHFT. Both
are used by uart(4) to get sufficient information to talk to the HW.
o Introduce PUC_FLAGS_ALTRES to tell puc(4) to try memory mapped I/O
if I/O port space cannot be allocated, or vice versa.
o Have ports of type PUC_PORT_TYPE_COM attach to uart(1) if attaching
to sio(4) fails (due to not having the sio driver).
o Put struct puc_device_description in struct puc_softc instead of
having a pointer to a device description in the softc. This allows
us to create device descriptions on the fly without having to use
malloc() or otherwise have them staticly defined.
o Move puc_find_description() from puc.c to puc_pci.c as it's specific
to PCI.
o Add EBUS and SBUS frontends for use on sparc64. Note that the P in
puc stands for PCI, so we kinda mess things up here. It's too soon
to worry about it though. We'll know what to do about it in time.
NOTE: This commit changes the behaviour of puc(4) to not quieten the
device probe and attach for child devices. The uart(4) driver provides
additional device description that is valuable to have.
we actually use. Originally, the code reserved 0x8000 to 0x80ff inclusive
which on my hardware conflicts with the acpi timer. This broke the amdpm
driver since it was actually given ports 0x800c to 0x810b (which should
not have happened, IMHO).
This also allows us to considerably simplify the handling of the nForce
smb driver, removing the need for a separate nfpm driver. With this, SMB
accesses appear to work on my Tyan Tiger MP board. Your mileage may vary.
In particular, the nForce changes have not been tested.
we can switch to 64M-sized identity mappings and not having to map the
first 64M. This is especially important because the first 1M contains
the VGA frame buffer and is otherwise a legacy memory range. Best to
make as little assumptions about it as possible. Switching to 64M-sized
mappings is important to avoid creating overlapping translations, which
have the side-effect of triggering machine checks. This is currently
what's preventing us to boot on an Intel Tiger 4.
Note that since we currently use 256M-sized identity mappings, we
would reduce the size of the mappings and consequently increase the
TLB pressure. The performance implications of this are minimal if
measurable at all because identify mappings are not our primary
means for memory management.
Also note that there's no guarantee that physical memory exists at
64M. Then again, we didn't had the guarantee when we were loading at
5M. We'll deal with this when it's a problem.
Discussed with: arun@
Special thanks to Pavlin Radoslavov <pavlin@icir.org> for testing and
fixing numerous problems.
Sponsored by: FreeBSD Foundation
Reviewed by: Pavlin Radoslavov <pavlin@icir.org>
and/or INTR_FAST. This belongs elsehwere and perhaps under bootverbose;
I'm committing it for now as it's uesful to know which drivers have
been converted and which have not.
we return to kernel or userland. This triggered a panic in a KSE
application when TDF_USTATCLOCK was set in the case userland was
interrupted, but we never called ast() on our way out. As such,
we called ast() at some other time. Unfortunately, TDF_USTATCLOCK
handling assumes running in the interrupt thread. This was not
the case anymore.
To avoid making the same mistake later, interrupt() now returns
to its caller whether we interrupted userland or not. This avoids
that we have to duplicate the check in assembly, where it's bound
to fall off the scope. Now we simply check the return value and
call ast() if appropriate.
Run into this: davidxu
to protect the vlan state in each ifnet (e.g. vlan count). The latter is
probably better handled through an ifnet-centric means but since changes
are infrequent shouldn't matter for now.
Sponsored by: FreeBSD Foundation
For the floppy driver, use fdcontrol to manipulate density selection.
For the CD drivers, the 'a' and 'c' suffix is without actual effect and
any applications insisting on it can be satisfied with a symlink:
ln -s /dev/cd0 /dev/cd0a
Ongoing discussion may result in these pieces of code being removed before
the 5-stable branch as opposed to after.
such a card is ejected, we'd panic. Instead, just ignore it.
I should also add a sanity check in the FUNCID code as well, but this
isn't wrong since the check is cheap and happens infrequently.
into targreadfilt(). Unlock around calls to notify_user(). If an application
is sending CCBs while the endpoint is shutting down, this may result in
incomplete disable. A more complete solution will come with a "dying" flag.
Submitted by: simokawa
a correctable DMA error. Failing to do so can cause the error interrupt
to be triggered over and over again.
- Clean up the comments for UEAFSR_* constants, fix a typo (UEAFSR_BLK is
(1 << 23), not (1 << 22)), and add two more. Also, add similar constants
for the CE AFSR bits.
We can't update the device description in attach (why not ?), so
we device_print() what we find.
Conditionalize the short cable fix on this being older than rev 16A.
Call device_printf() when we apply short cable fix.
Include interrupt hold-off setting for rev 16+ under "#ifdef notyet"
The device_printf()'s will go under bootverbose once the various
issues have settled a bit.
out of cdregister() and daregister(), which are run from interrupt context.
The sysctl code does blocking mallocs (M_WAITOK), which causes problems
if malloc(9) actually needs to sleep.
The eventual fix for this issue will involve moving the CAM probe process
inside a kernel thread. For now, though, I have fixed the issue by moving
dynamic sysctl variable creation for these two drivers to a task queue
running in a kernel thread.
The existing task queues (taskqueue_swi and taskqueue_swi_giant) run in
software interrupt handlers, which wouldn't fix the problem at hand. So I
have created a new task queue, taskqueue_thread, that runs inside a kernel
thread. (It also runs outside of Giant -- clients must explicitly acquire
and release Giant in their taskqueue functions.)
scsi_cd.c: Remove sysctl variable creation code from cdregister(), and
move it to a new function, cdsysctlinit(). Queue
cdsysctlinit() to the taskqueue_thread taskqueue once we
have fully registered the cd(4) driver instance.
scsi_da.c: Remove sysctl variable creation code from daregister(), and
move it to move it to a new function, dasysctlinit().
Queue dasysctlinit() to the taskqueue_thread taskqueue once
we have fully registered the da(4) instance.
taskqueue.h: Declare the new taskqueue_thread taskqueue, update some
comments.
subr_taskqueue.c:
Create the new kernel thread taskqueue. This taskqueue
runs outside of Giant, so any functions queued to it would
need to explicitly acquire/release Giant if they need it.
cd.4: Update the cd(4) man page to talk about the minimum command
size sysctl/loader tunable. Also note that the changer
variables are available as loader tunables as well.
da.4: Update the da(4) man page to cover the retry_count,
default_timeout and minimum_cmd_size sysctl variables/loader
tunables. Remove references to /dev/r???, they aren't used
any longer.
cd.9: Update the cd(9) man page to describe the CD_Q_10_BYTE_ONLY
quirk.
taskqueue.9: Update the taskqueue(9) man page to describe the new thread
task queue, and the taskqueue_swi_giant queue.
MFC after: 3 days
in a list head instead of a pointer to the first element at the time of
the first call. These lists are subject to change, and getdirtybuf()
would refetch from the wrong list in some cases.
Spottedy by: tegge
Pointy hat to: me
address of the device identified by its phandle_t by traversing OFW's
device tree. The space and address returned by this function can
subsequently be passed to sparc64_fake_bustag() to construct a valid
tag and handle for use by the newbus I/O functions.
Use of this function is expected to be limited to pre-newbus access to
devices, such as consoles and keyboards.
Partially obtained from: tmm
Reviewed by: jake, jmg, tmm
SBus testing made possible by: jake
Tested with: LINT
and replace it with the more intuitive name PCIR_BARS.
- Add a PCIR_BAR(x) macro that returns the config space register offset of
the 32-bit BAR x.
MFC after: 3 days
This replaces the current ioctl processing with a direct call path
from geom_dev() where the ioctl arrives (from SPECFS) to any directly
connected GEOM class.
The inverse of the above is no longer supported. This is the
situation were you have one or more intervening GEOM classes, for
instance a BSDlabel on top of a MBR or PC98. If you want to issue
MBR or PC98 specific ioctls, you will need to issue them on a MBR
or PC98 providers.
This paves the way for inviting CD's, FD's and other special cases
inside GEOM.
it in the last chunk (phys_avail block). The last chunk very often is
not larger than one or two pages, resulting in a msgbuf that's too
small to hold a complete verbose boot.
Note that pmap_steal_memory() will bzero the memory it "allocates".
Consequently, ia64 will never preserve previous msgbufs. This is not
a noticable difference in practice. If the msgbuf could be reused,
it was invariably too small to have anything preserved anyway.
Changes from the original implementation:
- Fragmentation is handled by the function m_fragment, which can
be called from whereever fragmentation is needed. Note that this
function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing
use.
- m_fragment works slightly differently from the old fragmentation
code in that it allocates a seperate mbuf cluster for each fragment.
This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent
fragments. While that is a nice feature in practice, it nerfed the
usefulness of mbuf_stress_test.
- Add two modes of random fragmentation. Chains with fragments all of
the same random length and chains with fragments that are each uniquely
random in length may now be requested.
o add locking
o strip irrelevant spl's
o split malloc types to better account for memory use
o remove unused IPSEC_NONBLOCK_ACQUIRE code
o remove dead code
Sponsored by: FreeBSD Foundation
NB: There is a known LOR on the forwarding path; this needs to be resolved
together with a similar issue in the bridge. For the moment it is
believed to be benign.
Sponsored by: FreeBSD Fondation
o remove irrlevant spl
Notes:
1. We don't lock domain list traversals as this is safe until we start
removing domains.
2. The calculation of max_datalen in net_init_domain appears safe as
noone depends on max_hdr and max_datalen having consistent values.
3. Giant is still held for fast and slow timeouts; this must stay until
each timeout routine is properly locked (coming soon).
Sponsored by: FreeBSD Fondation
bail out if the buffer is not already present.
- The buffer returned by incore() is not locked and should not be sent to
brelse(). Use getblk() with the new GB_NOCREAT flag to preserve the
desired semantics.
caller to acquire it. This permits drain_output() to be done atomically
with other operations as well as reducing the number of lock operations.
- Assert that the proper locks are held in drain_output().
- Change getdirtybuf() to accept a mutex as an argument. This mutex is used
to protect the vnode's buf list and the BKGRDWAIT flag. This lock is
dropped when we successfully acquire a buffer and held on return
otherwise. These semantics reduce the number of cumbersome cases in
calling code.
- Pass the mtx from getdirtybuf() into interlocked_sleep() and allow this
mutex to be used as the interlock argument to BUF_LOCK() in the LOCKBUF
case of interlocked_sleep().
- Change the return value of getdirtybuf() to be the resulting locked buffer
or NULL otherwise. This is for callers who pass in a list head that
requires a lock. It is necessary since the lock that protects the list
head must be dropped in getdirtybuf() so that we don't have a lock order
reversal with the buf queues lock in bremfree().
- Adjust all callers of getdirtybuf() to match the new semantics.
- Add a comment in indir_trunc() that points at unlocked access to a buf.
This may also be one of the last instances of incore() in the tree.
growable (stack) entries that not only grow down, but also grow up.
Have vm_map_growstack() take these flags into account when growing
an entry.
This is the first step in adding support for upward growable stacks.
It is a required feature on ia64 to support the register stack (or
rstack as I like to call it -- it also means reverse stack). We do
not currently create rstacks, so the upward growing is not exercised
and the change should be a functional no-op.
Reviewed by: alc