Don't allow cpu entries in the MP table to contain APIC IDs out of range.
Don't write outside array boundaries if an IO APIC entry in the MP table
contains an APIC ID out of range.
Assign APIC IDs for all IO APICs according to section 3.6.6 in the
Intel MP spec:
- If the current APIC ID on an IO APIC doesn't conflict with other
IO APICs or CPUs, that APIC ID should be used. The copy of the MP
table must be updated if the corresponding APIC ID in the MP table
is different.
- If the current APIC ID was in conflict with other units, the
corresponding APIC ID specified in the MP table is checked for conflict.
- If a conflict is still found then fall back to using a new unique ID.
The copy of the MP table must be updated.
- IDs out of range is considered to be in conflict.
During these operations, the IO_TO_ID array cannot be used, since any
conflict would have caused information loss. The array is then corrected,
since all APIC ID conflicts should have been resolved.
PR: 20312, 18919
errors were normally harmless because they were in unreachable code
and gcc apparently doesn't check the syntax inside asm statements
that it optimizes away.
Further experimentation showed that some Dell 2450 machines with the
prevention kludge installed still got T_RESERVED traps. CPU interrupt
vector 0x7A was observed to be triggered. This might have been the
bitwise OR of two different vectors sent from each of the IOAPICs at
the same time.
IOAPIC #0: 0x68 --> irq 8: RTC timer interrupt
IOAPIC #1: 0x32 --> irq 18: scsi host adapter or network interface
----
0x7a --> T_RESERVED
Both IOAPICs had ID 0.
Appendix B.3 in the MP spec indicates that the operating system is
responsible for assigning unique IDs to the IOAPICs.
The enclosed patch programs the IOAPIC IDs according to the IOAPIC
entries in the MP table.
Submitted by: tegge
to various pmap_*() functions instead of looking up the physical address
and passing that. In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.
Inspired by: John Dyson's 1998 patches.
Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t. This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system. (8 bytes on the Alpha).
Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc). Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries. This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).
Add a function to add a new page to the freelist. This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.
Reviewed by: dillon
specifies the instruction's operation size. GCC will default to 32-bit
operands reguardless of the prototype (ie, formal parameters' type)
of an inline function.
- Add support for using the PCI BIOS functions for configuration space
accesses, and make this the default.
- Make PNPBIOS the default (obsoletes the PNPBIOS config option).
- Add two new boot-time tunables to disable each of the above.
via sysctl. It's done pretty simply but it should be quite adequate.
Also move SHMMAXPGS from $machine/include/vmparam.h as the comments that
went with it were wrong... we don't allocate KVM space for the pages so
that comment is bogus.. The only practical limit is how much physical
ram you want to lock up as this stuff isn't paged out or swap backed.
includes one of bus_at386.h and bus_pc98.h. Becuase only bus_pc98.h
supports indirect pio and bus_at386.h is identical to old bus.h, there
is no functional change in PC-AT's kernels. That is, it cannot cause
performance loss.
Submitted by: nyan
Reviewed by: imp
bde and luoqi provided useful comments for earlier version.
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).
A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.
Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.
This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.
This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.
Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch
was using them exits.
Don't allow a user process to cause the kernel to take a TRCTRAP on a
user space address.
Reviewed by: jlemon, sef
Approved by: jkh
the low level interrupt handler number should be used. Change
setup_apic_irq_mapping() to allocate low level interrupt handler X (Xintr${X})
for any ISA interrupt X mentioned in the MP table.
Remove an assumption in the driver for the system clock (clock.c) that
interrupts mentioned in the MP table as delivered to IOAPIC #0 intpin Y
is handled by low level interrupt handler Y (Xintr${Y}) but don't assume
that low level interrupt handler 0 (Xintr0) is used.
Don't allocate two low level interrupt handlers for the system clock.
Reviewed by: NOKUBI Hirotaka <hnokubi@yyy.or.jp>
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.
The UPAGES have not been there since Jan '96, but the hole was preserved
for BSD/OS binary compatability. This has been fixed other ways (%ebx
now has a pointer to PS_STRINGS), and the stack is nowhere near where
it used to be so this hack isn't required anymore.
and extend. The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).
Agreed with in principle by: dufault
to use a locked cmpexg when unlocking a lock that we already hold, since
nobody else can touch the lock while we hold it. Second, it is not
necessary to use a locked cmpexg when locking a lock that we already
hold, for the same reason. These changes will allow MP locks to be used
recursively without impacting performance.
Modify two procedures that are called only by assembly and are already
NOPROF entries to pass a critical argument in %edx instead of on the
stack, removing a significant amount of code from the critical path
as a consequence.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>, Peter Wemm <peter@netplex.com.au>
Historically, the documentation of extended asm was lacking, namely you
should NOT specify the same register as an input, and a clobber.
If the register is clobbered, it should be specified as an output as well,
e.g., by linking input and output through the "number" notation.
(Beware of lvalues, some local variables needed...)
URL:http://egcs.cygnus.com/faq.html
In versions up to egcs-1.1.1, the compiler did not even warn about it,
but it was liable to output bad code. Newer egcs are pickier and simply
refuse to swallow such code.
Note, since *addr changes, it needs to be an output operand.
We might be excessive in saying that all memory has changed.
Obtained from: OpenBSD
w/extra thanks to Marc Espie <Marc.Espie@liafa.jussieu.fr>
the countdown register.
this should not be necessary but there are broken laptops that
do not restore the countdown register on resume.
when it happnes, it messes up the hardclock interval and system clock,
which leads to the infamous "calcru: negative time" problem.
Submitted by: kjc, iwasaki
Reviewed by: Steve O'Hara-Smith <steveo@eircom.net> and committers.
Obtained from: PAO3
* Change the hack used on the alpha for mapping devices into DENSE or
BWX memory spaces to a simpler one. Its still a hack and should be
a seperate api to explicitly map the resource.
* Add $FreeBSD$ as necessary.
can provide the correct context to each signal handler.
Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).
Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.
Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.
Reviewed by: marcel, jdp, bde
o Remove unused defines from genassym.c that were needed
by the trampoline.
o Add load_gs_param function to support.s that catches
a fault when %gs is loaded with an invalid descriptor.
The function returns EFAULT in that case.
o Remove struct trapframe from mcontext_t and replace it
with the list of registers.
o Modify sendsig and sigreturn accordingly.
This commit contains a patch by bde.
Reviewed by: luoqi, jdp
struct sigcontext and ucontext_t/mcontext_t are defined in such
a way that both (ie struct sigcontext and ucontext_t) can be
passed on to sigreturn. The signal handler is still given a
ucontext_t for maximum flexibility.
For backward compatibility sigreturn restores the state for the
alternate signal stack from sigcontext.sc_onstack and not from
ucontext_t.uc_stack. A good way to determine which value the
application has set and thus which value to use, is still open
for discussion.
NOTE: This change should only affect those binaries that use
sigcontext and/or ucontext_t. In the source tree itself
this is only doscmd. Recompilation is required for those
applications.
This commit also fixes a lot of style bugs without hopefully
adding new ones.
NOTE: struct sigaltstack.ss_size now has type size_t again. For
some reason I changed that into unsigned int.
Parts submitted by: bde
sigaltstack bug found by: bde
-----------------------------
By introducing a new sigframe so that the signal handler operates
on the new siginfo_t and on ucontext_t instead of sigcontext, we
now need two version of sendsig and sigreturn.
A flag in struct proc determines whether the process expects an
old sigframe or a new sigframe. The signal trampoline handles
which sigreturn to call. It does this by testing for a magic
cookie in the frame.
The alpha uses osigreturn to implement longjmp. This means that
osigreturn is not only used for compatibility with existing
binaries. To handle the new sigset_t, setjmp saves it in
sc_reserved (see NOTE).
the struct sigframe has been moved from frame.h to sigframe.h
to handle the complex header dependencies that was caused by
the new sigframe.
NOTE: For the i386, the size of jmp_buf has been increased to hold
the new sigset_t. On the alpha this has been prevented by
using sc_reserved in sigcontext.
the OS does FXSAVE/FXRESTOR instructions (fast FPU save/restore) during
context switching and also enables SIMD since this enables saving the
extra CPU context that isn't saved with normal FPU regs. The other
enables the SIMD instructions to use exception 16 (FPU) error reporting.
Note, this doesn't turn on SIMD, just defines the bits.
1. Move definitions of struct i386_*_args to the header file sysarch.h,
since they are part of the sysarch API. struct i386_get_ldt_args and
i386_set_ldt_args were identical, therefore make them into one
struct i386_ldt_args. Libc should use these definitions as well.
2. Return a more sensible EOPNOTSUPP for unknown operations.
Reviewed by: marcel
into two parts - one to do the bsfl and the other to convert the result
(base 0) to ffs()-like (base 1) in inline C. This enables the optimizer
to be a lot smarter in certain cases, like where it knows that the argument
is non-zero and we want ffs(known non zero arg) - 1. This appears to
produce identical code to the old inline when the argument is unknown.
that are linked into the kernel. The KLD compilation options are
changed to call these functions, rather than in-lining the
atomic operations.
This approach makes atomic operations from KLDs significantly
faster on UP systems (though somewhat slower on SMP systems).
PR: i386/13111
Submitted by: peter.jeremy@alcatel.com.au
0x40 and then access data stored in real-mode segment 0x40, even when
called in protected mode. Microsoft unfortunately coddle these individuals,
and so must we if we want to run their code.
This change works around GPFs in some APM and PnP BIOS implementations.
Obtained from: Linux
- Add support for calling 32-bit code in other segments
- Add support for calling 16-bit protected mode code
Update APM to use this facility.
Submitted by: jlemon
equivalent to SYS_RES_MEMORY for x86 but for alpha, the rman_get_virtual()
address of the resource is initialised to point into either dense-mapped
or bwx-mapped space respectively, allowing direct memory pointers to be
used to device memory.
Reviewed by: Andrew Gallatin <gallatin@cs.duke.edu>
macros) to the signal handler, for old-style BSD signal handlers as
the second (int) argument, for SA_SIGINFO signal handlers as
siginfo_t->si_code. This is source-compatible with Solaris, except
that we have no <siginfo.h> (which isn't even mentioned in POSIX
1003.1b).
An rather complete example program is at
http://www3.cons.org/cracauer/freebsd-signal.c
This will be added to the regression tests in src/.
This commit also adds code to disable the (hardware) FPU from
userconfig, so that you can use a software FP emulator on a machine
that has hardware floating point. See LINT.
Change "void *" to "volatile TYPE *", improving type safety
and eliminating some warnings (e.g., mp_machdep.c rev 1.106).
cpufunc.h:
Eliminate setbits. As defined, it's not precisely correct;
and it's redundant. (Use atomic_set_int instead.)
ipl_funcs.c:
Use atomic_set_int instead of setbits.
systm.h:
Include atomic.h.
Reviewed by: bde
the caller to specify a function to be guarded between an entry and exit
barrier, as well as pre- and post-barrier functions.
The primary use for this function is synchronised update of per-cpu private
data. The implementation is almost (but not quite) MI; with a better
mechanism for masking per-CPU interrupts it could probably be hoisted.
Reviewed by: peter (partially)
with respect to interrupts on UP systems. (The upgrade from gcc 2.7.x
to egcs 1.1.2 produced at least one non-atomic code sequence in
swap_pager_getpages.)
In addition, the primitives are now SMP-safe, but only on SMPs. (For
portability between SMPs and UPs, modules are compiled with the SMP-safe
versions.)
Submitted by: dillon and myself
Reviewed by: bde
than a review, this was a nice puzzle.
This is supposed to be binary and source compatible with older
applications that access the old FreeBSD-style three arguments to a
signal handler.
Except those applications that access hidden signal handler arguments
bejond the documented third one. If you have applications that do,
please let me know so that we take the opportunity to provide the
functionality they need in a documented manner.
Also except application that use 'struct sigframe' directly. You need
to recompile gdb and doscmd. `make world` is recommended.
Example program that demonstrates how SA_SIGINFO and old-style FreeBSD
handlers (with their three args) may be used in the same process is at
http://www3.cons.org/tmp/fbsd-siginfo.c
Programs that use the old FreeBSD-style three arguments are easy to
change to SA_SIGINFO (although they don't need to, since the old style
will still work):
Old args to signal handler:
void handler_sn(int sig, int code, struct sigcontext *scp)
New args:
void handler_si(int sig, siginfo_t *si, void *third)
where:
old:code == new:second->si_code
old:scp == &(new:si->si_scp) /* Passed by value! */
The latter is also pointed to by new:third, but accessing via
si->si_scp is preferred because it is type-save.
FreeBSD implementation notes:
- This is just the framework to make the interface POSIX compatible.
For now, no additional functionality is provided. This is supposed
to happen now, starting with floating point values.
- We don't use 'sigcontext_t.si_value' for now (POSIX meant it for
realtime-related values).
- Documentation will be updated when new functionality is added and
the exact arguments passed are determined. The comments in
sys/signal.h are meant to be useful.
Reviewed by: BDE
behavior slightly.
If machine/bus.h is included, but neither bus_memio.h nor bus_pio.h
are included, then behave as if both were included.
This won't change existing drivers, all of which include one or more
of bus_{p,mem}io.h, but will allow drivers from other systems to come
over with fewer changes. I freely admit that this might not be
optimal for some drivers, but those drivers can be optimized for
FreeBSD after the initial bringup happens.
Without the change, there is a bug that preclude drivers from
compiling with strange warning/errors.
I've been running this here for a while now w/o ill effects.
Reviewed by: gibbs
Not objected to by: bde, arch@ list.
automatically hacks on the active copy of the IDT if f00f_hack()
has changed it. This also allows simplifications in setidt().
This fixes breakage of FP exception handling by rev.1.55 of
sys/kernel.h. FP exceptions were sent to npx.c's probe handlers
because npx.c "restored" the old handlers to the wrong copy of the
IDT. The SYSINIT for f00f_hack() was purposely run quite late to
avoid problems like this, but it is bogusly associated with the
SYSINIT for proc0 so it was moved with the latter.
Problem reported and fix tested by: Martin Cracauer <cracauer@cons.org>
implicitly LOCK'ed instruction), so there shouldn't be any harm in making
it volatile pointer compatable for one of the users of it. It seems to
generate the same code regardless.
for elf kernels (it is broken for all kernels due to lack of egcs support).
Renaming of many assembler labels is avoided by declaring by declaring
the labels that need to be visible to gprof as having type "function"
and depending on the elf version of gprof being zealous about discarding
the others. A few type declarations are still missing, mainly for SMP.
PR: 9413
Submitted by: Assar Westerlund <assar@sics.se> (initial parts)
range attributes after they have been extracted from the master.
Hook up the i686 MP code to do this for each AP.
Be more careful about printing the default memory type for the i686.
Suggestions from: luoqi
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.
Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>
the address of the ps_strings structure to the process via %ebx.
For other kinds of binaries, %ebx is still zeroed as before.
Submitted by: Thomas Stephens <tas@stephens.org>
Reviewed by: jdp
In particular, replace the unused field pmap::pm_flag by pmap::pm_active,
which is a bit mask representing which processors have the pmap activated.
(Thus, it is a simple Boolean on UPs.)
Also, eliminate an unnecessary memory reference from cpu_switch()
in swtch.s.
Assisted by: John S. Dyson <dyson@iquest.net>
Tested by: Luoqi Chen <luoqi@watermarkgroup.com>,
Poul-Henning Kamp <phk@critter.freebsd.dk>
bootblocks in order to boot the kernel after this! Also note that this
change breaks BSDI BSD/OS compatibility.
Also increased default NKPT to 17 so that FreeBSD can boot on machines
with >=2GB of RAM. Booting on machines with exactly 4GB requires other
patches, not included.
numbers as chars or use bogus casts in an attempt to unmisrepresnt
them. In top, don't assume that 0xff is the only negative cpu
number when cpu numbers are (mis)represented.
is the preparation step for moving pmap storage out of vmspace proper.
Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>
put it, just like on the Alpha. It was wrong to load it at the
fixed address 0x08000000. That should only be done if the dynamic
linker is an executable (not a shared object) with a specific load
address encoded in the object file itself.
This fixes the recent breakage in the Linux emulator.
to an architecture-specific value defined in <machine/elf.h>. This
solves problems on large-memory systems that have a high value for
MAXDSIZ.
The load address is controlled by a new macro ELF_RTLD_ADDR(vmspace).
On the i386 it is hard-wired to 0x08000000, which is the standard
SVR4 location for the dynamic linker.
On the Alpha, the dynamic linker is loaded MAXDSIZ bytes beyond
the start of the program's data segment. This is the same place
a userland mmap(0, ...) call would put it, so it ends up just below
all the shared libraries. The rationale behind the calculation is
that it allows room for the data segment to grow to its maximum
possible size.
These changes have been tested on the i386 for several months
without problems. They have been tested on the Alpha as well,
though not for nearly as long. I would like to merge the changes
into 3.1 within a week if no problems have surfaced as a result of
them.
Sun implemented iBCS2 compatibility on Solaris >= 2.6: The emulator
runs in user-mode, patching the LDT so that client programs making
syscalls through the old iBCS2 call gate get handled by the emulator
process. Unemulated syscalls therefore need their own call-gate that
bypasses the emulator. Sun chose LDT entry 4 to implement this, which
is what we've been using as LUDATA_SEL, so we need to change LUDATA_SEL
if we want to run Solaris executables.
Discussed with: Mike Smith
installed.
Remove cpu_power_down, and replace it with an entry at the end of the
SHUTDOWN_FINAL queue in the only place it's used (APM).
Submitted by: Some ideas from Bruce Walter <walter@fortean.com>
The code was originaly contributed by Kelly Yancey
<kbyanc@freedomnet.com> in PR i386/6269 and revised by Akio Morita
<amorita@meadow.scphys.kyoto-u.ac.jp> and me. Test was performed by
Akio Morita and Toshiomi Moriki <moriki@db.is.kyushu-u.ac.jp>.
- Fix stylistic bug in identcpu.c.
- Update copyright in initcpu.c
- Fix typo in LINT.
PR: 6269 and 6270
The last consumer of this code (the old SCSI system) has left us and
the CAM code does it's own bouncing. The isa dma system has been
doing it's own bouncing for a while too.
Reviewed by: core
and set_regs() but for the floating point register state. The code
is stolen from procfs_machdep.c, and moved out of there into
machdep.c.
These functions are needed for generating ELF core dumps.
the relevant characteristics of the native machine, for building
and checking Elf_Ehdr structures.
Add structures to represent ELF "note" headers. Add defines for the
note types used in ELF core files.
the executable file, so it will work for both a.out and ELF format
files. I have split the object format specific code into separate
source files. It's cleaner than it was before, but it's still
pretty crufty.
Don't cheat on your make world for this update. A lot of things
have to be rebuilt for it to work, including the compiler and all
of the profiled libraries.
and use this when masking/unmasking interrupts.
Maintain a mapping from (iopaic number, int pin) tuple to irq number,
and use this when configuring devices and programming the ioapics.
Previous code assumed that irq number was equal to int pin number, and
that the ioapic number was 0.
Don't let an AP enter _cpu_switch before all local apics are initialized.
- moved definition of MACHINE_ARCH from cpu.h to parm.h as alpha.
- Added definitions of _MACHINE and _MACHINE_ARCH.
- Added hw.ispc98. The hw.ispc98 is 1 in PC98 kernel and is 0 in
IBM-PC kernel.
Discussed with: John Birrell <jb@FreeBSD.ORG>
Add some overflow checks to read/write (from bde).
Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags
and vm_object::paging_in_progress to use operations which are not
interruptable.
Reviewed by: Bruce Evans <bde@zeta.org.au>
in a SMP system. Unexpected things could happen if each cpu
has a different ldt setting and one cpu tries to use value
of currentldt set by another cpu.
The fix is to move currentldt to the per-cpu area. It includes
patches I filed in PR i386/6219 which are also user ldt related.
PR: i386/7591, i386/6219
Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
applications. Here's how it works.
The kernel should include <machine/elf.h> to get the definitions
for the native architecture. It can reference the various ELF
structures with generic names like Elf_Sym, Elf_Shdr, etc. A define
__ELF_WORD_SIZE is also available with the value 32 or 64 as
appropriate for the native architecture.
Generic applications should include <elf.h>, which is just a wrapper
for <machine/elf.h>.
Applications such as object file dumpers that need to deal with
foreign ELF files can include <sys/elf32.h> and/or <sys/elf64.h>.
Both can be included from the same source file if desired. The
structure names must be referenced using wordsize-specific names
like Elf32_Sym, Elf64_Shdr, etc.
I haven't change the alpha stuff, but I haven't broken it either.
Cast pointers to (vm_offset_t) instead of to (u_long) (as before) or to
(uintptr_t)(void *) (as would be more correct). Don't cast vm_offset_t's
to (u_long) just to do arithmetic on them.
mp_machdep.c:
Cast pointers to (uintptr_t) instead of to (u_long). Don't forget
to cast pointers to (void *) first or to recover from integral
possible integral promotions, although this is too much work for
machine-dependent code.
vm code generally avoids warnings for pointer vs long size mismatches
by using vm_offset_t to represent pointers; pmap.c often uses plain
`unsigned int' instead of vm_offset_t and didn't use u_long elsewhere,
but this style was messed up by code apparently imported from mp_machdep.c.
suitable for holding object pointers (ptrint_t -> uintptr_t).
Added corresponding signed type (intptr_t). Changed/added
corresponding non-C9x types for function pointers to match. Don't
use nonstandard types to implement these types, and don't comment
on them in <machine/types.h>.
just to ensure 32-bit variables. Doing so broke and/or pessimized
i386's with 64-bit longs (unnecessary use of 64-bit variables
caused remarkably few problems in C code, but the inline asm here
tended to fail because there are no 64-bit registers). Since the
interfaces here are very machine-dependent and shouldn't be used
outside of the kernel, use a standard types of "known" width instead
of fixed-width types.
Changed all quad_t's to u_int64_t's. quad_t isn't standard, and
using signed types for 64-bit registers was bogus (but made no
difference).
least unsuitable for holding an object pointer. This should have been
used to fix warnings about casts between pointers and ints on alphas.
Moved corresponding existing general typedef (fptrint_t) for function
pointers from the i386 <machine/profile.h> to a kernel-only typedef
in <machine/types.h>. Kludged libc/gmon/mcount.c so that it can
still see this typedef.
Clean up (or if antipodic: down) some of the msgbuf stuff.
Use an inline function rather than a macro for timecounter delta.
Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.
Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()
This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.
WARNING: Programs which muck about with struct proc in userland
will have to be fixed.
Reviewed, but found imperfect by: bde
available. The per-cpu variable ss_tpr has been replaced by ss_eflags.
This reduced the number of interrupts sent to the wrong CPU, due to
the cpu having the global lock being inside a critical region.
Remove some unneeded manipulation of tpr register in mplock.s.
Adjust code in mplock.s to be aware of variables on the stack being
destroyed by MPgetlock if GRAB_LOPRIO is defined.
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).
Don't attempt to switch to the BSP in boot(). If the system was idle when
an interrupt caused a panic, this won't work. Instead, switch to the BSP
in cpu_reset.
Remove some spurious forward_statclock/forward_hardclock warnings.
ints. Remove some no longer needed casts. Initialize the per-cpu
global data area using the structs rather than knowing too much about
layout, alignment, etc.
- Attempt to handle PCI devices where the interrupt is
an ISA/EISA interrupt according to the mp table.
- Attempt to handle multiple IO APIC pins connected to
the same PCI or ISA/EISA interrupt source. Print a
warning if this happens, since performance is suboptimal.
This workaround is only used for PCI devices.
With these two workarounds, the -SMP kernel is capable of running on
my Asus P/I-P65UP5 motherboard when version 1.4 of the MP table is disabled.
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
f00f_hack has run.
Use the global r_idt descriptor in f00f_hack when in SMP mode,
so the APs find the relocated interrupt descriptor table.
Submitted by: Partially from David A Adkins <adkin003@tc.umn.edu>
interrupts are masked, and EOI is sent iff the corresponding ISR bit
is set in the local apic. If the CPU cannot obtain the interrupt
service lock (currently the global kernel lock) the interrupt is
forwarded to the CPU holding that lock.
Clock interrupts now have higher priority than other slow interrupts.
the signal handling latency for cpu-bound processes that performs very
few system calls.
The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.
2) Do not unnecessarily force page blocking when paging
pages out.
3) Further improve swap pager performance and correctness,
including fixing the paging in progress deadlock (except
in severe I/O error conditions.)
4) Enable vfs_ioopt=1 as a default.
5) Fix and enable the page prezeroing in SMP mode.
All in all, SMP systems especially should show a significant
improvement in "snappyness."
in a way identically as before.) I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE. Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.
Add some accouting for vm_zone memory allocations, and provide properly
for vm_zone allocations out of the kmem_map. Also move the vm_zone
allocation stats to the VM OID tree from the KERN OID tree.
Highlights:
* Simple model for underlying hardware.
* Hardware basis for timekeeping can be changed on the fly.
* Only one hardware clock responsible for TOD keeping.
* Provides a real nanotime() function.
* Time granularity: .232E-18 seconds.
* Frequency granularity: .238E-12 s/s
* Frequency adjustment is continuous in time.
* Less overhead for frequency adjustment.
* Improves xntpd performance.
Reviewed by: bde, bde, bde
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.
Move sigjmp_buf and jmp_buf structure definitions to machine/setjmp.h
so that i386 can continue to use int as the basic register type and
alpha can use long. Bruce was concerned about possible differing
alignment. I've left the definition of _JBLEN in machine/setjmp.h
even though Bruce's example used the number directly. I don't know if
any other code relies on _JBLEN, so I left it to avoid potential
breakage.
- A nonprofiling version of s_lock (called s_lock_np) is used
by mcount.
- When profiling is active, more registers are clobbered in
seemingly simple assembly routines. This means that some
callers needed to save/restore extra registers.
- The stack pointer must have space for a 'fake' return address
in idle, to avoid stack underflow.
noticed some major enhancements available for UP situations. The number
of UP TLB flushes is decreased much more than significantly with these
changes. Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.
Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.
in <machine/cpu.h>. Moved the declarations to <machine/cputypes.h>.
Fixed style bugs in the moved code. Fixed everything that depended on
the nested include. Don't include <machine/cpu.h> (in the changed files)
unless something in it is used directly.
and fixed everything that dependended on it being declared in the old
place. It is used in "machine-independent" code in subr_prof.c.
Moved declaration of btext from subr_prof.c to <machine/cpu.h>. It
is machine-dependent.
in a P6 SMP system. Some MB bios'es don't set the registers up correctly
for the AP's. Additionally, set the memory between 0xa0000 and 0xbffff
as write combining.
PR: 4486
Submitted by: tegge@idi.ntnu.no (Tor Egge)
Implement a function is_adapter_memory() in order to determine what
should nto be dumped at all. Currently, only populated with the ``ISA
memory hole''. Adapter regions of other busses should be added.
holding CPU along with the lock. When a CPU fails to get the lock
it compares its own id to the holder id. If they are the same it
panic()s, as simple locks are binary, and this would cause a deadlock.
Controlled by smptests.h: SL_DEBUG, ON by default.
Some minor cleanup.
Add a simplelock to deal with disable_intr()/enable_intr() as used in UP kernel.
UP kernel expects that this is enough to guarantee exclusive access to
regions of code bracketed by these 2 functions.
Add a simplelock to bracket clock accesses in clock.c: clock_lock.
Help from: Bruce Evans <bde@zeta.org.au>
smp_active = 1 used to indicate that the system had frozen previously
started AP's, while smp_active = 0 was "AP's not yet started". I have split
this into smp_started (which is set when the AP's come online), and
smp_active is left for turning on/off AP scheduling.
- We now have enough per-cpu idle context, the real idle loop has been
revived (cpu's halt now with nothing to do).
- Some preliminary support for running some operations outside the
global lock (eg: zeroing "free but not yet zeroed pages") is present
but appears to cause problems. Off by default.
- the smp_active sysctl now behaves differently. It's merely a 'true/false'
option. Setting smp_active to zero causes the AP's to halt in the idle
loop and stop scheduling processes.
- bootstrap is a lot safer. Instead of sharing a statically compiled in
stack a number of times (which has caused lots of problems) and then
abandoning it, we use the idle context to boot the AP's directly. This
should help >2 cpu support since the bootlock stuff was in doubt.
- print physical apic id in traps.. helps identify private pages getting
out of sync. (You don't want to know how much hair I tore out with this!)
More cleanup to follow, this is more of a checkpoint than a
'finished' thing.
Added a new variable, 'bsp_apic_ready', which is set as soon as the bootstrap
CPU has initialized its local APIC. Conditionalize the GENSPLR functions
to call ss_lock ONLY after bsp_apic_ready is TRUE; This should prevent
any problems with races between the time the 1st AP becomes ready and the
time smp_active is set.
Made NEW_STRATEGY default.
Removed misc. old cruft.
Centralized simple locks into mp_machdep.c
Centralized simple lock macros into param.h
More cleanup in the direction of making splxx()/cpl MP-safe.
Several new fine-grained locks.
New FAST_INTR() methods:
- separate simplelock for FAST_INTR, no more giant lock.
- FAST_INTR()s no longer checks ipending on way out of ISR.
sio made MP-safe (I hope).
We now tsleep() in kthread_init() between start_init()
and prepare_usermode() while waiting for ALL the idle_loop()
processes to come online.
Debugged & tested by: "Thomas D. Dean" <tomdean@ix.netcom.com>
Reviewed by: David Greenman <dg@root.com>
Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>,
Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>,
and probably alot of others.
Submitted by: Jnathan Lemon <jlemon@americantv.com>
This code was eliminated when the PEND_INTS algorithm was added. But it was
discovered that PEND_INTS only worsen latency for FAST_INTR() routines,
which can't be marked pending.
Noticed & debugged by: dave adkins <adkin003@gold.tc.umn.edu>
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.
mplock.s:
- minor update of try_mplock for new algorithm where a CPU uses try_mplock
instead of get_mplock in the ISRs.
Macros to convert the Lite2 lock manager primitives to the names used
in the kernel proper. This allows us to hide them from the lock
manager till they can be turned on.
smp.h:
declarations for the new simplelock functions.
- s_lock_init()
- s_lock()
- s_lock_try()
- s_unlock()
Created lock for IO APIC and apic_imen (SMP version of imen)
- imen_lock
Code to use imen_lock for access from apic_ipl.s and apic_vector.s.
Moved this code *outside* of mp_lock.
It seems to work!!!
1) Make sure that the region mapped by a 4MB page is
properly aligned.
2) Don't turn on the PG_G flag in locore for SMP. I plan
to do that later in startup anyway.
3) Make sure the 2nd processor has PSE enabled, so that 4MB
pages don't hose it.
We don't use PG_G yet on SMP -- there is work to be done to make that
work correctly. It isn't that important anyway...
of the kernel, and also most of the dynamic parts of the kernel. Additionally,
4MB pages will be allocated for display buffers as appropriate (only.)
The 4MB support for SMP isn't complete, but doesn't interfere with operation
either.
this code is controlled by smptests.h: TEST_CPUSTOP, OFF by default
new code for handling mixed-mode 8259/APIC programming without 'ExtInt'
this code is controlled by smptests.h: TEST_ALTTIMER, ON by default
- TEST_CPUSTOP adds stop_cpus()/restart_cpus(), OFF by default
- TEST_ALTTIMER new method for attaching 8259 PIC to APIC
this method avoids 'ExtInt' programming, ON by default
- TIMER_ALL sends 8259/8254 timer INTs to all CPUs, ON by default
- ASMPOSTCODExxx code to display bytes to POST hardware, OFF by default
General cleanup.
New functions to stop/start CPUs via IPIs:
- int stop_cpus( u_int map );
- int restart_cpus( u_int map );
Turned off by default, enabled via smptests.h:TEST_CPUSTOP.
Current version has a BUG, perhaps a deadlock?
Till now NMIs would be ignored. Now an NMI is caught by the BSP.
APs still ignore NMI, am working on code to allow a CPU to stop other CPUs
via an IPI.
available to the kernel (VM_KMEM_SIZE). The default (32 MB) is too low
when having 512 MB or more physical memory in a server environment. This is
relevant on systems where "panic: kmem_malloc: kmem_map too small" is a
problem.
This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])
There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.
This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.
Reviewed by: fsmp, dyson
cost since it is only done in cpu_switch(), not for every exception.
The extra state is kept in the pcb, and handled much like the npx state,
with similar deficiencies (the state is not preserved across signal
handlers, and error handling loses state).
Changes to pmap.c for lapic_t lapic && ioapic_t ioapic pointers,
currently equal to apic_base && io_apic_base, will stand alone with the
private page mapping.
apic.h has defines like:
#define lapic__id lapic->id
Once private pages and "known virtual addr" mapping of the APICs is
ready all 'lapic__XXX' will be changed to 'lapic.XXX', and the defines
will be removed.
Changes to smp.h for lapic_t lapic && ioapic_t ioapic pointers,
currently equal to apic_base && io_apic_base, will stand alone with the
private page mapping.
reality. There will be a new call interface, but for now the file
pci_compat.c (which is to be deleted, after all drivers are converted)
provides an emulation of the old PCI bus driver functions. The only
change that might be visible to drivers is, that the type pcici_t
(which had been meant to be just a handle, whose exact definition
should not be relied on), has been converted into a pcicfgregs* .
The Tekram AMD SCSI driver bogusly relied on the definition of pcici_t
and has been converted to just call the PCI drivers functions to access
configuration space register, instead of inventing its own ...
This code is by no means complete, but assumed to be fully operational,
and brings the official code base more in line with my development code.
A new generic device descriptor data type has to be agreed on. The PCI
code will then use that data type to provide new functionality:
1) userconfig support
2) "wired" PCI devices
3) conflicts checking against ISA/EISA
4) maps will depend on the command register enable bits
5) PCI to Anything bridges can be defined as devices,
and are probed like any "standard" PCI device.
The following features are currently missing, but will be added back,
soon:
1) unknown device probe message
2) suppression of "mirrored" devices caused by ancient, broken chip-sets
This code relies on generic shared interrupt support just commited to
kern_intr.c (plus the modifications of isa.c and isa_device.h).
This is now the default, it delays most of the MP startup to the function
machdep.c:cpu_startup(). It should be possible to move the 2 functions
found there (mp_start() & mp_announce()) even further down the path once
we know exactly where that should be...
Help from: Peter Wemm <peter@spinner.dialix.com.au>
- The 1st (preparse_mp_table()) counts the number of cpus, busses, etc. and
records the LOCAL and IO APIC addresses.
- The 2nd pass (parse_mp_table()) does the actual parsing of info and recording
into the incore MP table.
This will allow us to defer the 2nd pass untill malloc() & private pages
are available (but thats for another day!).
panic( "xxxxx\n" );
to:
printf( "xxxxx\n" );
panic( "\n" );
For some as yet undetermined reason the argument to panic() is often NOT
printed, and the system sometimes hangs before reaching the panic printout.
So we hopefully at least print some useful info before the hang, as oppossed to
leaving the user clueless as to what has happened.
all uses of it with the equivalent calls to setbits().
This change incidentally eliminates a problem building ELF kernels
that was caused by SETBITS.
Reviewed by: fsmp, peter
Submitted by: bde
simplifies some assumptions and stops some code compile problems.
This should fix the compile hiccup in PR#3491, but smp kernel profiling
isn't likely to be fixed by this.
- one-liners all become inline.
- multi-liners become functions.
- FAST_IPI defines go away.
re-worked APICIPI_BANDAID code.
- now refered to as DETECT_DEADLOCK.
- on by default.
This code re-numbers PCI busses in the MP table to match PCI semantics
when the MP BIOS fails to do it properly.
Reviewed by: Peter Wemm <peter@spinner.DIALix.COM>
Peter Wemm <peter@spinner.DIALix.COM>, Steve Passe <smp@csn.net>
removed all the IPI_INTS code.
made the XFAST_IPI32 code default, renaming Xfastipi32 to Xinvltlb.
There are various options documented in i386/conf/LINT, there is more to
come over the next few days.
The kernel should run pretty much "as before" without the options to
activate SMP mode.
There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.
This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!
for syscalls, so one frame was lost in backtraces from syscalls.
This is handled better in the kernel by using a different mcount
entry point for profiling before the frame pointer is set up.
Expand RCSID().
Use .p2align instead of the ambiguous .align.
Added idempotency ifdef.
Removed unused macros ALTENTRY(), ALTASENTRY(), ASENTRY(), _MID_ENTRY.
Cleaned up formatting.
Reviewed by: jdp reviewed an old version
Obtained from: parts from NetBSD
have successfully built, booted, and run a number of different ELF
kernel configurations, including GENERIC. LINT also builds and
links cleanly, though I have not tried to boot it.
The impact on developers is virtually nil, except for two things.
All linker sets that might possibly be present in the kernel must be
listed in "sys/i386/i386/setdefs.h". And all C symbols that are
also referenced from assembly language code must be listed in
"sys/i386/include/asnames.h". It so happens that failure to do
these things will have no impact on the a.out kernel. But it will
break the build of the ELF kernel.
The ELF bootloader works, but it is not ready to commit quite yet.
Rename the PT* index KSTK* #defines to UMAX*, since we don't have a kernel
stack there any more..
These are used to calculate VM_MAXUSER_ADDRESS and USRSTACK, and really
do not want to be changed with UPAGES since BSD/OS 2.x binary compatability
depends on it.
space. (!)
Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.
Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.
The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).
Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().
The following parts came from John Dyson:
Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.
Move the upages object (upobj) from the vmspace to the proc structure.
Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.
convenient and makes life difficult for my next commit. We still need
an i386tss to point to for the tss slot in the gdt, so we use a common
tss shared between all processes.
Note that this is going to break debugging until this series of commits
is finished. core dumps will change again too. :-( we really need
a more modern core dump format that doesn't depend on the pcb/upages.
This change makes VM86 mode harder, but the following commits will remove
a lot of constraints for the VM86 system, including the possibility of
extending the pcb for an IO port map etc.
Obtained from: bde
supports All Cyrix CPUs, IBM Blue Lightning CPU and NexGen (now AMD)
Nx586 CPU, and initialize special registers of Cyrix CPU and msr of
IBM Blue Lightning CPU.
If revision of Cyrix 6x86 CPU < 2.7, CPU cache is enabled in
write-through mode. This can be disabled by kernel configuration
options.
Reviewed by: Bruce Evans <bde@freebsd.org> and
Jordan K. Hubbard <jkh@freebsd.org>
at runtime.
etc/make.conf:
Nuked HAVE_FPU option.
lib/msun/Makefile:
Always build the i387 objects. Copy the i387 source files at build
time so that the i387 objects have different names. This is simpler
than renaming the files in the cvs repository or repeating half of
bsd.lib.mk to add explicit rules.
lib/msun/src/*.c:
Renamed all functions that have an i387-specific version by adding
`__generic_' to their names.
lib/msun/src/get_hw_float.c:
New file for getting machdep.hw_float from the kernel.
sys/i386/include/asmacros.h:
Abuse the ENTRY() macro to generate jump vectors and associated code.
This works much like PIC PLT dynamic initialization. The PIC case is
messy. The old i387 entry points are renamed. Renaming is easier
here because the names are given by macro expansions.
Changed it from 4 to 16 for i386's. It can be anything for i386's,
but compiler options limit it to a power of 2, and assembler and
linker deficiencies limit it to a small power of 2 (<= 16).
We use 16 in the kernel to get smaller tables (see Makefile.i386 and
<machine/asmacros.h>). We still use the default of 4 in user mode.
Use HISTCOUNTER instead of (*kcount) in the definition of KCOUNT()
for consistency with other macros.
complained so it cannot be entirely bad :-)
I include the email that probably explains it for people who already know:
> >Compiling with -O3 inlines functions. However the function that is being
> >inlined in makeinfo.c (add_word_args()) is a vararg function and must not be
> >inlined.
> >
> >The code in question is K&R style, and AFIK, there is no way for the compiler
> >to determine that the function uses vararg. Either change the code to use
> >prototypes, or use stdarg, or add a directive to prevent inlining.
>
> Not declaring a varargs function as varargs before it is used gives
> undefined behaviour.
>
> However, in practice the bug is probably in FreeBSD's <varargs.h>, which
> doesn't use gcc's __builtin_next_arg(). gcc should notice that it is
> used and not inline functions that have it. <stdarg.h.> uses it, but I
> think there's another gcc builtin that it should be using.
Patch attached. The ellipsis causes gcc to flag this as a varargs function,
and the name "__builtin_va_alist" is special cased in gcc to hide the last
argument in the arglist.
Reviewed by: bde & phk
Submitted by: jlemon@americantv.com (Jonathan Lemon)
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
also implies VM_PROT_EXEC. We support it that way for now,
since the break system call by default gives VM_PROT_ALL. Now
we have a better chance of coalesing map entries when mixing
mmap/break type operations. This was contributing to excessive
numbers of map entries on the modula-3 runtime system. The
problem is still not "solved", but the situation makes more
sense.
Eventually, when we work on architectures where VM_PROT_READ
is orthogonal to VM_PROT_EXEC, we will have to visit this
issue carefully (esp. regarding security issues.)
(1) deleted #if 0
pc98/pc98/mse.c
(2) hold per-unit I/O ports in ed_softc
pc98/pc98/if_ed.c
pc98/pc98/if_ed98.h
(3) merge more files by segregating changes into headers.
new file (moved from pc98/pc98):
i386/isa/aic_98.h
deleted:
well, it's already in the commit message so I won't repeat the
long list here ;)
Submitted by: The FreeBSD(98) Development Team
I decided to do this for every hardclock() call instead of lazily
in microtime(). The lazy method is simpler but has more overhead
if microtime() is called a lot.
CPU_THISTICKLEN() is now a no-op and should probably go away.
Previously it did nothing directly but had the side effect of
setting i586_last_tick for CPU_CLOCKUPDATE() and i586_avg_tick for
debugging. CPU_CLOCKUPDATE() now uses a better method and
i586_avg_tick is too much trouble to maintain.
Reduced nesting of #includes in the usual case.
Increased nesting of #includes when CLOCK_HAIR is defined. This
is a kludge to get typedefs for inline functions only when the
inline functions are used. Normally only kern_clock.c defines
this. kern_clock.c can't include the i386 headers directly.
Removed unused LOCORE support.
- use a more accurate and more efficient method of compensating for
overheads. The old method counted too much time against leaf
functions.
- normally use the Pentium timestamp counter if available.
On Pentiums, the times are now accurate to within a couple of cpu
clock cycles per function call in the (unlikely) event that there
are no cache misses in or caused by the profiling code.
- optionally use an arbitrary Pentium event counter if available.
- optionally regress to using the i8254 counter.
- scaled the i8254 counter by a factor of 128. Now the i8254 counters
overflow slightly faster than the TSC counters for a 150MHz Pentium :-)
(after about 16 seconds). This is to avoid fractional overheads.
files.i386:
permon.c temporarily has to be classified as a profiling-routine
because a couple of functions in it may be called from profiling code.
options.i386:
- I586_CTR_GUPROF is currently unused (oops).
- I586_PMC_GUPROF should be something like 0x70000 to enable (but not
use unless prof_machdep.c is changed) support for Pentium event
counters. 7 is a control mode and the counter number 0 is somewhere
in the 0000 bits (see perfmon.h for the encoding).
profile.h:
- added declarations.
- cleaned up separation of user mode declarations.
prof_machdep.c:
Mostly clock-select changes. The default clock can be changed by
editing kmem. There should be a sysctl for this.
subr_prof.c:
- added copyright.
- calibrate overheads for the new method.
- documented new method.
- fixed races and and machine dependencies in start/stop code.
mcount.c:
Use the new overhead compensation method.
gmon.h:
- changed GPROF4 counter type from unsigned to int. Oops, this should
be machine-dependent and/or int32_t.
- reorganized overhead counters.
Submitted by: Pentium event counter changes mostly by wollman
previous snap. Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space. Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags. This is an optional optimization, but helpful
on the X86.
- fixed a sloppy common-style declaration.
- removed an unused macro.
- moved once-used macros to the one file where they are used.
- removed unused forward struct declarations.
- removed __pure.
- declared inline functions as inline in their prototype as well
as in theire definition (gcc unfortunately allows the prototype
to be inconsistent).
- staticized.
dependent operation, and not really a correct name. invltlb and invlpg
are more descriptive, and in the case of invlpg, a real opcode.
Additionally, fix the tlb management code for 386 machines.
with this quite a while ago when somebody reported a BSD/OS 2.1 binary
that wouldn't run. I'm pretty sure they tried it and I'm pretty sure
they mentioned to me that the patch worked.
comparisons in the inb() and outb() macros. I decided that int args
are OK here. Any type that can hold a u_int16_t without overflow
is correct, and 32-bit types are optimal.
Introduced a few tens of warnings (100 in LINT) for use of pessimized
(short) types for the port arg. Only a few drivers are affected by
this. u_short pessimizations aren't detected.
Added `__extension__' before the statement-expression in inb() so
that it can be compiled without warnings by gcc -pedantic.
(1) Add PC98 support to apm_bios.h and ns16550.h, remove pc98/pc98/ic
(2) Move PC98 specific code out of cpufunc.h (to pc98.h)
(3) Let the boot subtrees look more alike
Submitted by: The FreeBSD(98) Development Team
<freebsd98-hackers@jp.freebsd.org>
Changed i586_ctr_bias from long long to u_int. Only the low 32 bits
are used now that microtime uses a multiplication to do the scaling.
Previously the high 32 bits had to match those of rdtsc() to prevent
overflow traps and invalid timeval adjustments.
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.
performance issues.
1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.
Fixed profiling of system times. It was pre-4.4Lite and didn't support
statclocks. System times were too small by a factor of 8.
Handle deferred profiling ticks the 4.4Lite way: use addupc_task() instead
of addupc(). Call addupc_task() directly instead of using the ADDUPC()
macro.
Removed vestigial support for PROFTIMER.
switch.s:
Removed addupc().
resourcevar.h:
Removed ADDUPC() and declarations of addupc().
cpu.h:
Updated a comment. i386's never were tahoe's, and the deferred profiling
tick became (possibly) multiple ticks in 4.4Lite.
Obtained from: mostly from NetBSD
All new code is "#ifdef PC98"ed so this should make no difference to
PC/AT (and its clones) users.
Ok'd by: core
Submitted by: FreeBSD(98) development team
ansi and traditional cpp.
The nesting rules of macros are different, which required some changes.
Use __CONCAT(x,y) instead of /**/.
Redo some comments to use /* */ rather than "# comment" because the ansi
cpp cares about those, and also cares about quote matching.
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:
More usage of the TAILQ macros. Additional minor fix to queue.h.
Performance enhancements to the pageout daemon.
Addition of a wait in the case that the pageout daemon
has to run immediately.
Slightly modify the pageout algorithm.
Significant revamp of the pmap/fork code:
1) PTE's and UPAGES's are NO LONGER in the process's map.
2) PTE's and UPAGES's reside in their own objects.
3) TOTAL elimination of recursive page table pagefaults.
4) The page directory now resides in the PTE object.
5) Implemented pmap_copy, thereby speeding up fork time.
6) Changed the pv entries so that the head is a pointer
and not an entire entry.
7) Significant cleanup of pmap_protect, and pmap_remove.
8) Removed significant amounts of machine dependent
fork code from vm_glue. Pushed much of that code into
the machine dependent pmap module.
9) Support more completely the reuse of already zeroed
pages (Page table pages and page directories) as being
already zeroed.
Performance and code cleanups in vm_map:
1) Improved and simplified allocation of map entries.
2) Improved vm_map_copy code.
3) Corrected some minor problems in the simplify code.
Implemented splvm (combo of splbio and splimp.) The VM code now
seldom uses splhigh.
Improved the speed of and simplified kmem_malloc.
Minor mod to vm_fault to avoid using pre-zeroed pages in the case
of objects with backing objects along with the already
existant condition of having a vnode. (If there is a backing
object, there will likely be a COW... With a COW, it isn't
necessary to start with a pre-zeroed page.)
Minor reorg of source to perhaps improve locality of ref.
Macroize locore.s' page table setup even more, now it's almost readable.
Rename PG_U to PG_A (so that I can...)
Rename PG_u to PG_U. "PG_u" was just too ugly...
Remove some unused vars in pmap.c
Remove PG_KR and PG_KW
Remove SSIZE
Remove SINCR
Remove BTOPKERNBASE
This concludes my spring cleaning, modulus any bug fixes for messes I
have made on the way.
(Funny to be back here in pmap.c, that's where my first significant
contribution to 386BSD was... :-)
time. The results are currently ignored unless certain temporary options
are used.
Added sysctls to support reading and writing the clock frequency variables
(not the frequencies themselves). Writing is supposed to atomically
adjust all related variables.
machdep.c:
Fixed spelling of a function name in a comment so that I can log this
message which should have been with the previous commit.
Initialize `cpu_class' earlier so that it can be used in startrtclock()
instead of in calibrate_cyclecounter() (which no longer exists).
Removed range checking of `cpu'. It is always initialized to CPU_XXX
so it is less likely to be out of bounds than most variables.
clock.h:
Removed I586_CYCLECTR(). Use rdtsc() instead.
clock.c:
TIMER_FREQ is now a variable timer_freq that defaults to the old value of
TIMER_FREQ. #define'ing TIMER_FREQ should still work and may be the best
way of setting the frequency.
Calibration involves counting cycles while watching the RTC for one second.
This gives values correct to within (a few ppm) + (the innaccuracy of the
RTC) on my systems.
page dir+table index.
pmap.h: remove NUPDE, it was wrong and not used. Sanitize KSTKPTEOFF.
vmparam.h: Calculate virtual addr from PDI+PTI from pmap.h rather than
using magic math. Remove UPDT, not used.
regarding apm to LINT
- Disabled the statistics clock on machines which have an APM BIOS and
have the options "APM_BROKEN_STATCLOCK" enabled (which is default
in GENERIC now)
- move around some of the code in clock.c dealing with the rtc to make
it more obvios the effects of disabling the statistics clock
Reviewed by: bde
in a suboptimal manner. I had also noticed some panics that appeared
to be at least superficially caused by this problem. Also, included
are some minor mods to support more general handling of page table page
faulting. More details in a future commit.
Always delay using one inb(0x84) after each i/o in rtcin() - don't
do this conditional on the bogus option DUMMY_NOPS not being defined.
If you want an optionally slightly faster rtcin() again, then inline
it and use a better named option or sysctl variable. It only needs
to be fast in rtcintr().
(This code is as yet untested; to come after man page is written.)
This also adds inlines to cpufunc.h for the RDTSC, RDMSR, WRMSR, and RDPMC
instructions. The user-mode interface is via a subdevice of mem.c;
there is also a kernel-size interface which might be used to aid
profiling.
netscape-2.0 for Linux running all the Java stuff. The scrollbars are now
working, at least on my machine. (whew! :-)
I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.
The main changes:
COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself. The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code. All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first. Like IBCS2, you can statically compile it in with "options LINUX".
A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality(). The Linux-ELF libraries want
to use some of these.
linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.
Quite a few bugs removed, including incorrect arguments being used in
syscalls.. eg: mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.
The build for the code has changed. i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.
Supporting changes elsewhere in the kernel:
The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS). This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.
The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only. This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.
makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code. It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)
At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area. This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.
pmap_activate since it's not used anymore. Changed cpu_fork so that
it uses one line of inline assembly rather than calling mvesp() to
get the current stack pointer. Removed mvesp() since it is no longer
being used.
clock interrupts.
Keep a 1-in-16 smoothed average of the length of each tick. If the
CPU speed is correctly diagnosed, this should give experienced users
enough information to figure out a more suitable value for `tick'.
fd and wt drivers need bounce buffers, so this normally saves 32K-1K
of kernel memory.
Keep track of which DMA channels are busy. isa_dmadone() must now be
called when DMA has finished or been aborted.
Panic for unallocated and too-small (required) bounce buffers.
fd.c:
There will be new warnings about isa_dmadone() not being called after
DMA has been aborted.
sound/dmabuf.c:
isa_dmadone() needs more parameters than are available, so temporarily
use a new interface isa_dmadone_nobounce() to avoid having to worry
about panics for fake parameters. Untested.
looking at a high resolution clock for each of the following events:
function call, function return, interrupt entry, interrupt exit,
and interesting branches. The differences between the times of
these events are added at appropriate places in a ordinary histogram
(as if very fast statistical profiling sampled the pc at those
places) so that ordinary gprof can be used to analyze the times.
gmon.h:
Histogram counters need to be 4 bytes for microsecond resolutions.
They will need to be larger for the 586 clock.
The comments were vax-centric and wrong even on vaxes. Does anyone
disagree?
gprof4.c:
The standard gprof should support counters of all integral sizes
and the size of the counter should be in the gmon header. This
hack will do until then. (Use gprof4 -u to examine the results
of non-statistical profiling.)
config/*:
Non-statistical profiling is configured with `config -pp'.
`config -p' still gives ordinary profiling.
kgmon/*:
Non-statistical profiling is enabled with `kgmon -B'. `kgmon -b'
still enables ordinary profiling (and distables non-statistical
profiling) if non-statistical profiling is configured.
bzero.
Deprecated blkclr (removed it).
Removed some old cruft from cpufunc.h.
The optimized bzero was submitted by Torbjorn Granlund <tege@matematik.su.se>
The kernel adaption and other changes by me.
overflows.
It sure would be nice if there was an unmapped page between the PCB and
the stack (and that the size of the stack was configurable!). With the
way things are now, the PCB will get clobbered before the double fault
handler gets control, making somewhat of a mess of things. Despite this,
it is still fairly easy to poke around in the overflowed stack to figure
out the cause.
Protected them with `#ifdef KERNEL' so that <sys/queue.h> is valid C++.
Added the necessary #includes of <sys/queue.h>.
These functions are bogus and should be replaced by the queue macros.
- Don't print out meaningless iCOMP numbers, those are for droids.
- Use a shorter wait to determine clock rate to avoid deficiencies
in DELAY().
- Use a fixed-point representation with 8 bits of fraction to store
the rate and rationalize the variable name. It would be
possible to use even more fraction if it turns out to be
worthwhile (I rather doubt it).
The question of source code arrangement remains unaddressed.
misplaced extern declarations (mostly prototypes of interrupt handlers)
that this exposed. The prototypes should be moved back to the driver
sources when the functions are staticalized.
Added idempotency guards to <machine/conf.h>. "ioconf.h" can't be
included when building LKMs so define a wart in bsd.kmod.mk to help
guard against including it.
free-run and doing a subtract in microtime() rather than resetting the
counter to zero at every clock tick. In combination with the changes to
kern_clock.c, this should eliminate all the immediately obvious sources
of systematic jitter in timekeeping on Pentium machines.
changes to allow devices that don't probe (e.g. /dev/mem)
to create devfs entries
this required giving 'configure' its own SYSINIT entry
so we could duck in just before it with a DEVFS init
and some device inits..
my devfs now looks like:
./misc
./misc/speaker
./misc/mem
./misc/kmem
./misc/null
./misc/zero
./misc/io
./misc/console
./misc/pcaudio
./misc/pcaudioctl
./disks
./disks/rfloppy
./disks/rfloppy/fd0.1440
./disks/rfloppy/fd1.1200
./disks/floppy
./disks/floppy/fd0.1440
./disks/floppy/fd1.1200
also some sligt cleanups.. DEVFS needs a lot of work
but I'm getting back to it..
didn't work are somewhat bogusly optimized away before the constraint
is checked. We still expect constants passed to inline functions to
remain constant, but if the compiler ever decides that they aren't
constant then it will just generate slightly slower code instead of
an error.
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
include/signal.h:
There was massive namespace pollution from including <sys/types.h>.
POSIX functions were declared even when _ANSI_SOURCE is defined.
sys.sys/signal.h:
NSIG was declared even if _ANSI_SOURCE or _POSIX_SOURCE is defined.
sig_atomic_t wasn't declared if _POSIX_SOURCE is defined.
Declare a typedef for signal handling functions and use it to
unobfuscate declarations and to avoid half-baked function types
that cause unwanted compiler warnings at certain warning levels.
Fix confusing comment about SA_RESTART.
sys/i386/include/signal.h:
This has to be included to get the declaration of sig_atomic_t even
when _ANSI_SOURCE is defined, so be more careful about polluting
the ANSI namespace.
Uniformize idempotency ifdefs.
in machdep.c (it should use the global nmbclusters). Moved the calculation
of nmbclusters into conf/param.c (same place where nmbclusters has always
been assigned), and made the calculation include an extra amount based
on "maxusers". NMBCLUSTERS can still be overrided in the kernel config
file as always, but this change will make that generally unnecessary. This
fixes the "bug" reports from people who have misconfigured kernels seeing
the network hang when the mbuf cluster pool runs out.
Reviewed by: John Dyson
attempted to check for insecure and fatal eflags and segment
selectors, but missed many cases and got the IOPL check back to
front. The other syscalls didn't check at all.
sys_process.c, machdep.c:
Only allow PT_WRITE_U to write to the registers (ordinary and FP).
psl.h, locore.s, machdep.c:
Eliminate PSL_MBZ, PSL_MBO and PSL_USERCLR. We are not supposed
to assume anything about the reserved bits. Use PSL_USERCHANGE
and PSL_KERNEL instead. Rename PSL_USERSET to PSL_USER.
exception.s:
Define a private label for use by doreti when returning to user
mode fails.
machdep.c:
In syscalls, allow changing only the eflags that can be changed on
486's in user mode (no longer attempt to allow benign IOPL changes;
allow changing the nasty PSL_NT; don't allow changing the i586
bits).
Don't attempt to check all the cases involving invalid selectors
and %eip's. Just check for privilege violations and let the invalid
things cause a trap.
procfs_machdep.c:
Call the ptrace register functions to do all the work for reading
and writing ordinary registers and for single stepping.
trap.c:
Ignore traps caused by PSL_NT being set. Previously, users could
cause a fatal trap in user mode by setting PSL_NT and executing an
iret, and a fatal trap in kernel mode by setting PSL_NT and making
a syscall. PSL_NT was cleared too late and not in enough modes to
fix the problem.
Make all traps in user mode (except T_NMI) nonfatal.
Recover from traps caused by attempting to load invalid user
registers in doreti by restarting the traps so that they appear to
occur in user mode.
---
Fix bogons that I noticed while fixing the above:
psl.h:
Fix some comments.
Uniformize idempotency ifdef.
exception.s, machdep.c:
Remove rsvd[0-14]. rsvd0 hasn't been reserved since the 486 came
out. Replace rsvd0 by `align'. rsvd[0-11] used wrong (magic
non-unique) trap numbers. Replace rsvd[1-14] by rsvd.
locore.s:
Enable alignment check flag on 486's and 586's.
machdep.c:
Use a better type for kstack[].
Use TFREGP() to find the registers.
Reformat ptrace functions from SEF to something closer to KNF.
procfs_machdep.c:
The wrong pointer to the registers got fixed as a side effect.
Implement reading and writing of FP registers.
/proc/*/*regs now work (only) for processes that are in memory.
Clean up comments.
trap.c, trap.h:
Remove unused trap types.
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.
vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.
vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.
vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.
vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.
machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.
machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.
Submitted by: John Dyson and David Greenman
Keep track of interrupt nesting level. It is normally 0
for syscalls and traps, but is fudged to 1 for their exit
processing in case they metamorphose into an interrupt
handler.
i386/genassym.c;
Remove support for the obsolete pcb_iml and pcb_cmap2.
Add support for pcb_inl.
i386/swtch.s:
Fudge the interrupt nesting level across context switches and in
the idle loop so that the work for preemptive context switches
gets counted as interrupt time, the work for voluntary context
switches gets counted mostly as system time (the part when
curproc == 0 gets counted as interrupt time), and only truly idle
time gets counted as idle time.
Remove obsolete support (commented out and otherwise) for pcb_iml.
Load curpcb just before curproc instead of just after so that
curpcb is always valid if curproc is. A few more changes like
this may fix tracing through context switches.
Remove obsolete function swtch_to_inactive().
include/cpu.h:
Use the new interrupt nesting level variable to implement a
non-fake CLF_INTR() so that accounting for the interrupt state
works.
You can use top, iostat or (best) an up to date systat to see
interrupt overheads. I see the expected huge interrupt overheads
for ISA devices (on a 486DX/33, about 55% for an IDE drive
transferring 1250K/sec and the same for a WD8013EBT network card
transferring 1100K/sec). The huge interrupt overheads for serial
devices are unfortunately normally invisible.
include/pcb.h:
Remove the obsolete pcb_iml and pcb_cmap2. Replace them by
padding to preserve binary compatibility.
Use part of the new padding for pcb_inl.
isa/icu.s:
isa/vector.s:
Keep track of interrupt nesting level.
Alphabetize.
Write all i/o functions in sleep so that we don't use anything from
NetBSD.
Restore the correct type of u_int for ports. This saves a whole cycle
per i/o on 486's.
Change `inline' back to __inline to avoid compiler warnings with
-Wreally-all.
Don't implement bdb() unless BDE_DEBUGGER is defined. Declare bdb_exists
outside the function to avoid hundreds of compiler warnings.
Let the compiler pick the register in asms if possible.
Implement ffs() using inline asm(). gcc provides a slightly different
one. It was broken in gcc-2.4.5 but works now. Declaring a correct
version inline ensures getting a correct version. FreeBSD-1.1.5 has
an slow inline version but FreeBSD-2.0 has a library version (which
probably never gets used).
Do inb() and outb() without using %edx for constant ports below 0x100.
Remove casts to the same type in queue functions.
Declare prototypes for everything implemented i386/*.s and also for
everything that is normally implemented as an inline here (I don't
like the current complete dependency on gcc). Ifdef out the prototypes
that are declared elsewhere. THere should be a separate header to
declare things implemented in i386/*.s, but then it would be harder
to override declarations with inlines.
${UII}
Partly support BDE_DEBUGGER. Still broken by conflict with APM. Does
nothing if BDE_DEBUGGER is not defined.
Clean up prototypes and data declarations. Declare most of the segment
functions that are implemented in support.s. Make data private in
machdep.c if possible.
Parenthesize expressions in macros properly!
${Uniformize idempotency ifdef}.
to avoid compiler warnings.
Clean up prototypes: alphabetize; don't use redundant `extern' or
meaningless `extern inline'.
Uniformize idempotency ifdef.
for it is incomplete and buggy. There is no problem unless Xintr0()
is reentered or should be reentered, but high clock interrupt
frequencies for pcaudio cause Xintr0() to be reentered (or clock
ticks to be lost when Xintr0() should have been reentered but
wasn't), and we lose little by delaying the call to softclock().
Move declarations related to the clock driver to clock.h.
Move declarations related to the npx driver to npx.h.
Clean up the remaining declarations.
I know that many of these entries are bogus and need to be revisited,
but let's get the tree working again for now and then do a pass through
looking at all the __FreeBSD__ entries, shall we?
cycles. While waiting there I added a lot of the extra ()'s I have, (I have
never used LISP to any extent). So I compiled the kernel with -Wall and
shut up a lot of "suggest you add ()'s", removed a bunch of unused var's
and added a couple of declarations here and there. Having a lap-top is
highly recommended. My kernel still runs, yell at me if you kernel breaks.
matter, but similar bogusness in npx.c causes compiling without -O to fail.
Use __volatile in all asms.
Parenthesize macro args.
Change the names of the macros to avoid namespace pollution.
Remove unnecessary "#ifdef __i386__".
Sort #defines.
Add comments.
This code is mostly taken from the 1.1 port (which was in turn taken from
Dave Mills's kern.tar.Z example). A few significant differences:
1) ntp_gettime() is now a MIB variable rather than a system call. A few
fiddles are done in libc to make it behave the same.
2) mono_time does not participate in the PLL adjustments.
3) A new interface has been defined (in <machine/clock.h>) for doing
possibly machine-dependent things around the time of the clock update.
This is used in Pentium kernels to disable interrupts, set `time', and
reset the CPU cycle counter as quickly as possible to avoid jitter in
microtime(). Measurements show an apparent resolution of a bit more than
8.14usec, which is reasonable given system-call overhead.
Removed inb function since it's more correctly in pio.h
Copied write_eflags and read_eflags over from npx.c
(Some changes to the macros suggested by Bruce were not made at this
time since his suggestions probably apply to all the macros and
these inlined/macro definitions need a lot of cleaning up at some
point in the future.)
Reviewed by: Bruce
if KERNEL is not defined. lib/msun/i387/*.S include asmacros.h to
get the definitions of ENTRY(), etc. This is bogus since asmacros.h
is only supposed to give definitions suitable for the kernel. The
current definitions for the kernel almost worked but are missing
the ".type" declarations. This caused the linker to print warnings
about doubtful relocations for almost anything linked to libm[sun].
Uniformize name and use of idempotence identifier.
the Mach/i386 version of the BSD/vax(?) <machine/psl.h>. The Mach
version has slightly better names for many macros but is now out of
date and little used. It was originally used even less (for spelling
PSL_T as EFL_TF in <machine/db_machdep.h>).
negation whenever we access memory between 640k and 1M.
Original code from NetBSD 1.0-BETA. The exact origins are unclear but
Theo de Raadt, Charles, and Michael V. may have contributed to it.
Submitted by: pst
Changed u_int_inb to just inb and deleted define.
The code generated is identical to that generated with the cast so
the problem was obviously fixed at some point after gcc 1.4
Reviewed by:
Submitted by:
hosing syscons. Doesn anyone know anything about this
or can we just delete it now?
/*
* This roundabout method of returning a u_char helps stop gcc-1.40 from
* generating unnecessary movzbl's.
*/
#ifdef disable_for_gcc-2_6_0
#define inb(port) ((u_char) u_int_inb(port))
#endif
static inline u_int
u_int_inb(u_int port)
{
u_char data;
/*
* We use %%dx and not %1 here because i/o is done at %dx and
not at
* %edx, while gcc-2.2.2 generates inferior code (movw instead
of movl)
* if we tell it to load (u_short) port.
*/
__asm __volatile("inb %%dx,%0" : "=a" (data) : "d" (port));
return data;
}
Reviewed by:
Submitted by:
2. Hack.
Hack is to define RCSID() to null macro so that new msun stuff
will compile. This does NOT belong here, and I DON'T want it to
stay, I just need to put this here for now to enable msun and we need
to talk about what our RCSID story is supposed to be. We talked about
supporting RCSID() one day, and everyone seemed to like the idea
reasonably well of making it a macro you could just no-op this way,
but we never did anything. Now I see that JTCs code has it and I'm
loath to remove it or do anything until we've discussed it some more.
Well, so how about it? What's our story vis-a-vis RCSID() going to
be?
Submitted by: jkh
- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.
Delete the ifdef GPL_EMULATE case here and made the padding work for
both types of emulators so that there is no longer a need to compile
ps and friends new if you are using the GPL math emulator instead the
normal one.
``changes'' are actually not changes at all, but CVS sometimes has trouble
telling the difference.
This also includes support for second-directory compiles. This is not
quite complete yet, as `config' doesn't yet do the right thing. You can
still make it work trivially, however, by doing the following:
rm /sys/compile
mkdir /usr/obj/sys/compile
ln -s M-. /sys/compile
cd /sys/i386/conf
config MYKERNEL
cd ../../compile/MYKERNEL
ln -s /sys @
rm machine
ln -s @/i386/include machine
make depend
make
Vastly improved trap.c from me. This rewritten version has a variety of
features, amoung them: higher performance and much higher code quality.
support.s, cpufunc.h:
No longer use gs override to enforce range limits - compare directly
against VM_MAXUSER_ADDRESS instead. The old way caused problems in
preserving the gs selector...and this method is just as fast or faster.
list of changes, I've made the following additional changes:
1) i386/include/ipl.h renamed to spl.h as the name conflicts with the
file of the same name in i386/isa/ipl.h.
2) changed all use of *mask (i.e. netmask, biomask, ttymask, etc) to
*_imask (net_imask, etc).
3) changed vestige of splnet use in if_is to splimp.
4) got rid of "impmask" completely (Bruce had gotten rid of netmask),
and are now using net_imask instead.
5) dozens of minor cruft to glue in Bruce's changes.
These require changes I made to config(8) as well, and thus it must
be rebuilt.
-DG
from Bruce Evans:
sio:
o No diff is supplied. Remove the define of setsofttty(). I hope
that is enough.
*.s:
o i386/isa/debug.h no longer exists. The event counters became too
much trouble to maintain. All function call entry and exception
entry counters can be recovered by using profiling kernel (the new
profiling supports all entry points; however, it is too slow to
leave enabled all the time; it also). Only BDBTRAP() from debug.h
is now used. That is moved to exception.s. It might be worth
preserving SHOW_BITS() and calling it from _mcount() (if enabled).
o T_ASTFLT is now only set just before calling trap().
o All exception handlers set SWI_AST_MASK in cpl as soon as possible
after entry and arrange for _doreti to restore it atomically with
exiting. It is not possible to set it atomically with entering
the kernel, so it must be checked against the user mode bits in
the trap frame before committing to using it. There is no place
to store the old value of cpl for syscalls or traps, so there are
some complications restoring it.
Profiling stuff (mostly in *.s):
o Changes to kern/subr_mcount.c, gcc and gprof are not supplied yet.
o All interesting labels `foo' are renamed `_foo' and all
uninteresting labels `_bar' are renamed `bar'. A small change
to gprof allows ignoring labels not starting with underscores.
o MCOUNT_LABEL() is to provide names for counters for times spent
in exception handlers.
o FAKE_MCOUNT() is a version of MCOUNT() suitable for exception
handlers. Its arg is the pc where the exception occurred. The
new mcount() pretends that this was a call from that pc to a
suitable MCOUNT_LABEL().
o MEXITCOUNT is to turn off any timer started by MCOUNT().
/usr/src/sys/i386/i386/exception.s:
o The non-BDB BPTTRAP() macros were doing a sti even when interrupts
were disabled when the trap occurred. The sti (fixed) sti is
actually a no-op unless you have my changes to machdep.c that make
the debugger trap gates interrupt gates, but fixing that would
make the ifdefs messier. ddb seems to be unharmed by both
interrupts always disabled and always enabled (I had the branch in
the fix back to front for some time :-().
o There is no known pushal bug.
o tf_err can be left as garbage for syscalls.
/usr/src/sys/i386/i386/locore.s:
o Fix and update BDE_DEBUGGER support.
o ENTRY(btext) before initialization was dangerous.
o Warm boot shot was longer than intended.
/usr/src/sys/i386/i386/machdep.c:
o DON'T APPLY ALL OF THIS DIFF. It's what I'm using, but may require
other changes.
Use the following:
o Remove aston() and setsoftclock().
Maybe use the following:
o No netisr.h.
o Spelling fix.
o Delay to read the Rebooting message.
o Fix for vm system unmapping a reduced area of memory
after bounds_check_with_label() reduces the size of
a physical i/o for a partition boundary. A similar
fix is required in kern_physio.c.
o Correct use of __CONCAT. It never worked here for non-
ANSI cpp's. Is it time to drop support for non-ANSI?
o gdt_segs init. 0xffffffffUL is bogus because ssd_limit
is not 32 bits. The replacement may have the same
value :-), but is more natural.
o physmem was one page too low. Confusing variable names.
Don't use the following:
o Better numbers of buffers. Each 8K page requires up to
16 buffer headers. On my system, this results in 5576
buffers containing [up to] 2854912 bytes of memory.
The usual allocation of about 384 buffers only holds
192K of disk if you use it on an fs with a block size
of 512.
o gdt changes for bdb.
o *TGT -> *IDT changes for bdb.
o #ifdefed changes for bdb.
/usr/src/sys/i386/i386/microtime.s:
o Use the correct asm macros. I think asm.h was copied from Mach
just for microtime and isn't used now. It certainly doesn't
belong in <sys>. Various macros are also duplicated in
sys/i386/boot.h and libc/i386/*.h.
o Don't switch to and from the IRR; it is guaranteed to be selected
(default after ICU init and explicitly selected in isa.c too, and
never changed until the old microtime clobbered it).
/usr/src/sys/i386/i386/support.s:
o Non-essential changes (none related to spls or profiling).
o Removed slow loads of %gs again. The LDT support may require
not relying on %gs, but loading it is not the way to fix it!
Some places (copyin ...) forgot to load it. Loading it clobbers
the user %gs. trap() still loads it after certain types of
faults so that fuword() etc can rely on it without loading it
explicitly. Exception handlers don't restore it. If we want
to preserve the user %gs, then the fastest method is to not
touch it except for context switches. Comparing with
VM_MAXUSER_ADDRESS and branching takes only 2 or 4 cycles on
a 486, while loading %gs takes 9 cycles and using it takes
another.
o Fixed a signed branch to unsigned.
/usr/src/sys/i386/i386/swtch.s:
o Move spl0() outside of idle loop.
o Remove cli/sti from idle loop. sw1 does a cli, and in the
unlikely event of an interrupt occurring and whichqs becoming
zero, sw1 will just jump back to _idle.
o There's no spl0() function in asm any more, so use splz().
o swtch() doesn't need to be superaligned, at least with the
new mcounting.
o Fixed a signed branch to unsigned.
o Removed astoff().
/usr/src/sys/i386/i386/trap.c:
o The decentralized extern decls were inconsistent, of course.
o Fixed typo MATH_EMULTATE in comments. */
o Removed unused variables.
o Old netmask is now impmask; print it instead. Perhaps we
should print some of the new masks.
o BTW, trap() should not print anything for normal debugger
traps.
/usr/src/sys/i386/include/asmacros.h:
o DON'T APPLY ALL OF THIS DIFF. Just use some of the null macros
as necessary.
/usr/src/sys/i386/include/cpu.h:
o CLKF_BASEPRI() changes since cpl == SWI_AST_MASK is now normal
while the kernel is running.
o Don't use var++ to set boolean variables. It fails after a mere
4G times :-) and is slower than storing a constant on [3-4]86s.
/usr/src/sys/i386/include/cpufunc.h:
o DON'T APPLY ALL OF THIS DIFF. You need mainly the include of
<machine/ipl.h>. Unfortunately, <machine/ipl.h> is needed by
almost everything for the inlines.
/usr/src/sys/i386/include/ipl.h:
o New file. Defines spl inlines and SWI macros and declares most
variables related to hard and soft interrupt masks.
/usr/src/sys/i386/isa/icu.h:
o Moved definitions to <machine/ipl.h>
/usr/src/sys/i386/isa/icu.s:
o Software interrupts (SWIs) and delayed hardware interrupts (HWIs)
are now handled uniformally, and dispatching them from splx() is
more like dispatching them from _doreti. The dispatcher is
essentially *(handler[ffs(ipending & ~cpl)]().
o More care (not quite enough) is taken to avoid unbounded nesting
of interrupts.
o The interface to softclock() is changed so that a trap frame is
not required.
o Fast interrupt handlers are now handled more uniformally.
Configuration is still too early (new handlers would require
bits in <machine/ipl.h> and functions to vector.s).
o splnnn() and splx() are no longer here; they are inline functions
(could be macros for other compilers). splz() is the nontrivial
part of the old splx().
/usr/src/sys/i386/isa/ipl.h
o New file. Supposed to have only bus-dependent stuff. Perhaps
the h/w masks should be declared here.
/usr/src/sys/i386/isa/isa.c:
o DON'T APPLY ALL OF THIS DIFF. You need only things involving
*mask and *MASK and comments about them. netmask is now a pure
software mask. It works like the softclock mask.
/usr/src/sys/i386/isa/vector.s:
o Reorganize AUTO_EOI* macros.
o Option FAST_INTR_HANDLER_USERS_ES for people who don't trust
fastintr handlers.
o fastintr handlers need to metamorphose into ordinary interrupt
handlers if their SWI bit has become set. Previously, sio had
unintended latency for handling output completions and input
of SLIP framing characters because this was not done.
/usr/src/sys/net/netisr.h:
o The machine-dependent stuff is now imported from <machine/ipl.h>.
/usr/src/sys/sys/systm.h
o DON'T APPLY ALL OF THIS DIFF. You need mainly the different
splx() prototype. The spl*() prototypes are duplicated as
inlines in <machine/ipl.h> but they need to be duplicated here
in case there are no inlines. I sent systm.h and cpufunc.h
to Garrett. We agree that spl0 should be replaced by splnone
and not the other way around like I've done.
/usr/src/sys/kern/kern_clock.c
o splsoftclock() now lowers cpl so the direct call to softclock()
works as intended.
o softclock() interface changed to avoid passing the whole frame
(some machines may need another change for profile_tick()).
o profiling renamed _profiling to avoid ANSI namespace pollution.
(I had to improve the mcount() interface and may as well fix it.)
The GUPROF variant doesn't actually reference profiling here,
but the 'U' in GUPROF should mean to select the microtimer
mcount() and not change the interface.
Eliminates vm_fault overhead on process startup and
mmap referenced data for in-memory pages.
(process startup time using in-memory segments *much* faster)
2) Even more efficient pmap code. Code partially cleaned up.
More comments yet to follow.
(generally more efficient pte management)
3) Pageout clustering ( in addition to the FreeBSD V1.1 pagein
clustering.)
(much faster paging performance on non-write behind disk
subsystems, slightly faster performance on other systems.)
4) Slightly changed vm_pageout code for more efficiency and
better statistics. Also, resist swapout a little more.
(less likely to pageout a recently used page)
5) Slight improvement to the page table page trap efficiency.
(generally faster system VM fault performance)
6) Defer creation of unnamed anonymous regions pager until needed.
(speeds up shared memory bss creation)
7) Remove possible deadlock from swap_pager initialization.
8) Enhanced procfs to provide "vminfo" about vm objects and user
pmaps.
9) Increased MCLSHIFT/MCLBYTES from 2K to 4K to improve net &
socket performance and to prepare for things to come.
John Dyson
dyson@implode.root.com
David Greenman
davidg@root.com
following is a summary:
1) increased object cache back up to a more reasonable value.
2) removed old & bogus cruft from machdep.c (clearseg, copyseg,
physcopyseg, etc).
3) inlined many functions in pmap.c
4) changed "load_cr3(rcr3())" into tlbflush() and made tlbflush inline
assembly.
5) changed the way that modified pages are tracked - now vm_page struct
is kept updated directly - no more scanning page tables.
6) removed lots of unnecessary spl's
7) removed old unused functions from pmap.c
8) removed all use of page_size, page_shift, page_mask variables - replaced
with PAGE_ constants.
9) moved trunc/round_page, atop, ptoa, out of vm_param.h and into i386/
include/param.h, and optimized them.
10) numerous changes to sys/vm/ swap_pager, vnode_pager, pageout, fault
code to improve performance. LRU algorithm modified to be more
effective, read ahead/behind values tuned for better performance,
etc, etc...
a binary link-kit. Make all non-optional options (pagers, procfs) standard,
and update LINT to reflect new symtab requirements.
NB: -Wtraditional will henceforth be forgotten. This editing pass was
primarily intended to detect any constructions where the old code might
have been relying on traditional C semantics or syntax. These were all
fixed, and the result of fixing some of them means that -Wall is now a
realistic possibility within a few weeks.
The following patch adds the addr argument to signal handlers.
The kernel with the patch is no more and no less in compliance or in
violation of POSIX and ANSI C than the kernel before the patch.
The added functionality this addr argument provides is quite useful. It
enables an entire class of algorithms which use mprotect to trace memory
references. Beside garbage collectors, I have heard of this technique being
applied to debuggers and profilers. The only benchmarking I've performed is
using akcl to compile maxima: without the kernel patch, it takes 7 hours to
compile maxima, while with stratified garbage collection, it only takes 50
minutes.
Basically, I can't think of a reason not to add the addr argument and there
is a compelling need for it.
If you find the patch acceptable, please let me know so I can send my
FreeBSD akcl config files to wfs for inclusion in the core akcl release.
The old 386BSD config files there won't work on either NetBSD or FreeBSD.
when the machine panics.
i386/i386/locore.s:
1) got rid of most .set directives that were being used like
#define's, and replaced them with appropriate #define's in
the appropriate header files (accessed via genassym).
2) added comments to header inclusions and global definitions,
and global variables
3) replaced some hardcoded constants with cpp defines (such as
PDESIZE and others)
4) aligned all comments to the same column to make them easier to
read
5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to
/sys/i386/include/asmacros.h
6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code
7) added new global '_KERNend' to store last location+1 of kernel
8) cleaned up zeroing of bss so that only bss is zeroed
9) fix zeroing of page tables so that it really does zero them all
- not just if they follow the bss.
10) rewrote page table initialization code so that 1) works correctly
and 2) write protects the kernel text by default
11) properly initialize the kernel page directory, upages, p0stack PT,
and page tables. The previous scheme was more than a bit
screwy.
12) change allocation of virtual area of IO hole so that it is
fixed at KERNBASE + 0xa0000. The previous scheme put it
right after the kernel page tables and then later expected
it to be at KERNBASE +0xa0000
13) change multiple bogus settings of user read/write of various
areas of kernel VM - including the IO hole; we should never
be accessing the IO hole in user mode through the kernel
page tables
14) split kernel support routines such as bcopy, bzero, copyin,
copyout, etc. into a seperate file 'support.s'
15) split swtch and related routines into a seperate 'swtch.s'
16) split routines related to traps, syscalls, and interrupts
into a seperate file 'exception.s'
17) remove some unused global variables from locore that got
inserted by Garrett when he pulled them out of some .h
files.
i386/isa/icu.s:
1) clean up global variable declarations
2) move in declaration of astpending and netisr
i386/i386/pmap.c:
1) fix calculation of virtual_avail. It previously was calculated
to be right in the middle of the kernel page tables - not
a good place to start allocating kernel VM.
2) properly allocate kernel page dir/tables etc out of kernel map
- previously only took out 2 pages.
i386/i386/machdep.c:
1) modify boot() to print a warning that the system will reboot in
PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user
abort with a key on the console. The machine will wait for
ever if a key is typed before the reboot. The default is
15 seconds, but can be set to 0 to mean don't wait at all,
-1 to mean wait forever, or any positive value to wait for
that many seconds.
2) print "Rebooting..." just before doing it.
kern/subr_prf.c:
1) remove PANICWAIT as it is deprecated by the change to machdep.c
i386/i386/trap.c:
1) add table of trap type strings and use it to print a real trap/
panic message rather than just a number. Lot's of work to
be done here, but this is the first step. Symbolic traceback
is in the TODO.
i386/i386/Makefile.i386:
1) add support in to build support.s, exception.s and swtch.s
...and various changes to various header files to make all of the
above happen.
Mark the fact that PGSHIFT and PDRSHIFT are really the same as
PG_SHIFT and PD_SHIFT, these should be collapsed some day soon.
Document that KERNBASE should really be KPTDPTDI << PDRSHIFT, for
now leave it as the constant 0xFE000000 until I make a seperate
common header file for this stuff (vmaddresses.h?)
Remove NKMEMCLUSTERS define, it was only being used to define
VM_KMEM_SIZE, so why have all the indirection. Besides who wants
to work in CLBYTE sizes chuncks.
pmap.h:
Fix $Id$ and some other minor format clean ups.
Remove the XXX comment about NKPDE, since it now has the correct value
of 7.
Remove unused LASTPTDI and move the APTD into the very end of memory to
free up 4MB of kernel virtual address space.
Remove unused RSVDPTDI and free up 12MB of kernel virtual address space.
vmparam.h
Fix $Id$.
Increase SHMMAXPGS to 512 (2MB) now that there is room for it to be
bigger. The XXX comment stays until the kernel moves down in memory
to free up enough space to use the proper default of 4MB.
VM_KMEM_SIZE is now a direct constant stating the size of the kernel
malloc region. Increased the value from 3MB to 16MB.
Cleaned up tabs vs spaces after #define to make file consistent.
Removed now unused definitions of I386_PAGE_SIZE and I386_PDR_SIZE
Note That these two where unused and had the wrong values anyway!
Changed I386_KPDES to NKPDE
Changed I386_UPDES to NUPDE
Redid constant assignments of *PTDI's to be sizeable and relative.