... and thus retire debug.kdb.stop_cpus tunable/sysctl.
The knob was to work around CPU stopping issues, which since have been
either fixed or greatly reduced. kdb should really operate in a special
environment with scheduler stopped and interrupts disabled to provide
deterministic debugging.
Discussed with: attilio, rwatson
X-MFC after: 2 months or never
... and also increase the timeout.
It's better to try to proceed somehow despite stuck CPUs than to hang
indefinitely. Especially so during shutdown and when entering kdb or panic.
Timeout value is still an aribitrary value.
Timeout diagnostic is just a printf; the work on something more
debuggable is planned by attilio. Need to be careful here as
stop_cpus_hard is called very early while enetering kdb and soon(-ish)
it may become called very early when entering panic.
Reviewed by: attilio
MFC after: 2 months
o Setting td_intr_frame to the XIVs trap frame because it's referenced
by the ET event handler.
o Signal EOI to the CPU before calling the registered XIV handlers.
This prevents lost ITC interrupts, which cause starvation in one-shot
mode.
o Adding support for IPI_HARDCLOCK with corresponding per-CPU counters.
o Have the APs call cpu_initclocks() so as to limited the scattering of
clock related initialization. cpu_initclocks() calls the <self>_bsp()
or <self>_ap() version accordingly.
o Uncomment the ET clock handling in cpu_idle().
o Update the DDB 'show pcpu' output for the new MD fields.
o Entirely rewritten ia64_ih_clock(). Note that we don't create as many
clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale.
We can only have 240 XIVs and we can have more CPUs than that. There's
a single intrcnt index for the cumulative clock ticks and we keep per
CPU counts in the PCPU stats structure.
o Register the ITC by hooking SI_SUB_CONFIGURE (2nd order).
Open issues:
o Clock interrupts can still be lost. Some tweaking is still necessary.
Thanks to: mav@ for his support, feedback and explanations.
ET stats while committing:
eris% sysctl machdep.cpu | grep nclks
machdep.cpu.0.nclks: 24007
machdep.cpu.1.nclks: 22895
machdep.cpu.2.nclks: 13523
machdep.cpu.3.nclks: 9342
machdep.cpu.4.nclks: 9103
machdep.cpu.5.nclks: 9298
machdep.cpu.6.nclks: 10039
machdep.cpu.7.nclks: 9479
eris% vmstat -i | grep clock
clock 108599 50
dynamically loaded device drivers get a chance to run their event hooks.
- Decouple the USB suspend and resume lock from witness. It produces some
false warnings due to reusing the lock name among multiple devices.
MFC after: 3 days
is now required by bus_autoconf.
- Allow interface class matching even if device class is vendor specific.
- Update bus_autoconf tool to not generate system and subsystem match lines
for the nomatch event.
PR: misc/157903
MFC after: 14 days
sorted according to the mode which they support:
host, device or dual mode
- Add generic tool to extract these data:
tools/bus_autoconf
Discussed with: imp
Suggested by: Robert Millan <rmh@debian.org>
PR: misc/157903
MFC after: 14 days
instead of a PCPU field for curthread. This averts a race on SMP systems
with a high interrupt rate where the thread looking up the value of
curthread could be preempted and migrated between obtaining the PCPU
pointer and reading the value of pc_curthread, resulting in curthread being
observed to be the current thread on the thread's original CPU. This played
merry havoc with the system, in particular with mutexes. Many thanks to
jhb for helping me work this one out.
Note that Book-E is in principle susceptible to the same problem, but has
not been modified yet due to lack of Book-E hardware.
MFC after: 2 weeks
some interesting bugs (mostly on SMP systems) with atomic operations
silently failing in interrupt heavy situations, especially when using
overflow pages.
switch the region registers. pmap_switch() returns the pmap for
which the region register are currently programmed, which needs
to be re-programmed on the CPU the ougoing thread gets switched
in. This change does not noticibly change anything or fix known
bugs, but does give me a warm fuzzy feeling by being more
correct.
unhappy (probably they don't handle crossing the 64k boundary, etc.).
Fix this by changing zfsldr to use a loop reading from the disk one
sector at a time. To avoid trashing the saved copy of the MBR which is
used for disk I/O, read zfsboot2 at address 0x9000. This has the
advantage that BTX no longer needs to be relocated as it is read into
the correct location. However, the loop to relocate zfsboot2.bin can
now cross a 64k boundary, so change it to use relative segments instead.
(This will need further work if zfsboot2.bin ever exceeds 64k.)
While here, stop storing a relocated copy of zfsldr at 0x700. This was
only used by the xread hack which has recently been removed (and even
that use was dubious). Also, include the BIOS error code as hex when
reporting read errors to aid in debugging.
Much thanks to Henri Hennebert for patiently testing various iterations
of the patch as well as fixing the zfsboot2.bin relocation to use
relative segments.
MFC after: 1 week
to do about the few cases where the HAL state isn't available (regdomain)
or isn't yet setup (probe/attach.)
The global ath_hal_debug now affects all instances of the HAL.
This also restores the ability for probe/attach debugging to work; as
the sysctl tree may not be attached at that point. Users can just set
the global "hw.ath.hal.debug" to a suitable value to enable probe/attach
related debugging.
(obvious in retrospect) in which interrupts on one CPU that are temporarily
masked can end up permanently masked when a handler on another CPU clobbers
the interrupt mask register with an old copy.
rather than global variables.
This specifically allows for debugging to be enabled per-NIC, rather
than globally.
Since the ath driver doesn't know about AH_DEBUG, and to keep the ABI
consistent regardless of whether AH_DEBUG is enabled or not, enable the
debug parameter always but only conditionally compile in the debug
methods if needed.
The ALQ support is currently still global pending some brainstorming.
Submitted by: ssgriffonuser@gmail.com
Reviewed by: adrian, bschmidt
This was previously done only for SCSI XPT in r223081, on which the change
in r223089 depended in order to respond to serial number requests. As a
result of r223089, da(4) and ada(4) devices register a d_getattr for geom to
use to obtain the information.
Reported by: ache
Reviewed by: ken
server replied NFS3ERR_JUKEBOX/NFS4ERR_DELAY to an rpc.
This affected both NFSv3 and NFSv4. Found during testing
at the recent NFSv4 interoperability Bakeathon.
MFC after: 2 weeks