watchdog timeout issues and the root cause seems to stem from
silicon bug of controller. Personally I couldn't reproduce it on
RTL8169 controller but it seems it's dependent on usage pattern.
For newer PCIe based controllers I have no TSO complaints but
turning off TSO would be more safe. Users who are sure that
their controller works with TSO can still reenable the TSO with
ifconfig(8).
Reported by: Oliver Lehmann (lehmann at ans-netz dot de), Eugene Butusov (ebutusov at gmail dot com)
11bits. This limits the maximum interface MTU size in TSO case
as upper stack should not generate TCP segments with MSS greater
than the limit. Armed with this information, disable TSO if
interface MTU is greater than the limit.
lstat(2) is called on symlinks -- this code appears never to have
worked. The PR this addresses suggests that the intended
original behavior is the right one, but as bde points out in the
PR comments, we do actually support storing a mode on symlinks,
so returning it seems reasonable.
This is consistent with Mac OS X, which despite documentation to
the contrary does return the mode set on a symlink, but not some
other platforms. The Single Unix Spec requires only that the
returned bits be "meaningful", which seems at best unhelpful as
advice goes.
PR: 25018
MFC after: 3 days
vnode lock may cause a LOR between kld_sx lock and vnode lock.
linker_load_dependencies() drops kld_sx, and another thread may attempt
to load the same kld.
Reported and tested by: pjd
MFC after: 1 week
As clearly mentioned on the mailing lists, there is a list of drivers
that have not been ported to the MPSAFE TTY layer yet. Remove them from
the kernel configuration files. This means people can now still use
these drivers if they explicitly put them in their kernel configuration
file, which is good.
People should keep in mind that after August 10, these drivers will not
work anymore. Even though owners of the hardware are capable of getting
these drivers working again, I will see if I can at least get them to a
compilable state (if time permits).
code interfered with Performant mode and legacy interrupts. Also
remove a register read operation on the Simplq code that was
effectively a time-wasting no-op.
As of r178766 this driver didn't compile anymore, because it missed a
switch()-statement. I'm getting tired of seeing this driver being broken
for two months already. When I run `make universe', everything passes,
except the BWCT kernel configuration file.
don't support the ioapic either, so remove that option too. These
were commented out, but could never be enabled, unlike the other
options in the file that are commented out.
pmap_change_attr() in order to use the direct map for any cache mode, not
just write-back mode.
It is worth noting that this change also eliminates a situation in which we
have two mappings to the same physical memory with different cache modes.
Submitted by: Magesh Dhasayyan (with some changes by me)
Discussed with: jhb
keyword. But it doesn't work. Two options.. make it no longer accept it,
or actually make it work.. I chose the 2nd..
Allow the tablearg to be used to specify a skipto destination.
This is actually a very powerful construct if used correctly, or a sink
of cpu cycles if used badly.
changes t teh man page will follow.
There is no way for the caller to tell us which direction this packet is
going. With the bpf_mtap{2} routines, we can check the interface pointer.
MFC after: 2 weeks
mode changes, and cache and TLB invalidation when some or all of the
specified range is already mapped with the specified cache mode.
Submitted by: Magesh Dhasayyan
return success if the passed vnode pointer is NULL (rather than
panicking). This can occur if either audit or accounting are
disabled while the policy is running.
Since the swapoff control has no real relevance to this policy,
which is concerned about intent to write rather than water under the
bridge, remove it.
PR: kern/126100
Reported by: Alan Amesbury <amesbury at umn dot edu>
MFC after: 3 days
processes are not producing absolute pathname tokens. It is required
that audited pathnames are generated relative to the global root mount
point. This modification changes our implementation of audit_canon_path(9)
and introduces a new function: vn_fullpath_global(9) which performs a
vnode -> pathname translation relative to the global mount point based
on the contents of the name cache. Much like vn_fullpath,
vn_fullpath_global is a wrapper function which called vn_fullpath1.
Further, the string parsing routines have been converted to use the
sbuf(9) framework. This change also removes the conditional acquisition
of Giant, since the vn_fullpath1 method will not dip into file system
dependent code.
The vnode locking was modified to use vhold()/vdrop() instead the vref()
and vrele(). This will modify the hold count instead of modifying the
user count. This makes more sense since it's the kernel that requires
the reference to the vnode. This also makes sure that the vnode does not
get recycled we hold the reference to it. [1]
Discussed with: rwatson
Reviewed by: kib [1]
MFC after: 2 weeks
proved to be necessary to make the static drivers work
in EITHER/OR or BOTH configurations. Modules will still
build in sys/modules/igb or em as before.
This also updates the igb driver for support for the 82576
adapter, adds shared code fixes, and etc....
MFC after: ASAP
support for bpf(4) due to hacks in the Y! tree for a truss32 binary
(since superseded by native support for 32-bit binaries in truss itself).
MFC after: 1 week
to downgrade the exclusive lock to shared one when exclusive lock owner
requested shared lock. New lockmgr panics instead.
The vnode_pager_lock function requests shared lock on the vnode backing
the OBJT_VNODE, and can be called when the current thread already holds
an exlcusive lock on the vnode. For instance, it happens when handling
page fault from the VOP_WRITE() uiomove that writes to the file, with
the faulted in page fetched from the vm object backed by the same file.
We then get the situation described above.
Verify whether the vnode is already exclusively locked by the curthread
and request recursed exclusive vnode lock instead of shared, if true.
Reported by: gallatin
Discussed with: attilio
well as the 15C since it seems to be required in practice. The Linux
natsemi.c driver mostly does this as well.
PR: kern/112179
Submitted by: Mark Willson mark - hydrus org uk
MFC after: 1 week
It seems we only use `lbolt' inside the VFS syncer and the TTY layer
now. Because I'm planning to replace the TTY layer next month, there's
no reason to keep `lbolt' if it's only used in a single thread inside
the kernel.
Because the syncer code wanted to wake up the syncer thread before the
timeout, it called sleepq_remove(). Because we now just use a condvar(9)
with a timeout value of `hz', we can wake it up using cv_broadcast()
without waking up any unrelated threads.
Reviewed by: phk
After the import of the new TTY layer, the TTY_QUOTE definition will not
be present anymore. To make sure clists will still work as expected,
introduce an internal definition called QUOTEMASK.
Maybe we can decide to remove the quote bits entirely, but we still have
to look into this. There may be drivers that still use the quote bits.
Obtained from: //depot/projects/mpsafetty
the 32bit images on amd64.
Change the semantic of the PCB_32BIT pcb flag to request the context
switch code to operate on the segment registers. Its previous meaning
of saving or restoring the %gs base offset is assigned to the new
PCB_GS32BIT flag.
FreeBSD 32bit image activator sets the PCB_32BIT flag, while Linux 32bit
emulation sets PCB_32BIT | PCB_GS32BIT.
Reviewed by: peter
MFC after: 2 weeks
sockets for IPv6 raw sockets: separately lock the inpcb for determining
the destination address for a connect()'d raw socket at the rip6_send()
layer, and then re-acquire the inpcb lock in the rip6_output() layer to
query other options on the socket. Previously, the global raw IP socket
lock was used, which while correct and marginally more consistent, could
add significantly to global raw IP socket lock contention.
MFC after: 1 week
lock the inpcb and use a local stack variable to copy to/from userspace
so that sooptcopyin()/sooptcopyout() aren't called while holding an
rwlock.
While here, fix a bug in which a failed sooptcopyin() might lead to
partially consistent ICMPv6 filters on the socket by not ignoring the
error returned by sooptcopyin().
MFC after: 2 weeks
using the passed arguments explicitly and unconditionally rather than
testing them and calling panic(). The result is the same but easier
to read.
MFC after: 3 days
- When a cpuset is applied to a thread, walk the cpuset to see if it is a
"full" cpuset (includes all available CPUs). If not, set a new
TDS_AFFINITY flag to indicate that this thread can't run on all CPUs.
When inheriting a cpuset from another thread during thread creation, the
new thread also inherits this flag. It is in a new ts_flags field in
td_sched rather than using one of the TDF_SCHEDx flags because fork()
clears td_flags after invoking sched_fork().
- When placing a thread on a runqueue via sched_add(), if the thread is not
pinned or bound but has the TDS_AFFINITY flag set, then invoke a new
routine (sched_pickcpu()) to pick a CPU for the thread to run on next.
sched_pickcpu() walks the cpuset and picks the CPU with the shortest
per-CPU runqueue length. Note that the reason for the TDS_AFFINITY flag
is to avoid having to walk the cpuset and examine runq lengths in the
common case.
- To avoid walking the per-CPU runqueues in sched_pickcpu(), add an array
of counters to hold the length of the per-CPU runqueues and update them
when adding and removing threads to per-CPU runqueues.
MFC after: 2 weeks
revision and (on Prism cards) the primary firmware revision via
sysctl. Move the printing of this information under bootverbose,
since it is relatively easy to get to it now.
mapping to 4KB page mappings when the specified attribute change only
applies to a portion of the 2MB page. Previously, in such cases,
pmap_change_attr() gave up and returned an error.
Submitted by: Magesh Dhasayyan
- Check if panicstr isn't set, if it is ignore the lock. This helps to avoid
confusion, because lockmgr is a no-op when panicstr isn't NULL, so
asserting anything at this point doesn't make sense and can just race with
other panic.
Discussed with: kib
(Other more specific related options will follow)
This allows one to set multiple p2p links to the same place
and select which to use by having each in different FIBS.
turns out some devices do this and since we otherwise validate the station
is associated and don't use the aid for anything being lenient here allows
them to function
Submitted by: Chris Zimmermann
MFC after: 2 weeks
This gives significant performance improvements when many raw sockets used.
Benchmarks of mpd handeling 1000 simultaneous PPTP connections show up to 50%
performance boost. With higher number of connections benefit becomes even
bigger. PopTop snd others should also get some benefits.
the beginning. There's a race in the shared interrutp case. If
another interrupt happens after the interrupt is setup, then we'd try
to lock an uninitialized mutex. In addition, if we bailed out due to
a too old version of firmware, we'd leave the interrupt enabled with
all the fun that ensues....
"If you don't get a review within a day or two, I would firmly recommend
backing out the changes"
back out all my changes as unreviewed by secteam@ yet.
The ttyinfo() routine generates the fancy output when pressing ^T. Right
now it is stored in tty.c. In the MPSAFE TTY code it is already stored
in tty_info.c. To make integration of the MPSAFE TTY code a little
easier, take the same approach.
This makes the TTY code a little bit more readable, because having the
proc_*/thread_* routines in tty.c is very distractful.
Approved by: philip (mentor)
The kbd, kbdmux, ugen and uhid drivers included <sys/tty.h>, because
they needed clists, which have been moved to <sys/clist.h> some time
ago. In the MPSAFE TTY branch, <sys/tty.h> does not include
<sys/clist.h>, which means we have to teach these drivers to include
this header file directly.
Approved by: philip (mentor, implicit)
We're very lucky, because the flags used by our TIOCPKT implementation
are the same as flags used by Linux. We can safely enable TIOCPKT,
assuming EXTPROC is not used.
TIOCSPTLCK is used by unlockpt(). Because we don't need unlockpt() in
our implementation, make this ioctl a no-op.
Approved by: philip (mentor, implicit), rdivacky
Obtained from: P4 (//depot/projects/mpsafetty/...)
but removed too much, breaking the build in other places instead. Now
that the ipfilter issue has been fixed (or hacked around), address the
second issue by restoring r180755, with one small change. I don't feel
comfortable using assert(3) in a header that will be included in userland
code that may or may not already have an assertion mechanism in place,
so KASSERT() evaluates to a no-op in the !_KERNEL case.
behavior. Specifically, probe Host-PCI bridges in the order they are
encountered in the tree. For CPUs, just use an order of 100000 and assume
that no Host-PCI bridges will be more than 10000 levels deep in the
namespace. This fixes an issue on some boxes where the HPET timer stopped
attaching.
vnode buffers locked at once. In particular, there are indirect buffers
among locked ones. The bdwrite() may start the flushing to keep dirty
buffer list at the bounds. If any buffer on the dirty list requires
translation from logical to physical block number, code may ends up
trying to lock an indirect buffer already locked in ffs_balloc_ufsX.
Prevent the bdflush() activity when several buffers are locked at once
by setting the TDP_INBDFUSH for the problematic code blocks.
Reported and tested by: pho, Josef Buchsteiner at Juniper
In collaboration with: kan
MFC after: 1 month
when stack realignment is turned on (it is ALWAYS on for main), however
in a profiling build %ecx would be clobbered by mcount(), this would lead
to a segmentation fault when the code tries to reference any argument.
This fix changes mcount() to preserve %ecx.
PR: bin/119709
Reviewed by: bde
MFC after: 1 week
return NDIS_STATUS_PENDING. In this case, it's waiting for 5 secs to
get the response from drivers now. However, some NDIS drivers can send
the response before NDIS framework gets ready to receive it so we might
always be blocked for 5 secs in current implementation. NDIS framework
should reset the event before calling NDIS driver's callback not after.
MFC after: 1 month
used but MSI to HyperTransport IRQ mapping is enabled, and would act as
if MSI is turned on, resulting in interrupt loss.
This commit will,
1. enable MSI mapping on a device only when MSI is enabled for that
device and the MSI address matches the HT mapping window.
2. enable MSI mapping on a bridge only when a downstream device is
allocated an MSI address in the mapping window
PR: kern/118842
Reviewed by: jhb
MFC after: 1 week
child process immediately after bulk bcopy() without dropping the
process lock.
Since process is not single-threaded when forking, dropping and
reacquiring the lock allows an other thread to change the process title
of the parent in between, and results in hold being done on the invalid
pointer. The problem manifested itself as the double free of the old
p_args.
Reported by: kris
Reviewed by: jhb
MFC after: 1 week
The kernel has a special wchan called `lbolt', which is triggered each
second. It doesn't seem to be used a lot and it seems pretty redundant,
because we can specify a timeout value to the *sleep() routines. In an
attempt to eventually remove lbolt, make the NFS/RPC code use a timeout
of `hz' when trying to reconnect.
Only the TTY code (not MPSAFE TTY) and the VFS syncer seem to use lbolt
now.
Reviewed by: attilio, jhb
Approved by: philip (mentor), alfred, dfr
- removing 'const' qualifier from an input parameter to conform to the type
required by rw_assert();
- using in_addr->s_addr to retrive 32 bits address value.
Observed by: tinderbox
and there is no need to maintain it.
- Fix vn_get() in order to let it call vget(9) with a valid locking
request. vget(9) returns the vnode locked in order to prevent recycling,
but in this case internal XFS locks alredy prevent it from happening, so
it is safe to drop the vnode lock before to return by vn_get().
- Add a VNASSERT() in vget(9) in order to catch malformed locking requests.
Discussed with: kan, kib
Tested by: Lothar Braun <lothar at lobraun dot de>
kthread of the mpt(4) driver that hangs around for the entire lifetime of
the thread. Previously the driver would allocate a new CCB using M_WAITOK
with a lock held each time it updated its state. While here, use the
CAM API for allocating a CCB rather than raw malloc(9).
Reviewed by: scottl
MFC after: 1 week
This MAY be combined by a clever person with the 'key' code recently
added, however a cursary glance suggest that it would be safer to just keep
the patches as it is unlikely that the two modes would be used together
and the separate patch has been extensively tested.
Obtained from: here and there
MFC after: 1 week
interrupt-driven configuration handlers to complete, print out a
diagnostic message every 60 second indicating which handlers are
still running. Do this at most 5 times per run so as to avoid
scrolling out any useful information from the kernel message
buffer.
The interval of 60 seconds was selected based on a best guess as
to the nature of "long enough" and may want to be tuned higher
or lower depending on real-world tolerances.
MFC after: 3 days
Discussed with: scottl
for completion in run_interrupt_driven_config_hooks(). This is
helpful when trying to figure out which device drivers have gone
into la-la land during boot-time autoconfiguration.
MFC after: 3 days
- When a tick occurs on a cpu, iterate from cs_softticks until ticks.
The per-cpu tick processing happens asynchronously with the actual
adjustment of the 'ticks' variable. Sometimes the results may
be visible before the local call and sometimes after. Previously this
could cause a one tick window where we didn't evaluate the bucket.
- In softclock fetch curticks before incrementing cc_softticks so we
don't skip insertions which were made for the current time.
Sponsored by: Nokia
sched_tick() to prevent multiple increments for one tick. This pushes
the value out of range and breaks priority calculation.
Reviewed by: kib
Found by: pho/nokia
Sponsored by: Nokia
MFC after: 3 days
information from rip_input() to rip_append(). Instead, pass the source
address for an IP datagram to rip_append() using a stack-allocated
sockaddr_in, similar to udp_input() and udp_append().
Prior to the move to rwlocks for inpcbinfo, this was not a problem, as
use of the global was synchronized using the ripcbinfo mutex, but with
read-locking there is the potential for a race during concurrent
receive.
This problem is not present in the IPv6 raw IP socket code, which
already used a stack variable for the address.
Spotted by: mav
MFC after: 1 week (before inpcbinfo rwlock changes)
and handle NIC hardware watchdog resets.
- remove buggy code at the top of mxge_tick() which tried
to detect a race which is already detected in the kernel's
callout code.
- move callout_stop() and callout_reset() into mxge_close()
mxge_open() rather than doing the callout manipulation
all over the place.
- use callout_drain(), rather than callout_stop() to prevent
a potential race between mxge_tick() and mxge_detach()
which could lead to softclock using a destroyed mutex
- restructure the mxge_tick() and mxge_watchdog_reset()
routines to avoid resetting a callout, and then
immediately stopping it if the watchdog reset routine
is called, and fails.
- enable the driver to handle NIC hardware watchdog
resets by restoring the NIC's PCI config space, which is
lost when the NIC hardware watchdog triggers.
Reviewed by: jhb (previus version)
The tcsetattr() routine already converts the TCSA* arguments to their
respective TIOCSETA* ioctl's in the C library. There is no need to have
these definitions inside the kernel.
Approved by: philip (mentor, implicit)
I think one of the reasons why we have so many conflicts in the TTY
ioctl category, is because the ioctl's aren't ordered logically. This
commit only sorts them by number. The comments may still be inaccurate.
Approved by: philip (mentor)
When I ported most applications away from <sgtty.h>, I noticed none of
them were actually using these definitions. I kept them in place,
because I didn't want to touch tools like pstat(8) and stty(1).
In preparation for the MPSAFE TTY layer, remove these definitions. This
doesn't have any impact with respect to binary compatibility (see
tty_conf.c).
We couldn now add an #error to <sys/ioctl_compat.h> when included
outside the kernel. Unfortunately, kdump's mkioctls includes this file
unconditionally.
Approved by: philip (mentor)
vr(4) overhauling(r177050).
It seems that filtering multicast addresses with multicast CAM
entries require accessing 'CAM enable bit' for each CAM entry.
Subsequent accessing multicast CAM control register without
toggling the 'CAM enable bit' seem to no effects.
In order to fix that separate CAM setup from CAM mask configuration
and CAM entry modification. While I'm here add VLAN CAM filtering
feature which will be enabled in future(FreeBSD now can receive
VLAN id insertion/removal event from vlan(4) on the fly).
For VT6105M hardware, explicitly disable VLAN hardware tag
insertion/stripping and enable VLAN CAM filtering for VLAN id 0.
This shall make non-VLAN frames set VR_RXSTAT_VIDHIT bit in Rx
status word.
Added multicast/VLAN CAM address definition to header file.
PR: kern/125010, kern/125024
MFC after: 1 week
years. All datasheet I have indicates the bit 15 is the
VR_RXSTAT_RX_OK. The bit 14 is reserved for all Rhine family
except VT6105M. VT6105M uses that bit to indicate a VLAN frame
with matching CAM VLAN id.
Use the VR_RXSTAT_RX_OK instead of VR_RXSTAT_RXERR when vr(4)
checks the validity of received frame.
This should fix occasional dropping frames on VT6105M.
Tested by: Goran Lowkrantz ( goran.lowkrantz at ismobile dot com )
MFC after: 1 week
completes the move to a fully parallel UDP transmit path by using
global read, rather than write, locking of inpcbinfo in further
semi-connected cases:
- Add macros to allow try-locking of inpcb and inpcbinfo.
- Always acquire an incpcb read lock in udp_output(), which stablizes the
local inpcb address and port bindings in order to determine what further
locking is required:
- If the inpcb is currently not bound (at all) and are implicitly
connecting, we require inpcbinfo and inpcb write locks, so drop the
read lock and re-acquire.
- If the inpcb is bound for at least one of the port or address, but an
explicit source or destination is requested, trylock the inpcbinfo
lock, and if that fails, drop the inpcb lock, lock the global lock,
and relock the inpcb lock.
- Otherwise, no further locking is required (common case).
- Update comments.
In practice, this means that the vast majority of consumers of UDP sockets
will not acquire any exclusive locks at the socket or UDP levels of the
network stack. This leads to a marked performance improvement in several
important workloads, including BIND, nsd, and memcached over UDP, as well
as significant improvements in pps microbenchmarks.
The plan is to MFC all of the rwlock changes to RELENG_7 once they have
settled for a weeks in the tree.
Tested by: ps, kris (older revision), bde
MFC after: 3 weeks
The uart(4) driver has the advantage of supporting a wider variety of
hardware on a greater amount of platforms. This driver has already been
the standard on platforms such as ia64, powerpc and sparc64.
I've decided not to change anything on pc98. I'd rather let people from
the pc98 team look at this.
Approved by: philip (mentor), marcel
set MNT_UPDATE in fsflags, and delete the
"update" option from the global mount options.
MNT_UPDATE is a command, and not a property of a mount
that should persist after the command is executed.
We need to do similar things for MNT_FORCE and MNT_RELOAD.
All mount flags are prefixed by MNT_..... it would
be nice if flags which were commands were named differently
from flags which are persistent properties of a mount.
This was not such a big deal in the pre-nmount() days,
but with nmount() it is more important.
Requested by: yar
MFC after: 2 weeks
1. The FreeBSD driver was setting an interrupt coalesce delay of 1000us
for reasons that I can only speculate on. This was hurting everything
from lame sequential I/O "benchmarks" to legitimate filesystem metadata
operations that relied on serialized barrier writes. One of my
filesystem tests went from 35s to complete down to 6s.
2. Implemented the Performant transport method. Without the fix in
(1), I saw almost no difference. With it, my filesystem tests showed
another 5-10% improvement in speed. It was hard to measure CPU
utilization in any meaningful way, so it's not clear if there was a
benefit there, though there should have been since the interrupt handler
was reduced from 2 or more PCI reads down to 1.
3. Implemented MSI-X. Without any docs on this, I was just taking a
guess, and it appears to only work with the Performant method. This
could be a programming or understanding mistake on my part. While this
by itself made almost no difference to performance since the Performant
method already eliminated most of the synchronous reads over the PCI
bus, it did allow the CISS hardware to stop sharing its interrupt with
the USB hardware, which in turn allowed the driver to become decoupled
from the Giant-locked USB driver stack. This increased performance by
almost 20%. The MSI-X setup was done with 4 vectors allocated, but only
1 vector used since the performant method was told to only use 1 of 4
queues. Fiddling with this might make it work with the simpleq method,
not sure. I did not implement MSI since I have no MSI-specific hardware
in my test lab.
4. Improved the locking in the driver, trimmed some data structures.
This didn't improve test times in any measurable way, but it does look
like it gave a minor improvement to CPU usage when many
processes/threads were doing I/O in parallel. Again, this was hard to
accurately test.
USB isochronous transfer support is required for Bluetooth SCO.
While i'm here change u_int to uint and update TODO.
This should produce no visible changes unless the device is
broken (or really old).
MFC after: 3 months
for the bio for swapout write. It allows the page allocator to drain
free page list deeper. As result, a deadlock where pageout deamon sleeps
waiting for bio to be allocated for swapout is no more reproducable in
practice.
Alan said that M_USE_RESERVE shall be ressurrected and used there, but
until this is implemented, M_NOWAIT does exactly what is needed.
Tested by: pho, kris
Reviewed by: alc
No objections from: phk
MFC after: 2 weeks (RELENG_7 only)
SI_ALIAS flag and initialization of the si_parent when alias is created.
Assert that supplied parent device is not NULL.
Both situations could cause NULL dereference in the
devfs_populate_loop() when creating a symlink for SI_ALIAS'ed device.
Namely, cdp->cdp_c.si_parent may be NULL.
Reported by: mav
MFC after: 2 weeks
As a result, those arguments must be recombined before calling the real
syscal implementation. This change fixes 32-bit compatibility for
cpuset_getid(), cpuset_setid(), cpuset_getaffinity(), and
cpuset_setaffinity().
udp_output() so that argument validation occurs before jail processing.
Add additional comments explaining what's going on when we process
addresses and binding during udp_output().
MFC after: 3 weeks
Initialize %ds, %es, and %fs during CPU startup. Otherwise a garbage
value could leak to a 32-bit process if a process migrated to a different
CPU after exec and the new CPU had never exec'd a 32-bit process.
A more complete fix is needed, but this mitigates the most frequent
manifestations.
Obtained from: ups
it's non-NULL, as all callers can and should already do the required
checking. Update comments a bit more to talk about rawcb allocation
for consumers.
Reviewed by: bz
MFC after: 3 weeks
it in detail.
When setting media, don't error out when a specific media is selected.
# Note: There may be some issues still here since the EtherJet PC Card doesn't
# conform to the datasheet. Many different kinds of dongles can be plugged in
# and it is unknown how to ask which one it is.
Also, add a /* bad! */ comment to a 1/2 second delay after we set the
DC/DC parameters. This should be a *sleep of some sort for !cold.
Fortunately it is the only one and is only used when setting media, so
the benefit from removing it is small. Unfortunately, it likely
serves as an exemplar of good programming techniques, which it isn't.
2) Adds some __UserSpace__ on some of the common defines that
the user space code needs
3) Fixes a bug when we send up data to a user that failed. We
need to a) trim off the data chunk headers, if present, and
b) make sure the frag bit is communicated properly for the
msgs coming off the stream queues... i.e. we see if some
of the msg has been taken.
Obtained from: jeli contributed the VIMAGE changes on this pass Thanks Julain!
socket support. These utility routines are used only for routing and
pfkey sockets, neither of which have a notion of address, so were
required to mock up fake socket addresses to avoid connection
requirements for applications that did not specify their own fake
addresses (most of them).
Quite a bit of the removed code is #ifdef notdef, since raw sockets
don't support bind() or connect() in practice. Removing this
simplifies the raw socket implementation, and removes two (commented
out) uses of dtom(9).
Fake addresses passed to sendto(2) by applications are ignored for
compatibility reasons, but this is now done in a more consistent way
(and with a comment). Possibly, EINVAL could be returned here in
the future if it is determined that no applications depend on the
semantic inconsistency of specifying a destination address for a
protocol without address support, but this will require some amount
of careful surveying.
NB: This does not affect netinet, netinet6, or other wire protocol
raw sockets, which provide their own independent infrastructure with
control block address support specific to the protocol.
MFC after: 3 weeks
Reviewed by: bz
when it worked as generic IDE.
PR: 125422
Submitted by: Andrey V. Elsukov <bu7cher at yandex dot ru>
Approved by: imp (mentor, implicit)
MFC after: 1 week
generation of RTL810x PCIe fast ethernet controller. Note, Tx/Rx
descriptor format is different from that of first generation of
RTL8101E series. Jumbo frame is not supported for RTL810x
family.
Tested by: NAGATA Shinya ( maya AT negeta DOT com )
mutexes and replacing the obsolete if_watchdog interface. The ndis_ticktask
function calls into ieee80211_new_state under one condition with NDIS_LOCK
held. The ieee80211_new_state would call into ndis_start in some cases too,
resulting in the occasional case where ndis_start acquires NDIS_LOCK from
inside the NDIS_LOCK held by ndis_ticktask.
Obtained from: Paul B. Mahol <onemda@gmail.com>
MFC after: 1 week
page directory pages from VM_MIN_KERNEL_ADDRESS through the end of the
kernel's bss. Specifically, the dependence was in pmap_growkernel()'s one-
time initialization of kernel_vm_end, not in its main body. (I could not,
however, resist the urge to optimize the main body.)
Reduce the number of preallocated page directory pages to just those needed
to support NKPT page table pages. (In fact, this allows me to revert a
couple of my earlier changes to create_pagetables().)
page table pages have to be preallocated ...'', violates an assumption made
by minidumpsys(): kernel_vm_end is the highest virtual address that has ever
been used by the kernel. Now, however, the kernel code, data, and bss may
reside at addresses beyond kernel_vm_end. This revision modifies the upper
bound on minidumpsys()'s two page table traversals to account for this
possibility.
Use the new inline function in ia64_invalidate_icache().
While there, add proper synchronization so that we know
the fc.i instructions have taken effect when we return.
to vm_page_alloc() instead of VM_ALLOC_SYSTEM. VM_ALLOC_SYSTEM was the
logical choice before FreeBSD 7.0 because VM_ALLOC_INTERRUPT could not
reclaim a cached page. Simply put, there was no ordering between
VM_ALLOC_INTERRUPT and VM_ALLOC_SYSTEM as to which "dug deeper" into the
cache and free queues. Now, there is; VM_ALLOC_INTERRUPT dominates
VM_ALLOC_SYSTEM.
While I'm here, teach pmap_growkernel() to request a prezeroed page.
MFC after: 1 week
inpcb. When directly invoking udp_notify() from udp_ctlinput(), acquire
only a read lock; we may still see write locks in udp_notify() as the
in_pcbnotifyall() routine is shared with TCP and always uses a write lock
on the inpcb being notified.
MFC after: 1 month
some code paths, global or inpcb write locks are required, but for other
code paths, read locks or no locking at all are sufficient for the data
structures.
MFC after: 1 month
source or a specific destination address is requested as part of a send
on a UDP socket, read lock the inpcb rather than write lock it. This
will allow fully parallel transmit down to the IP layer when sending
simultaneously from multiple threads on a connected UDP socket.
Parallel transmit for more complex cases, such as when sendto(2) is
invoked with an address and there's already a local binding, will
follow.
MFC after: 1 month
the syscall code and acquires various event subsystem locks as needed.
The handling of the NOTE_TRACK for EVFILT_PROC is currently done by
calling the kqueue_register() from filt_proc() filter, causing recursive
entrance of the kqueue code. This results in the LORs and recursive
acquisition of the locks.
Implement the variant of the knote() function designed to only handle
the fork() event. It mostly copies the knote() body, but also handles
the NOTE_TRACK, removing the handling from the filt_proc(), where it
causes problems described above. The function is called from the fork1()
instead of knote().
When encountering NOTE_TRACK knote, it marks the knote as influx
and drops the knlist and kqueue lock. In this context call to
kqueue_register is safe from the problems.
An error from the kqueue_register() is reported to the observer as
NOTE_TRACKERR fflag.
PR: 108201
Reviewed by: jhb, Pramod Srinivasan <pramod juniper net> (previous version)
Discussed with: jmg
Tested by: pho
MFC after: 2 weeks
just like BIOCSETF but it doesn't drop all the packets buffered on
the discriptor and reset the statistics.
Also, when setting the write filter, don't drop packets waiting to
be read or reset the statistics.
PR: 118486
Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz>
MFC after: 1 month
KQ_FLUX_WAKEUP(). Since the later macro clears the KQ_FLUXWAIT, the
kqueue_scan() thread may be not woken up.
Move the setting of KQ_FLUXWAIT after wakeup to correct the issue.
Reported and tested by: pho
MFC after: 3 days
level. The distinction was artificial. Some more movement around the
deck charis is likely depending on the fallout from this one.
Paths were corrected after the svn mv. Hope that's OK.
appropriate (versions not appropriate to merge omitted):
o 1.226 imp nop, save for NetBSD string (minor merging the other way)
o 1.225 jnemeth Coreage LAPCCTXD
o 1.224 martin (remove 3rd and 4th clauses)
o 1.223 kiyohara (TDK bluetooth PC Card)
o 1.222 kiyohara (Anycom BlueCard)
o 1.221 ichiro (NEC Infrontia AX420N)
o 1.219 jmcneill (EDIMAX EP-4101)
o 1.213 tsutsui (TEAC IDECARDII entry fix)
Also, while I'm here, fix some tab problems that have crept in.
Our hook creates the sysctl node before root is mounted, but after cpu
is probed. It seems that k8temp can be loaded before the cpu module and,
in those cases, dev.cpu.0.temperature was not created.
PR: 124939
is reclaimed by the kernel. This fixes a bug resulted in the kernel
over writing packet data while user-space was still processing it when
zerocopy is enabled. (Or a panic if invariants was enabled).
Discussed with: rwatson
- the protosw entries are used directly
- the usrreq functions are library routines, generally wrapped by
consumers rather than being used directly
- the usrreq structure entries are likewise typically wrapped
Remove the rather incorrect #if 0'd pr_input_t prototype for raw_input.
MFC after: 3 days
global symbols, such as raw_input and raw_output, to have lmc_ prefixes.
This doesn't affect actual functionality since the functions are static,
but will limit the opportunities for current confusion and future
difficulty.
MFC after: 3 days
into a single "__asm"-statement as GCC doesn't guarantee their
consecutive output even when using consecutive "__asm __volatile"-
statement for them. Remove the otherwise unnecessary "__volatile". [1]
- The inline assembler instructions used here alter the condition
codes so add them to the clobber list accordingly.
- The inline assembler instructions used here uses output operands
before all input operands are consumed so add appropriate modifiers.
Pointed out by: bde [1]
MFC after: 2 weeks
to global hostname and domainname variables. Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout(). A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.
Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.
MFC after: 3 weeks
AcpiEvaluateObject() calls, otherwise, we are not able to bring devices
back up (NULL means 0, hence always off).
While there add missing WLAN on/off support.
MFC after: 3 days
Pointy hat to: rpaulo
MPSAFE patches on current@ and stable@. This driver also has a fundamental
issue in that it sleeps when sending commands to the card including in the
if_init/if_start routines (which can be called from interrupt context). As
such, the driver shouldn't be working reliably even on 4.x.
- Add a mutex to the softc and use it to protect the softc and device
hardware.
- Setup interrupt handler after ether_ifattach().
- Remove unused sbsh_watchdog() routine.
- Protect against concurrent attempts to load firmware.
possible to exhaust and garble stack with a packet that contains a couple
of hundreds nested encapsulation levels.
Submitted by: Ming Fu <fming@borderware.com>
Reviewed by: rwatson
PR: kern/85320
- Add a mutex to the softc and use it to protect the softc and device
hardware.
- Setup interrupt handler after attaching device to network stack.
- Use device_set_desc() rather than device_quiet() plus a manual printf
that simulates the normal probe printf.
- Axe next_sbni_unit and instead just leave room for two sbni devices for
each bus attachment.
- Don't bzero the already-zero'd softc.
- Add a detach method to the PCI driver.
- Add a lock to protect the list of available devices used to chain
interrupt handlers for dual port ISA cards.
- Remove unused watchdog routine.
- If if_alloc() fails, make sbni_attach() return an error rather than
panic'ing.
- Consolidate code to free bus resources into sbni_release_resources().
- Clear IFF_DRV_RUNNING|OACTIVE in stop() routine instead of in callers.
- Let ether_ioctl() handle SIOCSIFMTU.
and stable@. It also is a driver for an older non-802.11 wireless PC card
that is quite slow in comparison to say, wi(4). I know Warner wants this
driver axed as well.
- Add a mutex to the softc and use it to lock the softc and device hardware.
- Use a private timer to replace if_watchdog/if_timer.
- Use if_printf() rather than if_xname.
- Setup interrupt handler after ether_ifattach().
current@ and stable@ for the locking patches. The driver can always be
revived if someone tests it.
This driver also sleeps in its if_init routine, so it likely doesn't really
work at all anyway in modern releases.
- Add a mutex to the softc and use it to protect the softc and device
hardware.
- Setup interrupt handler after interface attach.
- Retire 'unit' from softc and use if_printf() instead.
- Don't frob IFF_UP in the driver.
- Use callout_() rather than timeout() and untimeout().
- Add a mutex to the softc and use it to protect the softc and device
hardware.
- Setup interrupt handler after ether_ifattach().
- Use a private timer instead of if_timer/if_watchdog.
- Retire arl_unit from the softc and use if_printf() and device_printf()
instead.
Note that the unpatched driver in 6.x and later does not work with the
hardware, so the one person who had volunteered to test the patch wasn't
able to test it.
dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific
netisr handlers to always be dispatched via a queue (deferred). Mark the
usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly
acquire Giant in those handlers.
Previously, any netisr handler not marked NETISR_MPSAFE would necessarily
run deferred and with Giant acquired. This change removes Giant
scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows
non-MPSAFE handlers to continue to force deferred dispatch so as to avoid
lock order reversals between their acqusition of Giant and any calling
context.
It is likely we will be able to remove NETISR_FORCEQUEUE once
IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no
longer be supported.
Reviewed by: bz
MFC after: 1 month
X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons,
but the rest can go back.
soun->sun_path isn't a null-terminated string. As UNIX(4) states, "the
terminating NUL is not part of the address." Since strlcpy has to return
"the total length of the string [it] tried to create," it walks off the end
of soun->sun_path looking for a \0.
This reverts r105332.
Reported by: Ryan Stone
disabled if one (or more) of the member interfaces does not support it. Always
turn off LRO since we can not bridge a combined frame.
Tested by: Stefan Lambrev
generating an RTM_MISS for every IP packet forwarded making user space
routing daemons unhappy.
PR: kern/123621, kern/124540, kern/122338
Reported by: Paul <paul gtcomm.net>, Mike Tancsa <mike sentex.net> on net@
Tested by: Paul and Mike
Reviewed by: andre
MFC after: 3 days
code is believed to be MPSAFE, and leaving aside the IPv6 route cache in
forwarding, Giant appears not to adequately synchronize the data structures
in the input or forwarding paths.
ceiling as a fraction of the kernel map's size rather than an absolute
quantity. Thus, scaling of the kmem map's size will be automatic with
changes to the kernel map's size.
datagram-only protocols, such as UDP. This version removes use of
sblock(), which is not required due to an inability to interlace data
improperly with datagrams, as well as avoiding some of the larger loops
and state management that don't apply on datagram sockets.
This is experimental code, so hook it up only for UDPv4 for testing; if
there are problems we may need to revise it or turn it off by default,
but it offers *significant* performance improvements for threaded UDP
applications such as BIND9, nsd, and memcached using UDP.
Tested by: kris, ps
there still being some well-known races in mld6 and nd6, running with
Giant over the netisr handler provides little or not additional
synchronization that might cause mld6 and nd6 to behave better.
already commited but with a wrong msleep variant and then
backed out. Note that this changes the semantic a little
as msleep_spin does not let us to specify priority after
wakeup.
Approved by: wkoszek, cognet
Approved by: kib (mentor)
ATM Tx/Rx checksum offload is supported but TSO and jumbo frame is
not yet supported. Because these newer controllers use different
descriptor formats, a flag RL_FLAG_DESCV2 flag was introduced to
handle that case in Tx/Rx handler. Also newer controllers seems to
require to not touch 'enable Tx/Rx bit' in RL_CPLUS_CMD register
so don't blindly try to set that bits.
Note, it seems that there is still power-saving related issue where
driver fails to attach PHY. Rebooting seems to fix that issue but
number of required reboots varys.
Many thanks to users that helped during developement. I really
appreciate their patient and test/feedbacks.
a dedicated flag that represents controller capabilities/events.
This will simplify many part of code that requires different
workaround for each controller revisions and will enhance
readability.
While I'm here move PHY wakeup code up before mii_phy_probe() which
seems to help to wake PHY in some cases.
RL_TXCFG register to identify a device in device probe. Reflect the
fact by modifing device description with general ethernet
controller family.
Note, rl_basetype in struct rl_type is not used and the more
detailed information is provided with rl_hwrev structure.
Previously we reused the space in the request buffer after the request
header to hold config pages during a transaction. This does not work when
reading large pages however. Also, we were already malloc'ing a buffer to
do a copyin/copyout w/o holding locks that was then copied into/out of the
request buffer. Instead, go ahead and use bus dma to alloc a buffer for
each config page request (and RAID actions that have an associated
ActionSGE). This results in fewer data copies and allows for larger sized
requests. For now the maximum size of a request is arbitrarily limited to
16 MB.
MFC after: 2 weeks
locally configured. This is more in line with the behaviour of other popular
bridging implementations and makes bridges more predictable after reboots for
example.
Reviewed by: thompsa
MFC after: 1 week
rather than write locking: while we need to maintain a valid reference
to the inpcb and fix its state, no protocol layer state is modified
during an IPv4 UDP receive -- there are only changes at the socket
layer, which is separately protected by socket locking.
While parallel concurrent receive on a single UDP socket is currently
relatively unusual, introducing read locking in the transmit path,
allowing concurrent receive and transmit, will significantly improve
performance for loads such as BIND, memcached, etc.
MFC after: 2 months
Tested by: gnn, kris, ps
in practice, the error (currently) makes no difference because the computation
performed by KVADDR() hides the error. This revision fixes the error.
Also, eliminate a (now) unused definition.
maximum size of the kmem map can be greater than 4GB, there is little point
in making the kernel virtual address space larger than 6GB.
Tested by: kris@
Now that the pseudo-interface cloner has an internal list of instances,
there is no need to create a softc. The softc only contains a pointer to
the ifp, which means there is no valid reason to keep it. While there,
remove the corresponding malloc-pool.
Approved by: philip (mentor)
Adaptec RAID 2045
Adaptec RAID 2405
Adaptec RAID 2445
Adaptec RAID 2805
Without this change these devices are supported by the driver's family
support, but they then appear as "Adaptec RAID Controller" in boot
messages and the dev.aac.0.%desc sysctl.
This includes hotkeys support and sysctl variables to control camera
and card reader. These new sysctls don't have CTFLAG_ANYBODY set.
While there add entries to devd.conf related to the Eee volume keys.
Reviewed by: phillip
MFC after: 1 week
Also tested by: lme (previous version)
semaphores. Specifically, semaphores are now represented as new file
descriptor type that is set to close on exec. This removes the need for
all of the manual process reference counting (and fork, exec, and exit
event handlers) as the normal file descriptor operations handle all of
that for us nicely. It is also suggested as one possible implementation
in the spec and at least one other OS (OS X) uses this approach.
Some bugs that were fixed as a result include:
- References to a named semaphore whose name is removed still work after
the sem_unlink() operation. Prior to this patch, if a semaphore's name
was removed, valid handles from sem_open() would get EINVAL errors from
sem_getvalue(), sem_post(), etc. This fixes that.
- Unnamed semaphores created with sem_init() were not cleaned up when a
process exited or exec'd. They were only cleaned up if the process
did an explicit sem_destroy(). This could result in a leak of semaphore
objects that could never be cleaned up.
- On the other hand, if another process guessed the id (kernel pointer to
'struct ksem' of an unnamed semaphore (created via sem_init)) and had
write access to the semaphore based on UID/GID checks, then that other
process could manipulate the semaphore via sem_destroy(), sem_post(),
sem_wait(), etc.
- As part of the permission check (UID/GID), the umask of the proces
creating the semaphore was not honored. Thus if your umask denied group
read/write access but the explicit mode in the sem_init() call allowed
it, the semaphore would be readable/writable by other users in the
same group, for example. This includes access via the previous bug.
- If the module refused to unload because there were active semaphores,
then it might have deregistered one or more of the semaphore system
calls before it noticed that there was a problem. I'm not sure if
this actually happened as the order that modules are discovered by the
kernel linker depends on how the actual .ko file is linked. One can
make the order deterministic by using a single module with a mod_event
handler that explicitly registers syscalls (and deregisters during
unload after any checks). This also fixes a race where even if the
sem_module unloaded first it would have destroyed locks that the
syscalls might be trying to access if they are still executing when
they are unloaded.
XXX: By the way, deregistering system calls doesn't do any blocking
to drain any threads from the calls.
- Some minor fixes to errno values on error. For example, sem_init()
isn't documented to return ENFILE or EMFILE if we run out of semaphores
the way that sem_open() can. Instead, it should return ENOSPC in that
case.
Other changes:
- Kernel semaphores now use a hash table to manage the namespace of
named semaphores nearly in a similar fashion to the POSIX shared memory
object file descriptors. Kernel semaphores can now also have names
longer than 14 chars (up to MAXPATHLEN) and can include subdirectories
in their pathname.
- The UID/GID permission checks for access to a named semaphore are now
done via vaccess() rather than a home-rolled set of checks.
- Now that kernel semaphores have an associated file object, the various
MAC checks for POSIX semaphores accept both a file credential and an
active credential. There is also a new posixsem_check_stat() since it
is possible to fstat() a semaphore file descriptor.
- A small set of regression tests (using the ksem API directly) is present
in src/tools/regression/posixsem.
Reported by: kris (1)
Tested by: kris
Reviewed by: rwatson (lightly)
MFC after: 1 month
unsynchronized. While races were extremely rare, we've now had a
couple of reports of panics in environments involving large numbers of
IPSEC tunnels being added very quickly on an active system.
- Add accessor functions ifnet_byindex(), ifaddr_byindex(),
ifdev_byindex() to replace existing accessor macros. These functions
now acquire the ifnet lock before derefencing the table.
- Add IFNET_WLOCK_ASSERT().
- Add static accessor functions ifnet_setbyindex(), ifdev_setbyindex(),
which set values in the table either asserting of acquiring the ifnet
lock.
- Use accessor functions throughout if.c to modify and read
ifindex_table.
- Rework ifnet attach/detach to lock around ifindex_table modification.
Note that these changes simply close races around use of ifindex_table,
and make no attempt to solve the probem of disappearing ifnets. Further
refinement of this work, including with respect to ifindex_table
resizing, is still required.
In a future change, the ifnet lock should be converted from a mutex to an
rwlock in order to reduce contention.
Reviewed and tested by: brooks
- Each log entry contains a text description in the "description" field of
the entry. The existing decode logic always ended up duplicating
information that was already in the description string. This made the
logs overly verbose. Now we just print out the description string.
- Add some simple parsing of the timestamp and event classes.
Reviewed by: ambrisko, scottl
MFC after: 2 weeks
- Fetch events from the controller in batches of 15 rather than a single
event at a time.
- When fetching events from the controller, honor the event class and
locale settings (via hw.mfi tunables). This also allows the firmware to
skip over unwanted log entries resulting in fewer requests to the
controller if there many unwanted log entries since the last clean
shutdown.
- Don't drop the driver mutex while decoding an event.
- If we get an error other than MFI_STAT_NOT_FOUND (basically EOF for
hitting the end of the event log) then emit a warning and bail on
processing further log entries.
Reviewed by: ambrisko, scottl
MFC after: 2 weeks
to INT_MAX. Otherwise, a process could create a semaphore (or increase
its value via ksem_post()) beyond INT_MAX and sem_getvalue() would return
a negative value. sem_getvalue() is only supposed to return a negative
value if that is the number of waiters for that semaphore.
MFC after: 2 weeks
provides the correct semantics for flock(2) style locks which are used by the
lockf(1) command line tool and the pidfile(3) library. It also implements
recovery from server restarts and ensures that dirty cache blocks are written
to the server before obtaining locks (allowing multiple clients to use file
locking to safely share data).
Sponsored by: Isilon Systems
PR: 94256
MFC after: 2 weeks
Bonus: including kern.mk just to pick kernel warning flags
was an extremely bad idea anyway, because it also picked
up CFLAGS (it probably wasn't the case at the time of CVS
rev. 1.1, I haven't checked). Remove duplicate CWARNFLAGS
from CFLAGS.
so we cannot compile it with -fstack-protector[-all] flags (or
it will self-recurse); this is ensured in sys/conf/files. This
OTOH means that checking for defines __SSP__ and __SSP_ALL__ to
determine if we should be compiling the support is impossible
(which it was trying, resulting in an empty object file). Fix
this by always compiling the symbols in this files. It's good
because it allows us to always have SSP support, and then compile
with SSP selectively.
Repoted by: tinderbox
- It is opt-out for now so as to give it maximum testing, but it may be
turned opt-in for stable branches depending on the consensus. You
can turn it off with WITHOUT_SSP.
- WITHOUT_SSP was previously used to disable the build of GNU libssp.
It is harmless to steal the knob as SSP symbols have been provided
by libc for a long time, GNU libssp should not have been much used.
- SSP is disabled in a few corners such as system bootstrap programs
(sys/boot), process bootstrap code (rtld, csu) and SSP symbols themselves.
- It should be safe to use -fstack-protector-all to build world, however
libc will be automatically downgraded to -fstack-protector because it
breaks rtld otherwise.
- This option is unavailable on ia64.
Enable GCC stack protection (aka Propolice) for kernel:
- It is opt-out for now so as to give it maximum testing.
- Do not compile your kernel with -fstack-protector-all, it won't work.
Submitted by: Jeremie Le Hen <jeremie@le-hen.org>
that modify condition codes (the carry bit, in this case). Without
"__volatile", the compiler might add the inline assembler instructions
between unrelated code which also uses condition codes, modifying the
latter.
This prevents the TCP pseudo header checksum calculation done in
tcp_output() from having effects on other conditions when compiled
with GCC 4.2.1 at "-O2" and "options INET6" left out. [1]
Reported & tested by: Boris Kochergin [1]
MFC after: 3 days
Now that st_rdev is being automatically generated by the kernel, there
is no need to define static major/minor numbers for the iodev and
memdev. We still need the minor numbers for the memdev, however, to
distinguish between /dev/mem and /dev/kmem.
Approved by: philip (mentor)
in_ifaddrhashtbl in in_ifinit because error handler in in_control removes
entries only for AF_INET addresses. If in_ifinit is called for the cloned
inteface that has just been created its address family is not AF_INET and
therefor LIST_REMOVE is not called for respective LIST_INSERT_HEAD and
freed entries remain in in_ifaddrhashtbl and lead to memory corruption.
PR: kern/124384
locked and unlocked completely in userland. by locking and unlocking mutex
in userland, it reduces the total time a mutex is locked by a thread,
in some application code, a mutex only protects a small piece of code, the
code's execution time is less than a simple system call, if a lock contention
happens, however in current implemenation, the lock holder has to extend its
locking time and enter kernel to unlock it, the change avoids this disadvantage,
it first sets mutex to free state and then enters kernel and wake one waiter
up. This improves performance dramatically in some sysbench mutex tests.
Tested by: kris
Sounds great: jeff
problem where Adaptec's arcconf monitoring tool hangs after producing
its expected output.
Submitted by: Adaptec, via driver ver 15317
MFC after: 1 week
from the softc.
- Rework the watchdog timer to match other NIC drivers:
- Start a timer in fe_init() that runs once a second and checks a counter
in the softc that is identical to the deprecated 'if_timer'.
- Just adjust the softc tx timeout value when sending packets instead of
scheduling the timer.
- Use IFQ_SET_MAXLEN().
Tested by: WATANABE Kazuhiro
FIFO, as required by SUSv3. No specific privilege check is performed
in this case, as FIFOs may be created by unprivileged processes
(subject to the normal file system name space restrictions that may be
in place).
Unlike the Apple implementation, we reject requests to create a FIFO
using mknod(2) if there is a non-zero dev argument to the system call,
which is permitted by the Open Group specification ("... undefined
..."). We might want to revise this if we find it causes
compatibility problems for applications in practice.
PR: kern/74242, kern/68459
Obtained from: Apple, Inc.
MFC after: 3 weeks
performed. Otherwise if ruleset is used by given mountpoint and is empty
it's freed by devfs_ruleset_reap and pointer becomes bogus.
Submitted by: Mateusz Guzik <mjguzik@gmail.com>
PR: kern/124853
some time now so collapse calls accordingly.
o Given that gem_load_txmbuf() is allowed to fail resulting in a packet
drop also for quite some time now implement the functionality of
gem_txcksum() by means of m_pullup(9), which de-obfuscates the code
and allows to always retrieve the correct length of the IP header.
o Add missing BUS_DMASYNC_PREREAD when syncing the control DMA maps in
gem_rint() and gem_start_locked().
o Correct some bus_barrier(9) calls to do a read/write barrier as we
do a read after a write. Add some missing ones in gem_mii_readreg()
and gem_mii_writereg().
o According to the Apple GMAC driver, the GEM ASIC specification and
the OpenSolaris eri(7D) the TX FIFO threshold has to be set to 0x4ff
for the Gigabit variants and 0x100 for the ERI in order do avoid TX
underruns.
o In gem_init_locked():
- be conservative and enable the RX and TX MACs,
- don't clear GEM_LINK otherwise we don't ever mark the link as up
again if gem_init_locked() is called from gem_watchdog(),
- remove superfluous setting of sc_ifflags.
o Don't bother to check whether the interface is running or whether its
queue is empty before calling gem_start_locked() in gem_tint(), the
former will check these anyway.
o Call gem_start_locked() in gem_watchdog() in order to try to get
some more packets going.
o In gem_mii_writereg() after reseting the PCS restore its configuration.
GMAC testing: grehan, marcel
MFC after: 2 weeks
on the amd64 architecture. The amd64 architecture requires kernel code and
global variables to reside in the highest 2GB of the 64-bit virtual address
space. Thus, the memory allocated during bootstrap, before the call to
kmem_init(), starts at KERNBASE, which is not necessarily the same as
VM_MIN_KERNEL_ADDRESS on amd64.
PowerPC/AIM. Consequently, it should not be used to determine the maximum
number of kernel map entries. Intead, use VM_MIN_KERNEL_ADDRESS, which marks
the start of the kernel map on all architectures.
Tested by: marcel@ (PowerPC/AIM)
KERNBASE and VM_MIN_KERNEL_ADDRESS are no longer the same, the physical
memory allocated during bootstrap will be offset from the low-end of the
kernel's page table.
address space on the amd64 architecture. The amd64 architecture
requires kernel code and global variables to reside in the highest 2GB
of the 64-bit virtual address space. Thus, KERNBASE cannot change.
However, KERNBASE is sometimes used as the start of the kernel virtual
address space. Henceforth, VM_MIN_KERNEL_ADDRESS should be used
instead. Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same
address, there should be no visible effect from this change (yet).
That said, kris@ has tested crash dumps under the full patch that
increases the kernel virtual address space on amd64 to 6GB.
Tested by: kris@
address space on the amd64 architecture. The amd64 architecture
requires kernel code and global variables to reside in the highest 2GB
of the 64-bit virtual address space. Thus, KERNBASE cannot change.
However, KERNBASE is sometimes used as the start of the kernel virtual
address space. Henceforth, VM_MIN_KERNEL_ADDRESS should be used
instead. Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same
address, there should be no visible effect from this change (yet).
This is needed for correct behavior when packets are lost or reordered.
PR: kern/123950
Reviewed by: andre@, silby@
Reported by: Yahoo!, Wang Jin
MFC after: 1 week
needed to promote cdev to cdev_priv, the si_priv pointer was followed.
Use member2struct() to calculate address of the wrapping cdev_priv.
Rename si_priv to __si_reserved.
Tested by: pho
Reviewed by: ed
MFC after: 2 weeks
libi386's time(), caused by a qemu bug. The bug might
be present in other BIOSes, too.
qemu either does not simulate the AT RTC correctly or
has a broken BIOS 1A/02 implementation, and will return
an incorrect value if the RTC is read while it is being
updated.
The effect is worsened by the fact that qemu's INT 15/86
function ("wait" a.k.a. usleep) is non-implmeneted or
broken and returns immediately, causing beastie.4th to
spin in a tight loop calling the "read RTC" function
millions of times, triggering the problem quickly.
Therefore, we keep reading the BIOS value until we get
the same result twice. This change fixes beastie.4th's
countdown under qemu.
Approved by: des (mentor)
- only one functino to destroy an SCTP stack sctp_finish()
- Make it so this function also arranges for any threads
created by the image to do a kthread_exit()
of whether NETATALKDEBUG is enabled, so make building it conditional on
NETATALK instead. This problem appears to have been present from the time
that the netatalk implementation was imported.
PR: 124456
Submitted by: Nathan Whitehorn <whitehorn at wisc dot edu>
MFC after: 3 days
sgtty was the original interface to configure terminal attributes on my
UNIX-like operating systems. It has been deprecated by the POSIX termios
interface, which is implemented in almost any modern system.
An advantage of turning this into a binary compatibility interface, is
that we can now eventually remove the COMPAT_43TTY switch from kernel
configurations. This removes many ioctl()'s from the TTY layer.
While there, increase the __FreeBSD_version, which may be useful for the
people working on the Ports tree.
Reviewed by: kib
Approved by: philip (mentor)
- Vimage prep - these are major restructures to move
all global variables to be accessed via a macro or two.
The variables all go into a single structure.
- Asconf address addition tweaks (add_or_del Interfaces)
- Fix rwnd calcualtion to be more conservative.
- Support SACK_IMMEDIATE flag to skip delayed sack
by demand of peer.
- Comment updates in the sack mapping calculations
- Invarients panic added.
- Pre-support for UDP tunneling (we can do this on
MAC but will need added support from UDP to
get a "pipe" of UDP packets in.
- clear trace buffer sysctl added when local tracing on.
Note the majority of this huge patch is all the vimage prep stuff :-)
same as the global variable defined in ip_input.c. Instead, adopt the name
'q' as found in about 1/2 of uses in ip_input.c, preventing a collision on
the name. This is non-harmful, but means that search and replace on the
global works less well (as in the virtualization work), as well as indexing
tools.
MFC after: 1 week
Reported by: julian
before PG_M. This sometimes prevents unnecessary removal of write access
from a PTE. Overall, the net result is fewer demotions and promotion
failures.
- Add a mutex to the softc to protect the softc and device hardware.
- Use a private watchdog timer.
- Setup interrupt handler after ether_ifattach().
- Use bus_foo() rather than bus_space_foo() and remove bus space tag and
handle from softc.
Tested by: imp
Now that we got rid of the minor-to-unit conversion and the constraints
on device minor numbers, we can convert the functions that operate on
minor and unit numbers to simple macro's. The unit2minor() and
minor2unit() macro's are now no-ops.
The ZFS code als defined a macro named `minor'. Change the ZFS code to
use umajor() and uminor() here, as it is the correct approach to do
this. Also add $FreeBSD$ to keep SVN happy.
Approved by: philip (mentor), pjd
page table page. The direction of the traversal can matter if
pmap_promote_pde() has to remove write access (PG_RW) from a PTE that hasn't
been modified (PG_M). In general, if there are two or more such PTEs to
choose among, it is better to write protect the one nearer the high end of
the page table page rather than the low end. This is because most programs
access memory in an ascending direction. The net result of this change is a
sometimes significant reduction in the number of failed promotion attempts
and the number of pages that are write protected by pmap_promote_pde().
Except for the case where we use the cloner library (clone_create() and
friends), there is no reason to enforce a unique device minor number
policy. There are various drivers in the source tree that allocate unr
pools and such to provide minor numbers, without using them themselves.
Because we still need to support unique device minor numbers for the
cloner library, introduce a new flag called D_NEEDMINOR. All cdevsw's
that are used in combination with the cloner library should be marked
with this flag to make the cloning work.
This means drivers can now freely use si_drv0 to store their own flags
and state, making it effectively the same as si_drv1 and si_drv2. We
still keep the minor() and dev2unit() routines around to make drivers
happy.
The NTFS code also used the minor number in its hash table. We should
not do this anymore. If the si_drv0 field would be changed, it would no
longer end up in the same list.
Approved by: philip (mentor)
mtx interface for NDIS_LOCK/UNLOCK. This should result in less
CPU utilization on behalf of the ndis driver. Additionally, this
commit also fixes a potential LOR in the ndis_tick code, by
not locking inside the ndis_tick function, but instead delegating
that work to the helpers called through IoQueueWorkItem. The
way that this is currently set up for NDIS prevents us from
simply implementing a callout_init_mtx mechanism.
However, the helper functions that handle the various timeout
cases implement fine-grained locking using the spinlocks provided
by the NDIS-compat layer, and using the mtx that is added with
this commit. This leaves the following ndis_softc members operated
on in ndis_tick in an unlocked context:
* ndis_hang_timer - Only modified outside of ndis_tick once, before
the first callout_reset to schedule ndis_tick
* ifp->if_oerrors - Only incremented in two places, which should be
an atomic op
* ndis_tx_timer - Assigned to 5 (when guaranteed to be 0) or 0
(in txeof), to indicate to ndis_tick what to
do. This is the only member of which I was
suspicious for needing the NDIS_LOCK here. My
testing (and another's) have been fine so far.
* ndis_stat_callout - Only uses a simple set of callout routines,
callout_reset only called by ndis_tick after
the initial reset, and then callout_drain is
used exactly once in shutdown code.
The benefit is that ndis_tick doesn't acquire NDIS_LOCK unless one of
the timeout conditions is flagged, and it still obeys the locking
order semantics that are dictated by the NDIS layer at the moment. I
have been investigating a more thorough s/spinlock/mtx/ of the NDIS
layer, but the simplest naive approach (replace KeAcquireSpinLock
with an mtx implementation) has anti-succeeded for me so far. This
is a good first step though.
Tested by: onemda@gmail.com
Reviewed by: current@, jhb, thompsa
Proposed by: jhb
The while loop that is assumed to initialize the uio_off later, may
be not entered at all, causing uninitialized value to be returned in
uio->uio_offset.
PR: 122925
Submitted by: Jaakko Heinonen <jh saunalahti fi>
MFC after: 1 weeks
We still use the interrupt filter due to performance problems that show up if
we don't. The main problem seen is that, due to the interrupt being edge
triggered, we occasionally miss interrupts which leads us to not notice that
we can transmit more packets. Using the new approach, which just schedules
a task on a taskqueue, we are guaranteed to have the task run even if the
interrupt arrived while we were already executing. If we were to use an
ithread the system would mask the interrupt while the handler is run and we'd
miss interrupts.
allocated semaphores, so it's wrong to increase it conditionally,
in this case for every over-the-limit semaphore nsegs is decreased
without being previously increased.
PR: kern/123685
Approved by: cognet (mentor)
- Add a mutex to the softc to protect the softc and device hardware.
- Use a private timer to implement a watchdog for tx timeouts and drive
the timer for auto negotiation.
- Use bus_foo() rather than bus_space_foo() and remove the bus space
tag & handle from the softc.
- Call bus_setup_intr() after ether_ifattach().
Tested by: Florian Smeets flo of kasimir.com
Remove the code which disables port status change interrupts for 1s
when one occured -- this makes that events get lost or delayed until
the next change.
Obtained from: NetBSD
- Fixed a problem on i386 architecture when using split header/jumbo frame
firmware caused by hardware alignment requirements.
- Added #define BCE_USE_SPLIT_HEADER to allow the feature to be enabled/
disabled. Enabled by default.
PR: kern/123696
MFC after: 2 weeks
and nfs requests processing. Lockmgr lock provides the shared locking for
nfs requests, while exclusive mode is used for modifications. The writer
starvation is handled by lockmgr too.
Reported by: kris, pho, many
Based on the submission by: mohan
Tested by: pho
MFC after: 2 weeks
In the FreeBSD base system, there are only two utilities that use struct
tty, namely pstat and sicontrol. The sicontrol utility calls the
TCSI_TTY ioctl(), which copies struct tty back to userspace.
sicontrol should not have this functionality. The same data is already
provided by pstat. If we really want to be able to export these numbers
through a file descriptor to userspace, we can export struct xtty, which
should provide a better abstraction. The ttystat option was only used as
a debugging aid.
This makes sicontrol compile in the mpsafetty branch.
Reviewed by: peter
Approved by: philip (mentor)
newvers.sh is run pwd is actually the obj directory, so "../../.svn"
doesn't exist and the test always fails. The second is that buildkernel
is executed with a restrictive PATH, so unless you have svnversion in
/bin or /usr/bin it can't run.
Fix this by looking for svnversion in /bin, /usr/bin, and /usr/local/bin
in that order. If found, store the location and derive the value of the
source directory. Then run svnversion in the appropriate directory.
There is one possible refinement which would be to add a test for
LOCALBASE!=/usr/local if we don't find svnversion the first time, but
IMO that's not necessary at this time.
is in little endian form. Likewise setting DC_AL_PAR0/DC_AL_PAR1
register expect the address to be in little endian form. For big
endian architectures the address should be swapped to get correct
one.
Change setting/getting ethernet hardware address to big endian
architecture frendly.
Reported by: Robert Murillo ( billypilgrim782001 at yahoo dot com )
Tested by: Robert Murillo ( billypilgrim782001 at yahoo dot com )
some longstanding issues:
o pass the vap since it's now the "coin of the realm" and required
to do things like set initial tx parameters in private node
state for use prior to association
o pass the mac address as cards that maintain outboard station
tables require this to create an entry (e.g. in ibss mode)
o remove the node table reference, we only have one node table
and it's unlikely this will change so this is not needed to
find the com structure
entry in the SMAP is a 20 byte structure and they are queried from the
BIOS via sucessive BIOS calls. Due to an apparent bug in the R900's
BIOS, for some SMAP requests the BIOS overflows the 20 byte buffer
trashing a few bytes of memory immediately after the SMAP structure. As
a workaround, add 8 bytes of padding after the SMAP structure used in
the loader for SMAP queries.
PR: i386/122668
Submitted by: Mike Hibler mike flux.utah.edu, silby
MFC after: 3 days
- Store the softc of the device in the 'si_drv1' of the cdev.
- Lookup the softc via 'si_drv1' in cdev methods rather than using the
minor number as a unit for devclass_get_softc().
- Lookup the device_t via the softc field in cdev methods rather than
using the minor number as a unit for devclass_get_device().
- Add a mutex to the softc to protect 'sc_opened'.
- Remove D_NEEDGIANT as all the smbus drivers are now MPSAFE and this driver
is now MPSAFE.
- Remove some checks for NULL softc pointers that can't happen and don't
bzero the softc during attach.
work. (Moreover, I don't believe that they have ever worked as intended.)
The explanation is fairly simple. Both MADV_DONTNEED and MADV_FREE perform
vm_page_dontneed() on each page within the range given to madvise(). This
function moves the page to the inactive queue. Specifically, if the page is
clean, it is moved to the head of the inactive queue where it is first in
line for processing by the page daemon. On the other hand, if it is dirty,
it is placed at the tail. Let's further examine the case in which the page
is clean. Recall that the page is at the head of the line for processing by
the page daemon. The expectation of vm_page_dontneed()'s author was that
the page would be transferred from the inactive queue to the cache queue by
the page daemon. (Once the page is in the cache queue, it is, in effect,
free, that is, it can be reallocated to a new vm object by vm_page_alloc()
if it isn't reactivated quickly enough by a user of the old vm object.) The
trouble is that nowhere in the execution of either MADV_DONTNEED or
MADV_FREE is either the machine-independent reference flag (PG_REFERENCED)
or the reference bit in any page table entry (PTE) mapping the page cleared.
Consequently, the immediate reaction of the page daemon is to reactivate the
page because it is referenced. In effect, the madvise() was for naught.
The case in which the page was dirty is not too different. Instead of being
laundered, the page is reactivated.
Note: The essential difference between MADV_DONTNEED and MADV_FREE is
that MADV_FREE clears a page's dirty field. So, MADV_FREE is always
executing the clean case above.
This revision changes vm_page_dontneed() to clear both the machine-
independent reference flag (PG_REFERENCED) and the reference bit in all PTEs
mapping the page.
MFC after: 6 weeks
it. Bad imp. Removing us dips us under 10,000 in size too.
o Replace an unconditional 30ms DELAY (yes, busy wait) with a check of the
SIBUSY bit in the SelfST register before accessing the eeprom. This changes
the time to read the EEPROM from 2 * 20 * 30ms (1.2s) to < 20*25us (.0005s)
and make the attach of the card tolerable when ethernet media is present.
Include data from the datasheet about why this works. While this is a 2500x
speed increase, it doesn't really matter at all once the card is probed...
o set dev earlier in softc.
o remove unused fields from softc and args from cs_alloc_irq
o remove some commented code that will never be implemented.
o Don't try to send a packet and see if it worked. We don't
need this anymore, and it doesn't add any value.
o tweaks for BNC and AUI.
o limit possible time hung in the kernel to 4s rather than 40s.
boards. This is enough to net-boot to multiuser.
Also supported is the SMSC LAN91C111 parts used on the netCF, netDUO and netMMC
add-on boards.
I'll be putting some instructions on how to boot this on the Gumstix boards
online soon.
This is still fairly rough and will be refined over time but I felt it was
better to get this out there where other people can help out.
sn(4) driver and also looking at newer drivers. The reason for the rewrite is
to support MII and to try and resolve some performance issues found when trying
to use the sn(4) driver on the Gumstix network boards.
For reference, the SMSC LAN91C111 is a non-PCI ethernet part whose lineage
dates back to Ye Olde Days of ISA. It seems to get some use in the embedded
space these days on parts lacking on-board MACs or on-board PCI controllers,
such as the XScale PXA line of ARM CPUs.
This also includes a driver for the SMSC LAN83C183 10/100 PHY.
Man page to follow.
they can re-added. Remove CS_NAME. Don't whine when there's an
ignored checksum error: User has said STFU, so we should S the FU.
(remove mandated properties).
The CTRL() macro seems to perform character to control-character
conversion (i.e. 'A' to 0x01) to lowercase characters. This is actually
not valid. If we use lowercase characters, conversions such as
CTRL('\\') and CTRL('?') will result to invalid conversions.
Because we must still support old source code that uses CTRL() (bad!),
we make CTRL() accept both forms. When the character is a lowercase
character, we perform the old style conversion.
Approved by: philip (mentor)
- Add a mutex to the softc to protect the softc and the device hardware.
- Add a private timer to manage transmit watchdogs rather than using
if_timer/if_watchdog.
- Setup the interrupt handler after ether_ifattach().
Tested by: imp
for this driver is called 'ie'. Otherwise, ifconfig(8) doesn't recognize
any of the modules as being the ie(4) driver and will always try to kldload
the driver even when it is already present in the kernel.
Reported by: Thierry Herbelot
the dm_lock is held while the newly allocated vnode is locked. Since no
other threads may try to lock the new vnode yet, the LOR there cannot
result in the deadlock.
Shut down the witness warning to note this fact.
Tested by: pho
Prodded by: attilio
10BaseT' since it required 10BaseT to have carrier to switch to it.
This chip makes it hard to do proper auto, so we don't do it. We
can't test carrier on things easily.
Don't insist on carrier when we set the media. Don't report failures.
Remove a 1s! delay that appears to not be needed.
With these patches, and John Baldwin's patches, I'm able to pass
packets on my IBM EtherJet card again.
MAC events.
- Use bus_*() rather than bus_space_*() and remove the bus space tag and
handle from the softc.
- Retire unused macros for examining CIS tuples.
o When forced to be 10baseT, don't require that the 10baseT interface
have link to succeed. Still require it for IFM_AUTO, however, since it
appears that there's no way to tell if a specific type of interface
worked. I'm doing a web search for a datasheet now to see if there's
anything obvious.
o Minor incidental formatting nits, including collapsing code of the form
if (foo) {
bar();
} else {
if (baz)
bing();
}
into:
if (foo) {
bar();
} else if (baz) {
bing();
}
to save an indentation level.
o Remove stray reference to 3.x config file syntax.
# I believe John's patches still apply after this...
timer by keeping a once-a-second timer running that decrements a counter
similar to if_timer and reset the chip if it gets down to zero via the
decrement.
- Use IFQ_SET_MAXLEN().
The Giant lock is acquired in two places in tty_tty.c. In both places,
it is unneeded.
There is no reason to specify D_NEEDGIANT on this device node. The
device node has only been designed to return ENXIO when opened. It
doesn't make any sense to lock/unlock Giant, just to return this error.
D_TTY is also unneeded. The unimplemented functions don't need to be
patched by devfs.
We don't need to lock Giant when we want to lookup the proper TTY vnode.
s_ttyvp is already protected by proctree_lock (see devfs_vnops.c).
Approved by: philip (mentor)
systems where the CardBus bridge was connected to a APIC. The case
where the probe routine is told to not setup the IRQ was mishandled
but the error was masked in the case where the IRQ was a valid one
for the card.
MFC after: 1 week
bring it more up to date. The watchdog timer, and its
associated code, is all collapsed into the ndis_tick function
that was implemented for the NDIS-subsystem watchdog. This
implementation is similar to what numerous other drivers use
to implement the watchdog.
Reviewed by: thompsa, jhb
MFC after: 2 weeks
- Add a mutex to the softc to protect the softc and device hardware.
- Don't leak bus resources if if_alloc() fails during attach.
- Setup the interrupt handler after calling ether_ifattach().
- Use a private timer to manage the transmit watchdog.
Tested by: WATANABE Kazuhiro CQG00620 of nifty.ne.jp
- Add a mutex to protect the softc and device hardware.
- Use a callout rather than a callout_handle for the media timer.
- Use a dedicated timer for managing the tx watchdog rather than if_timer.
- Fix some resource leaks if xe_attach() fails.
- Shutdown the device before detaching the driver.
- Setup the interrupt handler after ether_ifattach().
Tested by: Ian FREISLICH ianf of clue.co.za
- Add a mutex to the softc and use it to protect the softc and device.
- Setup the interrupt handler in the common code instead of in each front
end and do it after ether_ifattach().
- Use ie_stop() and ieinit_locked() in iereset() rather than frobbing IFF_UP
and invoking ieioctl().
- Use DELAY() to implement a spin loop on a register with a timeout rather
than scheduling a timeout and then doing a tight spin on the register.
In the non-MPSAFE case this would never have worked because the spinning
code held Giant and the timeout routine would have been blocked on Giant
forever. The same approach would not worke in the MPSAFE case either for
the same reason, hence use a loop around DELAY().
- Clear IFF_DRV_(RUNNING|OACTIVE) in ie_stop() rather than in callers.
- Call ieinit_locked() directly rather than ieioctl(!) from ie_mc_reset().
- Don't leak the rx frame buffer on detach.
Tested by: Thierry Herbelot thierry of herbelot.com
a client reboot, do this check before performing the lock otherwise we
will trash the new lock along with any other old locks the client held
before rebooting.
Make sure nlm_check_idle always returns with nlm_global_lock held.
MFC after: 1 week
template, use an M_TEMP malloc(9) allocation rather than an mbuf
with mtod(9) and dtom(9). This eliminates the last use of
dtom(9) in TCP.
MFC after: 3 weeks
In the mpsafetty branch, Linux sshd seems to work properly inside a
jail. Some small modifications had to be made to the Linux compatibility
layer.
The Linux PTY routines always expect the device major number to be 136
or higher. Our code always set the major/minor number pair to 136:0.
This makes routines like ttyname() and ptsname() fail, because we'll end
up having ambiguous device numbers.
The conversion was not performed on all *stat() routines, which meant in
some cases the numbers didn't get transformed. By pushing the conversion
into linux_driver_get_major_minor(), the transformation will take place
on all calls.
Approved by: philip (mentor), rdivacky
to reduce performance degradation under heavy outgoing scan/flood.
Scalability is now much more important then several kilobytes of RAM.
Remove unneded TCP-specific expiration handeling. Before this connected
TCP sessions could never expire. Now connected TCP sessions will expire
after 24hours of inactivity.
Simplify HouseKeeping() to avoid several mul/div-s per packet. Taking into
account increased LINK_TABLE_OUT_SIZE, precision is still much more then
required.
- to increase performance do not reallocate mbuf when possible,
- to support up to 16K packets (was 2K max) use mbuf cluster of proper size.
This change depends on recent ng_nat and ip_fw_nat changes.
As discussed with Robert Watson and John Baldwin, it would be better if
PTY's are created with proper permissions, turning grantpt() into a
no-op.
Bypassing security frameworks like MAC by passing NOCRED to
VOP_SETATTR() will only make things more complex.
Approved by: philip (mentor)
pretend to be IntelliMouse (which have a few more features than generic mice)
causing the IntelliMouse probe to work and the Synaptics code never to be
called.
This should not break "real" IntelliMouse because the Synaptics detection code
is fairly specific.
PR: kern/120833
Submitted by: Eygene Ryabinkin <rea-fbsd -at- codelabs.ru>
MFC after: 1 week
This fixes packet fragmentation handeling.
Pass really available buffer size to libalias instead of MCLBYTES constant.
MCLBYTES constant were used with believe that m_megapullup() always moves
date into a fresh cluster that sometimes may become not so.
promotion within the kernel's address space. Specifically,
pmap_promote_pde() is only called when the page table page (PTP) that
is referenced by the given PDE has a full "use count", i.e., its
wire_count is 512. Although this guarantees for a user address space
that all 512 PTEs in the PTP hold valid mappings, the same is not true
of the kernel's address space. A kernel PTP always has a use count of
512 regardless of the state of the PTEs. Therefore,
pmap_promote_pde() should not assume (or assert) that the first PTE in
the PTP is valid.
(Don't ask for a vendor import of this yet, we're in the early days of svn)
Instead of using cyclic timers to call the state clean and deadman callbacks,
use a callout on FreeBSD to avoid the deadlock on FreeBSD due to trying to
send interprocessor interrupts with interrupts disabled.
Reported by: ps, jhb, peter, thompsa
In the mpsafetty branch, PTY's are allocated through the posix_openpt()
system call. The controller side of a PTY now uses its own file
descriptor type (just like sockets, vnodes, pipes, etc).
To remain compatible with existing FreeBSD and Linux C libraries, we can
still create PTY's by opening /dev/ptmx or /dev/ptyXX. These nodes
implement d_fdopen(). Devfs has been slightly changed here, to allow
finit() to be called from d_fdopen().
The routine grantpt() has also been moved into the kernel. This routine
is a little odd, because it needs to bypass standard UNIX permissions.
It needs to change the owner/group/mode of the slave device node, which
may often not be possible. The old implementation solved this by
spawning a setuid utility.
When VOP_SETATTR() is called with NOCRED, devfs_setattr() dereferences
ap->a_cred, causing a kernel panic. Change the de_{uid,gid,mode} code to
allow changes when a->a_cred is set to NOCRED.
Approved by: philip (mentor)
whatever frequency it started at instead of always picking the highest
frequency. The first version of this driver attempted to do this, but it
set the speed to the first frequency in the list rather than the value it
had saved.
MFC after: 1 week
Discussed with: rpaulo, phk
clients that have rebooted (or otherwise changed port numbers). If the
client is broken or has no active locks, it won't notify us. Fall back
on the two minute timeout logic used by the userland rpc.lockd code.
MFC after: 1 week
variations from normal 16x50 behaviour however is the the use of a normally
unused bit of IER to control RX timeout interrupts independently of the
generally used RXRDY bit. If this bit is not enabled, we only ever get
interrupts when the FIFO is full, never before. This is not very useful when
the UART is being used as a console.
In order to support this without causing potential problems on more "normal"
16x50 variants, this change introduces two hints for the uart device, ier_mask
and ier_rxbits. These can be used to override which bits get set and cleared
when we're enabling and disabling RX interrupts.
Reviewed by: marcel
some cases, add explicit inpcb locking rather than relying on the global
lock, as we dereference inp_socket, but also allowing us to drop the
global lock more quickly.
MFC after: 1 week
Even though we got rid of device major numbers some time ago, device
drivers still need to provide unique device minor numbers to make_dev().
These numbers are only used inside the kernel. They are not related to
device major and minor numbers which are visible in devfs. These are
actually based on the inode number of the device.
It would eventually be nice to remove minor numbers entirely, but we
don't want to be too agressive here.
Because the 8-15 bits of the device number field (si_drv0) are still
reserved for the major number, there is no 1:1 mapping of the device
minor and unit numbers. Because this is now unused, remove the
restrictions on these numbers.
The MAXMAJOR definition was actually used for two purposes. It was used
to convert both the userspace and kernelspace device numbers to their
major/minor pair, which is why it is now named UMINORMASK.
minor2unit() and unit2minor() have now become useless. Both minor() and
dev2unit() now serve the same purpose. We should eventually remove some
of them, at least turning them into macro's. If devfs would become
completely minor number unaware, we could consider using si_drv0 directly,
just like si_drv1 and si_drv2.
Approved by: philip (mentor)
monitoring UDP connections using sysctls. In some cases, add
previously missing locking of inpcbs, as inp_socket is followed,
which also allows us to drop global locks more quickly.
MFC after: 1 week
clocked at 10x normal speed. That is, when you set it for 9600
baud, it actually does 96000 baud. In order to make it plug and
play with other serial ports, it has to have its clock rate
reduced by a factor of 10.
Discussed with: Marcel Moolenaar
MFC after: 2 weeks
o do not put the chip into full sleep in ath_stop as it gains
nothing and causes many parts to hang in ath_detach because we
may touch the chip during vap teardown; this may also fix issues
with unloading the module
o add a note in ath_detach to explain ath_hal_detach puts the
chip in low power mode; this is useful to know as it means
unloading the module will place a pci device in the lowest
possible power state
o leave an #ifdef notyet marker for powering down the chip when
a device is marked down; we can't do that until we handle all
the ways the driver may be entered and touch the chip
o fix resume by reloading the h/w key cache as it's been clobbered
(for pci) by the socket being powered off; for station mode we
directly stop+init the chip and then simulate a beacon miss to
get the upper layers sync'd up; for other configs we must brute
force stop+start the vaps so they go through the state machine
address specified in the ioctl and for drivers that need the address
to locate a key (e.g. for delete).
Note this changes net80211-private api's but not the driver callback;
may want to change that in the future.
Reviewed by: sephe, thompsa
on amd64. Note the only difference is the iovec32 part so I use the
native structure for everything else.
Also I plan to MFC all the changes in -current to 7-stable and 6-stable
shortly since I've been running them. This does not include the cam
changes.
MFC after: 3 days
o construct a name for the com lock as done for other locks
o pass the device name to IEEE80211_LOCK_INIT so the mtx name
is constructed as foo_com_lock
o introduce *_LOCK_OBJ macro's to hide the lock contents and
minimize redundant code
Right now we perform some of the checks inside the fcntl()'s F_DUPFD
operation twice. We first validate the `fd' argument. When finished,
we validate the `arg' argument. These checks are also performed inside
do_dup().
The reason we need to do this, is because fcntl() should return different
errno's when the `arg' argument is out of bounds (EINVAL instead of
EBADF). To prevent the redundant locking of the PROC_LOCK and
FILEDESC_SLOCK, patch do_dup() to support the error semantics required
by fcntl().
Approved by: philip (mentor)
inlining resulted in constant propagation to the extend that cmpval
was known to the compiler to be URWLOCK_WRITE_OWNER (= 0x80000000U).
Unfortunately, instead of zero-extending the unsigned constant, it
was sign-extended. As such, the cmpxchg instruction was comparing
0x0000000080000000LU to 0xffffffff80000000LU and obviously didn't
perform the exchange.
But, since the value returned by cmpxhg equalled cmpval (when zero-
extended), the _thr_rtld_lock_release() function thought the exchange
did happen and as such returned as if having released the lock. This
was not the case. Subsequent locking requests found rw_state non-zero
and the thread in question entered the kernel and block indefinitely.
The work-around is to zero-extend by casting to uint64_t.
which label mbufs. This leak can occur if one policy successfully allocates
label storage and subsequent allocations from other policies fail.
Spotted by: rwatson
MFC after: 1 week
Because clists are also used outside the TTY layer, rename the file
containing the clist routines to something more accurate.
The mpsafetty TTY layer doesn't use clists. It uses its own buffers,
which also implement the unbuffered copying to userspace. We cannot
simply remove the clist routines then, because this would break various
drivers that are present within the kernel.
Approved by: philip (mentor)
calling destroy_dev() with sleepable malloc(9). The entire opetation
is being serialized through pcm cv from top down, so dropping mutex is
rather safe.
Reported by: delphij
gigabit ethernet and JMC260 fast ethernet controllers. ATM jme(4)
supports all hardware features except RSS and multiple Tx/Rx queue.
In these days most ethernet controller vendors take a ply of
concealing hardware detailes from open source developers. As
contrasted with these vendors JMicron provided all necessary
information needed to write a stable driver during driver writing
and answered many questions I had. They even helped fixing driver
bugs with protocol analyzer. Many thanks to JMicron for their
support of FreeBSD.
H/W donated by: JMicron
Anyway, in the edge case the flushing happens and the while is no more
executed, nfs_flush() (and nfs4_flush()) can return with a wrong
err value of ENOLCK.
Bring it back to 0, as we expect to have for that case.
Reported by: kris
Reviewed by: kib
similar to _WANT_UCRED and _WANT_PRISON and seems to be much nicer than
defining _KERNEL.
It is also needed for my sys/refcount.h change going in soon.
parts relied on the now removed NET_NEEDS_GIANT.
Most of I4B has been disconnected from the build
since July 2007 in HEAD/RELENG_7.
This is what was removed:
- configuration in /etc/isdn
- examples
- man pages
- kernel configuration
- sys/i4b (drivers, layers, include files)
- user space tools
- i4b support from ppp
- further documentation
Discussed with: rwatson, re
context, where the iwn mutex is being held, and
iwn_start assumes that we do not have that mutex held.
Resolve this issue with what we do for other NICs by
splitting the iwn_start procedure into two parts,
iwn_start() do the locking, and iwn_start_locked()
assumes that the mutex is being held. This resolves
panic when WITNESS is enabled.
NET_NEEDS_GIANT. netatm has been disconnected from the build for ten
months in HEAD/RELENG_7. Specifics:
- netatm include files
- netatm command line management tools
- libatm
- ATM parts in rescue and sysinstall
- sample configuration files and documents
- kernel support as a module or in NOTES
- netgraph wrapper nodes for netatm
- ctags data for netatm.
- netatm-specific device drivers.
MFC after: 3 weeks
Reviewed by: bz
Discussed with: bms, bz, harti
refcount interface.
It also introduces the correct usage of memory barriers, as sometimes
fdrop() and fhold() are used with shared locks, which don't use any
release barrier.
entirety of the specified range be mapped. Specifically, it has
returned EINVAL if the entire range is not mapped. There is not,
however, any basis for this in either SuSv2 or our own man page.
Moreover, neither Linux nor Solaris impose this requirement. This
revision removes this requirement.
Submitted by: Tijl Coosemans
PR: 118510
MFC after: 6 weeks
Leave IDTVEC(ill) where it was unless we compile with KDTRACE_HOOKS[1].
Hide the with DTRACE case case under #ifdef KDTRACE_HOOKS.
Suggested by: attilio [1]
Reviewed by: attilio
ip6_savecontrol in preparation for udp_append() to no longer
need an WLOCK as we will no longer be modifying socket options.
Requested by: rwatson
Reviewed by: gnn
MFC after: 10 days
- Use proper synhronization primitives to protect the internal fdesc node cache
used in fdescfs.
- Properly initialize and uninitalize hash.
- Remove unused functions.
Since fdescfs might recurse on itself, adding proper locking to it needed some
tricky workarounds in some parts to make it work. For instance, a descriptor in
fdescfs could refer to an open descriptor to itself, thus forcing the thread to
recurse on vnode locks. Because of this, other race conditions also had to be
fixed.
Tested by: pho
Reviewed by: kib (mentor)
Approved by: kib (mentor)
delete "snapshot" from the persistent mount options list.
This should fix problems with doing a mount -o snapshot of a file system, followed by
an NFS export of the same file system.
PR: 122833
Reported by: Leon Kos <leon.kos lecad fs uni-lj si>,
Jaakko Heinonen <jh saunalahti fi>
MFC after: 1 month
here, because we already do them further up in vfs_donmount() in vfs_mount.c
async -> MNT_ASYNC
force -> MNT_FORCE
multilabel -> MNT_MULTILABEL
noatime -> MNT_NOATIME
noclusterr -> MNT_NOCLUSTERR
noclusterw -> MNT_NOCLUSTERW
MFC after: 1 month
Of course I was silly enough to only check LINT for build failures, but not
the userspace bits. In the mpsafetty branch I didn't notice this, because
<sys/clist.h> never got included in userspace.
Approved by: philip (mentor)
Pointy hat to: me :-(
ttyfree(), freeing the tty. Since destroy_dev() may call d_purge()
cdevsw method, that is the ttypurge() for the tty, the code ends up
accessing freed tty structure.
Put the ttyrel() after destroy_dev() in the ttyfree. To prevent the
panic the rev. 1.274 provided fix for, check the TS_GONE in sysctl
handler and refuse to provide information on such tty.
Reported, debugging help and tested by: pho
DIscussed with and reviewed by: jhb
MFC after: 1 week
in the giant_trick routines after the dev_refthread increments the
si_threadcount. Remove assert, do not perform dev_relthread() for failed
dev_refthread(), and handle failure in the tty_gettp() callers (cdevsw
tty methods).
Before kern_conf.c 1.210 and 1.211, the kernel usually paniced in the
giant_trick routines dereferencing NULL cdevsw, not taking this fault.
Reported by: Vince Hoffman <jhary unsane co uk>
Debugging help and tested by: pho
Reviewed by: jhb
MFC after: 1 week
sense to loop trying to vget() the vnode again.
PR: 122977
Submitted by: Arthur Hartwig <arthur.hartwig nokia com>
Tested by: pho
Reviewed by: jhb
MFC after: 1 week
For some reason, the <sys/tty.h> header file also contains routines of the
clists and console that are used inside the TTY layer. Because the clists
are not only used by the TTY layer (example: various input drivers), we'd
better move the entire clist programming interface into <sys/clist.h>. Also
remove a declaration of nonexistent variable.
The <sys/tty.h> header also contains various definitions for the console
code (tty_cons.c). Also move these to <sys/cons.h>, because they are
not implemented inside the TTY layer.
While there, create separate malloc pools for the clist and console code.
Approved by: philip (mentor)
PIPE_MTX().
Since the pipe_present is cleared before (potentially) sleeping, the
second thread may enter the pipeclose() for the reciprocal pipe end.
The test at the end of the pipeclose() for the pipe_present == 0 would
succeed, allowing the second thread to free the pipe memory. First
threads then accesses the freed memory after being woken up.
Properly track the closing state of the pipe in the pipe_present.
Introduce the intermediate state that marks the pipe as mostly
dismantled but might be sleeping waiting for the knote list to be
cleared. Free the pipe pair memory only when both ends pass that point.
Debugging help and tested by: pho
Discussed with: jmg
MFC after: 2 weeks
monitoring the pipe. The code sets pipe_present = 0 and enters
knlist_cleardel(), where the PIPE_MTX might be dropped when knl->kl_list
cannot be cleared due to influx knotes.
If the following often encountered code fragment
if (!(kn->kn_status & KN_DETACHED))
kn->kn_fop->f_detach(kn);
knote_drop(kn, td); [1]
is executed while the knlist lock is dropped, then the knote memory is freed
by the knote_drop() without knote being removed from the knlist, since
the filt_pipedetach() contains the following:
if (kn->kn_filter == EVFILT_WRITE) {
if (!cpipe->pipe_peer->pipe_present) {
PIPE_UNLOCK(cpipe);
return;
Now, the memory may be reused in the zone, causing the access to the
freed memory. I got the panics caused by the marker knote appearing on
the knlist, that, I believe, manifestation of the issue. In the Peter
Holm test scenarious, we got unkillable processes too.
The pipe_peer that has the knote for write shall be present. Ignore the
pipe_present value for EVFILT_WRITE in filt_pipedetach().
Debugging help and tested by: pho
Discussed with: jmg
MFC after: 2 weeks
the elf files. This is complicated by the fact that the actual CTF
parsing has to be done in CDDL'd code, so the BSD licensed code only
knows about the opaque data which it must be able to free.
to spam all vaps and this won't happen if the frame comes from a station
that is associated to an ap vap (and so has an entry in the table)
Noticed by: Jared Go
Reviewed by: thompsa
Even though single linked lists allow items to be removed at constant time
(when the previous element is known), the queue macro's don't allow this.
Implement new REMOVE_NEXT() macro's. Because the REMOVE() macro's also
contain the same code, make it call REMOVE_NEXT().
The OpenBSD version of SLIST_REMOVE_NEXT() needs a reference to the list
head, even though it is unused. We'd better mimic this. The STAILQ version
also needs a reference to the list. This means the prototypes of both
macro's are the same.
Approved by: philip (mentor)
PR: kern/121117
Our current TTY layer uses a set-uid application called ptchown to
change ownership of a PTY slave device. The new TTY layer implements
this functionality through a new ioctl().
By accident I discovered Darwin's TTY layer also uses this approach.
Because of this, they also have a GID_TTY.
Approved by: philip (mentor)
routines for those modules, rather than in the raw socket code. This
each privilege check to occur in exactly once place and avoids
duplicate checks across layers.
MFC after: 3 weeks
Sponsored by: nCircle Network Security, Inc.
argument, call mac_socket_check_connect() on that address before
proceeding with the send. Otherwise policies instrumenting the
connect entry point for the purposes of checking destination
addresses will not have the opportunity to check implicit
connect requests.
MFC after: 3 weeks
Sponsored by: nCircle Network Security, Inc.
hard limit of 512 pending mutexes in the witness code and
we can easily have 1 million bucket mutexes initialized before
witness is up and running. Bumping the limit from 512 to 1M
is not really an option here...
the watchdog code. This delta also incorporates some missing PCI
IDs that got added.
PR 122928 - might be fixed by this, no verification at this point.
Change so that we save off a type field for display and
NULL inp just for good measure.
- sctp_output.c - Fix it so in sending to the loopback we use the
src address of the inbound INIT. We don't want
to do this for non local addresses since otherwise
we might be ingressed filtered so we need to use
the best src address and list the address sent to.
Obtained from: time bug - Neil Wilson
MFC after: 1 week
The patch does not change the cdevsw KBI. Management of the data is
provided by the functions
int devfs_set_cdevpriv(void *priv, cdevpriv_dtr_t dtr);
int devfs_get_cdevpriv(void **datap);
void devfs_clear_cdevpriv(void);
All of the functions are supposed to be called from the cdevsw method
contexts.
- devfs_set_cdevpriv assigns the priv as private data for the file
descriptor which is used to initiate currently performed driver
operation. dtr is the function that will be called when either the
last refernce to the file goes away, the device is destroyed or
devfs_clear_cdevpriv is called.
- devfs_get_cdevpriv is the obvious accessor.
- devfs_clear_cdevpriv allows to clear the private data for the still
open file.
Implementation keeps the driver-supplied pointers in the struct
cdev_privdata, that is referenced both from the struct file and struct
cdev, and cannot outlive any of the referee.
Man pages will be provided after the KPI stabilizes.
Reviewed by: jhb
Useful suggestions from: jeff, antoine
Debugging help and tested by: pho
MFC after: 1 month
Directory IO without a VM object will store data in 'malloced' buffers
severely limiting caching of the data. Without this change VM objects for
directories are only created on an open() of the directory.
TODO: Inline test if VM object already exists to avoid locking/function call
overhead.
Tested by: kris@
Reviewed by: jeff@
Reported by: David Filo
- Adds some prepwork (Not all yet) for vimage in particular
support the delete the sctppcbinfo.xx structs. There is
still a leak in here if it were to be called plus we stil
need the regrouping (From Me and Michael Tuexen)
- Adds support for UDP tunneling. For BSD there is no
socket yet setup so its disabled, but major argument
changes are in here to emcompass the passing of the port
number (zero when you don't have a udp tunnel, the default
for BSD). Will add some hooks in UDP here shortly (discussed
with Robert) that will allow easy tunneling. (Mainly from
Peter Lei and Michael Tuexen with some BSD work from me :-D)
- Some ease for windows, evidently leave is reserved by their
compile move label leave: -> out:
MFC after: 1 week
- Bug in CA that does not get us incrementing the PBA properly which
made us more conservative.
- comment updated in sctp_input.c
- memsets added before we log
- added arg to hmac id's
MFC after: 2 weeks
controller. L1 has several threshold/timer registers and they
seem to require careful tuned parameters to get best
performance. Datasheet for L1 is not available to open source
driver writers so age(4) focus on stability and correctness of
basic Tx/Rx operation. ATM the performance of age(4) is far from
optimal which in turn means there are mis-programmed registers or
incorrectly configured registers.
Currently age(4) supports all known hardware assistance including
- MSI support.
- TCP Segmentation Offload.
- Hardware VLAN tag insertion/stripping.
- TCP/UDP checksum offload.
- Interrupt moderation.
- Hardware statistics counter support.
- Jumbo frame support.
- WOL support.
L1 gigabit ethernet controller is mainly found on ASUS
motherboards. Note, it seems that there are other variants of
hardware as known as L2(Fast ethernet) and newer gigabit ethernet
(AR81xx) from Atheros. These are not supported by age(4) and
requires a seperate driver. Big thanks to all people who reported
feedback or tested patches.
Tested by: kevlo, bsam, Francois Ranchin < fyr AT fyrou DOT net >
Thomas Nystroem < thn AT saeab DOT se >
Roman Pogosyan < asternetadmin AT gmail DOT com >
Derek Tattersal < dlt AT mebtel DOT net >
Oliver Seitz < karlkiste AT yahoo DOT com >
which contains all the hook definitions rather than splattering
them all over the header files.
The definitions are only valid when the KDTRACE_HOOKS kernel
option is defined, so other kernel sources have no need to
see them.
and struct proc.
Add a field to struct thread to stash the error variable (or returned
status) from the last syscall so that it is available during a
DTrace probe.
data via ctor and dtor event handlers.
The size of the extra data is allocated opaquely and this file
contains a function which the dtrace module can call to check
that the kernel supports at least the amount of data that it needs.
This file is optionally compiled into nthe kernel if the KDTRACE_HOOKS
kernel option is defined.
This is BSD licensed code written specifically for FreeBSD.
It initialises using SYSINIT so that the SDT provider, probe and
argument description linkage is done whenever a module is loaded,
regardless of whether the DTrace modules are loaded or not.
This file is optionally compiled into the kernel if the KDTRACE_HOOKS
option is defined.
- KDTRACE_HOOKS for the shim layer of hooks which separate BSD licensed
code from CDDL code.
- DDB_CTF for the code that parses the CTF (compact C type format)
data for use by the DTrace Function Boundary Trace
provider and (possibly) ddb if we plan to do that.
devsoftc.async_proc != NULL because the latter might not be true
sometimes.
This way /etc/rc.suspend gets executed.
Reviwed by: njl
Submitted by: Mitsuru IWASAKI <iwasaki at jp.FreeBSD.org>
Tested also by: Andreas Wetzel <mickey242 at gmx.net>
MFC after: 1 week
superpage-aligned virtual address for the mapping. Revision 1.65
implemented an overly simplistic and generally ineffectual method for
finding a superpage-aligned virtual address. Specifically, it rounds
the virtual address corresponding to the end of the data segment up to
the next superpage-aligned virtual address. If this virtual address
is unallocated, then the device will be mapped using superpages.
Unfortunately, in modern times, where applications like the X server
dynamically load much of their code, this virtual address is already
allocated. In such cases, mmap(2) simply uses the first available
virtual address, which is not necessarily superpage aligned.
This revision changes mmap(2) to use a more robust method,
specifically, the VMFS_ALIGNED_SPACE option that is now implemented by
vm_map_find().
physical address of the device's memory. This enables
pmap_align_superpage() to propose a virtual address for mapping the
device memory that permits the use of superpage mappings.
- verified that the ifp->if_snd.ifq_mtx was initalized for
all attached interfaces. This was pointless because it was
initalized for all interfaces in if_attach() so I've removed it.
- Checked that ifp->if_snd.ifq_maxlen is initalized and set it to
ifqmaxlen if unset. This makes more sense in if_attach() so
I moved it there.
- The first call of if_slowtimo(). Delete if_check() and call
if_slowtimo() directly from the SYSINIT().
All shim hooks are defined here. This is the interface between BSD
code in FreeBSD and CDDL code from OpenSolaris.
The hooks defined here are pre-processed out from the source files
when the KDTRACE_HOOKS kernel option isn't defined.
Note that this implementation differs from the one in OpenSolaris, so
it is BSD licensed and can be included anywhere.
The kernel definitions defined here are dependent on the kernel option
KDTRACE_HOOKS so that macros added to the sources are pre-processed
out completely when the DTrace kernel hooks aren't compiled in.
the mentioned PR:
- bounds check time->month as it is used as an array index
- fix usage of time->month as array index (month is 1-12)
- fix calculation based on time->day (day is 1-31)
- fix the speedup code as it doesn't calculate correct timestamps before
the year 2000 and reduce the number of calculation in the year-by-year code
- speedup month calculations by replacing the array content with cumulative
values
- add microseconds calculation
- fix an endian problem
PR: kern/97786
Submitted by: Andriy Gapon <avg@topspin.kiev.ua>
Reviewed by: scottl (earlier version)
Approved by: emax (mentor)
MFC after: 1 week
-It has new hardware support
-It uses a new method of TX cleanup called Head Write Back
-It includes the provisional generic TCP LRO feature contributed
by Myricom and made general purpose by me. This should move into
the stack upon approval but for this driver drop its in here.
-Also bug fixes and etc...
MFC in a week if no serious issues arise.
so the index needs to be translated into an offset. While we
did add the offset (0x10), we forgot to account for the width.
Tested by: Thomas Vogt
MFC after: 3 days
- Obsolete redundant inst_name and unit members of struct sym_hcb.
- Fix three more NULL vs. 0 confusions.
- Use device_set_softc(9) to tell the bus layer that this driver
allocates a instance of struct sym_hcb itself.
(all types) used per socket buffer.
Add support to netstat to print out all of the socket buffer
statistics.
Update the netstat manual page to describe the new -x flag
which gives the extended output.
Reviewed by: rwatson, julian
lock_object, using an unified field called lo_data.
- Replace lo_type usage with the w_name usage and at init time pass the
lock "type" directly to witness_init() from the parent lock init
function. Handle delayed initialization before than
witness_initialize() is called through the witness_pendhelp structure.
- Axe out LO_ENROLLPEND as it is not really needed. The case where the
mutex init delayed wants to be destroyed can't happen because
witness_destroy() checks for witness_cold and panic in case.
- In enroll(), if we cannot allocate a new object from the freelist,
notify that to userspace through a printf().
- Modify the depart function in order to return nothing as in the current
CVS version it always returns true and adjust callers accordingly.
- Fix the witness_addgraph() argument name prototype.
- Remove unuseful code from itismychild().
This commit leads to a shrinked struct lock_object and so smaller locks,
in particular on amd64 where 2 uintptr_t (16 bytes per-primitive) are
gained.
Reviewed by: jhb
- Rename BGE_FLAG_EEPROM to BGE_FLAG_EADDR to underline it's absence means
"there's no chip containing an Ethernet address fitted to the BGE chip
so we have to get it from the firmware instead" rather than "there's no
EEPROM, but maybe NVRAM or something else".
- Don't treat BCM5906[M] generally like chips w/o BGE_FLAG_EADDR set, just
in the two cases really necessary. This gets us line with the original
patch for DragonFlyBSD.
- For sparc64 restore the intended behavior of obtaining the Ethernet
address from the firmware in case BGE_FLAG_EADDR is not set, even for
BCM5906[M].
- Fix some style(9) bugs introduced with rev. 1.208 of if_bge.c
Approved by: jhb
Additional testing by: Thomas Nystroem (BCM5906)
what Linux does. This is because robust futexes are mostly
userspace thing which we cannot alter. Two syscalls maintain
pointer to userspace list and when process exits a routine
walks this list waking up processes sleeping on futexes
from that list.
Reviewed by: kib (mentor)
MFC after: 1 month
Add support for the Apple USB Ethernet adapter.
Work around the "latch in at the first working PHY address hack",
that fails for this adapter because it returns 0xffff when reading
from lower PHY addresses. Also add more debugging printfs
Obtained from: OpenBSD
MFC After: 3 days
o correct mapping of CCK rates to PLCP; was using nonstandard Ralink
values which just happened to also be used by Zydas (so went unnoticed)
o change ieee80211_plcp2rate api to take a phy type instead of a flag
that indicates ofdm/!ofdm
o update drivers to match (restore per-driver code to map rate->PLCP)
Reviewed by: sephe, weongyo, thompsa
o add IEEE80211_C_STA capability to indicate sta mode is supported
(was previously assumed) and mark drivers as capable
o add ieee80211_opcap array to map an opmode to the equivalent capability bit
o move IEEE80211_C_OPMODE definition to where capabilities are defined so it's
clear it should be kept in sync (on future additions)
o check device capabilities in clone create before trying to create a vap;
this makes driver checks unneeded
o make error codes return on failed clone request unique
o temporarily add console printfs on clone request failures to aid in
debugging; these will move under DIAGNOSTIC or similar before release
Instead use the worldwide known MAX() function.
This should fix problems with negative values showing up on
dev.cpu.%d.temperature.
This is slightly different from the fix in the PR.
Submitted by: KOIE Hidetaka <hide at koie.org>
PR: 123542
used to request superpage alignment for the submap.
Request superpage alignment for the kmem_map.
Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently
equivalent but VMFS_ANY_SPACE is the new preferred spelling.)
Remove a stale comment from kmem_malloc().
support for VMFS_ALIGNED_SPACE, which requests the allocation of an
address range best suited to superpages. The old options TRUE and FALSE
are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no
immediate need to update all of vm_map_find(9)'s callers.
While I'm here, correct a misstatement about vm_map_find(9)'s return
values in the man page.
hand, it may cause other threads to sleep since kqueue_scan() may mark
some knotes as infux. This could lead to the deadlock.
Before kqueue_scan() sleeps, wakeup the threads that are waiting for the
influx knotes produced by this thread.
Tested by: pho (previous version)
Reviewed by: jmg
MFC after: 2 weeks
closed is the legitimate situation. For instance, filedescriptor with
registered events may be closed in parallel with closing the kqueue.
Properly handle the case instead of asserting that this cannot happen.
Reported and tested by: pho
Reviewed by: jmg
MFC after: 2 weeks
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
syncache that has an invalid SEQ instead of only doing it when we suceed
in mallocing space for the log message.
MFC after: 1 week
Reviewed by: sam, bz
for UPA it should have fulfilled its purpose by now and Fireplane-
and JBus-based machines are way to messy in organization to implement
something equivalent.
- Fix a bunch of style(9) bugs.
Handle cases where dma function pointers may be NULL, and where
the max_iosize can't be derived from a DMA data structure. For
the latter, revert to the prior behaviour of using DFLTPHYS for
the max i/o size when there is no other data.
Reviewed by: marcel
No objection by: sos
aligned on an 8 byte boundary. Prior to rev 1.36 this wasn't a problem
because mbuf clusters tend be naturally aligned. The switch to using
split buffers with the first buffer being the embedded data area of the
mbuf has broken this assumption, at least on i386, causing a complete
failure of RX functionality. Fix this for now by using a full cluster for
the first RX buffer. A more sophisticated approach could be done with the
old buffer scheme to realign the m_data pointer with m_adj(), but I'm also
not clear on performance benefits of this old scheme or the performance
implications of adding an m_adj() call to every allocation.
and its children in the form:
"parent","child"
so that head and bottom of an oriented graph can be easilly detected and
various form of diagrams can be build.
The sysctl is called debug.witness.graphs and it is read-only; in order
to get the list of relations, a simple:
#sysctl debug.witness.graphs
will do the trick.
This approach has been choosen in order to support easilly things like
the DOT format and such. Soon, an auto-explicative awk script, which
filters simple informations returned by the sysctl and converts them into
a real DOT script, will be committed to the repository between examples.
Discussed with: rwatson
counter-timer timecounter so the associated SYSCTL nodes don't clash on
machines having multiple U2P and U2S bridges as well as establishing a
clear mapping between these bridges and their timecounter device.
- Don't bother setting up a "nice" name for the IOMMU, just use the name
returned by device_get_nameunit(9), too.
- Fix some minor style(9) bugs.
- Use __FBSDID in counter.c
MFC after: 1 week
VSOCK has been added as cache target. Now they process
not only VDIR but also VSOCK.
- fixed panic issue caused by cache incorrect free process
by "umount -f"
Submitted by: Masanori OZAWA <ozawa@ongs.co.jp>
MFC after: 1 week
perform various operations on a controller. Specifically, for each mpt(4)
device, create a character device in devfs which accepts ioctl requests for
reading and writing configuration pages and performing RAID actions.
MFC after: 1 week
Reviewed by: scottl
than checking whether audit is enabled globally, instead check whether
the current thread has an audit record. This avoids entering the audit
code to collect argument data if auditing is enabled but the current
system call is not of interest to audit.
MFC after: 1 week
Sponsored by: Apple, Inc.
method:
- If the last of the child cpufreq drivers returns an error while trying to
fetch its list of supported frequencies but an earlier driver found the
requested frequency, don't return an error to the caller.
- If all of the child cpufreq drivers fail and the attempt to match the
frequency based on 'cpu_est_clockrate()' fails, return ENXIO rather than
returning success and returning a frequency of CPUFREQ_VAL_UNKNOWN.
MFC after: 3 days
PR: kern/121433
Reported by: Eugene Grosbein eugen ! kuzbass dot ru
all cards/modes.
In addition to the intr forcing added with rev. 1.205 adopt the other
places to use the same logic.
We need to exclude a few chips/revisions (5700, 5788) from using the
enhanced version and fall back to the old way as that is the only
method they support.
Tested by: phk
Suggested by: davidch, Broadcom (thanks a lot for the help!)
MFC after: 16 days
- add / remove clients from cxgb_main.c now
- change ifdef TOE_ENABLED to TCP_OFFLOAD_DISABLE
- update copyrights
- fix transmit data mismatch bug caused by not setting SB_NOCOALESCE
on tx sockbuf on passive connections
- fix receive sequence mismatch bug caused by not setting SB_NOCOALESCE
on rx sockbuf on passive connections
- don't sleep without checking SBS_CANTRCVMORE first
- various ddp ordering fixes
Supported by: Chelsio Inc.
ALT_BREAK_TO_DEBUGGER. In addition to "Enter ~ ctrl-B" (to enter the
debugger), there is now "Enter ~ ctrl-P" (force panic) and
"Enter ~ ctrl-R" (request clean reboot, ala ctrl-alt-del on syscons).
We've used variations of this at work. The force panic sequence is
best used with KDB_UNATTENDED for when you just want it to dump and
get on with it.
The reboot request is a safer way of getting into single user than
a power cycle. eg: you've hosed the ability to log in (pam, rtld, etc).
It gives init the reboot signal, which causes an orderly reboot.
I've taken my best guess at what the !x86 and non-sio code changes
should be.
This also makes sio release its spinlock before calling KDB/DDB.
which are also likely to be irrelevant for sun4v (there's no SBus on sun4v
and only some EBus devices). While at it fix some style bugs according to
style.Makefile(5) where appropriate.
MFC after: 3 days
mount fs needing Giant to be held when processing bufobjs.
Use a different subqueue for pending workitems on filesystems requiring
Giant. This simplifies the code notably and also reduces the number of
Giant acquisitions (and the whole processing cost).
Suggested by: jeff
Reviewed by: kib
Tested by: pho
- Limit grabbing the lock to SIOCSIFFLAGS.
- Move ieee80211_start_all() to SIOCSIFFLAGS.
- Remove SIOCSIFMEDIA as it is not useful.
- Limit ether_ioctl to only SIOCGIFADDR. SIOCSIFADDR and SIOCSIFMTU have no
affect as there is no input/output path in the vap parent. The vap code
will handle the reinit of the mac address changes.
- Split off ndis_ioctl_80211 as it was getting too different to wired devices.
This fixes a copyout while locked and a lock recursion.
Reviewed by: sam
to profile outoing packets for a number of mbuf chain
related parameters
e.g. number of mbufs, wasted space.
probably will do with further work later.
Reviewed by: various
10/100 operation and place the mailbox registers at a different offset.
They also do not have an EEPROM, so the MAC address must be read from
NVRAM instead.
MFC after: 1 month
PR: kern/118975
Submitted by: benjsc, Thomas Nyström thn at saeab dot se
Submitted by: sephe (original patch for DragonflyBSD)
o The function is defined unconditionally but depends on SPR_SVR,
which is defined conditionally.
o spr.h defines mfspr() and mtspr(), which is no worse to use.
while holding the socket buffer lock. These leads to an
immediate panic due to recursing the socket buffer lock. This
bug was introduced in uipc_syscalls.c:1.240, but masked by
another bug until that was fixed in uipc_syscalls.c:1.269.
Note that the current fix isn't perfect, but better than
panicking: normally we guarantee that simultaneous invocations
of a system call to write on a stream socket won't be
interlaced, which is ensured by use of the socket buffer sleep
lock. This is guaranteed for the sendfile headers, but not
trailers. In practice, this is likely not a problem, but
should be fixed.
MFC after: 3 days
Pointy hat to: andre (1.240), cperciva (1.269)
Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.
Approved by: imp
total of 6 interrupt resources for scc(4) on macio(4). This
is 3 per channel, of which the 1st of each channel is the
interrupt associated with the SCC. The other 2 are for DMA
operation.
Change scc_bfe_attach() to accept an argument that's the
number of interrupts per channel (ipc) and change each bus
front-end (bfe) to pass that argument through a wrapper
for the device_attach method.
For now, we only allocate the 1st interrupt of each channel
to perserve behaviour.
by the parent for interrupt resources. This corrects parsing of
the interrupts property.
With parsing of the property fixed, add all interrupts to the
resource list. Bump the max. number of interrupts from 5 to 6
as scc(4) attached to macio(4) has 6 interrupts (3 per channel).
Submitted by: Nathan Whitehorn <nathanw@uchicago.edu>
- detect number of LAWs in run time and initalize accordingly
- introduce decode windows target IDs used in MPC8572
- other minor updates
Obtained from: Freescale, Semihalf
doesn't require parts of the Expansion ROM to be copied around,
for obtaining the MAC address on !OFW platforms.
- Don't unnecessarily cache bus space tag and handle nor RIDs
in the softcs of the front-ends.
- Don't use function calls in initializers.
- Let the SBus front-end depend on sbus(4).
info about all currently mounted file systems. When an address is given
as an argument, prints detailed info about the given mount point.
MFC after: 2 weeks
infrastructure. Its only consumer ever was sio(4) and thus was
unused on sparc64 since removing the last traces of sio(4) in
sparc64 configuration files in favor for uart(4) over three
years ago. If similar functionality is required again it should
be brought back as an MD intr_pending() which works for all
busses by using for example interrupt controller hooks.
when creating the parent bus DMA tag. While at it correct the style
and a nearby comment.
- Take advantage of m_collapse(9) for performance reasons.
MFC after: 2 weeks
PR 122839 is fixed in both em and in igb
Second, the issue on building modules since the static kernel
build changes is now resolved. I was not able to get the fancier
directory hierarchy working, but this works, both em and igb
build as modules now.
Third, there is now support in em for two new NICs, Hartwell
(or 82574) is a low cost PCIE dual port adapter that has MSIX,
for this release it uses 3 vectors only, RX, TX, and LINK. In
the next release I will add a second TX and RX queue. Also, there
is support here for ICH10, the followon to ICH9. Both of these are
early releases, general availability will follow soon.
Fourth: On Hartwell and ICH10 we now have IEEE 1588 PTP support,
I have implemented this in a provisional way so that early adopters
may try and comment on the functionality. The IOCTL structure may
change. This feature is off by default, you need to edit the Makefile
and add the EM_TIMESYNC define to get the code.
Enjoy all!!
assumptions about the state of the cooling devices. Instead, switch them
off on init and, only after that, we are in TZ_ACTIVE_NONE.
Submited by: Andriy Gapon <avg at icyb.net.ua>
Reviewed by: njl
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.
Sponsored by: Nokia
o Add CTASSERTs ensuring that HME_NRXDESC and HME_NTXDESC are set to
legal values.
o Use appropriate maxsize, nsegments and maxsegsize parameters when
creating DMA tags and correct some comments related to them.
o The FreeBSD bus_dmamap_sync(9) supports ored together flags for quite
some time now so collapse calls accordingly.
o Add missing BUS_DMASYNC_PREREAD when syncing the control DMA maps in
hme_rint() and hme_start_locked().
o Keep state of the link state and use it to enable or disable the MAC
in hme_mii_statchg() accordingly as well as to return early from
hme_start_locked() in case the link is down.
o Introduce a sc_flags and use it to replace individual members like
sc_pci.
o Add bus_barrier(9) calls to hme_mac_bitflip(), hme_mii_readreg(),
hme_mii_writereg() and hme_stop() to ensure the respective bit
has been written before we starting polling on it and for the right
bits to change.
o Rather just returning in case hme_mac_bitflip() fails and leaving us
in an undefined state report the problem and move on; chances are
the requested configuration will become active shortly after.
o Don't call hme_start_locked() in hme_init_locked() unconditionally
but only after calls to hme_init_locked() when it's appropriate, i.e.
in hme_watchdog().
o Add a KASSERT which asserts nsegs is valid also to hme_load_txmbuf().
o In hme_load_txmbuf():
- use a maximum of the newly introduced HME_NTXSEGS segments instead
of the incorrect HME_NTXQ, which reflects the maximum TX queue
length, for loading the mbufs and put the DMA segments back onto
the stack instead of the softc as 16 should be ok there.
- use the common errno(2) return values instead of homegrown ones,
- given that hme_load_txmbuf() is allowed to fail resulting in a
packet drop for quite some time now implement the functionality of
hme_txcksum() by means of m_pullup(9), which de-obfuscates the code
and allows to always retrieve the correct length of the IP header, [1]
- also add a KASSERT which asserts nsegs is valid,
- take advantage of m_collapse(9) instead of m_defrag(9) for
performance reasons.
o Don't bother to check whether the interface is running or whether its
queue is empty before calling hme_start_locked() in hme_tint(), the
former will check these anyway.
o In hme_intr() call hme_rint() before hme_tint() as gem_tint() may
take quite a while to return when it calls hme_start_locked().
o Get rid of sc_debug and just check if_flags for IFF_DEBUG directly.
o Add a shadow sc_ifflags so we don't reset the chip when unnecessary.
o Handle IFF_ALLMULTI correctly. [2]
o Use PCIR_BAR instead of a homegrown macro.
o Replace sc_enaddr[6] with sc_enaddr[ETHER_ADDR_LEN].
o Use the maximum of 256 TX descriptors for better performance as using
all of them has no additional static cost rather than using just half
of them.
Reported by: rwatson [2]
Suggested by: yongari [1]
Reviewed by: yongari
MFC after: 1 month
in order to get rid of bus space handle and tag in struct sym_hcb.
- Remove unused members related to bus addresses in struct sym_hcb.
- sym(4) takes care of allocating an instance of struct sym_hcb
itself so don't let newbus allocate it as an unused softc also.
- Add basic MPSAFE locking. This includes changing the sym(4) CCBs
to be allocated up-front instead of on demand as needed. Besides
making these allocations more likely to succeed, this also solves
the problem of calling bus_dmamap_create(9) with the SIM mutex
held.
Reviewed by: scottl
MFC after: 1 month
- Remove superfluous returns in functions returning void.
- In sym_alloc_lcb_tags() return directly instead of jumping
to a label which just returns.
- Fix some spelling in comments.
- Remove trailing whitespace.
exit requires entering the audit code. The result is much the same,
but they mean different things.
MFC afer: 3 days
Submitted by: Diego Giagio <dgiagio at gmail dot com>
the method for the (indent == NULL) case (i.e. the kern.geom.conftxt
sysctl). The purpose is to extend the conftxt output with scheme-
specific fields which can be used by libdisk. In particular, have
the schemes dump the xs and xt fields, which contain the backward
compatible values for class type and partition type. This allows
libdisk to work with the legacy slicers as well as with gpart and
helps/promotes migration.
don't send and EOI which works like on amd64/i386 and blocks all
interrupts on the relevant interrupt controller.
o Replace the post_filter and post_inthread hooks registered when
creating the interrupt events with just ic_clear as on sparc64 we
don't need to do any disable->EOI->enable dance to unblock all but
the relevant interrupt while running the filter or handler; just
not clearing the interrupt already has the same effect.
o Merge from amd64/i386:
- Split the intr_table_lock into an sx lock used for most things,
and a spin lock to protect intrcnt_index.
- Add support for binding interrupts to CPUs, including for the
bus_bind_intr(9) interface, a assign_cpu hook and initially
shuffling interrupts arround in a round-robin fashion.
Reviewed by: jhb
MFC after: 1 month
for better structure.
Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.
In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such. All that
is theoretically a matter for userland only.
Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC. For this we have <sys/clock.h>
<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on. These know only seconds and fractions thereof.
Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.
Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c. Remove references to it
elsewhere.
Remove a lot of unnecessary <sys/clock.h> includes.
Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
communicate between two parts of this one function. This was causing
problems with shared lookups as each would trash the ino value in the
inode.
- Remove the unused i_ino field from the inode structure.
receiving or transmitting.
With IPv6 raw sockets, read lock rather than write lock the inpcb when
receiving. Unfortunately, IPv6 source address selection appears to
require a write lock on the inpcb for the time being.
MFC after: 3 months
interrupt. So, add a new function pointer, arm_post_filter, which defaults
to NULL, and which will be used as the post_filter arg for
intr_event_create(). Set it properly for the AT91, so that it boots again.
Reported by: hps
Note this includes changes to all drivers and moves some device firmware
loading to use firmware(9) and a separate module (e.g. ral). Also there
no longer are separate wlan_scan* modules; this functionality is now
bundled into the wlan module.
Supported by: Hobnob and Marvell
Reviewed by: many
Obtained from: Atheros (some bits)
A lot of testing has shown that the problem people were seeing was due
to invalid padding after the end of option list option, which was corrected
in tcp_output.c rev. 1.146.
Thanks to: anders@, s3raphi, Matt Reimer
Thanks to: Doug Hardie and Randy Rose, John Mayer, Susan Guzzardi
Special thanks to: dwhite@ and BitGravity
Discussed with: silby
MFC after: 1 day
So if we have channel 0..3 devclass_get_maxunit is 4.
It's never been a problem as devclass_get_device() has
catched a possibly bad input.
Discussed with: scottl
when reading credential data from sockets.
Teach pf to unlock the pcbinfo more quickly once it has acquired an
inpcb lock, as the inpcb lock is sufficient to protect the reference.
Assert locks, rather than read locks or write locks, on inpcbs in
subroutines--this is necessary as the inpcb may be passed down with a
write lock from the protocol, or may be passed down with a read lock
from the firewall lookup routine, and either is sufficient.
MFC after: 3 months
deserves its own internet memes). The trick is to force all available,
unused pins (that being advertised as "speaker") to behave as microphone
pins instead.
Reported / Tested by: Dmitry Kutsenko <kutsenko.truebsd.org>
MFC after: 3 days
we're certain the allocation will entierly succeed. This fixes a leak in a
fairly unlikely case.
Reported by: vijay singh <vijjus at rocketmail dot com>
MFC after: 1 week
noise from sio per unit. sio likes to probe if interrupts are configured
correctly by looking at the pending bits of the atpic in order to put a
non-fatal warning on the console. I think I'd rather read the pending
bits from the apics, but I'm not sure its worth the hassle.
move most offload functionality from NIC to TOE
factor out all socket and inpcb direct access
factor out access to locking in incpb, pcbinfo, and sockbuf
as the former is becoming deprecated and exhibits some extraneous
Giant-locking. The new callout(9) is declared MPSAFE, so it may
improve concurrency.
Tested by: matteo
Silence from: wpaul
MFC after: 1 month
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.
This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.
MFC after: 3 months
Tested by: kris (superset of committered patch)
done by understandable macros.
Fix the bug that prevented the system from responding on interfaces with
link local addresses assigned.
PR: 120958
Submitted by: James Snow <snow at teardrop.org>
MFC after: 2 weeks
have separate configuration spaces so by definition they implement
different PCI domains. Thus change psycho(4) to use PCI domains
instead of reenumerating all PCI busses so they have globally unique
bus numbers and drop support for reenumerating busses in the OFW PCI
code.
According to CVS history reenumeration was also required in order to
get some E450 to boot but given that no other open source kernel
changes the PCI bus numbers assigned by the firmware I believe the
real problem was that the old code used the bus number as the device
number for the PCI busses and unlike most of the other machines the
firmwares of the problematic ones don't use disjoint PCI bus numbers
across the host-PCI-bridges.
MFC after: 1 month
This avoids calling busdma in the request processing path which caused a traumatic performance degradation.
Allocation has be postponed to after we know how many devices we possible can have on portmulitpliers to save some space.
two ticks by counting the number of switches and the load when
sched_clock() is called.
- If the busy metric exceeds a threshold allow the idle thread to spin
waiting for new work for a brief period to avoid using IPIs. This
reduces the cost on the sender and receiver as well as reducing wakeup
latency considerably when it works.
Sponsored by: Nokia
variables and sysctl nodes.
- In reset walk the children of kern_sched_stats and reset the counters
via the oid_arg1 pointer. This allows us to add arbitrary counters to
the tree and still reset them properly.
- Define a set of switch types to be passed with flags to mi_switch().
These types are named SWT_*. These types correspond to SCHED_STATS
counters and are automatically handled in this way.
- Make the new SWT_ types more specific than the older switch stats.
There are now stats for idle switches, remote idle wakeups, remote
preemption ithreads idling, etc.
- Add switch statistics for ULE's pickcpu algorithm. These stats include
how much migration there is, how often affinity was successful, how
often threads were migrated to the local cpu on wakeup, etc.
Sponsored by: Nokia
the fact that we have a 1:1 mapping by virtue of the BATs.
Eliminate the now unused moea_rkva_alloc(), moea_pa_map() and
moea_pa_unmap() functions.
Pointed out by: grehan.
rev. 1.149 rework.
It allows to save several percents of CPU time on SMP by using UMA's
internal per-CPU allocation limits instead of own global variable
each time updated with atomics.
Tested with: Netperf cluster
deals with the usual __opendir2() calls, and the rest part with an interface
translator to expose fdopendir(3) functionality. Manual page was obtained from
kib@'s work for *at(2) system calls.
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.
Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.
The implementation of the lf_purgelocks() is submitted by dfr.
Reported by: kris
Tested by: kris, pho
Discussed with: jeff, dfr
MFC after: 2 weeks
- reorder structures fields (XX_refs) a bit to group fields modified
same time together. According to my tests it gives up to 10%
SMP performance benefit on real workload due to reduced inter-CPU
cache trashing.
- change q_flags from long to int as long is not really needed there and
it's usage with atomics is argued by some people.
- move NGF_WORKQ flag into the separate field q_flags2 as it protected by
queue mutex instead of node writer protection used by the rest of flags.
- move nd_work queue entry to ng_queue structure to which it is more
related and make it STAILQ instead of TAILQ as now it is a classic FIFO.
- remove q_node pointer from ng_queue structure as it is not really needed.
- reimplement item queue using STAILQ instead of own equal implementation.
As soon as BT subsystem has own item queues using ng_item.el_next update
it also.
- change depth field in ng_item from uintptr_t to u_int. It was made
uintptr_t to keep ABI compatibility.
Reviewed by: julian, emax
Tested with: Netperf cluster
inittodr() and resettodr(). Have nexus double as the clock device,
because it's the firmware that provides RTC services. We could
create a special (pseudo-) device for it, but that wasn't superior
enough to actually do it. Maybe later...
Requested by: phk
so credit its authors with contributions to this file. Remove
prototype copyright notice, although one might be warranted if someone
wanted to claim it badly enough.
Noticed by: Simon Burge.
routines in this file. Remove 'place holder' copyright since the
amount that's actually original is small relative to the length of the
file. The contents of this file appear to have originated at DECWRL
by way of NetBSD.
Noticed by: Simon Burge
o Implement IPI_PREEMPT,
o Set td_lock for the thread being switched out,
o For ULE & SMP, loop while td_lock points to blocked_lock for
the thread being switched in,
o Enable ULE by default in GENERIC and SKI,
clearing MSI enable bit for MSI capable hardwares resulted in Tx
problems. MSI enable bit is set only when MSI is requested from
user.
Tested by: remko
(i.e. fixed delivery) to SAPIC_DELMODE_LOWPRI. While the commit
log doesn't mention the change in behaviour, it is believed to be
deliberate. In the last 5.5 years this hasn't been a problem. Nor
do I think did it make any difference, but who knows. However, I
do know that it break SMP support for Montecito-based machines.
Switch back to fixed-CPU delivery so that SMP works again. This
gives me some time to look more closely at the problem, as well
as make sure the I-cache validation as it's implemented currently
is sufficient in SMP configurations...
mips32r2 and mips64r2 (and close relatives) processors. There
presently is support for ADMtek ADM5120, A mips 4Kc in a malta board,
the RB533 routerboard (based on IDT RC32434) and some preliminary
support for sibtye/broadcom designs. Other hardware support will be
forthcomcing.
This port boots multiuser under gxemul emulating the malta board and
also bootstraps on the hardware whose support is forthcoming...
Oleksandr Tymoshenko, Wojciech Koszek, Warner Losh, Olivier Houchard,
Randall Stewert and others that have contributed to the mips2 and/or
mips2-jnpr perforce branches. Juniper contirbuted a generic mips port
late in the life cycle of the misp2 branch. Warner Losh merged the
mips2 and Juniper code bases, and others list above have worked for
the past several months to get to multiuser.
In addition, the mips2 work owe a debt to the trail blazing efforts of
the original mips branch in perforce done by Juli Mallett.
mips32r2 and mips64r2 (and close relatives) processors. There
presently is support for ADMtek ADM5120, A mips 4Kc in a malta board,
the RB533 routerboard (based on IDT RC32434) and some preliminary
support for sibtye/broadcom designs. Other hardware support will be
forthcomcing.
This port boots multiuser under gxemul emulating the malta board and
also bootstraps on the hardware whose support is forthcoming...
Oleksandr Tymoshenko, Wojciech Koszek, Warner Losh, Olivier Houchard,
Randall Stewert and others that have contributed to the mips2 and/or
mips2-jnpr perforce branches. Juniper contirbuted a generic mips port
late in the life cycle of the misp2 branch. Warner Losh merged the
mips2 and Juniper code bases, and others list above have worked for
the past several months to get to multiuser.
In addition, the mips2 work owe a debt to the trail blazing efforts of
the original mips branch in perforce done by Juli Mallett.
mips32r2 and mips64r2 (and close relatives) processors. There
presently is support for ADMtek ADM5120, A mips 4Kc in a malta board,
the RB533 routerboard (based on IDT RC32434) and some preliminary
support for sibtye/broadcom designs. Other hardware support will be
forthcomcing.
This port boots multiuser under gxemul emulating the malta board and
also bootstraps on the hardware whose support is forthcoming...
Oleksandr Tymoshenko, Wojciech Koszek, Warner Losh, Olivier Houchard,
Randall Stewert and others that have contributed to the mips2 and/or
mips2-jnpr perforce branches. Juniper contirbuted a generic mips port
late in the life cycle of the misp2 branch. Warner Losh merged the
mips2 and Juniper code bases, and others list above have worked for
the past several months to get to multiuser.
In addition, the mips2 work owe a debt to the trail blazing efforts of
the original mips branch in perforce done by Juli Mallett.
merged juniper and mips2 code base. This represents the work of
Juniper Engineers, plus Oleksandr Tymoshenko, Wojciech Koszek, Warner
Losh, Olivier Houchard, Randall Stewert and others that have
contributed to the mips2 and/or mips2-jnpr perforce branches.
The original code from KAME did not take care of address
aliases or multiple ip addresses that have the same
prefix.
Reviewed by: rwatson, gnn, sam, kmacy, julian
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,
route add -net 192.103.54.0/24 10.9.44.1
route add -net 192.103.54.0/24 10.9.44.2
The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"
Multiple default routes can also be inserted. Here is the netstat
output:
default 10.2.5.1 UGS 0 3074 bge0 =>
default 10.2.5.2 UGS 0 0 bge0
When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,
route delete default
would fail and trigger the following error message:
"route: writing to routing socket: No such process"
"delete net default: not in table"
On the other hand,
route delete default 10.2.5.2
would be successful: "delete net default: gateway 10.2.5.2"
One does not have to specify a gateway if there is only a single
route for a particular destination.
I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options RADIX_MPATH" in the kernel configuration
to enable this feature.
Reviewed by: robert, sam, gnn, julian, kmacy
public namespace for WITNESS as they are only used internally so just
move them in the private namespace for the subsystem (with all related
supporting definitions).
Make clock_if.m and subr_rtc.c standard on i386
Add hints for "atrtc" driver, for non-PnP, non-ACPI systems.
NB: Make sure to install GENERIC.hints into /boot/device.hints in these!
Nuke MD inittodr(), resettodr() functions.
Don't attach to PHP0B00 in the "attimer" dummy driver any more, and remove
comments that no longer apply for that reason.
Add new "atrtc" device driver, which handles IBM PC AT Real Time
Clock compatible devices using subr_rtc and clock_if.
This driver is not entirely clean: other code still fondles the
hardware to get a statclock interrupt on non-ACPI timer systems.
Wrap some overly long lines.
After it has settled in -current, this will be ported to amd64.
Technically this is MFC'able, but I fail to see a good reason.
under bootverbose.
Struct ct is used for setting/reading real time clocks and I'm about
to Do Things to some of those, so a bit of preemptive debugging is
in order.
Remove a pointless __inline.
the only one difference is that lockmgr*() functions now accept
LK_NOWITNESS flag which skips ordering for the instanced calling.
- Remove an unuseful stub in witness_checkorder() (because the above check
doesn't allow ever happening) and allow witness_upgrade() to accept
non-try operation too.
- Fix speaker issues with Dell Vostro 1500 (GPIO0)
Tested by: John Wright <jwright.gmail.com>
- Apply ridiculous quirk on Asus A8X series (A8JC, A8M, A8xx, etc). These
different laptop series share simmilar pci id, hardware codecs, etc.
but works differently. A slight difference in connection type for
widget #26 is used to differentiate it.
Tested by: eric baumbach <embaumbach.gmail.com>
- Apply GPIO0 quirk for ASUS G2K laptop
- Sort ASUS ids accordingly.
Submitted by: jkim
MFC after: 3 days
TX traffic to sit in the send chain until a received packet kick
started the interrupt handler. This would cause extremely slow
performance when used with NFS over UDP.
- Removed untested polling code.
- Updated copyright year in the file header.
- Removed inadvertent ^M's created by DOS text editor.
MFC after: 2 weeks
be handled by chn_abort() and chn_start() alone. This should fix
few issues with single duplex hardware (mostly) or pre virtual record
(RELENG 6) under WINE emulation and possibly others that using
SNDCTL_DSP_SETTRIGGER.
MFC after: 3 days
The problem is that the PM support is part of a much larger WIP here, but due to popular demand I decided to get some of it imported.
Also I forgot the mention:
HW sponsored by: Vitsch Electronics / VEHosting
may be held for the duration of the various dirhash operations which
avoids many complex unlock/lock/revalidate sequences.
- Permit shared locks on lookup. To protect the ip->i_dirhash pointer we
use the vnode interlock in the shared case. Callers holding the
exclusive vnode lock can run without fear of concurrent modification to
i_dirhash.
- Hold an exclusive dirhash lock when creating the dirhash structure for
the first time or when re-creating a dirhash structure which has been
recycled.
Tested by: kris, pho
indexes so directory lookup becomes shared lock safe. In the modifying
cases an exclusive lock is held here so the commit routine may
rely on the state of i_offset.
- Similarly handle i_diroff by fetching at the start and setting only once
the operation is complete. Without the exclusive lock these are only
considered hints.
- Assert that an exclusive lock is held when we're preparing for a commit
routine.
- Honor the lock type request from lookup instead of always using exclusive
locking.
Tested by: pho, kris
I've taken a slightly different approach than is used with the ICH8 controllers
in that each controller is not identified individually (eg USB A, USB B, etc).
Instead I've given then same description to each one even though the device ID
differs. This can easily be changed if desired, or ICH8 (and any others using
that approach) can be made to work as this does.
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.
Reviewed by: jhb
Sponsored by: Nokia
2/4MB page from a PDE. Specifically, change it to use PG_PS_FRAME,
not PG_FRAME, to extract the physical address of a 2/4MB page from a
PDE.
Change the last argument passed to pmap_pv_insert_pde() from a
vm_page_t representing the first 4KB page of a 2/4MB page to the
vm_paddr_t of the 2/4MB page. This avoids an otherwise unnecessary
conversion from a vm_paddr_t to a vm_page_t in pmap_copy().
Support is working on the Silicon Image SiI3124/3132.
Support is working on some AHCI chips but far from all.
Remember this is WIP, so test reports and (constructive) suggestions are welcome!
received frame under certain conditions. wpaul said the length
0xfff0 is special meaning that indicates hardware is in the
process of copying a packet into host memory. But it seems
there are other cases that hardware is busy or stuck in bad
situation even if the received frame length is not 0xfff0.
To work-around this condition, add a check that verifys that
recevied frame length is in valid range. If received length is out
of range reinitialize hardware to recover from stuck condition.
Reported by: Mike Tancsa ( mike AT sentex DOT net )
Tested by: Mike Tancsa
Obtained from: OpenBSD
MFC after: 1 week
no longer needed, but for now we still want to be consistent with other
similar checks in the tree.
- Call ASSERT_VOP_ELOCKED() only when vget() returns 0.
Reviewed by: jeff
o create a private task queue thread that sets up root and current
directories (hooking mountroot event as needed); this is necessary
because task queue threads are parented from proc0 and it does not
have a reference to rootvnode (lost when / mounting moved to init)
o bounce image load + unload requests through the private task q so
we can load images even when the request is made from a thread that
does not have sufficient context (e.g. task q thread)
o add a check in the task q thread to fail requests before root is
mounted (just in case)
Reviewed by: jhb, mlaier, luigi (glance)
MFC after: 1 month
and linux_openat(). Instead just pass AT_FDCWD into linux_common_open()
for the linux_open() case. This prevents passing -1 as a dirfd to
openat() from succeeding which is wrong.
Suggested by: rwatson, kib
Approved by: kib (mentor)
ICMP unreach, frag needed. Up to now we only looked at the
interface MTU. Make sure to only use the minimum of the two.
In case IPSEC is compiled in, loop the mtu through ip_ipsec_mtu()
to avoid any further conditional maths.
Without this, PMTU was broken in those cases when there was a
route with a lower MTU than the MTU of the outgoing interface.
PR: kern/122338
Tested by: Mark Cammidge mark peralex.com
Reviewed by: silence on net@
MFC after: 2 weeks
so that all implemented variants have proper prototypes. The 8-bit,
16-bit and 64-bit variants are not implemented.
This really fixes the current build breakages caused by type casting
and struct aliasing rules.
commands can be written to /dev/psm%d and status can be read back from it.
- Reflect the change in psm(4) and bump version for ports.
MFC after: 1 week
Because of this we were not getting further interrupts for link state
changes, thus never went into iface UP state and thus could not transmit.
The only way out of this was an incoming packet generating an rx interrupt
and making us call into bge_link_upd.
Up to rev. 1.101, in bge_start_locked, we only returned instantly
if there was 'no link AND nothing queued for tx'. So with a packet queued
for tx, we hit the register scrubbing at the end of bge_start_locked
and were out fine. We simply lost a packet or two but got the interrupts
need to get into UP state.
With rev. 1.102 this was turned into 'if there is no link OR there is
nothing to send' (correct behaviour) and as long as there is no link
we never hit the register scrubbing and consequently never got the link UP.
What we do now is force an interrupt at the end of bge_ifmedia_upd_locked
so we will call bge_link_upd, clear the link state attention and get
further interrupts.
This helps to get the iface UP on an idle network or at least to get
it UP faster not depending on an rx intr anymore.
In case you could not get a DHCP lease or it took very long,
it was because of this.
It is unknown which chips are affected by this. ASIC rev. 0x2003 was the
most popular trouble candidate.
At least the fiber cards should have been working fine.
Which register to scrub is currently under discussion. The comitted
solution was tested and found to work for a lot of setups. It might
not help with MSI.
The reason why we end up in such a situation is entirely unknown.
PR: kern/111804
Tested by: phk, scottl at Y!
MFC after: 14 days
was changed in rev. 1.161 of tcp_var.h. All option now test for sufficient
space in TCP header before getting added.
Reported by: Mark Atkinson <atkin901-at-yahoo.com>
Tested by: Mark Atkinson <atkin901-at-yahoo.com>
MFC after: 1 week
bit in order to allow per-bit checks on the options flag, in particular
in the consumers code [1]
- Re-enable the check against TDP_DEADLKTREAT as the anti-waiters
starvation patch allows exclusive waiters to override new shared
requests.
[1] Requested by: pjd, jeff
buffer kernel descriptors, which is used to allow the buffer
currently in the BPF "store" position to be assigned to userspace
when it fills, even if userspace hasn't acknowledged the buffer
in the "hold" position yet. To implement this, notify the buffer
model when a buffer becomes full, and check that the store buffer
is writable, not just for it being full, before trying to append
new packet data. Shared memory buffers will be assigned to
userspace at most once per fill, be it in the store or in the
hold position.
This removes the restriction that at most one shared memory can
by owned by userspace, reducing the chances that userspace will
need to call select() after acknowledging one buffer in order to
wait for the next buffer when under high load. This more fully
realizes the goal of zero system calls in order to process a
high-speed packet stream from BPF.
Update bpf.4 to reflect that both buffers may be owned by userspace
at once; caution against assuming this.
state transitioning flags and of msleep(9) callings.
Use, instead, an algorithm very similar to what sx(9) and rwlock(9)
alredy do and direct accesses to the sleepqueue(9) primitive.
In order to avoid writer starvation a mechanism very similar to what
rwlock(9) uses now is implemented, with the correspective per-thread
shared lockmgrs counter.
This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and
lockmgr_args_rw(). These two are like the 2 "normal" versions, but they
both accept a rwlock as interlock. In order to realize this, the general
lockmgr manager function "__lockmgr_args()" has been implemented through
the generic lock layer. It supports all the blocking primitives, but
currently only these 2 mappers live.
The patch drops the support for WITNESS atm, but it will be probabilly
added soon. Also, there is a little race in the draining code which is
also present in the current CVS stock implementation: if some sharers,
once they wakeup, are in the runqueue they can contend the lock with
the exclusive drainer. This is hard to be fixed but the now committed
code mitigate this issue a lot better than the (past) CVS version.
In addition assertive KA_HELD and KA_UNHELD have been made mute
assertions because they are dangerous and they will be nomore supported
soon.
In order to avoid namespace pollution, stack.h is splitted into two
parts: one which includes only the "struct stack" definition (_stack.h)
and one defining the KPI. In this way, newly added _lockmgr.h can
just include _stack.h.
Kernel ABI results heavilly changed by this commit (the now committed
version of "struct lock" is a lot smaller than the previous one) and
KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction,
so manpages and __FreeBSD_version will be updated accordingly.
Tested by: kris, pho, jeff, danger
Reviewed by: jeff
Sponsored by: Google, Summer of Code program 2007
contigmalloc(9) as a last resort to steal pages from an inactive,
partially-used superpage reservation.
Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and
refactor it so that a separate subroutine is responsible for breaking
the selected reservation. This subroutine is also used by
vm_reserv_reclaim_contig().
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
Also, don't bother unmasking the interrupt in the post_filter case if
we never masked it. The INTR_FILTER case had been doing this by having
arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
both the 'post_filter' and 'post_ithread' hooks to match what the
non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
designed to do even in 5.x).
Glanced at by: piso
Reviewed by: marius
Requested by: marius [1], [5]
Tested on: amd64, i386, arm, sparc64
part of detecting the media. Explicitly ensure that we don't send it to
bpf(4) as bpf(4) isn't setup yet. This worked by accident before the bpf
interface stuff was reworked to avoid other races (bpf_peers_present, etc.)
but now it needs an explicit check to avoid a panic.
MFC after: 3 days
PR: kern/120915
UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM.
(UMA_SLAB_KMAP met its original demise in revision 1.30 of
vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame
allocators. Without it, UMA cannot correctly return pages from the
jumbo frame zones to the VM system because it resets the pages' object
field to NULL instead of the kernel object. In more detail, the jumbo
frame zones are created with the option UMA_ZONE_REFCNT. This causes
UMA to overwrite the pages' object field with the address of the slab.
However, when UMA wants to release these pages, it doesn't know how to
restore the object field, so it sets it to NULL. This change teaches
UMA how to reset the object field to the kernel object.
Crashes reported by: kris
Fix tested by: kris
Fix discussed with: jeff
MFC after: 6 weeks
spinning when readers hold a lock. This spinning is speculative because,
unlike the write case, we can not test whether the owners are running.
- Add speculative read spinning for readers who are blocked by pending
writers while a read lock is still held. This allows the thread to
spin until the write lock succeeds after which it may spin until the
writer has released the lock. This prevents excessive context switches
when readers and writers both hold the lock for brief periods.
Sponsored by: Nokia
the fdesc_allocvp(). The caller of the fdesc_allocvp() expects that the
returned vnode is not reclaimed. Do lock the vnode exclusive and drop
the lock after.
Reported by: pho
Reviewed by: jeff
fixed pri boost with '1' or any priority less than the current thread's
priority with a value greater than two. Default the boost to
PRI_MIN_TIMESHARE to prevent regular user-space threads from starving
threads in the kernel. This prevents these user-threads from also
being scheduled as if they are high fixed-priority kernel threads.
- Restore the setting of lowpri in tdq_choose(). It has to be either here
or in sched_switch(). I accidentally removed it from both places.
Tested by: kris
do this either. Simply check P_NOLOAD. It'd be nice if this was
in a thread flag so we didn't have an extra cache miss every time we
add and remove a thread from the run-queue.
- Pull all the code to deal with the trampoline stuff into one
centeralized place and use it from everywhere.
- Some minor style tidiness
Reviewed by: tinguely
platform, so use the latter in preference to the former. This makes
the fake_preload setup be the same between kb920x_machdep.c and
avila_machdep.c....
and the igb driver static in the kernel. But it also reflects
some other bug fixes in my development stream at Intel.
PR 122373 is also fixed in this code.
- Move callout thread creation from kern_intr.c to kern_timeout.c
- Call callout_tick() on every processor via hardclock_cpu() rather than
inspecting callout internal details in kern_clock.c.
- Remove callout implementation details from callout.h
- Package up all of the global variables into a per-cpu callout structure.
- Start one thread per-cpu. Threads are not strictly bound. They prefer
to execute on the native cpu but may migrate temporarily if interrupts
are starving callout processing.
- Run all callouts by default in the thread for cpu0 to maintain current
ordering and concurrency guarantees. Many consumers may not properly
handle concurrent execution.
- The new callout_reset_on() api allows specifying a particular cpu to
execute the callout on. This may migrate a callout to a new cpu.
callout_reset() schedules on the last assigned cpu while
callout_reset_curcpu() schedules on the current cpu.
Reviewed by: phk
Sponsored by: Nokia
given pmap is never NULL, and therefore pmap_pml4e() can never return
NULL. The pervasive use of these inline functions throughout the pmap
makes these simple changes worthwhile.
to trip a bug causing the latter to return a zeroed struct
aac_adapter_info. This causes two issues. One is cosmetic only --
a verbose boot prints information about the controller, and shows all
zero:
aac0: Unknown processor 0MHz, 0MB memory (0MB cache, 0MB execution),
unknown battery platform
The second problem is that the firmware version information is stored
away for aac_rev_check, for userland tools (like aaccli) to query via
the FSACTL_MINIPORT_REV_CHECK and FSACTL_LNX_MINIPORT_REV_CHECK ioctls.
When aaccli encounters this issue it prints
Command Error: <The current AFAAPI.DLL is too old to work with the
current controller software.>
Move the RequestSupplementAdapterInfo call after RequestAdapterInfo,
which seems to fix both problems.
These functions try the specified operation (rlocking and wlocking) and
true is returned if the operation completes, false otherwise.
The KPI is enriched by this commit, so __FreeBSD_version bumping and
manpage updating will happen soon.
Requested by: jeff, kris
abstraction as the RAID and CAM modules, making it nearly impossible
for enough initialization to be done in time for the RAID module to
know whether to attach. On top of this, no reset was being done on
the controller on attach, in violation of the spec. Additionally,
the port enable step was being deferred to the end of the attach
process, long after it should have been done to ensure reliable
operation from the controller. Fix all of these with a few hacks
to force the "attach" and "enable" steps of the core module early
on, and ensure that a reset and port enable also happens early on.
In the future, the driver needs to be refactored to eliminate the
core module abstraction, clean up withe reset/enable steps, and
defer event messages until all of the modules are available to
recieve them.
openat(2), faccessat(2), fchmodat(2), fchownat(2), fstatat(2),
futimesat(2), linkat(2), mkdirat(2), mkfifoat(2), mknodat(2),
readlinkat(2), renameat(2), symlinkat(2)
syscalls.
Based on the submission by rdivacky,
sponsored by Google Summer of Code 2007
Reviewed by: rwatson, rdivacky
Tested by: pho
openat() and the related syscalls.
Based on the submission by rdivacky,
sponsored by Google Summer of Code 2007
Reviewed by: rwatson, rdivacky
Tested by: pho
to protect the v_lock pointer. Removing the interlock acquisition
here allows vn_lock() to proceed without requiring the interlock
at all.
- If the lock mutated while we were sleeping on it the interlock has
been dropped. It is conceivable that the upper layer code was
relying on the interlock and LK_NOWAIT to protect the identity or
state of the vnode while acquiring the lock. In this case return
EBUSY rather than trying the new lock to prevent potential races.
Reviewed by: tegge
Keeping the lockmgr lock valid allows us to switch the v_lock pointer
in snapshot vnodes between the embedded lockmgr lock and snapdata
lock without needing the vnode interlock to protect against races
- Keep unused snapdata structures in a list.
- Add a function to lock the devvp and allocate a snapdata to it or
acquire a new one without races. The old function was safe from
creation races because we set the mount flag when creating snapshots
and thus serializing them. However, it might have been subject to
destroying races.
Reviewed by: tegge
was a kluge. This implementation matches the behaviour on powerpc
and sparc64.
While on the subject, make sure to invalidate the I-cache after
loading a kernel module.
MFC after: 2 weeks
incompatible with existing bindings.
- Try to copyout the setid in cpuset() before migrating the proc to the
setid in case the user has supplied a bad buffer.
- Rename cpuset_root() and cpuset_base() to cpuset_ref{root,base} to
be more descriptive and free cpuset_root to be used as a different
type of symbol.
- Make cpuset_root the cpuset_t set of all cpus in the system. This
should contain the same bitmask as all_cpus presently.
- Add a CPU_CMP() macro to compare two sets.
- Do not check destination hook presence, it will be done by netgraph.
- Use u_int instead of int in some places to simplify type conversions.
- Use NG_SEND_DATA_ONLY() macro instead of selfmade equivalent.
which simply want a reference should use vref(). Callers which want
to check validity need to hold a lock while performing any action
based on that validity. vn_lock() would always release the interlock
before returning making any action synchronous with the validity check
impossible.
SI_SUB_DRIVERS) to avoid loading schemes before all the GEOM
classes have been loaded and initialized. Otherwise we may
end up using mutexes that haven't been initialized (due to
g_retaste() posting an event).
vm_object_reference(). This is intended to get rid of vget()
consumers who don't wish to acquire a lock. This is functionally
the same as calling vref(). vm_object_reference_locked() already
uses vref.
Discussed with: alc
and netgraph in gernal). This also allows to add queues for an interface
that is not yet existing (you have to provide the bandwidth for the
interface, however).
PR: kern/106400, kern/117827
MFC after: 2 weeks
dropped after the call to lockmgr() so just revert this approach using
something similar to the precedent one:
BUF_LOCKWAITERS() just checks if there are waiters (not the actual number
of them) and it is based on newly introduced lockmgr_waiters() which
returns if the lockmgr has waiters or not. The name has been choosen
differently by old lockwaiters() in order to not confuse them.
KPI results enriched by this commit so __FreeBSD_version bumping and
manpage update will be happening soon.
'struct buf' also changes, so kernel ABI is disturbed.
Bug found by: jeff
Approved by: jeff, kib
allows the class to create a different GEOM for the same provider
as well as avoid that we end up with multiple GEOMs of the same
class with the same name.
For example, when a disk contains a PC98 partition table but
only MBR is supported, then the partition table can be treated
as a MBR. If support for PC98 is later loaded as a module, the
MBR scheme is pre-empted for the PC98 scheme as expected.
offload bugs by manual padding for short IP/UDP frames. Unfortunately
it seems that these workaround does not work reliably on newer PCIe
variants of RealTek chips.
To workaround the hardware bug, always pad short frames if Tx IP
checksum offload is requested. It seems that the hardware has a
bug in IP checksum offload handling. NetBSD manually pads short
frames only when the length of IP frame is less than 28 bytes but I
chose 60 bytes to safety. Also unconditionally set IP checksum
offload bit in Tx descriptor if any TCP or UDP checksum offload is
requested. This is the same way as Linux does but it's not
mentioned in data sheet.
Obtained from: NetBSD
Tested by: remko, danger
src/cddl and src/sys/cddl directories per the core@ decision following
the license review.
This change modifies the affected Makefiles to reference the sources
in their new location.
will never exit ngintr(), while there is some ready requests on the queue.
It was made years ago with hope of parallel queue processing by several
net threads. But even if we have several threads sometimes, we have no
rights to process queue in parallel as it will break original requests
serialization that is critically important for some setups.
from clearing the IFF_NEEDSGIANT flag on Giant-locked interfaces.
In particular, wpa_supplicant was doing this on USB interfaces,
causing panics when Giant-locked code was then called without Giant.
Submitted by: Alexey Popov
Reviewed by: rwatson
MFC after: 3 days
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.
1. Add support for automatic promotion of 4KB page mappings to 2MB page
mappings. Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic
promotion is disabled. Tested by: kris
2. To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.
The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.
frequency generation and what frequency the generated was anyones
guess.
In general the 32.768kHz RTC clock x-tal was the best, because that
was a regular wrist-watch Xtal, whereas the X-tal generating the
ISA bus frequency was much lower quality, often costing as much as
several cents a piece, so it made good sense to check the ISA bus
frequency against the RTC clock.
The other relevant property of those machines, is that they
typically had no more than 16MB RAM.
These days, CPU chips croak if their clocks are not tightly within
specs and all necessary frequencies are derived from the master
crystal by means if PLL's.
Considering that it takes on average 1.5 second to calibrate the
frequency of the i8254 counter, that more likely than not, we will
not actually use the result of the calibration, and as the final
clincher, we seldom use the i8254 for anything besides BEL in
syscons anyway, it has become time to drop the calibration code.
If you need to tell the system what frequency your i8254 runs,
you can do so from the loader using hw.i8254.freq or using the
sysctl kern.timecounter.tc.i8254.frequency.
The timer_spkr_*() functions take care of the enabling/disabling
of the speaker.
Test on the existence of timer_spkr_*() functions, rather than
architectures.
zero-copy to the store buffer position on the BPF descriptor,
and the 'b' buffer as the free buffer in order to fill them in
the order documented in bpf(4).
MFC after: 4 months
Suggested by: csjp
(such as 'atime' vs 'noatime'). The filesystems will always see either
'nofoo' or 'nonofoo', never plain 'foo'. As such, their list of valid
mount options should include 'nofoo' instead of 'foo'. With this fix,
you can do 'mount -u -o atime' on a FFS filesystem that isn't marked as
noatime without getting an error. You can also update a noatime FFS
filesystem mounted via mount(2) (e.g. 6.x /sbin/mount binary) to 'atime'
using nmount(2) (e.g. 7.x /sbin/mount binary).
MFC after: 1 week
Reviewed by: crodig
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.
The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()
the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.
Drop entirely timer2 on pc98, it is not used anywhere at all.
Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.
Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.
This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.
Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.
In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.
Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.
This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.
* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks
the owner of a queue to block and unblock execution of the tasks in the
queue while allowing tasks to continue to be added queue. Combining this
with taskqueue_drain() allows a queue to be safely disabled. The unblock
function may run (or schedule to run) the queue when it is called, just as
calling taskqueue_enqueue() would.
Reviewed by: jhb, sam
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.
Reviewed by: arch
There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
bpf_canfreebuf() in order to avoid potentially calling a non-inlinable
but trivial function in zero-copy buffer mode for every packet
received when we couldn't free the buffer anyway.
MFC after: 4 months
of pptpgre and ksocket nodes for all calls between two peers. This patch
modifies node's API by adding new "session_%04x" hook names support, while
keeping backward compatibility.
Together with appropriate user-level support (by latest mpd5) it gives
huge performance benefits for case of multiple active calls between
two peers because of avoiding data duplication and extra socket processing.
On my benchmarks I have got more then 10 times speedup for the 200
simultaneous PPTP calls between two peers.
In conclusion, it allows now to build effective "clients <=> PAC <=> PNS"
setups.
o sort mbuf flags together and extend values to 32 bits
o write M_COPYFLAGS in terms of M_PROTOFLAGS
o move M_COPYFLAGS and M_PROTOFLAGS up to be together with flag defs
Reviewed by: rwatson
MFC after: 3 weeks
- Take advantage of m_collapse(9).
- Sync with other NIC drivers and prepend a TX mbuf if the first attempt
to load it fails with an error other than EFBIG and stop trying instead
of freeing it and keeping on trying to enqueue more mbufs. Also ensure
the driver queue isn't empty before trying to enqueue mbufs in order to
reduce locking operations.
- In xl_ifmedia_upd() add a missing XL_UNLOCK(). [1]
- Const'ify the xl_devs array.
- Remove an outdated comment.
PR: 113406 [1]
MFC after: 1 month
- Correct the maxsize parameter when creating the mbufs busdma tag to
reflect the actual requirement of dc(4).
- Move the KASSERT in dc_newbuf() to the right spot.
- Also convert the TX side to take advantage of bus_dmamap_load_mbuf_sg(9).
- Move the comment regarding dc_start_locked() to the right spot.
MFC after: 2 weeks
- Resource allocation in aac_alloc (moved from from aac_init)
- Interrupt setup in aac_setup_intr (from aac_attach)
- Container probing in aac_get_container_info (from aac_startup and
aac_handle_aif)
- Firmware status check moved to aac_check_firmware from aac_init
In case of "new SA", we must check the hard lifetime of the old SA
to find out if it is not permanent and we can delete it.
Submitted by: sakane via gnn
MFC after: 3 days
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.
The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.
The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.
These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.
Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.
Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.
Sponsored by: Seccuris Inc.
In collaboration with: rwatson
Tested by: pwood, gallatin
MFC after: 4 months [1]
[1] Certain portions will probably not be MFCed, specifically things
that can break the monitoring ABI.
references to a vnode with VI_OWEINACT set will force the vinactive()
call. The kernel makes no guarantees about which reference was the
last to close a file or when the actual inactive processing will
happen. The previous code was designed to preserve existing semantics
in the face of shared locks, however, this was unnecessary.
Discussed with: mckusick
is requested. Handle this case specially before the while loop.
- Use the held vnode lock to check for VI_DOOMED. The vnode lock and
interlock must both be held to set VI_DOOMED so either one held, even
shared, is sufficient to check it.
No objection by: kib
are mixed. Some pure context switch microbenchmarks show up to 29%
improvement. Pipe based context switch microbenchmarks show up to 7%
improvement. Real world tests are far less impressive as they are
dominated more by actual work than switch overheads, but depending on
the machine in question, workload, kernel options, phase of moon, etc, a
few percent gain might be seen.
Summary of changes:
- don't reload MSR_[FG]SBASE registers when context switching between
non-threaded userland apps. These typically cost 120 clock cycles each
on an AMD cpu (less on Barcelona/Phenom). Intel cores are probably no
faster on this.
- The above change only helps unthreaded userland apps that tend to use
the same value for gsbase. Threaded apps will get no benefit from this.
- reorder things like accessing the pcb to be in memory order, to give
prefetching a better chance of working. Operations are now in increasing
memory address order, rather than reverse or random.
- Push some lesser used code out of the main code paths. Hopefully
allowing better code density in cache lines. This is probably futile.
- (part 2 of previous item) Reorder code so that branches have a more
realistic static branch prediction hint. Both Intel and AMD cpus
default to predicting branches to lower memory addresses as being
taken, and to higher memory addresses as not being taken. This is
overridden by the limited dynamic branch prediction subsystem. A trip
through userland might overflow this.
- Futule attempt at spreading the use of the results of previous operations
in new operations. Hopefully this will allow the cpus to execute in
parallel better.
- stop wasting 16 bytes at the top of kernel stack, below the PCB.
- Never load the userland fs/gsbase registers for kthreads, but preserve
curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!)
Microbenchmarking this code seems to be really sensitive to things like
scheduling luck, timing, cache behavior, tlb behavior, kernel options,
other random code changes, etc.
While it doesn't help heavy userland workloads much, it does help high
context switch loads a little, and should help those that involve
switching via kthreads a bit more.
A special thanks to Kris for the testing and reality checks, and Jeff for
tormenting me into doing this. :)
This is still work-in-progress.
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.
The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set. However, these changes enable
me to remove a work-around from pmap_promote_pde(), the superpage
promotion procedure.
(Note: The AMD processors that we have tested, including the latest,
the Phenom, still exhibit the historical behavior.)
Acknowledgments: After I observed the problem, Stephan (ups) was
instrumental in characterizing the exact behavior of Intel's recent
TLBs.
Tested by: Peter Holm
vnodes belonging to the mountpoint. Also, yield when in the
softdep_process_worklist() even when we are not going to sleep due to
buffer drain.
It is believed that the ULE fixed the problem [1], but the yielding
seems to be needed at least for the 4BSD case.
Discussed: on stable@, with bde
Reviewed by: tegge, jeff [1]
MFC after: 2 weeks
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.
As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.
Reported and tested by: kris (i386/pmap.c:pmap_remove() part)
Reviewed by: alc
MFC after: 1 week
multi-descriptor transmission attempt. Datasheet said nothing about
this requirements. This should fix a long-standing VLAN hardware
tagging issues with re(4).
Reported by: Giulio Ferro ( auryn AT zirakzigil DOT org )
Tested by: Giulio Ferro ( auryn AT zirakzigil DOT org )
to declaring a proper module. The module event handler is part of the
gpart core and will add the scheme to an internal list on module load
and will remove the scheme from the internal list on module unload.
This makes it possible to dynamically load and unload partitioning
schemes.
to it for tasting. This is useful when the class, through means outside
the scope of GEOM, can claim providers previously unclaimed.
The g_retaste() function posts an event which is handled by the
g_retaste_event().
Event suggested by: phk
exhaustion is encountered. There was a fix made previously for this
problem but the solution (breaking out of the receive loop) does not
seem to work. mbuf reuse strategy is already adopted by other drivers
such as if_bge. The problem was recreated and the patch is also
verified in the same test environment.
layouts different than the defaults:
o hint.npe.0.mac="A", "B", etc. specifies the window for MAC register accesses
o hint.npe.0.mii="A", "B", etc. specifies PHY registers
o hint.npe.1.phy=%d specifies the PHY to map to a port
This allows devices like NSLU to be setup w/o code changes and will
also be used for forthcoming support for more Avila boards.
Reviewed by: imp
MFC after 1 week
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
- Create a new lock in the bufobj to lock bufobj fields independently.
This leaves the vnode interlock as an 'identity' lock while the bufobj
is an io lock. The bufobj lock is ordered before the vnode interlock
and also before the mnt ilock.
- Exploit this new lock order to simplify softdep_check_suspend().
- A few sync related functions are marked with a new XXX to note that
we may not properly interlock against a non-zero bv_cnt when
attempting to sync all vnodes on a mountlist. I do not believe this
race is important. If I'm wrong this will make these locations easier
to find.
Reviewed by: kib (earlier diff)
Tested by: kris, pho (earlier diff)
code.
The bug:
There exists a race condition for timeout/untimeout(9) due to the
way that the softclock thread dequeues timeouts.
The softclock thread sets the c_func and c_arg of the callout to
NULL while holding the callout lock but not Giant. It then drops
the callout lock and acquires Giant.
It is at this point where untimeout(9) on another cpu/thread could
be called.
Since c_arg and c_func are cleared, untimeout(9) does not touch the
callout and returns as if the callout is canceled.
The softclock then tries to acquire Giant and likely blocks due to
the other cpu/thread holding it.
The other cpu/thread then likely deallocates the backing store that
c_arg points to and finishes working and hence drops Giant.
Softclock resumes and acquires giant and calls the function with
the now free'd c_arg and we have corruption/crash.
The fix:
We need to track curr_callout even for timeout(9) (LOCAL_ALLOC)
callouts. We need to free the callout after the softclock processes
it to deal with the race here.
Obtained from: Juniper Networks, iedowse
Reviewed by: jhb, iedowse
MFC After: 2 weeks.
around the check for the BV_BKGRDINPROG in the brelse() and bqrelse().
See the comment for the explanation why it is safe.
Tested by: pho
Submitted by: jeff
ffs_extread() when setting the IN_ACCESS flag by checking whether the
IN_ACCESS is already set. The possible race there is admissible.
Tested by: pho
Submitted by: jeff
to enter thread_suspend_check().
- Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the
thread_suspend_check() to ast() rather than userret().
- Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so
that we don't miss a suspend request. If this is set use the
expensive signal path.
- Set NEEDSUSPCHK when creating a new thread in thr in case the
creating thread is due to be suspended as well but has not yet.
Reviewed by: davidxu (Authored original patch)
lock in the 8259A drivers as these drivers are only used on UP systems.
This slightly reduces the penalty of an SMP kernel (such as GENERIC) on
a UP x86 machine.
resource to a CPU. The default method is to pass the request up to the
parent similar to BUS_CONFIG_INTR() so that all busses don't have to
explicitly implement bus_bind_intr. A bus_bind_intr(9) wrapper routine
similar to bus_setup/teardown_intr() is added for device drivers to use.
Unbinding an interrupt is done by binding it to NOCPU. The IRQ resource
must be allocated, but it can happen in any order with respect to
bus_setup_intr(). Currently it is only supported on amd64 and i386 via
nexus(4) methods that simply call the intr_bind() routine.
Tested by: gallatin
putting the correct size in the fib header. Presumably the older firmware
silently ignored a bad size field.
(This change tested with a 3805 controller. Passthrough devices were
created when running firmware build 12814, but not 15323 or later. With
this change they're created for both old and new firmware versions.)
Submitted by: Adaptec
FSACTL_LNX_SEND_LARGE_FIB, and FSACTL_LNX_SEND_RAW_SRB, and correct size
checks on FIBs passed in from userspace. Both changes were obtained from
Adaptec's driver build 15317. Adaptec's commandline RAID tool arcconf uses
these ioctls when creating a RAID-10 array (and probably other operations
too).
so the annoying message is not printed.
o Don't warn about FUTEX_FD not being implemented
and return ENOSYS instead of 0 (eg. success).
o Clear FUTEX_PRIVATE_FLAG as we actually implement
only private futexes so there is no reason to
return ENOSYS when app asks for a private futex.
We don't reject shared futexes because they worked
just fine with our implementation so far.
Approved by: kib (mentor)
Tested by: bsam
MFC after: 1 week
work on architectures with a write-back cache as the PIO writes end up
in the cache which the sync(BUS_DMASYNC_POSTREAD) in usb_transfer_complete
then discards; compensate in the xfer methods that do PIO by pushing the
writes out of the cache before usb_transfer_complete is called.
This fixes USB on xscale and likely other places.
Sponsored by: hobnob
Reviewed by: cognet, imp
MFC after: 1 month
obtain the reference. In particular, this fixes the panic reported in
the PR. Remove the comments stating that this needs to be done.
PR: kern/119422
MFC after: 1 week
all uses) involve a read but usbd_start_transfer only does a PREWRITE; change
this to BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE as I'm not sure if any
users do write+read.
Reviewed by: cognet, imp
MFC after: 1 month
rqindex back in struct thread.
- Compile kern_switch.c independently again and stop #include'ing it from
schedulers.
- Remove the ts_thread backpointers and convert most code to go from
struct thread to struct td_sched.
- Cleanup the ts_flags #define garbage that was causing us to sometimes
do things that expanded to td->td_sched->ts_thread->td_flags in 4BSD.
- Export the kern.sched sysctl node in sysctl.h
This one line change makes the following code found in many ethernet device drivers
(at least em, igb, ixgbe, and cxgb) gratuitous
case SIOCSIFADDR:
if (ifa->ifa_addr->sa_family == AF_INET) {
/*
* XXX
* Since resetting hardware takes a very long time
* and results in link renegotiation we only
* initialize the hardware only when it is absolutely
* required.
*/
ifp->if_flags |= IFF_UP;
if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
EM_CORE_LOCK(adapter);
em_init_locked(adapter);
EM_CORE_UNLOCK(adapter);
}
arp_ifinit(ifp, ifa);
} else
error = ether_ioctl(ifp, command, data);
break;
thread_fini(). The schedulers initialize themselves properly during
sched_fork_thread() anyhow. fini is only called when we're returning
the memory to the allocator which surely doesn't care what state the
memory is in.
is only used by 4bsd.
- Create a new runq_choose_fuzz() function rather than polluting runq_choose()
with 4BSD specific code.
- Move the fuzz sysctl into sched_4bsd.c
- Remove some dead code from kern_switch.c
maxsockets limit, not maxfiles limit. The question remains why those
limits are handled differently (with error code for maxfiles but with
sleep for maxsokets), but those would be addressed in a separate commit
if necessary.
Requested by: rwhatson, jeff
before doing the very expensive cursig() and related locking. NEEDSIGCHK
is updated whenever our signal mask change or when a signal is delivered and
should be sufficient to avoid the more expensive tests. This eliminates
another source of PROC_LOCK contention in multithreaded programs.
- In the last revision the code was changed to use maxfilesperproc rather than
the per-process file limit to restrict the size of the poll array. This
eliminates a significant source of process lock contention in multithreaded
programs and is cheaper. This had been committed with the wrong batch of
changes.
a simple (wmesg, count) tuple in a hash to keep track of how many times
we sleep at each wait message. We hash on message and not channel. No
line number information is given as typically wait messages are not used in
more than one place. Identical strings defined at different addresses will
show up with seperate counters.
- Use debug.sleepq.enable to enable, .reset to reset, and .stats dumps stats.
- Do an unsynchronized check in sleepq_switch() prior to switching before
calling sleepq_profile() which uses a global lock to synchronize the hash.
Only sleeps which actually cause a context switch are counted.
1.38 in 2001. Break out of the FOREACH_THREAD_IN_PROC loop when we've
discovered a new proc in the chain.
- Increment i and check for maxlockdepth once per matching process not
once per thread. This didn't properly terminate the loop before.
- Fix a bug which has existed potentially since rev 1.1. waitblock->lf_next
can be NULL when a thread has been woken-up but not yet scheduled. Check
for this condition rather than blindly dereferencing.
Found by: libMicro
requiring the per-process spinlock to only requiring the process lock.
- Reflect these changes in the proc.h documentation and consumers throughout
the kernel. This is a substantial reduction in locking cost for these
fields and was made possible by recent changes to threading support.
2's compliment.
The 2's compliment transform is done so a "count down" sampling interval
can be converted into a "count up" PMC value. a 2's complimented 'count down'
value is written to the PMC counter; then the read-back counter is reverted
via another 2's compliment.
PR: kern/121660
Reviewed by: jkoshy
Approved by: jkoshy
MFC after: 1 week
vm/vm_contig.c, vm/vm_page.c, and vm/vm_pageq.c. Today, vm/vm_pageq.c
has withered to the point that it contains only four short functions,
two of which are only used by vm/vm_page.c. Since I can't foresee any
reason for vm/vm_pageq.c to grow, it is time to fold the remaining
contents of vm/vm_pageq.c back into vm/vm_page.c.
Add some comments. Rename one of the functions, vm_pageq_enqueue(),
that is now static within vm/vm_page.c to vm_page_enqueue().
Eliminate PQ_MAXCOUNT as it no longer serves any purpose.
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
and collapse down to one intr_event_create() routine. The disable and
eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
routines since to trim one extra indirection.
Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on: {amd64,i386} x {FILTER, !FILTER}
the referenced data is only obtained/changed in the device open handler,
and the ioctl handler can only run after the open handler. Also fix a
few nearby style issues.
Submitted by: Matt Jacob
drivers.
In the giant_XXX wrappers for the device methods of the D_NEEDGIANT
drivers, do not dereference the cdev->si_devsw. It is racing with
the destroy_devl() clearing of the si_devsw. Instead, use the
dev_refthread() and return ENXIO for the destroyed device. [1]
The check for the D_INIT in the prep_cdevsw() was not synchronized with
the call of the fini_cdevsw() in destroy_devl(), that under rapid device
creation/destruction may result in the use of uninitialized cdevsw [2].
Change the protocol for the prep_cdevsw(), requiring it to be called
under dev_mtx, where the check for D_INIT is done.
Do not free the memory allocated for the gianttrick cdevsw while holding
the dev_mtx, put it into the free list to be freed later. Reuse the
d_gianttrick pointer to keep the size and layout of the struct cdevsw
(requested by phk). Free the memory in the dev_unlock_and_free(), and do
all the free after the dev_mtx is dropped (suggested by jhb).
Reported by: bsdimp + many [1], pho [2]
Reviewed by: phk, jhb
Tested by: pho
MFC after: 1 week
for a configurable number of seconds, spin the disk down. Spin it back
up on the next request.
Notice that the timeout is only armed by a request, so to spin down a
disk you may have to do:
atacontrol spindown ad10 5
dd if=/dev/ad10 of=/dev/null count=1
To disable spindown, set timeout to zero:
atacontrol spindown ad10 0
In order to debug any trouble caused, this code is somewhat noisy on the
console.
Enabling spindown on a disk containing / or /var/log/messages is not
going to do anything sensible.
Spinning a disk up and down all the time will wear it out, use sensibly.
Approved by: sos
10 microseconds is too short.
Always set the cpu to the highest frequency so that we get through
boot and don't handicap cpus where powerd(8) is not used.
10 microseconds is too short.
Always set the cpu to the highest frequency so that we get through
boot and don't handicap cpus where powerd(8) is not used.
monitor mode. This solves a problem that sometimes mangled frames
are passed.
Submitted by: Werner Backes <werner_at_bit-1.de>
Tested by: Werner Backes <werner_at_bit-1.de>
PR: kern/121608
Approved by: thompsa (mentor)
will have a special section, named .PPC.EMB.apuinfo, which will
tell GDB that a BookE processor is targeted and which will
result in GDB using a different register definition. In order
to support remote GDB for BookE, we need the GDB stub in the
kernel look for that section and use the BookE definitions.
uidinfo structure. This entirely removes contention observed on the
ui_mtxp mutex (as it is now gone).
- Convert the uihashtbl_mtx mutex to a rwlock, as most of the time we just
need to read-lock it.
Reviewed by: jhb, jeff, kris & others
Tested by: kris
this means that it no longer grabs the lagg rwlock. Use two port table arrays
which list the active ports for Tx and switch between them with an atomic op.
Now the lagg rwlock is only exclusively locked for management (ioctls) and
queuing of lacp control frames isnt needed.
a jail, etc. by simply calling setpriority(PRIO_PROCESS, <PID>, 0) and
checking the return value: 0 means that the process exists and -1 that
it doesn't exist.
Reviewed by: rwatson
MFC after: 1 week
Instead of checking each page for PG_UNMANAGED, perform a one-time
check whether the object is OBJT_PHYS. (PG_UNMANAGED pages only
belong to OBJT_PHYS objects.)
with style(9) recommendation that macros not contain the
terminating ';', leaving that to the invoker. All SYSINIT()
consumers must now provide a trailing ';'.
Unlike the change to remove the ';'s from callers, this change
shouldn't be MFC'd unless we don't mind requiring source changes
to third party modules that might still depend on SYSINIT()
providing its own ';'.
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.
MFC after: 1 month
Discussed with: imp, rink
Otherwise the parameter is no-op, since zone by default limits number
of descriptors to some 12K entries. Attempt to allocate more ends up
sleeping on zonelimit.
MFC after: 2 weeks
all. The reference in ia64 code is due to cutNpaste in its history
and can safely be removed.
Revired by: cognet, raj, marcel, jhb and maybe one other whom I'm forgetting
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
code wishes to bind an interrupt source to an individual CPU. The MD
code may reject the binding with an error. If an assign_cpu function
is not provided, then the kernel assumes the platform does not support
binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
event is bound to a CPU. Only shared ithreads are bound. We currently
leave private ithreads for drivers using filters + ithreads in the
INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
an interrupt source and binds its interrupt event to the specified CPU.
MI code can currently (ab)use this by doing:
intr_bind(rman_get_start(irq_res), cpu);
however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
where the implementation in the x86 nexus(4) driver would end up calling
intr_bind() internally.
Requested by: kmacy, gallatin, jeff
Tested on: {amd64, i386} x {regular, INTR_FILTER}
In that case return an continue processing the packet without IPsec.
PR: 121384
MFC after: 5 days
Reported by: Cyrus Rahman (crahman gmail.com)
Tested by: Cyrus Rahman (crahman gmail.com) [slightly older version]
"Fast IPsec: Initialized Security Association Processing." printf.
People kept asking questions about this after the IPsec shuffle.
This still is the Fast IPsec implementation so no worries that it would
be any slower now. There are no functional changes.
Discussed with: sam
MFC after: 4 days
No need to compile 'dead' code.
I am leaving it in because we will have to review the concept and
should use the common function in various places.
MFC after: 5 days
receivers from being given interrupts if any CPUs in the system were not
tagged as interrupt receivers that I introduced when switching the x86
interrupt code to track CPUs via FreeBSD CPU IDs rather than local APIC
IDs. In practice this only affects systems with Hyperthreading (though
disabling HTT in the BIOS would workaround the issue) as that is the only
case currently where one can have CPUs that aren't tagged as interrupt
receivers. On a Dell SC1425 test box with 2 x Xeon w/ HTT (so 4 logical
CPUs of which 2 were interrupt receivers) the result was that all
device interrupts were sent to CPU 0.
MFC after: 1 week
Pointy hat to: jhb
different "platforms" on x86 machines. The existing code already handles
having two platforms: ACPI and legacy. However, the existing approach was
rather hardcoded and difficult to extend. These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device). This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it
can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
resource managers so that subclassed nexus(4) drivers can invoke it from
their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
attach routine.
- The ACPI driver no longer contains an new-bus identify method. Instead
it exposes a public function (acpi_identify()) which is a probe routine
that the MD nexus(4) drivers can use to probe for ACPI. All of the
probe logic in acpi_probe() is now moved into acpi_identify() and
acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
acpi_identify() and claims the nexus0 device if the probe succeeds. It
then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
This matches the previous behavior where the old acpi_identify() would
fail to add an acpi0 device again leaving you with no devices.
Discussed with: imp
Silence on: arch@
callout_* API (e.g. callout_init_mtx(9)). This was one of the numerous
items on the http://wiki.freebsd.org/SMPTODO list.
Reviewed by: imp, obrien, jhb
MFC after: 1 week
virtual 86 mode to query the BIOS directly. This is needed for certain
HP machines whose BIOS only provide an SMAP when invoked from real mode.
On such machines the loader will be able to query the SMAP successfully
due to the recent BTX changes, but the kernel will not.
One thing I'm not sure of is if we can skip the INT 12h probe altogether
if we have the SMAP from the loader as it seems that we do the INT 12h
probe to setup enough state so we can use vm86 to call the BIOS.
MFC after: 1 week
failing to load on a kernel that has "nodevice mem" in the config. It will
now properly bring in the mem(4) module.
Submitted by: antoine
Reviewed by: imp
MFC after: 1 week
ABI and the direction flag, that is it now assumes that the direction
flag is cleared at the entry of a function and it doesn't clear once
more if needed. This new behaviour conforms to the i386/amd64 ABI.
Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.
jhb@ noted that it might break old apps if they assumed DF == 1 would be
preserved in the signal handlers, but that such apps should be rare and
that older versions of gcc would not generate such apps.
Submitted by: Aurelien Jarno <aurelien aurel32 net>
PR: 121422
Reviewed by: jhb
MFC after: 2 weeks
- Close a sleepqueue signal race by interlocking with the per-process
spinlock. This was mistakenly omitted from the thread_lock patch and
has been a race since.
MFC After: 1 week
PR: bin/117603
Reported by: Danny Braniss <danny@cs.huji.ac.il>
PhysMask fields based on the number of physical address bits supported
by the current CPU. The old code assumed 36 bits on i386 and 40 bits on
amd64. In truth, all Intel CPUs up until recently used 36 bits (a newer
Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits.
In at least one case (the new Intel CPU) having the size of the mask field
wrong resulted in writing questionable values into the MTRR registers on
the application processors (BSP as well if you modify the MTRRs via
memcontrol or running X, etc.). The result of the questionable physmask
was that all of memory was apparently treated as uncached rather than
write-back resulting in a very significant performance hit.
Fix this by constructing a run-time mask for the PhysBase and PhysMask
fields based on the number of physical address bits supported by the CPU.
All 64-bit capable CPUs provide a count of PA bits supported via the
0x80000008 extended CPUID feature, so use that if it is available. If that
feature is not available, then assume 36 PA bits.
While I'm here, expand the (now-unused) macros for the PhysBase and
PhysMask fields to the current largest possible value (52 PA bits).
MFC after: 1 week
PR: i386/120516
Reported by: Nokia
hangs (one at boot, one at shutdown) in recent machines. First, only try
to take ownership of the EHCI controller if the BIOS currently owns the
controller. On a HP DL160 G5, the machine hangs when we try to take
ownership. Second, don't bother trying to give up ownership of the
controller during shutdown. It's not strictly required and a Dell DCS S29
hangs on shutdown after the config write.
Both of these changes match the behavior of the Linux EHCI driver. I also
think both of these hangs are caused by bugs in the BIOS' SMM handler
causing it to get stuck in an infinite loop in SMM.
MFC after: 1 week
accept a mouse using the boot subclass. Instead, restore the original
hid_is_collection() test and fallback to testing the interface class,
subclass, and protocol if that fails.
MFC after: 1 week
PR: usb/118670
might be currently programmed into the registers.
Underlying firmware (U-Boot) would typically program MAC address into the
first unit only, and others are left uninitialized. It is now possible to
retrieve and program MAC address for all units properly, provided they were
passed on in the bootinfo metadata.
Reviewed by: imp, marcel
Approved by: cognet (mentor)
We're now more robust against cases of non-sorted and/or non-continuous
numbering of those entries.
Reviewed by: imp, marcel
Approved by: cognet (mentor)
This was introduced as a workaround long time ago for some Alpha firmware
(which is now gone), and actually prevented net_close() to ever be
called.
Certain firmwares (U-Boot) need local shutdown operations to be performed on a
network controller upon transaction end: such platform-specific hooks are
supposed to be called via netif_close() (from within net_close()).
This change effectively reverts the following CVS commit:
sys/boot/common/dev_net.c
revision 1.7
date: 2000/05/13 15:40:46; author: dfr; state: Exp; lines: +2 -1
Only probe network settings on the first open of the network device.
The alpha firmware takes a seriously long time to open the network device
the first time.
Also suppress excessive output while netbooting via loader, unless debugging.
While there, make sys/boot/uboot more style(9) compliant.
Reviewed by: imp
Approved by: cognet (mentor)
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
sched_sleep(). This removes extra thread_lock() acquisition and
allows the scheduler to decide what to do with the static boost.
- Change the priority arguments to cv_* to match sleepq/msleep/etc.
where 0 means no priority change. Catch -1 in cv_broadcastpri() and
convert it to 0 for now.
- Set a flag when sleeping in a way that is compatible with swapping
since direct priority comparisons are meaningless now.
- Add a sysctl to ule, kern.sched.static_boost, that defaults to on which
controls the boost behavior. Turning it off gives better performance
in some workloads but needs more investigation.
- While we're modifying sleepq, change signal and broadcast to both
return with the lock held as the lock was held on enter.
Reviewed by: jhb, peter
Before this patch callback returned result of the last finished call chain.
Now it returns last nonzero result from all call chain results in this request.
As soon as this improvement gives reliable error reporting, it is now possible
to remove dirty workaround in ng_socket, made to return ENOBUFS error statuses
of request-response operations. That workaround was responsible for returning
ENOBUFS errors to completely unrelated requests working at the same time
on socket.
set a default name. If the IRQ is added as a consequence of
configurating the IRQ without there ever being a handler
assigned to it, we will not have a name. This breaks the
fragile intrcnt/intrnames logic.
state change and reliable error recovery.
o Moved vr_softc structure and relevant macros to header file.
o Use PCIR_BAR macro to get BARs.
o Implemented suspend/resume methods.
o Implemented automatic Tx threshold configuration which will be
activated when it suffers from Tx underrun. Also Tx underrun
will try to restart only Tx path and resort to previous
full-reset(both Rx/Tx) operation if restarting Tx path have failed.
o Removed old bit-banging MII interface. Rhine provides simple and
efficient MII interface. While I'm here show PHY address and PHY
register number when its read/write operation was failed.
o Define VR_MII_TIMEOUT constant and use it in MII access routines.
o Always honor link up/down state reported by mii layers. The link
state information is used in vr_start() to determine whether we
got a valid link.
o Removed vr_setcfg() which is now handled in vr_link_task(), link
state taskqueue handler. When mii layer reports link state changes
the taskqueue handler reprograms MAC to reflect negotiated duplex
settings. Flow-control changes are not handled yet and it should
be revisited when mii layer knows the notion of flow-control.
o Added a new sysctl interface to get statistics of an instance of
the driver.(sysctl dev.vr.0.stats=1)
o Chip name was renamed to reflect the official name of the chips
described in VIA Rhine I/II/III datasheet.
REV_ID_3065_A -> REV_ID_VT6102_A
REV_ID_3065_B -> REV_ID_VT6102_B
REV_ID_3065_C -> REV_ID_VT6102_C
REV_ID_3106_J -> REV_ID_VT6105_A0
REV_ID_3106_S -> REV_ID_VT6105M_A0
The following chip revisions were added.
#define REV_ID_VT6105_B0 0x83
#define REV_ID_VT6105_LOM 0x8A
#define REV_ID_VT6107_A0 0x8C
#define REV_ID_VT6107_A1 0x8D
#define REV_ID_VT6105M_B1 0x94
o Always show chip revision number in device attach. This shall help
identifying revision specific issues.
o Check whether EEPROM reloading is complete by inspecting the state
of VR_EECSR_LOAD bit. This bit is self-cleared after the EEPROM
reloading. Previously vr(4) blindly spins for 200us which may/may
not enough to complete the EEPROM reload.
o Removed if_mtu setup. It's done in ether_ifattach().
o Use our own callout to drive watchdog timer.
o In vr_attach disable further interrupts after reset. For VT6102 or
newer hardwares, diable MII state change interrupt as well because
mii state handling is done by mii layer.
o Add more sane register initialization for VT6102 or newer chips.
- Have NIC report error instead of retrying forever.
- Let hardware detect MII coding error.
- Enable MODE10T mode.
- Enable memory-read-multiple for VT6107.
o PHY address for VT6105 or newer chips is located at fixed address 1.
For older chips the PHY address is stored in VR_PHYADDR register.
Armed with these information, there is no need to re-read
VR_PHYADDR register in miibus handler to get PHY address. This
saves one register access cycle for each MII access.
o Don't reprogram VR_PHYADDR register whenever access to a register
located at a PHY address is made. Rhine fmaily allows reprogramming
PHY address location via VR_PHYADDR register depending on
VR_MIISTAT_PHYOPT bit of VR_MIISTAT register. This used to lead
numerous phantom PHYs attached to miibus during phy probe phase and
driver used to limit allowable PHY address in mii register accessors
for certain chip revisions. This removes one more register access
cycle for each MII access.
o Correctly set VLAN header length.
o bus_dma(9) conversion.
- Limit DMA access to be in range of 32bit address space. Hardware
doesn't support DAC.
- Apply descriptor ring alignment requirements(16 bytes alignment)
- Apply Rx buffer address alignment requirements(4 bytes alignment)
- Apply Tx buffer address alignment requirements(4 bytes alignment)
for Rhine I chip. Rhine II or III has no Tx buffer address
alignment restrictions, though.
- Reduce number of allowable number of DMA segments to 8.
- Removed the atomic(9) used in descriptor ownership managements
as it's job of bus_dmamap_sync(9).
With these change vr(4) should work on all platforms.
o Rhine uses two separated 8bits command registers to control Tx/Rx
MAC. So don't access it as a single 16bit register.
o For non-strict alignment architectures vr(4) no longer require
time-consuming copy operation for received frames to align IP
header. This greatly improves Rx performance on i386/amd64
platforms. However the alignment is still necessary for
strict-alignment platforms(e.g. sparc64). The alignment is handled
in new fuction vr_fixup_rx().
o vr_rxeof() now rejects multiple-segmented(fragmented) frames as
vr(4) is not ready to handle this situation. Datasheet said nothing
about the reason when/why it happens.
o In vr_newbuf() don't set VR_RXSTAT_FIRSTFRAG/VR_RXSTAT_LASTFRAG
bits as it's set by hardware.
o Don't pass checksum offload information to upper layer for
fragmented frames. The hardware assisted checksum is valid only
when the frame is non-fragmented IP frames. Also mark the checksum
is valid for corrupted frames such that upper layers doesn't need
to recompute the checksum with software routine.
o Removed vr_rxeoc(). RxDMA doesn't seem to need to be idle before
sending VR_CMD_RX_GO command. Previously it used to stop RxDMA
first which in turn resulted in long delays in Rx error recovery.
o Rewrote Tx completion handler.
- Always check VR_TXSTAT_OWN bit in status word prior to
inspecting other status bits in the status word.
- Collision counter updates were corrected as VT3071 or newer
ones use different bits to notify collisions.
- Unlike other chip revisions, VT86C100A uses different bit to
indicate Tx underrun. For VT3071 or newer ones, check both
VR_TXSTAT_TBUFF and VR_TXSTAT_UDF bits to see whether Tx
underrun was happend. In case of Tx underrun requeue the failed
frame and restart stalled Tx SM. Also double Tx DMA threshold
size on each failure to mitigate future Tx underruns.
- Disarm watchdog timer only if we have no queued packets,
otherwise don't touch watchdog timer.
o Rewrote interrupt handler.
- status word in Tx/Rx descriptors indicates more detailed error
state required to recover from the specific error. There is no
need to rely on interrupt status word to recover from Tx/Rx
error except PCI bus error. Other event notifications like
statistics counter overflows or link state events will be
handled in main interrupt handler.
- Don't touch VR_IMR register if we are in suspend mode. Touching
the register may hang the hardware if we are in suspended state.
Previously it seems that touching VR_IMR register in interrupt
handler was to work-around panic occurred in system shutdown
stage on SMP systems. I think that work-around would hide
root-cause of the panic and I couldn't reproduce the panic
with multiple attempts on my box.
o While padding space to meet minimum frame size, zero the pad data
in order to avoid possibly leaking sensitive data.
o Rewrote vr_start_locked().
- Don't try to queue packets if number of available Tx descriptors
are short than that of required one.
o Don't reinitialize hardware whenever media configuration is
changed. Media/link state changes are reported from mii layer if
this happens and vr_link_task() will perform necessary changes.
o Don't reinitialize hardware if only PROMISC bit was changed. Just
toggle the PROMISC bit in hardware is sufficient to reflect the
request.
o Rearrganed the IFCAP_POLLING/IFCAP_HWCSUM handling in vr_ioctl().
o Generate Tx completion interrupts for every VR_TX_INTR_THRESH-th
frames. This reduces Tx completion interrupts under heavy network
loads.
o Since vr(4) doesn't request Tx interrupts for every queued frames,
reclaim any pending descriptors not handled in Tx completion
handler before actually firing up watchdog timeouts.
o Added vr_tx_stop()/vr_rx_stop() to wait for the end of active
TxDMA/RxDMA cycles(draining). These routines are used in vr_stop()
to ensure sane state of MAC before releasing allocated Tx/Rx
buffers. vr_link_task() also takes advantage of these functions to
get to idle state prior to restarting Tx/Rx.
o Added vr_tx_start()/vr_rx_start() to restart Rx/Tx. By separating
Rx operation from Tx operation vr(4) no longer need to full-reset
the hardware in case of Tx/Rx error recovery.
o Implemented WOL.
o Added VT6105M specific register definitions. VT6105M has the
following hardware capabilities.
- Tx/Rx IP/TCP/UDP checksum offload.
- VLAN hardware tag insertion/extraction. Due to lack of information
for getting extracted VLAN tag in Rx path, VLAN hardware support
was not implemented yet.
- CAM(Content Addressable Memory) based 32 entry perfect multicast/
VLAN filtering.
- 8 priority queues.
o Implemented CAM based 32 entry perfect multicast filtering for
VT6105M. If number of multicast entry is greater than 32, vr(4)
uses traditional hash based filtering.
o Reflect real Tx/Rx descriptor structure. Previously vr(4) used to
embed other driver (private) data into these structure. This type
of embedding make it hard to work on LP64 systems.
o Removed unused vr_mii_frame structure and MII bit-baning
definitions.
o Added new PCI configuration registers that controls mii operation
and mode selection.
o Reduced number of Tx/Rx descriptors to 128 from 256. From my
testing, increasing number of descriptors above than 64 didn't help
increasing performance at all. Experimentations show 128 Rx
descriptors seems to help a lot reducing Rx FIFO overruns under
high system loads. It seems the poor Tx performance of Rhine
hardwares comes from the limitation of hardware. You wouldn't
satuarte the link with vr(4) no matter how fast CPU/large number of
descriptors are used.
o Added vr_statistics structure to hold various counter values.
No regression was reported but one variant of Rhine III(VT6105M)
found on RouterBOARD 44 does not work yet(Reported by Milan Obuch).
I hope this would be resolved in near future.
I'd like to say big thanks to Mike Tancsa who kindly donated a Rhine
hardware to me. Without his enthusiastic testing and feedbacks
overhauling vr(4) never have been possible. Also thanks to Masayuki
Murayama who provided some good comments on the hardware's internals.
This driver is result of combined effort of many users who provided
many feedbacks so I'd like to say special thanks to them.
Hardware donated by: Mike Tancsa (mike AT sentex dot net)
Reviewed by: remko (initial version)
Tested by: Mike Tancsa(x86), JoaoBR ( joao AT matik DOT com DOT br )
Marcin Wisnicki ( mwisnicki+freebsd AT gmail DOT com )
Stefan Ehmann ( shoesoft AT gmx DOT net )
Florian Smeets ( flo AT kasimir DOT com )
Phil Oleson ( oz AT nixil DOT net )
Larry Baird ( lab AT gta DOT com )
Milan Obuch ( freebsd-current AT dino DOT sk )
remko (initial version)
tdq_runq_add to select the runq rather than hoping we set it properly
when we adjusted the priority. This involves the same number of
branches as before so should perform identically without the extra
fragility.
Tested by: bz
Reviewed by: bz
the cpufreq drivers to reliably use properties of PCI devices for quirks,
etc.
- For the legacy drivers, add CPU devices via an identify routine in the
CPU driver itself rather than in the legacy driver's attach routine.
- Add CPU devices after Host-PCI bridges in the acpi bus driver.
- Change the ichss(4) driver to use pci_find_bsf() to locate the ICH and
check its device ID rather than having a bogus PCI attachment that only
checked for the ID in probe and always failed. As a side effect, you
can now kldload ichss after boot.
- Fix the ichss(4) driver to use the correct device_t for the ICH (and not
for ichss0) when doing PCI config space operations to enable SpeedStep.
MFC after: 2 weeks
Reviewed by: njl, Andriy Gapon avg of icyb.net.ua
present in cpu_feature2. Also, use CPUID2_EST rather than a magic
number.
- Don't free the ACPI settings list in detach if we are going to fail the
request. Otherwise an attempt to kldunload est would free the array
but the driver would keep trying to use it.
MFC after: 1 week
routines (V86 requests from the client and hardware interrupt handlers):
- Install trampoline real mode interrupt handlers at IDT vectors 0x20-0x2f
to handle hardware interrupts by invoking the appropriate vector (0x8-0xf
or 0x70-0x78). This allows the 8259As to use vectors 0x20-0x2f in real
mode as well as protected mode will ensuring that the master 8259A
doesn't share IDT space with CPU exceptions in protected mode.
- Since we don't need to reserve space for page tables and a page directory
anymore since dropping paging support, move the TSS and protected mode
IDT up by 16k. Grow the ring 1 link stack by 16k as a result.
- Repurpose the ring 1 link stack to be used as a real mode stack when
invoking real mode routines either via a V86 request or a hardware
interrupts. This simplifies a few things as we avoid disturbing the
original user stack.
- Add some more block comments to explain how the code interacts with the
V86 structure as this wasn't immediately obvious from the prior comments
(e.g. that we explicitly copy the seg regs for real mode out of the V86
struct onto the stack to be popped off when going into real mode, etc.).
Also, document some of the stack frames we create going to real mode and
back.
- Remove all of the virtual 86 related code including having to simulate
various instructions and BIOS calls on a trap from virtual 86 mode.
- Explicitly panic if a user client attempts to perform a V86 CALL
request that isn't a far call.
- Bump version to 1.2.
Assuming this works ok this should fix some of the long standing issues
with USB booting as well as etherboot.
MFC after: 2 weeks
Submitted by: kib (some parts from his original real mode patch)
- Only calculate timeshare priorities once per tick or when a thread is woken
from sleeping.
- Keep the ts_runq pointer valid after all priority changes.
- Call tdq_runq_add() directly from sched_switch() without passing in via
tdq_add(). We don't need to adjust loads or runqs anymore.
- Sort tdq and ts_sched according to utilization to improve cache behavior.
Sponsored by: Nokia
- Normalize the preemption/ipi setting code by introducing sched_shouldpreempt()
so the logical is identical and not repeated between tdq_notify() and
sched_setpreempt().
- In tdq_notify() don't set NEEDRESCHED as we may not actually own the thread lock
this could have caused us to lose td_flags settings.
- Garbage collect some tunables that are no longer relevant.
the NOPs used are 0x01.
While we could simply pad with EOLs (which are 0x00), rather use an
explicit 0x00 constant there to not confuse poeple with 'EOL padding'.
Put in a comment saying just that.
Problem discussed on: src-committers with andre, silby, dwhite as
follow up to the rev. 1.161 commit of tcp_var.h.
MFC after: 11 days
the appropriate bit in the DEVACTB register.
This change allows the C2 state on those systems to work as expected.
Reviewed by: njl
Submitted by: Andriy Gapon <avg at icyb.net.ua>
MFC after: 1 week
Specifically, since the delete-behind heuristic is never applied to a
device-backed object, there is no point in checking whether each of the
object's pages is fictitious. (Only device-backed objects have
fictitious pages.)
know if has siblings that need an actual probe. Introduce a specail
return value called BUS_PROBE_NOOWILDCARD. If the driver returns
this, the probe is only successful for devices that have had a
specific devclass set for them.
Reviewed by: current@, jhb@, grehan@
in*() and out*() primitives should not be used, other than by
ISA drivers. In this case they were used for memory-mapped I/O
and were not even used in the spirit of the primitives.
if netgraph reported error while delivering to destination.
Reset 'next send' counter to the last requested by peer on ack timeout
to resend all subsequest packets after lost one again without additional hints.
Solaris and AIX.
fcntl(fd, F_DUP2FD, arg) and dup2(fd, arg) are functionnaly equivalent.
Document it.
Add some regression tests (identical to the dup2(2) regression tests).
PR: 120233
Submitted by: Jukka Ukkonen
Approved by: rwaston (mentor)
MFC after: 1 month
HPT drivers would sometimes test the value of a preprocessor definition but
not always make sure that the definition existed in the first place, leading
to warnings on newer compilers. I blindly assumed the same with this driver,
and it turned out to be wrong and to enable some code that doesn't work.
process lock leading to a hang. This bug was introduced in
kern_sig.c:1.351, when the call to expand_name() was moved earlier
bit this particular error case was not updated.
It so happens that U-Boot disables the D-cache when booting
an ELF image, so this change makes sure we run with the
D-cache enabled from now on. It shows too...
While here, remove the duplicate definition of the hw.model
sysctl.
variable is set. On my Mac Mini this puts the CPU in NAP mode when
the kernel is idle and, any technical or environmental reasons
aside, avoids that I have to listen to the fan all day :-)
trashing and improve performance.
Remove waitflag argument from ng_ksocket_incoming2(), it means nothing
as function call was queued by netgraph.
Remove node validity check, as node validity guarantied by netgraph.
Update comments.
value at the requested address as a symbol. For example, "ex /S
aio_swake" prints the name of the function currently registered in
via aio_swake hook.
The change as committed differs slightly from the patch in the PR,
as I force the size of the retrieved value (and the automatic
address increment) to be sizeof(void *). This seems to provide
the most useful auto-increment behavior, and defaults using the
default size (4), which is not sizeof(void *) on 64-bit platforms.
MFC after: 3 days
PR: 57976
Submitted by: Dan Strick <strick at covad.net>
for all network interfaces, not just ethernet-like ones.
Upgrade it to a louder WARNING and be explicit that the flag is obsolete.
Support for IFF_NEEDSGIANT will be removed in a few months (see arch@ for
details) and will not appear in 8.0.
Upgrade if_watchdog to a WARNING.
> 0 rather than >= 0, or we will panic when trying to deliver the signal.
MFC after: 3 days
PR: 100802
Submitted by: Valerio Daelli <valerio.daelli at gmail.com>
to flush the TLB instead of hardcoding a size of 33 pages. Apertures of
32MB and 64MB only use a 16 page GATT and an aperture of 128MB only uses
a 32 page GATT, so without this the code could walk off the end of the
pointer and cause a page fault if the next page was unmapped. Also, for
aperture sizes > 128MB, not all of the pages would be read. The Linux
driver has the same bug.
MFC after: 1 week
Tested by: Frédéric PRACA frederic.praca of freebsd-fr.org
hold the newline and nul terminator. Otherwise, there are cases where
garbage may end up in the command history due to a lack of a nul
terminator, or input may end up without room for a newline.
MFC after: 3 days
PR: 119079
Submitted by: Michael Plass <mfp49_freebsd@plass-family.net>
TCP/UDP checksum in driver for short frames. For frames that requires
hardware VLAN tag insertion, the checksum offload trick does not
work due to changes of checksum offset in mbuf after the VLAN tag.
Disable hardware checksum offload for VLAN interface to fix the bug.
Reported by: Christopher Cowart < ccowart AT rescomp DOT berkeley DOT edu >
Tested by: Christopher Cowart < ccowart AT rescomp DOT berkeley DOT edu >
MFC after: 5 days
returns EINVAL. Right now we return 0 or success for invalid commands,
which could be quite problematic in certain conditions.
MFC after: 1 week
Discussed with: rwatson
revision 1.6
date: 2004/08/21 18:50:34; author: alc; state: Exp; lines: +3 -1
Properly free the temporary sf_buf in uiomove_fromphys() if a copyin or
copyout fails.
Obtained from: DragonFlyBSD
Spotted out by: Mark Tinguely
MFC After: 3 days
restrict the utilization of direct pointers to the content of
ip packet. These modifications are functionally nop()s thus
can be merged with no side effects.
- Set M_BCAST|M_MCAST for incoming frames
- Send the frame to a local interface if the bridge returns the mbuf
Submitted by: Eugene Grosbein
Tested by: Boris Kochergin
private to the kernel, some ports define _KERNEL and include this
header. While arguably this is wrong, it's also reality. By having
the MD fields last, architectures that have CPU-specific variations
of PCPU_MD_FIELDS will at least have the MI fields at a constant
offset. Of course, having all MI fields first helps kernel debugging
as well, so this is not a change without some benefits to us.
This change does not result in an ABI breakage, because this header
is not part of the ABI. Recompilation of lsof is required though :-)
used in the kernel only (by virtue of checking for _KERNEL),
ports like lsof (part of gtop) cheat. It sets _KERNEL, but does
not set either AIM or E500. As such, PCPU_MD_FIELDS didn't get
defined and the build broke.
The catch-all is to define PCPU_MD_FIELDS with a dummy integer
when at the end of line we ended up without a definition for it.
the input field from the current cursor location, rather than the end of
the input line, as the cursor may not be at the end of the line.
Otherwise, we may overshoot, overwriting a bit of the previous line and
failing to fully overwrite the current line.
MFC after: 3 days
PR: 119079
Submitted by: Michael Plass <mfp49_freebsd@plass-family.net>
allocator for jumbo frame. Also remove unneeded jlist lock which
is no longer required to protect jumbo buffers.
With these changes jumbo frame performance of nfe(4) was slightly
increased and users should not encounter jumbo buffer allocation
failure anymore.
to avoid terrible unpredicted effects for netgraph operation of their
exhaustion while allocating control messages.
Add separate configurable 512 items limit for data items allocation
for DoS/overload protection.
Discussed with: julian
it's probed first. The PowerPC platform code deals with everything.
As such, probe devices in order of their location in the memory map.
o Refactor the ocpbus_alloc_resource for readability and make sure we
set the RID in the resource as per the new convention.
- Even for the PCI Express host controller we need to use bus 0
for configuration space accesses to devices directly on the
host controller's bus.
- Pass the maximum number of slots to pci_ocp_init() because the
caller knows how many slots the bus has. Previously a PCI or
PCI-X bus underneath a PCI Express host controller would not
be enumerated properly.
o Pull the interrupt routing logic out of pci_ocp_init() and into
its own function. The logic is not quite right and is expected
to be a bit more complex.
o Fix/add support for PCI domains. The PCI domain is the unit
number as per other PCI host controller drivers. As such, we
can use logical bus numbers again and don't have to guarantee
globally unique bus numbers. Remove pci_ocp_busnr. Return the
highest bus number ito the caller of pci_ocp_init() now that
we don't have a global variable anymore.
o BAR programming fixes:
- Non-type0 headers have at most 1 BAR, not 0.
- First write ~0 to the BAR in question and then read back its
size.
Obtained from: Juniper Networks (mostly)
It is normally initialized by ffs_statfs() after ffs_mount finished.
The extattr autostart code calls the ufs_lookup(), that uses value above
to iterate over the directory blocks, see bmask initialization in the
ufs_lookup() and ufsdirhash. Having the filesystem with root directory
spanning more then one block would result in reading a random kernel
memory.
PR: kern/120781
Test case provided by: rwatson
MFC after: 1 week
expressions on i386 are evaluated in the range of the long double type,
so this is wrong in a different but hopefully less worse way than
before. Since expressions are evaluated in long double registers,
there is no runtime cost to using long double instead of double to
declare intermediate values (except in cases where this avoids compiler
bugs), and by careful use of float_t or double_t it is possible to
avoid some of the compiler bugs in this area, provided these types are
declared as long double.
I was going to change float.h to be less broken and more usable in
combination with the change here (in particular, it is more necessary
to know the effective number of bits in a double_t when double_t !=
double, since DBL_MANT_DIG no longer logically gives this, and
LDBL_MANT_DIG doesn't give it either with FreeBSD-i386's default
rounding precision. However, this was too hard for now. In particular,
LDBL_MANT_DIG is used a lot in libm, so it cannot be changed. One
thing that is completely broken now is LDBL_MAX. This may have sort
of worked when it was changed from DBL_MAX in 2002 (adding 0 to it at
runtime gave +Inf, but you could at least compare with it), but starting
with gcc-3.3.1 in 2003, it is always +Inf due to evaluating it at
compile time in the default rounding precision.
expressions on i386 are evaluated in the range of the long double type,
so this is wrong in a different but hopefully less worse way than
before. Since expressions are evaluated in long double registers,
there is no runtime cost to using long double instead of double to
declare intermediate values (except in cases where this avoids compiler
bugs), and by careful use of float_t or double_t it is possible to
avoid some of the compiler bugs in this area, provided these types are
declared as long double.
I was going to change float.h to be less broken and more usable in
combination with the change here (in particular, it is more necessary
to know the effective number of bits in a double_t when double_t !=
double, since DBL_MANT_DIG no longer logically gives this, and
LDBL_MANT_DIG doesn't give it either with FreeBSD-i386's default
rounding precision. However, this was too hard for now. In particular,
LDBL_MANT_DIG is used a lot in libm, so it cannot be changed. One
thing that is completely broken now is LDBL_MAX. This may have sort
of worked when it was changed from DBL_MAX in 2002 (adding 0 to it at
runtime gave +Inf, but you could at least compare with it), but starting
with gcc-3.3.1 in 2003, it is always +Inf due to evaluating it at
compile time in the default rounding precision.
mount options that mount_nfs could pass down, if it passed
down string mount options. Right now, mount_nfs jut passes
down a single mount option named "nfs_args" with a fully
initialized 'struct nfs_args'.
In future commits, we will add code to the kernel for parsing stringified
NFS mount options, so that we can convert mount_nfs to pass string options
from userspace to kernel, instead of an initialized struct nfs_args.
the same way that it is default initialized in revision 1.77 of mount_nfs.c.
Right now, this is a no-op, because currently we initialize
struct nfs_args in mount_nfs in userspace, and pass it
down into the kernel via nmount(), so we overwrite whatever we initialize
here with the value passed in from userspace.
However, this lays the groundwork for moving away from passing
struct nfs_args from userspace to kernel via nmount(), so that we
can instead pass string mount options via nmount() which can be parsed in
the kernel. This will make it easier to add new NFS mount options.
passing it to cpuset_which(). Pass in 'set' instead. This argument
is not used but for convenience cpuset_which() nulls all incoming
parameters.
Submitted by: davidxu
Patch in the PR was modified to check active jumbo buffers in use
and other possible jumbo buffer leak.
Jumbo buffer usage in lge(4) still wouldn't be reliable due to lack
of driver lock in local jumbo buffer allocator. Either introduce
a new lock to protect jumbo buffer or switch to UMA backed page
allocator for jumbo frame is required.
PR: kern/78072
mask none of the upper bits are set.
- Be more careful about enforcing the boundaries of masks and child sets.
- Introduce a few more CPU_* macros for implementing these tests.
- Change the cpusetsize argument to be bytes rather than bits to match
other apis.
Sponsored by: Nokia
IPPORT_EPHEMERALFIRST and IPPORT_EPHEMERALLAST with values
10000 and 65535 respectively.
The rationale behind is that it makes the attacker's life more
difficult if he/she wants to guess the ephemeral port range and
also lowers the probability of a port colision (described in
draft-ietf-tsvwg-port-randomization-01.txt).
While there, remove code duplication in in_pcbbind_setup().
Submitted by: Fernando Gont <fernando at gont.com.ar>
Approved by: njl (mentor)
Reviewed by: silby, bms
Discussed on: freebsd-net
mappings. Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic
promotion is disabled. (Expect this to change.)
Reviewed by: ups
Tested by: kris, Peter Holm
the specific semantics of ockmgr aren't required: update UFS1 extended
attributes to protect its data structures using an sx lock.
While here, update comments on lock granularity.
MFC after: 2 weeks
The kernel config file is KERNCONF=MPC85XX, so the usual procedure applies:
1. make buildworld TARGET_ARCH=powerpc
2. make buildkernel TARGET_ARCH=powerpc TARGET_CPUTYPE=e500 KERNCONF=MPC85XX
This default config uses kernel-level FPU emulation. For the soft-float world
approach:
1. make buildworld TARGET_ARCH=powerpc TARGET_CPUTYPE=e500
2. disable FPU_EMU option in sys/powerpc/conf/MPC85XX
3. make buildkernel TARGET_ARCH=powerpc TARGET_CPUTYPE=e500 KERNCONF=MPC85XX
Approved by: cognet (mentor)
MFp4: e500
TSEC is the MAC engine offering 10, 100 or 1000 Mbps speed and is found on
different Freescale parts (MPC83xx, MPC85xx). Depending on the silicon version
there are up to four TSEC units integrated on the chip.
This driver also works with the enhanced version of the controller (eTSEC),
which is backwards compatible, but doesn't take advantage of its additional
features (various off-loading mechanisms) at the moment.
Approved by: cognet (mentor)
Obtained from: Semihalf
MFp4: e500
The QUICC engine is found on various Freescale parts including MPC85xx, and
provides multiple generic time-division serial channel resources, which are in
turn muxed/demuxed by the Serial Communications Controller (SCC).
Along with core QUICC/SCC functionality a uart(4)-compliant device driver is
provided which allows for serial ports over QUICC/SCC.
Approved by: cognet (mentor)
Obtained from: Juniper
MFp4: e500
The PQ3 is a high performance integrated communications processing system
based on the e500 core, which is an embedded RISC processor that implements
the 32-bit Book E definition of the PowerPC architecture. For details refer
to: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC8555E
This port was tested and successfully run on the following members of the PQ3
family: MPC8533, MPC8541, MPC8548, MPC8555.
The following major integrated peripherals are supported:
* On-chip peripherals bus
* OpenPIC interrupt controller
* UART
* Ethernet (TSEC)
* Host/PCI bridge
* QUICC engine (SCC functionality)
This commit brings the main functionality and will be followed by individual
drivers that are logically separate from this base.
Approved by: cognet (mentor)
Obtained from: Juniper, Semihalf
MFp4: e500
native extended attributes. This didn't interfere with the operation of
UFS2 extended attributes, but the code shouldn't be running for UFS2.
MFC after: 2 weeks
a queue entry field, just copy out the unsigned int that is the trigger
message. In practice, auditd always requested sizeof(unsigned int), so
the extra bytes were ignored, but copying them out was not the intent.
MFC after: 1 month
soft lifetime [1] introduced in rev. 1.21 of key.c.
Along with that, fix a related problem in key_debug
printing the correct data.
While there replace a printf by panic in a sanity check.
PR: 120751
Submitted by: Kazuaki ODA (kazuaki aliceblue.jp) [1]
MFC after: 5 days
Rework of this area is a pre-requirement for importing e500 support (and
other PowerPC core variations in the future). Mainly the following
headers are refactored so that we can cover for low-level differences between
various machines within PowerPC architecture:
<machine/pcpu.h>
<machine/pcb.h>
<machine/kdb.h>
<machine/hid.h>
<machine/frame.h>
Areas which use the above are adjusted and cleaned up.
Credits for this rework go to marcel@
Approved by: cognet (mentor)
MFp4: e500
- Move the assigment of the socket down before we first need it.
No need to do it at the beginning and then drop out the function
by one of the returns before using it 100 lines further down.
- Use t_maxopd which was assigned the "tcp_mssdflt" for the corrrect
AF already instead of another #ifdef ? : #endif block doing the same.
- Remove an unneeded (duplicate) assignment of mss to t_maxseg just before
we possibly change mss and re-do the assignment without using t_maxseg
in between.
Reviewed by: silby
No objections: net@ (silence)
MFC after: 5 days
- When searching for affinity search backwards in the tree from the last
cpu we ran on while the thread still has affinity for the group. This
can take advantage of knowledge of shared L2 or L3 caches among a
group of cores.
- When searching for the least loaded cpu find the least loaded cpu via
the least loaded path through the tree. This load balances system bus
links, individual cache levels, and hyper-threaded/SMT cores.
- Make the periodic balancer recursively balance the highest and lowest
loaded cpu across each link.
Add support for cpusets:
- Convert the cpuset to a simple native cpumask_t while the kernel still
only supports cpumask.
- Pass the derived cpumask down through the cpu_search functions to
restrict the result cpus.
- Make the various steal functions resilient to failure since all threads
can not run on all cpus any longer.
General improvements:
- Precisely track the lowest priority thread on every runq with
tdq_setlowpri(). Before it was more advisory but this ended up having
pathological behaviors.
- Remove many #ifdef SMP conditions to simplify the code.
- Get rid of the old cumbersome tdq_group. This is more naturally
expressed via the cpu_group tree.
Sponsored by: Nokia
Testing by: kris
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.
Sponsored by: Nokia
and assignment.
- Add a reference to a struct cpuset in each thread that is inherited from
the thread that created it.
- Release the reference when the thread is destroyed.
- Add prototypes for syscalls and macros for manipulating cpusets in
sys/cpuset.h
- Add syscalls to create, get, and set new numbered cpusets:
cpuset(), cpuset_{get,set}id()
- Add syscalls for getting and setting affinity masks for cpusets or
individual threads: cpuid_{get,set}affinity()
- Add types for the 'level' and 'which' parameters for the cpuset. This
will permit expansion of the api to cover cpu masks for other objects
identifiable with an id_t integer. For example, IRQs and Jails may be
coming soon.
- The root set 0 contains all valid cpus. All thread initially belong to
cpuset 1. This permits migrating all threads off of certain cpus to
reserve them for special applications.
Sponsored by: Nokia
Discussed with: arch, rwatson, brooks, davidxu, deischen
Reviewed by: antoine
not have VTOC information about the partitions, it will be created.
This is because the VTOC information is used for the partition type
and FreeBSD's sunlabel(8) does not create nor use VTOC information.
For this purpose, new tags have been added to support FreeBSD's
partition types.
structure. This allows per-CPU variations of struct pmap on a
single architecture without affecting the machine-independent
fields. As such, the PMAP variations don't affect the ABI. They
become part of it.
CPUFREQ_DRV_SETTINGS(). The value of count on input is used to
prefent overflow of the settings buffer passed into CPUFREQ_DRV_SETTINGS().
This corrects the "est: CPU supports Enhanced Speedstep, but is not recognized."
error on my system.
MFC after: 1 week
than rely on the lockmgr support [1]:
* bump the waiters only if the interlock is held
* let brelvp() return the waiters count
* rely on brelvp() instead than BUF_LOCKWAITERS() in order to check
for the waiters number
- Remove a namespace pollution introduced recently with lockmgr.h
including lock.h by including lock.h directly in the consumers and
making it mandatory for using lockmgr.
- Modify flags accepted by lockinit():
* introduce LK_NOPROFILE which disables lock profiling for the
specified lockmgr
* introduce LK_QUIET which disables ktr tracing for the specified
lockmgr [2]
* disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it
can only be used on a per-instance basis
- Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer
used
This patch breaks KPI so __FreBSD_version will be bumped and manpages
updated by further commits. Additively, 'struct buf' changes results in
a disturbed ABI also.
[2] Really, currently there is no ktr tracing in the lockmgr, but it
will be added soon.
[1] Submitted by: kib
Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
partition table is empty, check to see if we have something that
looks sufficiently like a BPB. On non-i386 machines, the boot
sector typically doesn't contain boot code; the end of the boot
sector is all zeroes. This is also where the partition table is
for MBRs.
We only check the sector size and cluster size, as that seems to
be the most reliable across implementations, BPB versions and
platforms.
just em, there is an igb driver (this follows behavior with our Linux drivers).
All adapters up to the 82575 are supported in em, and new client/desktop support
will continue to be in that adapter.
The igb driver is for new server NICs like the 82575 and its followons.
Advanced features for virtualization and performance will be in this driver.
Also, both drivers now have shared code that is up to the latest we have
released. Some stylistic changes as well.
Enjoy :)
code to add padlock features to the CPU model on VIA CPUs was no longer
effective. Change the code to instead output a separate printf during
dmesg for VIA Padlock features similar to other cpuid feature bitmasks.
MFC after: 1 week
Add "show sysregs" command to ddb. On i386, this gives gdt, idt, ldt,
cr0-4, etc. Support should be added for other platforms that have a
different set of registers for system use.
while still restricting auto-channel select to only those channels
permitted by regulatory constraints (sorta, we're still missing the
checks to honor radar and noadhoc status on channels). This somehow
got lost in the initial merge of the revised scanning code.
Reviewed by: jhay
MFC after: 2 weeks
frames. This bug seems to happen on certain hardware model/revision
(e.g. 88E8053) but it's not identified which hardwares are affected.
Revision 1.4 of if_mskreg.h was not enough to workaround the bug.
To workaround it, inrease GMAC FIFO threshold by one FIFO word to
flush received pause frames.
Reported by: das, Kirill Nuzhdin < kirill.nuzhdin AT rad dot chem dot msu dot ru >
Tested by: das, Kirill Nuzhdin
only because there's a partition table where the boot sector has
boot code. Boot sectors without boot code look like a MBR for all
practical purposes. This change adds a check for the partition table
and fails the probe when it's obvously invalid. The assumption being
that the sector contains a boot sector and not a MBR.
More checks are needed to distinguish a boot secto without boot code
from a (empty) MBR.
This fixes the panic which happens when mdcreate_vnode() calls vn_close()
and mddestroy() calls it again further down the error handling path.
Reviewed by: kris, kib
MFC after: 3 days
- Consolidate the code to humanize the size of a disk partition into a
single function based on the code for GPT partitions and use it for
GPT partitions, BSD slices, and BSD partitions.
- Teach the humanize code to use KB for small partitions (e.g. GPT boot
partitions now show up as 64KB rather than 0MB).
- Pad a few partition type names out so that things line up in the
common case.
MFC after: 1 week
weren't displayed on the new console. However, the config string has been
altered as part of being parsed so we only display the first option. Fix
this by saving a copy of /boot.config before parsing it and displaying the
saved copy after parsing.
MFC after: 1 week
PR: i386/103972
Submitted by: Alexandre Belloni alexandre.belloni of netasq.com
global audit mutex and condition variables, with an sx lock which protects
the trail vnode and credential while in use, and is acquired by the system
call code when rotating the trail. Previously, a "message" would be sent
to the kernel audit worker, which did the rotation, but the new code is
simpler and (hopefully) less error-prone.
Obtained from: TrustedBSD Project
MFC after: 1 month
the limit in bytes) hard coded into both the kernel and userland.
Make both these limits a sysctl, so it is easy to change the limit.
If the userland part of ipfw finds that the sysctls don't exist,
it will just fall back to the traditional limits.
(100 packets is quite a small limit these days. If you want to test
TCP at 100Mbps, 100 packets can only accommodate a DBP of 12ms.)
Note these sysctls in the man page and warn against increasing them
without thinking first.
MFC after: 3 weeks
device supports retrieving a serial number. Instead, first query the
list of VPD pages it does support, and only query the serial number if
it's supported, else silently move on. This eliminates a lot of noise
during verbose booting, and will likely eliminate the need for most
NOSERIAL quirks.
pmap_remove_all() must not be called on fictitious pages. To date,
fictitious pages have been allocated from zeroed memory, effectively
hiding this problem because the fictitious pages appear to have an empty
pv list. Submitted by: Kostik Belousov
Rewrite the comments describing vm_object_page_remove() to better
describe what it does. Add an assertion. Reviewed by: Kostik Belousov
MFC after: 1 week
the vnode interlock is not held. vn_printf() already correctly handles
locked and unlocked vnode interlocks, and all the in-tree vop_print
methods are interlock-agnostic.
Some code calls vprintf() with the vnode interlock held, that causes
unjustified panics with INVARIANTS (ffs_syncvnode() as example).
Reported by: Peter Holm
want to adjust this code to just assume that all CPUs >= Esther should
be checked for the extended cpuid flags register.
MFC after: 3 days
PR: i386/119491
The code seems pretty MPSAFE and Giant is held over kproc_exit() which
at lowel calls exit1(). exit1() requires Giant to be unowned so this
opens a window for races.
Reported by: Bryan Venteicher <bryanv at daemoninthecloset dot org>
Tested by: Bryan Venteicher <bryanv at daemoninthecloset dot org>
always curthread.
As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.
Tested by: Andrea Barberio <insomniac at slackware dot it>
source upgrades by falling back to GNU ar(1) as necessary. Option
WITH_BSDAR is gone. Option _WITH_GNUAR to aid in upgrades is *not*
supposed to be set by the user.
Stop bootstrapping BSD ar(1) on the next __FreeBSD_version bump, as
there are no known bugs in it. Bump __FreeBSD_version to anticipate
this and to flag the switch to BSD ar(1), should it be needed for
something.
Input from: obrien, des, kaiw
first before they can be set to Explorer mode.
PR: kern/118578
Submitted by: Andriy Gapon <avg@icyb.net.ua> (I added some comments)
Reviewed by: philip
MFC after: 1 month
variations (e500 currently), this provides a gcc-level FPU emulation and is an
alternative approach to the recently introduced kernel-level emulation
(FPU_EMU).
Approved by: cognet (mentor)
MFp4: e500
only anonymous default (OBJT_DEFAULT) and swap (OBJT_SWAP) objects should
ever have OBJ_ONEMAPPING set. However, vm_object_deallocate() was
setting it on device (OBJT_DEVICE) objects. As a result,
vm_object_page_remove() could be called on a device object and if that
occurred pmap_remove_all() would be called on the device object's pages.
However, a device object's pages are fictitious, and fictitious pages do
not have an initialized pv list (struct md_page).
To date, fictitious pages have been allocated from zeroed memory,
effectively hiding this problem. Now, however, the conversion of rotting
diagnostics to invariants in the amd64 and i386 pmaps has revealed the
problem. Specifically, assertion failures have occurred during the
initialization phase of the X server on some hardware.
MFC after: 1 week
Discussed with: Kostik Belousov
Reported by: Michiel Boland
namespace in order to handle lockmgr fields in a controlled way instead
than spreading all around bogus stubs:
- VN_LOCK_AREC() allows lock recursion for a specified vnode
- VN_LOCK_ASHARE() allows lock sharing for a specified vnode
In FFS land:
- BUF_AREC() allows lock recursion for a specified buffer lock
- BUF_NOREC() disallows recursion for a specified buffer lock
Side note: union_subr.c::unionfs_node_update() is the only other function
directly handling lockmgr fields. As this is not simple to fix, it has
been left behind as "sole" exception.
the same order that FreeBSD 6 and before did. Doug
White and the other bloodhounds at ISC discovered that
while FreeBSD 7's ordering of options was more efficient,
it caused some cable modem routers to ignore the
SYN-ACKs ordered in this fashion.
The placement of sackOK after the timestamp option seems
to be the critical difference:
FreeBSD 6:
<mss 1460,nop,wscale 1,nop,nop,timestamp 3512155768 0,sackOK,eol>
FreeBSD 7.0:
<mss 1460,nop,wscale 3,sackOK,timestamp 1370692577 0>
FreeBSD 7.0 + this change:
<mss 1460,nop,wscale 3,nop,nop,timestamp 7371813 0,sackOK,eol>
MFC after: 1 week
the provided trailers. This has been broken since revision 1.240.
Submitted by: Dan Nelson
PR: kern/120948
"sounds ok to me" from: phk
MFC after: 3 days
can run on processors that don't have a FPU. This is typically the
case for Book E processors. While a tuned system will probably want
to use soft-float (or use a processor that has a FPU if the usage is
FP intensive enough), allowing hard-float on FPU-less systems gives
great portability and flexibility.
Obtained from: NetBSD
o Disable interrupts while not running U-Boot code. We clobber
registers that the U-Boot interrupt handlers assume to be
fixed as per the U-Boot register usage. At this time this only
applies to r14. U-Boot uses r2 now for what they used r29 for.
After we restore r14 in preparation of doing the syscall, we
re-enable interrupts. When we return from the syscall, we
disable interrupts and restore the callee-saved r14.
(link) address and the physical (load) address. Ideally, the mapping
between link and load addresses should be abstracted by the copyin(),
copyout() and readin() functions, so that we don't have to add kluges
in __elfN(loadimage)(). Then, we could also have paged virtual memory
for the kernel. This can be important under EFI, where you need to
allocate physical memory form the firmware if you want to work in all
scenarios.
o Move the API prototypes to a separate header (glue.h)
o Allow the platform to hint libuboot about where to look
for the API signature. The uboot_address variable is
expected to be defined by the platform.
- add support for T3C
- add DDP support (zero-copy receive)
- fix TOE transmit of large requests
- fix shutdown so that sockets don't remain in CLOSING state indefinitely
- register listeners when an interface is brought up after tom is loaded
- fix setting of multicast filter
- enable link at device attach
- exit tick handler if shutdown is in progress
- add helper for logging TCB
- add sysctls for dumping transmit queues
- note that TOE wxill not be MFC'd until after 7.0 has been finalized
MFC after: 3 days
consists of the null-terminated name and the contents of any structure
you wish to record. A new ktrstruct() function constructs and emits a
KTR_STRUCT record. It is accompanied by convenience macros for struct
stat and struct sockaddr.
In kdump(1), KTR_STRUCT records are handled by a dispatcher function
that runs stringent sanity checks on its contents before handing it
over to individual decoding funtions for each type of structure.
Currently supported structures are struct stat and struct sockaddr for
the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK
and AF_IPX is present but disabled, as I am unable to test it properly.
Since 's' was already taken, the letter 't' is used by ktrace(1) to
enable KTR_STRUCT trace points, and in kdump(1) to enable their
decoding.
Derived from patches by Andrew Li <andrew2.li@citi.com>.
PR: kern/117836
MFC after: 3 weeks
Check that only MREMAP_FIXED and MREMAP_MAYMOVE flags are specified.
Check for the page alignment of the addr argument.
Submitted by: rdivacky
MFC after: 1 week
- Added loose RX MTU functionality to allow frames larger than 1500 bytes
to be accepted even though the interface MTU is set to 1500.
- Implemented new TCP header splitting/jumbo frame support which uses
two chains for receive traffic rather than the original single recevie
chain.
- Added additional debug support code.
binutils ar and ranlib to gar and granlib, respectively.
* Introduce a temporary variable WITH_GNUAR as a safety net.
When buildworld with -DWITH_GNUAR, GNU binutils ar and ranlib
will install as default ones and 'BSD' ar will be disabled.
* Bump __FreeBSD_version to reflect the import of 'BSD' ar(1).
Approved by: jkoshy (mentor)
The logical disks will appear as /dev/lvm/<vol group>-<logical vol>, for
instance /dev/lvm/vg0-home. G_LINUX_LVM currently supports linear stripes with
segments on multiple physical disks. The metadata is read only, logical
volumes can not be allocated or resized.
Reviewed by: Ivan Voras
Previously known as geom_lvm(4), rename requested by des, phk.
file system. In particular, stop overwriting mount point
flags in nfs_mountdiskless() because now they are set
elsewhere. (They were _initialized_ by that function in
the 4.4BSD days, when mount structures were not allocated
in a centralized manner -- see rev. 1.1 of this file.)
Fix nfs_mount(), which happened to depend on the loss of
MNT_ROOTFS when it came to update handling.
Also note that mountnfs() no longer handles updates. Now
they shouldn't reach this function, so printf a diagnostic
message if that happens due to a coding error.
macros. The only semantic change was the need to add a vc_opened field
to struct vcomm since we can no longer use the request queue returning
to an uninitialized state to hold whether or not the device is open.
MFC after: 1 month
the same operation of lockmgr() but accepting a custom wmesg, prio and
timo for the particular lock instance, overriding default values
lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo.
- Use lockmgr_args() in order to implement BUF_TIMELOCK()
- Cleanup BUF_LOCK()
- Remove LK_INTERNAL as it is nomore used in the lockmgr namespace
Tested by: Andrea Barberio <insomniac at slackware dot it>
fundamentally fairly confused about how signals work and when it is
appropriate for upcalls to be interrupted. In particular, we should
be exempting certain upcalls from interruption, we should not always
eventually time out sleeping on a upcall, and we should not be
interrupting the sleep for certain signals that we currently are
(including SIGINFO). This code needs to be reworked in the style of
NFS interruptible mounts.
MFC after: 1 month
coherent with the data caches. Implement a quick fix to allow
us to boot on Montecito, while I'm working on a better fix in
the mean time.
Commit made on Montecito-based Itanium...
is to be requested via a "ro" option. At the same time, MNT_RDONLY
is gradually becoming an indicator of the current state of the FS
instead of a command flag. Today passing MNT_RDONLY alone to the
kernel's mount machinery will lead to various glitches. (See the
PRs for examples.)
Therefore mount the root FS with a "ro" option instead of the
MNT_RDONLY flag. (Note that MNT_RDONLY still is added to the mount
flags internally, by vfs_donmount(), if "ro" was specified.)
To be able to pass "ro" cleanly to kernel_vmount(), teach the latter
function to accept options with NULL values.
Also correct the comment explaining how mount_arg() handles length
of -1.
PR: bin/106636 kern/120319
Submitted by: Jaakko Heinonen <see PR kern/120319 for email> (originally)
legacy interrupts rather than MSI as a special case. Prior to this
commit, the interrupt handler was doing the slow handshaking with
the device to ensure the legacy interrupt was lowered in both
the legacy and MSI-X case. This handshaking was not
required for MSI-X.
allocator for jumbo frame.
o Removed unneeded jlist lock which was used to manage jumbo
buffers.
o Don't reinitialize hardware if MTU was not changed.
o Added additional check for minimal MTU size.
o Added a new tunable hw.skc.jumbo_disable to disable jumbo frame
support for the driver. The tunable could be set for systems that
do not need to use jumbo frames and it would save
(9K * number of Rx descriptors) bytes kernel memory.
o Jumbo buffer allocation failure is no longer critical error for
the operation of sk(4). If sk(4) encounter the allocation failure
it just disables jumbo frame support and continues to work without
user intervention.
With these changes jumbo frame performance of sk(4) was slightly
increased and users should not encounter jumbo buffer allocation
failure. Previously sk(4) tried to allocate physically contiguous
memory, 3388KB for 256 Rx descriptors. Sometimes that amount of
contiguous memory region could not be available for running systems
which in turn resulted in failure of loading the driver.
Tested by: Cy Schubert < Cy.Schubert () komquats dot com >
modules using invalid ABI versions (e.g. a 7.x module with an 8.x kernel)
for a given kernel:
- Add a 'kernel' module version whose value is __FreeBSD_version.
- Add a version dependency on 'kernel' in every module that has an
acceptable version range of __FreeBSD_version up to the end of the
branch __FreeBSD_version is part of. E.g. a module compiled on 701000
would work on kernels with versions between 701000 and 799999 inclusive.
Discussed on: arch@
MFC after: 1 week
A couple of notes for this:
* WITNESS support, when enabled, is only used for shared locks in order
to avoid problems with the "disowned" locks
* KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order
to assert for a generic thread (not curthread) owning or not the
lock. Really, this kind of check is bogus but it seems very
widespread in the consumers code. So, for the moment, we cater this
untrusted behaviour, until the consumers are not fixed and the
options could be removed (hopefully during 8.0-CURRENT lifecycle)
* Implementing KA_HELD and KA_UNHELD (not surported natively by
WITNESS) made necessary the introduction of LA_MASKASSERT which
specifies the range for default lock assertion flags
* About other aspects, lockmgr_assert() follows exactly what other
locking primitives offer about this operation.
- Build real assertions for buffer cache locks on the top of
lockmgr_assert(). They can be used with the BUF_ASSERT_*(bp)
paradigm.
- Add checks at lock destruction time and use a cookie for verifying
lock integrity at any operation.
- Redefine BUF_LOCKFREE() in order to not use a direct assert but
let it rely on the aforementioned destruction time check.
KPI results evidently broken, so __FreeBSD_version bumping and
manpage update result necessary and will be committed soon.
Side note: lockmgr_assert() will be used soon in order to implement
real assertions in the vnode namespace replacing the legacy and still
bogus "VOP_ISLOCKED()" way.
Tested by: kris (earlier version)
Reviewed by: jhb
access cache improvements:
- Flush just access control state on CODA_PURGEUSER, not the full
namecache for /coda.
- When replacing a fid on a cnode as a result of, e.g.,
reintegration after offline operation, we no longer need to
purge the namecache entries associated with its vnode.
MFC after: 1 month
modeled on the access cache found in NFS, smbfs, and the Linux coda
module. This is a positive access cache of a single entry per file,
tracking recently granted rights, but unlike NFS and smbfs,
supporting explicit invalidation by the distributed file system.
For each cnode, maintain a C_ACCCACHE flag indicating the validity
of the cache, and a cached uid and mode tracking recently granted
positive access control decisions.
Prefer the cache to venus_access() in VOP_ACCESS() if it is valid,
and when we must fall back to venus_access(), update the cache.
Allow Venus to clear the access cache, either the whole cache on
CODA_FLUSH, or just entries for a specific uid on CODA_PURGEUSER.
Unlike the Coda module on Linux, we don't flush all entries on a
user purge using a generation number, we instead walk present
cnodes and clear only entries for the specific user, meaning it is
somewhat more expensive but won't hit all users.
Since the Coda module is agressive about not keeping around
unopened cnodes, the utility of the cache is somewhat limited for
files, but works will for directories. We should make Coda less
agressive about GCing cnodes in VOP_INACTIVE() in order to improve
the effectiveness of in-kernel caching of attributes and access
rights.
MFC after: 1 month
VFS namecache, as is done by the Coda module on Linux. Unlike the Coda
namecache, the global VFS namecache isn't tagged by credential, so use
ore conservative flushing behavior (for now) when CODA_PURGEUSER is
issued by Venus.
This improves overall integration with the FreeBSD VFS, including
allowing __getcwd() to work better, procfs/procstat monitoring, and so
on. This improves shell behavior in many cases, and improves ".."
handling. It may lead to some slowdown until we've implemented a
specific access cache, which should net improve performance, but in the
mean time, lookup access control now always goes to Venus, whereas
previously it didn't.
MFC after: 1 month
When ntfs_ntput() reaches 0 in the refcount the inode lockmgr is not
released and directly destroyed. Fix this by unlocking the lockmgr() even
in the case of zero-refcount.
Reported by: dougb, yar, Scot Hetzel <swhetzel at gmail dot com>
Submitted by: yar
nfs_xid_gen() function instead of duplicating the logic in both
nfsm_rpchead() and the NFS3ERR_JUKEBOX handling in nfs_request().
MFC after: 1 week
Submitted by: mohans (a long while ago)
through the FreeBSD ABI. IPC_INFO, SHM_INFO, SHM_STAT were added
specifically for Linux binary support. They are not documented
as being a part of the FreeBSD ABI, also, the structures necessary
for them have been hidden away from the users for a long time.
Also, the Linux ABI layer uses it's own structures to populate the
responses back to the user to ensure that the ABI is consistent.
I think there is a bit more separation work that needs to happen.
Reviewed by: jhb
Discussed with: jhb
Discussed on: freebsd-arch@ (very briefly)
MFC after: 1 month
the PIC also informs the platform at which IRQ level it can start
assigning IPIs, since this can depend on the number of IRQs
supported for external interrupts.
PAGE_SIZE or less, the bounce page counting logic was flawed and wouldn't
reserve any pages. Adjust to be correct. Review of other architectures is
forthcoming.
Submitted by: Joseph Golio
With write-allocate cache we get into the following scenario:
1. data has been updated in the memory by the USB HC, but
2. D-cache holds an un-flushed value of it
3. when affected cache line is being replaced, the old (un-flushed) value is
flushed and overwrites the newly arrived
This is possible due to how write-allocate works with virtual caches (ARM for
example).
In case of USB transfers it leads to fatal tags discrepancies in umass(4)
operation, which look like the following:
umass0: Invalid CSW: tag 1 should be 2
(probe0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(probe0:umass-sim0:0:0:0): Retrying Command
umass0: Invalid CSW: tag 1 should be 3
(probe0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(probe0:umass-sim0:0:0:0): Retrying Command
umass0: Invalid CSW: tag 1 should be 4
(probe0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(probe0:umass-sim0:0:0:0): Retrying Command
umass0: Invalid CSW: tag 1 should be 5
(probe0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(probe0:umass-sim0:0:0:0): Retrying Command
umass0: Invalid CSW: tag 1 should be 6
(probe0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(probe0:umass-sim0:0:0:0): error 5
(probe0:umass-sim0:0:0:0): Retries Exausted
To eliminate this, a BUS_DMASYNC_PREREAD sync operation is required in
usbd_start_transfer().
Credits for nailing this down go to Grzegorz Bernacki gjb AT semihalf DOT com.
Reviewed by: imp
Approved by: cognet (mentor)
historical relic, and are no longer appropriate for either LAN or WAN
mounting. At modern (gigabit and 10 gigabit) LAN speeds packet loss
from socket buffer fill events is common, and sequence numbers wrap
quickly enough that data corruption is possible. TCP solves both of
these problems without imposing significant overhead.
MFC after: 1 month
sectors so the geometry of large IDE disks has to be adjusted. This
corresponds to what the OpenSolaris dad(7D) driver does except that
the latter only tweaks sectors and effectively limits the mediasize
to 128GB so the cylinders and heads fields won't ever overflow. Not
limiting the mediasize is a compromise between allowing to use Sun
disk label as far as possible and being able to use the entire disk
with another disk label.
This allows to use the full capacity of large IDE disks if they were
not labeled under (Open)Solaris (in both ways of the meaning).
MFC after: 2 weeks
Turn off TFTP support by default: when both TFTP and NFS are enabled in the
loader, strange interactions occur in the pure netbooting scenario (i.e.
loader is TFTP-ed, kernel+world mounted over NFS), leading to very slow access
to the NFS-exported files.
Reviewed by: grehan
Approved by: cognet (mentor)
The logical disks will appear as /dev/lvm/<vol group>-<logical vol>, for
instance /dev/lvm/vg0-home. GLVM currently supports linear stripes with
segments on multiple physical disks. The metadata is read only, logical
volumes can not be allocated or resized.
Reviewed by: Ivan Voras
- Include lock.h in lockmgr.h as nested header in order to safely use
LOCK_FILE and LOCK_LINE. As long as this code will be replaced soon
we can tollerate for a while this namespace pollution even if the real
fix would be to let lockmgr() depend by lock.h as a separate header.
tree, restyle everything but coda.h (which is more explicitly shared
across systems) into a closer approximation to style(9).
Remove a few more unused function prototypes.
Add or clarify some comments.
MFC after: 1 month
NOP-message polling in ciss_periodic().
Note that setting the tunable to non-zero can be workaround only for
`ADAPTER HEARTBEAT FAILED' problem, and may freeze the system w/o
the problem.
Reviewed by: scottl
Reported by: Attila Nagy
MFC after: 3 days
owned by a NULL owner. This will lead consequent VOP_ISLOCKED() present
into nfs_upgrade_vnlock() to panic as it only acquire curthread now.
Fix nfs_upgrade_vnlock() and nfs_downgrade_vnlock() in order to not use
more the struct thread pointer passed as argument (as it is really nomore
required there as vn_lock() and VOP_UNLOCK doesn't get the lock more).
Using curthread, in place, doesn't get ambiguity as LK_EXCLOTHER should
be handled as a "not locked" request by both functions.
Reported by: kris
Tested by: kris
Reviewed by: ups
- Rename print_vattr to coda_print_vattr and make static, rename
print_cred to coda_print_cred.
- Remove unused coda_vop_nop.
- Add XXX comment because coda_readdir forwards to the cache vnode's
readdir rather than venus_readdir, and annotate venus_readdir as
unused.
- Rename vc_nb_* to vc_*.
- Use d_open_t, d_close_t, d_read_t, d_write_t, d_ioctl_t and d_poll_t
for prototyping vc_* as that is the intent, don't use our own
definitions.
- Rename coda_nb_statfs to coda_statfs, rename NB_SFS_SIZ to
CODA_SFS_SIZ.
- Replace one more OBE reference to NetBSD with a reference to FreeBSD.
- Tidy up a little vertical whitespace here and there.
- Annotate coda_nc_zapvnode as unused.
- Remove unused vcodattach.
- Annotate VM_INTR as unused.
- Annotate that coda_fhtovp is unused and doesn't match the FreeBSD
prototype, so isn't hooked up to vfs_fhtovp. If we want NFS export of
Coda to work someday, this needs to be fixed.
- Remove unused getNewVnode.
- Remove unused coda_vget, coda_init, coda_quotactl prototypes.
MFC after: 1 month
the mountpoint for a specific device. This was implemented incorrectly,
a bad idea in a fundamental sense, and also never used, so presumably
a long-idle debugging function.
MFC after: 1 month
for vop_bmap; delete the existing stub that returned either EINVAL
or EOPNOTSUPP, and had unreachable calls to VOP_BMAP on the cache
vnode.
MFC after: 1 month
directory, and jail directory within procstat. While this functionality
is available already in fstat, encapsulating it in the kern.proc.filedesc
sysctl makes it accessible without using kvm and thus without needing
elevated permissions.
The new procstat output looks like:
PID COMM FD T V FLAGS REF OFFSET PRO NAME
76792 tcsh cwd v d -------- - - - /usr/src
76792 tcsh root v d -------- - - - /
76792 tcsh 15 v c rw------ 16 9130 - -
76792 tcsh 16 v c rw------ 16 9130 - -
76792 tcsh 17 v c rw------ 16 9130 - -
76792 tcsh 18 v c rw------ 16 9130 - -
76792 tcsh 19 v c rw------ 16 9130 - -
I am also bumping __FreeBSD_version for this as this new feature will be
used in at least one port.
Reviewed by: rwatson
Approved by: rwatson
then later to FreeBSD. Update various NetBSD-related comments: in some
cases delete them because they don't appply, in others update to say
FreeBSD as they still apply but in FreeBSD (and might for that matter
no longer apply on NetBSD), and flag one case where I'm not sure
whether it applies.
MFC after: 1 month
locks of those vnodes. Probably, Coda should do the same lock sharing/
pass-through that is done for nullfs, but in the mean time this ensures
that locks are adequately held to prevent corruption of data structures
in the cache file system.
Assuming most operations came from the top layer of Coda and weren't
performed directly on the cache vnodes, in practice this corruption was
relatively unlikely as the Coda vnode locks were ensuring exclusive
access for most consumers.
This causes WITNESS to squeal like a pig immediately when Coda is used,
rather than waiting until file close; I noticed these problems because
of the lack of said squealing.
MFC after: 1 month
vget() calls using inode numbers to query the root of /coda, which is not
needed since we now cache the root vnode with the mountpoint.
MFC after: 1 month
VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should
only acquire curthread as argument; this will lead in axing the additional
argument from both functions, making the code cleaner.
Reviewed by: jeff, kib
the provided lock or &blocked_lock. The thread may be temporarily
assigned to the blocked_lock by the scheduler so a direct comparison
can not always be made.
- Use THREAD_LOCKPTR_ASSERT() in the primary consumers of the scheduling
interfaces. The schedulers themselves still use more explicit asserts.
Sponsored by: Nokia
obtained from OpenBSD with an algorithm suggested
by Amit Klein. The OpenBSD algorithm has a few
flaws; see Amit's paper for more information.
For a description of how this algorithm works,
please see the comments within the code.
Note that this commit does not yet enable random IP ID
generation by default. There are still some concerns
that doing so will adversely affect performance.
Reviewed by: rwatson
MFC After: 2 weeks
- Move recursion checking into rwlock inlines to free a bit for use with
adaptive spinners.
- Clear the RW_LOCK_WRITE_SPINNERS flag whenever the lock state changes
causing write spinners to restart their loop.
- Write spinners are limited by a count while readers hold the lock as
there is no way to know for certain whether readers are running still.
- In the read path block if there are write waiters or spinners to avoid
starving writers. Use a new per-thread count, td_rw_rlocks, to skip
starvation avoidance if it might cause a deadlock.
- Remove or change invalid assertions in turnstiles.
Reviewed by: attilio (developed parts of the patch as well)
Sponsored by: Nokia
This support tries to be as parallel as possible with other locking
primitives, but there are differences; more specifically:
- The base witness support is alredy equipped for allowing lock
duplication acquisition as lockmgr rely on this.
- In the case of lockmgr_disown() the lock result unlocked by witness
even if it is still held by the "kernel context"
- In the case of upgrading we can have 3 different situations:
* Total unlocking of the shared lock and nothing else
* Real witness upgrade if the owner is the first upgrader
* Shared unlocking and exclusive locking if the owner is not the first
upgrade but it is still allowed to upgrade
- LK_DRAIN is basically handled like an exclusive acquisition
Additively new options LK_NODUP and LK_NOWITNESS can now be used with
lockinit(): LK_NOWITNESS disables WITNESS for the specified lock while
LK_NODUP enable duplicated locks tracking. This will require manpages
update and a __FreeBSD_version bumping (addressed by further commits).
This patch also fixes a problem occurring if a lockmgr is held in
exclusive mode and the same owner try to acquire it in shared mode:
currently there is a spourious shared locking acquisition while what
we really want is a lock downgrade. Probabilly, this situation can be
better served with a EDEADLK failing errno return.
Side note: first testing on this patch alredy reveleated several LORs
reported, so please expect LORs cascades until resolved. NTFS also is
reported broken by WITNESS introduction. BTW, NTFS is exposing a lock
leak which needs to be fixed, and this patch can help it out if
rightly tweaked.
Tested by: kris, yar, Scot Hetzel <swhetzel at gmail dot com>
done in consumers code: using locks properties is much more appropriate.
Fix current code doing these bogus checks.
Note: Really, callout are not usable by all !(LC_SPINLOCK | LC_SLEEPABLE)
primitives like rmlocks doesn't implement the generic lock layer
functions, but they can be equipped for this, so the check is still
valid.
Tested by: matteo, kris (earlier version)
Reviewed by: jhb
This allows to fix a problem with ARM kernel.bin not having the MFS image
embedded: it is objcopied from the kernel.noheader temporary ELF file, which
was not subject to embedding the MFS image previously.
Reviewed by: imp
Approved by: cognet (mentor)
De-hardcode usage of ARM_TP_ADDRESS and RAS local storage, and move this
special purpose page to a more convenient place i.e. after the vectors high
page, more towards the end of address space. Previous location (0xe000_0000)
caused grief if KVA was to go beyond the default limit.
Note that ARM world rebuilding is required after this change since the
location of ARM_TP_ADDRESS is shared between kernel and userland.
Submitted by: Grzegorz Bernacki (gjb AT semihalf dot com)
Reviewed by: imp
Approved by: cognet (mentor)
- Expose sbrelease_internal(), a variant of sbrelease() with no
expectations about the validity of locks in the socket buffer.
- Use sbrelease_internel() in sorflush(), and as a result avoid intializing
and destroying a socket buffer lock for the temporary stack copy of the
actual buffer, asb.
- Add a comment indicating why we do what we do, and remove an XXX since
things have gotten less ugly in sorflush() lately.
This makes socket close cleaner, and possibly also marginally faster.
MFC after: 3 weeks
referencing the files VM pages are returned from the network stack,
making changes to the file safe.
This flag does not guarantee that the data has been transmitted to the
other end.
- Rename rt2560_read_eeprom to rt2560_read_config, we already have
rt2560_eeprom_read
- If hardware gives us wrong encryption done index, shout out loudly and
terminate the processing loop
- Process encryption done if RX done bit is set in interrupt status register
(according to Ralink Linux driver)
- Turn VALID/BUSY bits in TX descriptor only after TX descriptor is fully setup
- Fix BBP read: RT2560_BBPCSR can't be written until its RT2560_BBP_BUSY bit is
off (according to Ralink Linux driver)
- Skip invalid (0 of 0xffff) BBP register/value entries stored in EEPROM
- Fix channel TX power location in EEPROM, if channel TX power is above 31 set
it to 24 (TX power only has 5bits in RF register, "24" is according to Ralink
Linux driver)
- Configure BBP according to the BBP register/value stored in EEPROM, restore
BBP17 (RX sensitivity tuning) to default value after this.
- Set TX/RX antenna after BBP is initialized; these two operation will try to
set BBP registers
- Reconfigure ACK TX time registers according to 802.11g standard (TX @36Mb,
other side's ACK should be sent @24Mb).
- 2560 parts have two TX ring: one for management/control packets, one for data
packets. Add private OACTIVE flag for each of them. Turn on IFF_DRV_OACTIVE
if one of private OACTIVE is on; turn off IFF_DRV_OACTIVE iff all of them are
off.
- Rework watchdog to mimic old if_watchdog action. Process TX done/encryption
done in watchdog function (according to Ralink Linux driver)
Obtained from: DragonFly
Approved by: sam (mentor)
Tested by: sam
Related to PR: kern/117655
# Forcing long slot time setting is not included in this commit, comment and
# related code is in place, so if problem pops up, quick tests could be done.
down some DCMD's without any data. Thanks to Dell and LSI for helping
to provide clues to figure out this problem. Now MegaCli can upgrade
the firmware and should work identical when run on Linux.
Reviewed by: scottl, LSI
MFC after: 1 day
ipsec*_set_policy and do the privilege check only if needed.
Try to assimilate both ip*_ctloutput code blocks calling ipsec*_set_policy.
Reviewed by: rwatson
a variety of bootloaders. This sometimes means that different loader
scripts are required within one ${MACHINE_ARCH}, which makes the
current practice of using ldscript.${MACHINE_ARCH} unsuitable.
Instead, make the default the current convention and allow the ld
scripts to be overridden as necessary.
If we aren't arm, pc98 or sun4v, then enable treating warnings like
errors. That doesn't mean these platforms aren't -Werror clean, just
that we haven't enforced it before. Someone with some spare time
should investigate these three platforms to see if any can be removed.
PCI-express chipset (and thus has functional MSI) if there are any
PCI-express devices in the system, not requiring a root port device.
With PCI-X the chipset detection has to be very conservative because there
are known systems with PCI-X devices that do not appear to have PCI-X
chipsets. However, with PCI-express I'm not sure it is possible to have
a PCI-express device in a system with a non-PCI-express chipset. If we
assume that is the case then this change is valid. It is also required
for at least some PCI-express systems that don't have any devices with
a root port capability (some ICH9 systems).
MFC after: 1 week
Reported by: jfv
free function controlable, instead of passing the KVA of the buffer
storage as the first argument.
Fix all conventional users of the API to pass the KVA of the buffer
as the first argument, to make this a no-op commit.
Likely break the only non-convetional user of the API, after informing
the relevant committer.
Update the mbuf(9) manual page, which was already out of sync on
this point.
Bump __FreeBSD_version to 800016 as there is no way to tell how
many arguments a CPP macro needs any other way.
This paves the way for giving sendfile(9) a way to wait for the
passed storage to have been accessed before returning.
This does not affect the memory layout or size of mbufs.
Parental oversight by: sam and rwatson.
No MFC is anticipated.
aligned (or at least not cross a page boundary). However, it turns out
that on at least one machine one table header does cross a page boundary.
This caused problems with the MADT early probe as it uses the crash dump
map to load ACPI tables by loading the RSDT/XSDT into pages 1 ... N and
loading the header of each ACPI table header into page 0 looking for the
MADT. However, if a table header crossed a page boundary, then page 1
would get trashed resulting in a panic. Fix this by reserving the first
2 pages for ACPI table headers (headers are less than a page in size,
so 2 pages will be sufficient) and use pages 2 .. N for the RSDT and XSDT.
Note: amd64 should probably be simplified to just use pmap_mapbios()
for all these tables which will use the direct map and not need the
crash dump hack.
MFC after: 5 days
Tested on: i386
Reported by: Pete French petefrench of ticketswitch.com
read socket buffers in shutdown() and close():
- Call socantrcvmore() before sblock() to dislodge any threads that
might be sleeping (potentially indefinitely) while holding sblock(),
such as a thread blocked in recv().
- Flag the sblock() call as non-interruptible so that a signal
delivered to the thread calling sorflush() doesn't cause sblock() to
fail. The sblock() is required to ensure that all other socket
consumer threads have, in fact, left, and do not enter, the socket
buffer until we're done flushin it.
To implement the latter, change the 'flags' argument to sblock() to
accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK
flag. When SBL_NOINTR is set, it forces a non-interruptible sx
acquisition, regardless of the setting of the disposition of SB_NOINTR
on the socket buffer; without this change it would be possible for
another thread to clear SB_NOINTR between when the socket buffer mutex
is released and sblock() is invoked.
Reviewed by: bz, kmacy
Reported by: Jos Backus <jos at catnook dot com>
The only downside is that it renames pmap_vac_me_harder() to pmap_fix_cache().
From Mark's email on -arm :
pmap_get_vac_flags(), pmap_vac_me_harder(), pmap_vac_me_kpmap(), and
pmap_vac_me_user() has been rewritten as pmap_fix_cache() to be more
efficient in the kernel map case. I also removed the reference to
the md.kro_mappings, md.krw_mappings, md.uro_mappings, and md.urw_mappings
counts.
In pmap_clearbit(), we can also skip over tests and writeback/invalidations
in the PVF_MOD and PVF_REF cases if those bits are not set in the pv_flag.
PVF_WRITE will turn caching back on and remove the PV_MOD bit.
In pmap_nuke_pv(), the vm_page_flag_clear(pg, PG_WRITEABLE) has been moved
to the pmap_fix_cache().
We can be more agressive in attempting to turn caching back on by calling
pmap_fix_cache() at times that may be appropriate to turn cache on
(a kernel mapping has been removed, a write has been removed or a read
has been removed and we know the mapping does not have multiple write
mappings to a page).
In pmap_remove_pages() the cpu_idcache_wbinv_all() is moved to happen
before the page tables are NULLed because the caches are virtually
indexed and virtually tagged.
In pmap_remove_all(), the pmap_remove_write(m) is added before the
page tables are NULLed because the caches are virtually indexed and
virtually tagged. This also removes the need for the caches fixing routine
(whichever is being used pmap_vac_me_harder() or pmap_fix_cache()) to be
called on any of these mappings.
In pmap_remove(), I simplified the cache cleaning process and removed
extra TLB removals. Basically if more than PMAP_REMOVE_CLEAN_LIST_SIZE
are removed, then just flush the entire cache.
This implemntation made for growing down stack organization like i386/amd64
platforms have, but prefers different machine dependent version if it is present.
o conversion to callout(9) API.
o add a missing driver lock in bfe_ifmedia_sts().
o use our callout to drive watchdog timer.
o restart Tx routine if pending queued packets are present in
watchdog handler.
o unarm watchdog timer only if there are no queued packets.
o don't blindly reset phy and let phy driver handle link change
request in bfe_init_locked().
o return the status of mii_mediachg() to caller in
bfe_ifmedia_upd(). Previously it always returned 0 to caller.
o add check for IFF_DRV_RUNNING flag as well as IFF_DRV_OACTIVE
in bfe_start_locked().
o implement miibus_statchg method that keeps track of current
link state changes as well as negotiated speed/duplex/
flow-control configuration.
Reprogram MAC to appropriate duplex state. Flow-control
configuration was also implemented but commented out at the
moment. The flow-control configuration will be enabled again
after we have general flow-control framework in mii layer.
Reported by: Yousif Hassan < yousif () alumni ! jmu ! edu >
Tesdted by: Yousif Hassan < yousif () alumni ! jmu ! edu >
This makes sure that process tokens credentials with un-initialized
audit contexts are handled correctly. Currently, when invariants are
enabled, this change fixes a panic by ensuring that we have a valid
termid family. Also, this fixes token generation for process tokens
making sure that userspace is always getting a valid token.
This is consistent with what Solaris does when an audit context is
un-initialized.
Obtained from: TrustedBSD Project
MFC after: 1 week
relabel check for MLS rather than returning 0 directly.
This problem didn't result in a vulnerability currently as the central
implementation of ifnet relabeling also checks for UNIX privilege, and
we currently don't guarantee containment for the root user in mac_mls,
but we should be using the MLS definition of privilege as well as the
UNIX definition in anticipation of supporting root containment at some
point.
MFC after: 3 days
Submitted by: Zhouyi Zhou <zhouzhouyi at gmail dot com>
Sponsored by: Google SoC 2007
- Fix whitespace according to style(9).
- Sync the comment describing why we have to wait in nsphy_reset()
with nsphyter_reset(). It's true that the manual tells to not do a
reset within 500us of applying power but that's unlikely the cause
of problems seen here. Generally having to wait 500us after a reset
however is.
DP83847 PHYs. The main reason for using a specific driver for these
PHYs are reset quirks similar to the nsphy(4) driven DP83840A.
PR: 112654
Obtained from: NetBSD
MFC after: 2 weeks
Thanks to: mlaier for testing w/ DP83815
overridden at compile-time using kernel options of the same names.
Rather than doing a compile-time CTASSERT of buffer sizes being
even multiples of block sizes, just adjust them at boottime, as
the failure mode is more user-friendly.
MFC after: 2 months
PR: 119993
Suggested by: Scot Hetzel <swhetzel at gmail dot com>
fields in FTS and FTSENT structs being too narrow. In addition,
the narrow types creep from there into fts.c. As a result, fts(3)
consumers, e.g., find(1) or rm(1), can't handle file trees an ordinary
user can create, which can have security implications.
To fix the historic implementation of fts(3), OpenBSD and NetBSD
have already changed <fts.h> in somewhat incompatible ways, so we
are free to do so, too. This change is a superset of changes from
the other BSDs with a few more improvements. It doesn't touch
fts(3) functionality; it just extends integer types used by it to
match modern reality and the C standard.
Here are its points:
o For C object sizes, use size_t unless it's 100% certain that
the object will be really small. (Note that fts(3) can construct
pathnames _much_ longer than PATH_MAX for its consumers.)
o Avoid the short types because on modern platforms using them
results in larger and slower code. Change shorts to ints as
follows:
- For variables than count simple, limited things like states,
use plain vanilla `int' as it's the type of choice in C.
- For a limited number of bit flags use `unsigned' because signed
bit-wise operations are implementation-defined, i.e., unportable,
in C.
o For things that should be at least 64 bits wide, use long long
and not int64_t, as the latter is an optional type. See
FTSENT.fts_number aka FTS.fts_bignum. Extending fts_number `to
satisfy future needs' is pointless because there is fts_pointer,
which can be used to link to arbitrary data from an FTSENT.
However, there already are fts(3) consumers that require fts_number,
or fts_bignum, have at least 64 bits in it, so we must allow for them.
o For the tree depth, use `long'. This is a trade-off between making
this field too wide and allowing for 64-bit inode numbers and/or
chain-mounted filesystems. On the one hand, `long' is almost
enough for 32-bit filesystems on a 32-bit platform (our ino_t is
uint32_t now). On the other hand, platforms with a 64-bit (or
wider) `long' will be ready for 64-bit inode numbers, as well as
for several 32-bit filesystems mounted one under another. Note
that fts_level has to be signed because -1 is a magic value for it,
FTS_ROOTPARENTLEVEL.
o For the `nlinks' local var in fts_build(), use `long'. The logic
in fts_build() requires that `nlinks' be signed, but our nlink_t
currently is uint16_t. Therefore let's make the signed var wide
enough to be able to represent 2^16-1 in pure C99, and even 2^32-1
on a 64-bit platform. Perhaps the logic should be changed just
to use nlink_t, but it can be done later w/o breaking fts(3) ABI
any more because `nlinks' is just a local var.
This commit also inludes supporting stuff for the fts change:
o Preserve the old versions of fts(3) functions through libc symbol
versioning because the old versions appeared in all our former releases.
o Bump __FreeBSD_version just in case. There is a small chance that
some ill-written 3-rd party apps may fail to build or work correctly
if compiled after this change.
o Update the fts(3) manpage accordingly. In particular, remove
references to fts_bignum, which was a FreeBSD-specific hack to work
around the too narrow types of FTSENT members. Now fts_number is
at least 64 bits wide (long long) and fts_bignum is an undocumented
alias for fts_number kept around for compatibility reasons. According
to Google Code Search, the only big consumers of fts_bignum are in
our own source tree, so they can be fixed easily to use fts_number.
o Mention the change in src/UPDATING.
PR: bin/104458
Approved by: re (quite a while ago)
Discussed with: deischen (the symbol versioning part)
Reviewed by: -arch (mostly silence); das (generally OK, but we didn't
agree on some types used; assuming that no objections on
-arch let me to stick to my opinion)
exposed as kernel compile options, they have more meaningful names.
PR: 119993
MFC after: 2 months
Suggested by: Scot Hetzel <swhetzel at gmail dot com>
bug that caused us to reintroduce it is believed to be fixed, and Kris
says he no longer sees problems with fifofs in highly parallel builds.
If this works out, we'll MFC it for 7.1.
MFC after: 3 months
Pointed out by: kris
resulted in the argument to the make_dev() to be a unit number.
Correct this by supplying a minor number to make_dev(), and using
the unit number for the calculation of the slave tty name.
Reported and tested by: Peter Holm
Reviewed by: jhb
Yet another pointy hat to: kib
MFC after: 1 day
while the thread does not hold the thread lock would stop blocking for
subsequent interruptible sleeps and would always immediately fail the
sleep with EWOULDBLOCK instead (even sleeps that didn't have a timeout).
Some background:
- KSE has a facility for allowing one thread to interrupt another thread.
During this process, the target thread aborts any interruptible sleeps
much as if the target thread had a pending signal. Once the target
thread acknowledges the interrupt, normal sleep handling resumes. KSE
manages this via the TDF_INTERRUPTED flag. Specifically, it sets the
flag when it sends an interrupt to another thread and clears it when
the interrupt is acknowledged. (Note that this is purely a software
interrupt sort of thing and has no relation to hardware interrupts
or kernel interrupt threads.)
- The old code for handling the sleep timeout race handled the race
by setting the TDF_INTERRUPT flag and faking a KSE-style thread
interrupt to the thread in the process of going to sleep. It probably
should have just checked the TDF_TIMEOUT flag in sleepq_catch_signals()
instead.
- The bug was that the sleepq code would set TDF_INTERRUPT but it was
never cleared. The sleepq code couldn't safely clear it in case there
actually was a real KSE thread interrupt pending for the target thread
(in fact, the sleepq timeout actually stomped on said pending interrupt).
Thus, any future interruptible sleeps (*sleep(.. PCATCH ..) or
cv_*wait_sig()) would see the TDF_INTERRUPT flag set and immediately
fail with EWOULDBLOCK. The flag could be cleared if the thread belonged
to a KSE process and another thread posted an interrupt to the original
thread. However, in the more common case of a non-KSE process, the
thread would pretty much stop sleeping.
- Fix the bug by just setting TDF_TIMEOUT in the sleepq timeout code and
not messing with TDF_INTERRUPT and td_intrval. With yesterday's fix to
fix sleepq_switch() to check TDF_TIMEOUT, this is now sufficient.
MFC after: 3 days
exposing them to all consumers of ip_fw.h. These structures are
used in both ipfw(8) and ipfw(4), but not part of the user<->kernel
interface for other applications to use, rather, shared
implementation.
MFC after: 3 days
Reported by: Paul Vixie <paul at vix dot com>
being properly cancelled by a timeout. In general there is a race
between a the sleepq timeout handler firing while the thread is still
in the process of going to sleep. In 6.x with sched_lock, the race was
largely protected by sched_lock. The only place it was "exposed" and had
to be handled was while checking for any pending signals in
sleepq_catch_signals().
With the thread lock changes, the thread lock is dropped in between
sleepq_add() and sleepq_*wait*() opening up a new window for this race.
Thus, if the timeout fired while the sleeping thread was in between
sleepq_add() and sleepq_*wait*(), the thread would be marked as timed
out, but the thread would not be dequeued and sleepq_switch() would
still block the thread until it was awakened via some other means. In
the case of pause(9) where there is no other wakeup, the thread would
never be awakened.
Fix this by teaching sleepq_switch() to check if the thread has had its
sleep canceled before blocking by checking the TDF_TIMEOUT flag and
aborting the sleep and dequeueing the thread if it is set.
MFC after: 3 days
Reported by: dwhite, peter
`kn_sdata' member of the newly registered knote. The problem is that
this member is overwritten by a call to kevent(2) with the EV_ADD flag,
targetted at the same kevent/knote. For instance, a userland application
may set the pointer to NULL, leading to a panic.
A testcase was provided by the submitter.
PR: kern/118911
Submitted by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after: 1 day
- Remove the "thread" argument from the lockmgr() function as it is
always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
present
Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h
KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.
Tested by: matteo
doesn't overflow in arc.c in this check:
if (kmem_used() > (kmem_size() * 4) / 5)
return (1);
With this bug ZFS almost doesn't cache.
Only 32bit machines are affected that have vm.kmem_size set to values >=1GB.
Reported by: David Taylor <davidt@yadt.co.uk>
Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).
Leave a few comments to be addressed later.
Reviewed by: rwatson (older version, before addressing his comments)
- Improve error handling for load operations.
- Fix a memory corruption bug when using certain linux management apps.
- Allocate all commands up front to avoid OOM deadlocks later on.
while in principle a good idea, opened us up to a race inherrent to
the syncache's direct insertion of incoming TCP connections into the
"completed connection" listen queue, as it transpires that the socket
is inserted before the inpcb is fully filled in by syncache_expand().
The bug manifested with the occasional returning of 0.0.0.0:0 in the
address returned by the accept() system call, which occurred if accept
managed to execute tcp_usr_accept() before syncache_expand() had copied
the endpoint addresses into inpcb connection state.
Re-add tcbinfo locking around the address copyout, which has the effect
of delaying the copy until syncache_expand() has finished running, as
it is run while the tcbinfo lock is held. This is undesirable in that
it increases contention on tcbinfo further, but a more significant
change will be required to how the syncache inserts new sockets in
order to fix this and keep more granular locking here. In particular,
either more state needs to be passed into sonewconn() so that
pru_attach() can fill in the fields *before* the socket is inserted, or
the socket needs to be inserted in the incomplete connection queue
until it is actually ready to be used.
Reported by: glebius (and kris)
Tested by: glebius
a run-queue. If the priority is numerically raised only change lowpri
if we're certain it will be correct. Some slop is allowed however
previously we could erroneously raise lowpri for an idle cpu that a
thread had recently run on which lead to errors in load balancing
decisions.
to files, such as ktrace output, under CODA_VERBOSE. Otherwise, each
such call to VOP_WRITE() results in a kernel printf.
MFC after: 3 days
Obtained from: NetBSD
checksum offoload by downloading AIC-6915 firmware. Changes are
o Header file cleanup.
o Simplified probe logic.
o s/u_int{8,16,32}_t/uint{8,16,32}_t/g
o K&R -> ANSI C.
o In register access function, added support both memory mapped and
IO space register acccess. The function will dynamically detect
which method would be choosed.
o sf_setperf() was modified to support strict-alignment
architectures.
o Use SF_MII_DATAPORT instead of hardcoded value 0xffff.
o Added link state/speed, duplex changes handling task q. The task q
is also responsible for flow control settings.
o Always hornor link up/down state reported by mii layers. The link
state information is used in sf_start() to determine whether we
got a valid link.
o Added experimental flow-control setup. It was commented out but
will be activated once we have flow-cotrol infrastructure in mii
layer.
o Simplify IFF_UP/IFCAP_POLLING and IFF_PROMISC handling logic. Rx
filter always honors promiscuous mode.
o Implemented suspend/resume methods.
o Reorganized Rx filter routine so promiscuous mode changes doesn't
require interface re-initialization.
o Reimplemnted driver probe routine such that it looks for matching
device from supported hardware list table. This change will help to
add newer hardware revision to the driver.
o Use ETHER_ADDR_LEN instead of hardcoded value.
o Prefer memory space register mapping over I/O space as the hardware
requires lots of register access to get various consumer/producer
index. Failing to get memory space mapping, sf(4) falls back to I/O
space mapping. Use of memory space register mapping requires
somewhat large memory space(512K), though.
o Switch to simpler bus_{read,write}_{1,2,4}.
o Use PCIR_BAR macro to get BARs.
o Program PCI cache line size if the cache line size was set to 0
and enable PCI MWI.
o Add a new sysctl node 'dev.sf.N.stats' that shows various MAC
counters for Rx/Tx statistics.
o Add a sysctl node to configure interrupt moderation timer. The
timer defers interrupts generation until time specified in timer
control register is expired. The value in the timer register is in
units of 102.4us. The allowable range for the timer is 0 - 31
(0 ~ 3.276ms).
The default value is 1(102.4us). Users can change the timer value
with dev.sf.N.int_mod sysctl(8) variable/loader(8) tunable.
o bus_dma(9) conversion
- Enable 64bit DMA addressing.
- Enable 64bit descriptor format support.
- Apply descriptor ring alignment requirements(256 bytes alignment).
- Apply Rx buffer address alignment requirements(4 bytes alignment).
- Apply 4GB boundary restrictions(Tx/Rx ring and its completion ring
should live in the same 4GB address space.)
- Set number of allowable number of DMA segments to 16. In fact,
AIC-6915 doesn't have a limit for number of DMA segments but it
would be waste of Tx descriptor resource if we allow more than 16.
- Rx/Tx side bus_dmamap_load_mbuf_sg(9) support.
- Added alignment fixup code for strict-alignment architectures.
- Added endianness support code in Tx/Rx descriptor access.
With these changes sf(4) should work on all platforms.
o Don't set if_mtu in device attach, it's handled in ether_ifattach.
o Use our own callout to drive watchdog timer.
o Enable VLAN oversized frames and announce sf(4)'s VLAN capability
to upper layer.
o In sf_detach(), remove mtx_initialized KASSERT as it's not possible
to get there without initialzing the mutex. Also mark that we're
about to detaching so active bpf listeners do not panic the system.
o To reduce PCI register access cycles, Rx completion ring is
directly scanned instead of reading consumer/producer index
registers. In theory, Tx completion ring also can be directly
scanned. However the completion ring is composed of two types
completion(1 for Tx done and 1 and DMA done). So reading producer
index via register access would be more safer way to detect the
ring wrap-around.
o In sf_rxeof(), don't use m_devget(9) to align recevied frames. The
alignment is required only for strict-alignment architectures and
now the alignment is handled by sf_fixup_rx() if required. The
removal of the copy operation in fast path should increase Rx
performance a lot on non-strict-alignemnt architectures such as
i386 and amd64.
o In sf_newbuf(), don't set descriptor valid bit as sf(4) is
programmed to run with normal mode. In normal mode, the valid bit
have no meaning. The valid bit should be used only when the
hardware uses polling(prefetch) mode. The end of descriptor queue
bit could be used if needed, but sf(4) relys on auto-wrapping of
hardware on 256 descriptor queue entries so both valid and
descriptor end bit are not used anymore.
o Don't disable generation of Tx DMA completion as said in datasheet
and use the Tx DMA completion entry instead of relying on Tx done
completion entry. Also added additional Tx completion entry type
check in Tx completion handler.
o Don't blindly reset watchdog timer in sf_txeof(). sf(4) now unarm
the the watchdog only if there are no active Tx descriptors in Tx
queue.
o Don't manually update various counters in driver, instead, use
built-in MAC statistic registers to update them. The statistic
registers are updated in every second.
o Modified Tx underrun handlers to increase the threshold value
in units of 256 bytes. Previously it used to increase 16 bytes
at a time which seems to take too long to stabalize whenever Tx
underrun occurrs.
o In interrupt handler, additional check for the interrupt is
performed such that interrupts only for this device is allowed to
process descriptor rings. Because reading SF_ISR register clears
all interrtups, nuke writing to a SF_ISR register.
o Tx underrun is abonormal condition and SF_ISR_ABNORMALINTR includes
the interrupt. So there is no need to inspect the Tx underrun again
in main interrupt loop.
o Don't blindly reinitialize hardware for abnormal interrupt
condition. sf(4) reintializes the hardware only when it encounters
DMA error which requires an explicit hardware reinitialization.
o Fix a long standing bug that incorrectly clears MAC statistic
registers in sf_init_locked.
o Added strict-alignment safe way of ethernet address reprogramming
as IF_LLADDR may return unaligned address.
o Move sf_reset() to sf_init_locked in order to always reset the
hardware to a known state prior to configuring hardware.
o Set default Rx DMA, Tx DMA paramters as shown in datasheet.
o Enable PCI busmaster logic and autopadding for VLAN frames.
o Rework sf_encap.
- Previously sf(4) used to type 0 of Tx descriptor with padding
enabled to store driver private data. Emebedding private data
structures into descriptors is bad idea as the structure size
would be different between 64bit and 32bit architectures. The
type 0 descriptor allows fixed number of DMA segments in
a descriptor format and provides relatively simple interface to
manage multi-fragmented frames.
However, it wastes lots of Tx descriptors as not all frames are
fragmented as the number of allowable segments in a descriptor.
- To overcome the limitation of type 0 descriptor, switch to type
2 descriptor which allows 64bit DMA addressing and can handle
unliumited number of fragmented DMA segments. The drawback of
type 2 descriptor is in its complexity in managing descriptors
as driver should handle the end of Tx ring manually.
- Manually set Tx desciptor queue end mark and record number of
used descriptors to reclaim used descriptors in sf_txeof().
o Rework sf_start.
- Honor link up/down state before attempting transmission.
- Because sf(4) uses only one of two Tx queues, use low priority
queue instead of high one. This will remove one shift operation
in each Tx kick command.
- Cache last produder index into softc such that subsequenet Tx
operation doesn't need to access producer index register.
o Rewrote sf_stats_update to include all available MAC statistic
counters.
o Employ AIC-6915 firmware from Adaptec and implement firmware
download routine and TCP/UDP checksum offload.
Partial checksum offload support was commented out due to the
possibility of firmware bug in RxGFP.
The firmware can strip VLAN tag in Rx path but the lack of firmware
assistance of VLAN tag insertion in transmit side made it useless
on FreeBSD. Unlike checksum offload, FreeBSD requires both Tx/Rx
hardware VLAN assistance capability. The firmware may also detect
wakeup frame and can wake system up from states other than D0.
However, the lack of wakeup support form D3cold state keep me from
adding WOL capability. Also detecting WOL frame requires firmware
support but it's not yet known to me whether the firmware can
process the WOL frame.
o Changed *_ADDR_HIADDR to *_ADDR_HI to match other definitions of
registers.
o Added definitioan to interrupt moderation related constants.
o Redefined SF_INTRS to include Tx DMA done and DMA errors. Removed
Tx done as it's not needed anymore.
o Added definition for Rx/Tx DMA high priority threshold.
o Nuked unused marco SF_IDX_LO, SF_IDX_HI.
o Added complete MAC statistic register definition.
o Modified sf_stats structure to hold all MAC statistic regiters.
o Nuke various driver private padding data in Tx/Rx descriptor
definition. sf(4) no longer requires private padding. Also remove
unused padding related definitions. This greatly simplifies
descriptor manipulation on 64bit architectures.
o Becase we no longer pad driver private data into descriptor,
remove deprecated/not-applicable comments for padding.
o Redefine Rx/Tx desciptor status. sf(4) doesn't use bit fileds
anymore to support endianness.
Tested by: bruffer (initial version)
be wrong but I couldn't find a way to make it work. In addition, the
number of TxGFP instruction does not match the firmware image size,
so I guess something was wrong when Adaptec generated the TxGFP
firmware from their DDK.
According to datasheet, normally, the first GFP instruction would be
opcode C, WaitForStartOfFrame, to synchronize checksumming with
incoming frame. But the first instruction in TxGFP firmware was
opcode 1, BrToImmIfTrue, so it could not process checksum correctly,
I guess. Checking for RxGFP firmware also indicates the first
instruction should be opcode C. Since the number of instructions in
TxGFP firmware lacks exactly one instruction, I prepended the opcode
C to TxGFP firmware image. With this change, the resulting image size
perfectly matches with the nummber of instructions and Tx checksum
offload seems to work without problems.
lockmgr lkp, when held in exclusive mode, is recursed
- Introduce the function BUF_RECURSED() which does the same for bufobj
locks based on the top of lockmgr_recursed()
- Introduce the function BUF_ISLOCKED() which works like the counterpart
VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj
BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus
BUF_REFCNT() in a more explicative and SMP-compliant way.
This allows us to axe out BUF_REFCNT() and leaving the function
lockcount() totally unused in our stock kernel. Further commits will
axe lockcount() as well as part of lockmgr() cleanup.
KPI results, obviously, broken so further commits will update manpages
and freebsd version.
Tested by: kris (on UFS and NFS)
- Don't specify vnode operations for mknod, lease, and advlock--let them
fall through to vop_default.
- Implement vop_default with &default_vnodeops, rather than with VOP_PANIC,
so that unimplemented vnode operations are handled in more sensible ways
than panicking, such as EOPNOTSUPP on ACL queries generated by bsdtar,
or mknod.
MFC after: 3 days
fill out all fields, just fill out the ones the file system knows
about. Among other things, this causes the outpuf of "mount" and
"df" to make quite a bit more sense as /dev/cfs0 is specified as the
mountfrom name.
MFC after: 3 days
vnodes during coda_unmount() in order to detect errant use of them
after the vnode references may no longer be valid.
No need to clear the VV_ROOT flag on mi_rootvp flag (especially after
the vnode reference is no longer valid) as this isn't done on other
file systems.
MFC after: 3 days
and then release it when it is closed: we rely on the caller to keep the
vnode around with a valid reference. This avoids vrele() destroying the
vnode vop_close() is being called from during a call to vop_close(), and
a crash due to lockmgr recursing the vnode lock when a Coda unmount
occurs.
MFC after: 3 days
Move all extern variable definitions to associated .h files, move some
extern variable definitions between include files to place them more
appropriately.
MFC after: 3 days
Coda vnode derived from it, in the style of nullfs. This allows files
in the Coda file system to be memory-mapped, such as with execve(2) or
mmap(2).
MFC after: 3 days
Reported by: Rune <u+openafsdev-sr55 at chalmers dot se>
"BSM conversion requested for unknown event 43140"
It should be noted that we need to audit the fd argument for this system
call.
Obtained from: TrustedBSD Project
MFC after: 1 week
unp_connect(): it is expected to return with the lock held, and two
possible error paths otherwise returned with it unlocked.
The fix committed here is slightly different from the patch in the
PR, but along an alternative line suggested in the PR.
PR: 119778
MFC after: 3 days
Submitted by: James Juran <james dot juran at baesystems dot com>
was missed. As result, pty_create_slave() may index out of the names[]
bounds, creating wrong slave tty names.
Tested by: kensmith
Reviewed by: jhb
MFC after: 3 days
since the the command and data that is being built to be sent to or read
from the HW lives in the softc. Commands are later run via an_setdef etc.
In the ioctl path various references are kept to the data stored in
the softc so it needs to be protected. Almost think of the command
in the softc a global variable since it essentially is. Since locking
wasn't done in this type of context the commands would get corrupted.
Thanks to avatar@ for catching some lock issues and dhw@ for testing.
Things are a lot more stable except for the MPI-350 cards. My an(4)
remote laptop stays on the network now.
The driver should be changed so that it uses private memory that is passed
to the functions that talk to the card. Then only those functions would
really need to grab locks.
Reviewed by: avatar@
drop the lock and then re-acquire it, revalidating TCP connection state
assumptions when we do so. This avoids a potential lock order reversal
(and potential deadlock, although none have been reported) due to the
inpcb lock being held over a page fault.
MFC after: 1 week
PR: 102752
Reviewed by: bz
Reported by: Václav Haisman <v dot haisman at sh dot cvut dot cz>
shortest possible chain of mbufs of m_defrag(9). What we want is
chains of mbufs that can be safely stored to a Tx descriptor which
can have up to STGE_MAXTXSEGS mbufs. The ethernet controller does
not need to align Tx buffers on 32bit boundary. So the use of
m_defrag(9) was waste of time.
specified in Table 7-10 in their destination address field shall not be relayed
by the Bridge. Add a check in bridge_forward() to adhere to this.
PR: kern/119744
compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel
option DIAGNOSTIC still disables inlining of certain pmap functions.)
Eliminate dead code from pmap_enter(). This code implemented an assertion.
On i386, an equivalent check is already implemented. However, on amd64,
a small change is required to implement an equivalent check.
Eliminate \n from a nearby panic string.
Use KASSERT() to reimplement pmap_copy()'s two assertions.
in the range and precision of their type(s) on amd64, but FLT_EVAL_METHOD
said that they were evalated in the "interesting" (buggy) i387 methods.
float_t was broken compatibly with FLT_EVAL_METHOD.
These definitions seem to be broken on powerpc and possibly on arm.
float_t is float on powerpc with gcc [-notraditional] according to
glibc, and FLT_EVAL_METHOD is marked with XXX on arm.
problems when the DRM driver is loaded and the AIXGL extension is loaded
, the AIXGL driver requests a drm_close and this will cause the radeon
driver to fail while starting X windows.
PR: kern/114688
Submitted by: vehemens <vehemens at verizon dot net>
Prodded by: Robert Noland
Approved by: imp (mentor, a while ago already), anholt
MFC After: 1 week
encounters a syntax error, and add a tip about adding first
the `vital' options and then experimental ones.
PR: docs/119658
Submitted by: Julian Stacey, jhs at berklix.org
other. The first one survives, the rest are removed. So far, it appears
only some acpi_perf(4) BIOS tables have these invalid states, but address
this in the core to be sure to handle other potential driver data.
PR: kern/114722
Tested by: stefan.lambrev / moneybookers.com
MFC after: 3 days
- Track packet zone mbufs separately from other mbufs
- free packet zone buffers via m_free rather than trying to manage the refcount
as with clusters - its refcount and management seems to be "special"
but reread it from the device_t every time the device list is fetched.
Previously the device name in pciconf -l would not be updated when a driver
was unloaded or if a device was detached and attached to a different
driver.
MFC after: 1 week
PR: kern/104777
Submitted by: "Iasen Kostoff" tbyte | otel net
queues (which we call slices). The NIC will steer traffic into up to
hw.mxge.max_slices different receive rings based on a configurable
hash type (hw.mxge.rss_hash_type).
Currently the driver defaults to using a single slice, so the default
behavior is unchanged. Also, transmit from non-zero slices is
disabled currently.
spec:
- Use read/modify/write cycles to enable and disable the HPET instead of
writing 0 to reserved bits.
- Shutdown the HPET during suspend as encouraged by the spec.
- Fail to attach to an HPET with a period of zero.
MFC after: 1 week
PR: kern/119675 [3]
Reported by: Leo Bicknell | bicknell ufp.org
- Fix a bug introduced in 1.4.20 where speculative read by the processor in the
write-only doorbell region would cause a target-abort (as opposed to simply
returning random data). This could manifest itself as NMI or machine freeze
depending on how the BIOS/OS/chipset configuration handles target-abort.
- Add support for new revisions of -R cards (with AEL1002/AEL1010 xaui->xfi)
- Increase an internal timing (dispatch engine): fix possible spurious reset
(seen on very few cards).
lowest priority on the queue for the current cpu vs curthread's
priority. In the case that curthread is waking up many threads of a
lower priority as would happen with a turnstile_broadcast() or wakeup()
of many threads this prevents them from all ending up on the current cpu.
- In sched_add() make the relationship between a scheduled ithread and
the current cpu advisory rather than strict. Only give the ithread
affinity for the current cpu if it's actually being scheduled from
a hardware interrupt. This prevents it from migrating when it simply
blocks on a lock.
Sponsored by: Nokia
- increase asserts for mbuf accounting
- track outstanding mbufs (maps very closely to leaked)
- actually only create one thread per port if !multiq
Oddly enough this fixes the use after free
- move txq_segs to stack in t3_encap
- add checks that pidx doesn't move pass cidx
- simplify mbuf free logic in collapse mbufs routine
- move cxgb_tx_common in to cxgb_multiq.c and rename to cxgb_tx
- move cxgb_tx_common dependencies
- further simplify cxgb_dequeue_packet for the non-multiqueue case
- only launch one service thread per port in the non-multiq case
- remove dead cleaning code from cxgb_sge.c
- simplify PIO case substantially in by returning directly from mbuf collapse
and just using m_copydata
- remove gratuitous m_gethdr in the rx path
- clarify freeing of mbufs in collapse
o Increased number of Rx/Tx descriptors to 256 for 8169 GigEs
because it's hard to push the hardware to the limit with default
64 descriptors.
TSO requires large number of Tx descriptors to pass a full sized
TCP segment(65535 bytes IP packet) to hardware. Previously it
consumed 32 Tx descriptors, assuming MCLBYTES DMA segment size,
to send the TCP segment which means re(4) couldn't queue more
than two full sized IP packets.
For 8139C+ it still uses 64 Rx/Tx descriptors due to its hardware
limitations. With this changes there are (very) small waste of
memory for 8139C+ users but I don't think it would affect 8139C+
users for most cases.
o Various bus_dma(9) fixes.
- The hardware supports DAC so allow 64bit DMA operations.
- Removed BUS_DMA_ALLOC_NOW flag.
- Increased DMA segment size to 4096 from MCLBYTES because TSO
consumes too many descriptors with MCLBYTES DMA segment size.
- Tx/Rx side bus_dmamap_load_mbuf_sg(9) support. With these
changes the code is more readable than previous one and got a
(slightly) better performance as it doesn't need to pass/
decode arguments to/from callback function.
- Removed unnecessary callback function re_dmamap_desc() and
nuked rl_dmaload_arg structure which was used in the callback.
- Additional protection for DMA map load failure. In case of
failure reuse current map instead of returning a bogus DMA
map.
- Deferred DMA map unloading/sync operation for maximum
performance until we really need to load new DMA map. If we
happen to reuse current map(e.g. input error) there is no need
to sync/unload/load again.
- The number of allowable Tx DMA segments for a mbuf chains are
now 32 instead of magic nseg value. If the number of available
Tx descriptors are short enough to send highly fragmented mbuf
chains an optimized re_defrag() is called to collapse mbuf
chains which is supposed to be much faster than m_defrag(9).
re_defrag() was borrowed from ath(4).
- Separated Rx/Tx DMA tag from a common DMA tag such that Rx DMA
tag correctly uses DMA maps that were created with DMA alignment
restriction(8bytes alignments). Tx DMA tag does not have such
alignment limitation.
- Added additional sanity checks for DMA ring map load failure.
- Added additional spare Rx DMA map for graceful handling of Rx
DMA map load failure.
- Fixed misused bus_dmamap_sync(9) and added missing
bus_dmamap_sync(9) in re_encap()/re_txeof()/re_rxeof().
o Enabled TSO again as re(4) have reasonable number of Tx
descriptors.
o Don't touch DMA address of a Tx descriptor in re_txeof(). It's
not needed.
o Fix incorrect update of if_ierrors counter. For Rx buffer
shortage it should update if_qdrops as the buffer is reused.
o Added checks for unsupported H/W revisions and return ENXIO for
these hardwares. This is required to remove resource allocation
code in re_probe as other drivers do in device probe routine.
o Modified descriptor index manipulation macros as it's now possible
to have different number of descriptors for Rx/Tx.
o In re_start, to save a lock operation, use IFQ_DRV_IS_EMPTY before
trying to invoke IFQ_DRV_DEQUEUE. Also don't blindly call re_encap
since we already know the number of available Tx descriptors in
advance.
o Removed RL_TX_DESC_THLD which was used to reserve RL_TX_DESC_THLD
descriptors in Tx path. There is no such a limitation mentioned in
8139C+/8169/8110/8168/8101/8111 datasheet and it seems to work ok
without reserving RL_TX_DESC_THLD descriptors.
o Fix a comment for RL_GTXSTART. The register is 8bits register.
o Added comments for 8169/8139C+ hardware restrictions on descriptors.
o Removed forward declaration for "struct rl_softc", it's not needed.
o Added a new structure rl_txdesc for Tx descriptor managements and
a structure rl_rxdesc for Rx descriptor managements.
o Removed unused member variable rl_intlock in driver softc. There are
still several unused member variables which are supposed to be used
to access hardware statistics counters. But it seems that accessing
hardware counters were not implemented yet.
the kernel's direct map instead of the pmap's recursive mapping to access
the lowest level in the page table. The direct map is preferable for two
reasons: (1) The TLB is more likely to hold the required direct mapping
because pmap_enter() has already used the direct map to access a nearby
PTE and (2) loading a direct mapping into the TLB involves walking only 2
or 3 levels of the page table instead of 4.
- Turn on WOL bits in suspend/shutdown method.
- WOL is disabled in resume routine as WOL can interfere normal
Rx operation.
- Move stge_reset() to stge_init_locked() as resetting hardware
clears configured Rx information which in turn results in
non-working Rx module after suspend/shutdown operation.
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
zone code. The GPE handler method (i.e. _L00) generates various Notify
events that need to be run to completion before the GPE is re-enabled.
In ACPI-CA, we queue an asynch callback at the same priority as a Notify
so that it will only run after all Notify handlers have completed. The
callback re-enables the GPE afterwards. We also changed the priority of
Notifies to be the same as GPEs, given the possibility that another GPE
could arrive before the Notifies have completed and we don't want it to
get queued ahead of the rest.
The ACPI-CA change was submitted by Alexey Starikovskiy (SUSE) and will
appear in a later release. Special thanks to him for helping track this
bug down.
MFC after: 1 week
Tested by: jhb, Yousif Hassan <yousif / alumni.jmu.edu>
zone code. The GPE handler method (i.e. _L00) generates various Notify
events that need to be run to completion before the GPE is re-enabled.
In ACPI-CA, we queue an asynch callback at the same priority as a Notify
so that it will only run after all Notify handlers have completed. The
callback re-enables the GPE afterwards. We also changed the priority of
Notifies to be the same as GPEs, given the possibility that another GPE
could arrive before the Notifies have completed and we don't want it to
get queued ahead of the rest.
The ACPI-CA change was submitted by Alexey Starikovskiy (SUSE) and will
appear in a later release. Special thanks to him for helping track this
bug down.
MFC after: 1 week
Tested by: jhb, Yousif Hassan <yousif / alumni.jmu.edu>
memcpy/memset/memcmp and friends from libkern/arm to arm/arm/support.S, and so
I did, but in the process, I didn't add the appropriate copyrights.
This is a major oversight from me, and I apology to the NetBSD people for it.
MFC After: 1 day
via a new socket during an NFS operation as that reconnect takes place in
the context of an arbitrary thread with an arbitrary credential. Ideally
we would like to use the mount point's credential for the entire process
of setting up the socket to connect to the NFS server. Since some of the
APIs (sobind(), etc.) only take a thread pointer and infer the credential
from that instead of a direct credential, work around the problem by
temporarily changing the current thread's credential to that of the mount
point while connecting the socket and then reverting back to the original
credential when we are done.
Reviewed by: rwatson
Tested on: UDP, TCP, TCP with forced reconnect
of fpget*() and fpset*()).
The i386 fpget*() were efficient but a bit obfuscated (using macros
and a case statement to demultiplex them through a single inline).
The demultiplexing mainly gave smaller source code.
The i386 fpset*() were obfuscated in the same way and were very
inefficient due to the case statement not having enough cases or
complexity so all cases used the FP environment.
This also fixes a harmless bug in rev.1.12. fpsetmask() extracted the
old value from the bit-field twice, but the doubled shift was harmless
since the shift count is 0.
All fp*() interfaces are now inline functions on i386. They used to
be macros that call (a different set of) inline functions. This is a
small ABI change which shouldn't cause problems since cases where
inlining fails (mainly -O0) only give (working) static functions.
others can be replaced cleanly by the amd64 versions. There is no
current amd64 version to merge, but there is an old one which is
similar.
Fix the following bugs in fpresetsticky():
- garbage args clobbered non-sticky bits in the status register
- the return value was usually garbage since it was masked with the
arg instead of with the field selector.
Optimize fpresetsticky() to avoid using the environment as in
feclearexcept() (use only fnclex() if possible) and also to avoid
using fnclex() for null changes. The second of these optimizations
might not be so good since its branch might cost more than it saves.
Unmasked exceptions (which can be fixed up using fpset*() before they
trap) are very rare, especially on amd64 since SSE exceptions trap
synchronously, but I want to merge the faster amd64 implementations of
fpset*() back to i386 without introducing the bug on i386.
The i386 implementation has always avoided the trap automatically by
changing things using load/store of the FP environment, but this is
very slow. Most changes only affect the control word, so they can
usually be done much more efficiently, and amd64 has always done this,
but loading the control word can trap.
This version use the fast method only in the usual case where it will
not trap. This only costs a couple of integer instructions (including
one branch which I haven't optimized carefully yet) in the usual case,
but bloats the inlines a lot. The inlines were already a bit too large
to handle both the FPU and SSE.
a panic race on module unload. The wakeup() is internal to
kproc_exit/kthread_exit. The correct fix is to fix the msleep() in
detach to sleep on fdc->fdc_thread instead of &fdc->fdc_thread.
Noted and reviewed by: jhb
Pointy hat to: kib
MFC after: 1 week
panic but it won't actually lock anything.
This can lead some paths to reach lockmgr_disown() with inconsistent
lock which will let trigger the relative assertions.
Fix those in order to recognize panic situation and to not trigger.
Reported by: pho
Submitted by: kib
- fix a previous style fix: shifts should be in the correct direction even
if they are null.
- restore a comment about namespace pollution from floatingpoint.h 1.12 and
update it.
- remove unused namespace pollution FP_*REG.
- improve some comments.
- sort macro definitions for entry points.
- don't use underscores for macro args.
Wakeup the thread doing the fdc_detach() when the fdc worker thread exits [1].
Write access to the write-protected floppy shall call device_unbusy() to
pair the device_busy() in the fd_access() [2].
PR: 116537 [1], 116539 [2]
MFC after: 1 week
rspq lock. Not doing so was causing us to skip re-enabling the interrupt.
- remove duplicate credits sysctl
- add support for dumping hardware context of the txq
- decrement budget_left when we break out of the process_responses loop
interrupt handlers for child devices by adding a dummy handler that is
always present so that the underlying interrupt thread is always around
avoiding panics from stray interrupts.
MFC after: 3 days
soconnect()) instead of &thread0 when establishing a connection to the NFS
server. Otherwise inconsistent credentials may be used when setting up
the NFS socket.
MFC after: 1 week
Reviewed by: rwatson
maintain a separate td_incruntime to hold unbilled CPU usage for
the thread that has the previous properties of td_runtime.
When thread information is requested using the thread monitoring
sysctls, export thread td_runtime instead of process rusage runtime
in kinfo_proc.
This restores the display of individual ithread and other kernel
thread CPU usage since inception in ps -H and top -SH, as well for
libthr user threads, valuable debugging information lost with the
move to try kthreads since they are no longer independent processes.
There is universal agreement that we should rewrite the process and
thread export sysctls, but this commit gets things going a bit
better in the mean time. Likewise, there are resevations about the
continued validity of statclock given the speed of modern processors.
Reviewed by: attilio, emaste, jhb, julian
address space in kmem map call vm_lowmem event in a loop and wait a bit for
subsystems to reclaim some memory which in turn will reclaim address space as
well.
Note, this is a work-around.
Reviewed by: alc
Approved by: alc
MFC after: 3 days
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>
- return the error from cxgb_tx_common so that when an error is hit we dont
spin forever in the taskq thread
- remove unused rxsd_ref
- simplify header_offset calculation for embedded mbuf headers
- fix memory leak by making sure that mbuf header initialization took place
- disable printf's for stalled queue, don't do offload/ctrl queue restart
when tunnel queue is restarted
- add more diagnostic information about the txq state
- add facility to dump the actual contents of the hardware queue using sysctl
- fix this to compile with C++ by casting ints to enums in a few places
and by using the correct parameter type for _fpsetprec(). Remove
__cplusplus ifdefs which disabled the buggy code.
- remove __CC_SUPPORTS___INLINE ifdefs. `__inline' vs `inline', and either
of these #defined away, are supposed to be handled by very old ifdefs
in <sys/cdefs.h>. Thus the __CC_SUPPORTS___INLINE macro is not needed
here (or anywhere else that it used). It is less needed here than in
most places, since this file is userland-only and userland is far from
supporting INTEL_COMPILER. The __CC_SUPPORTS___INLINE__ macro which
was used here is even less needed. It is to support spelling `inline'
as `__inline__' instead of the usual spelling `__inline'.
Fix some style bugs that I missed in the previous commit (remove unused
asms and sort more variables).
Now, lockmgr() function can only be called passing curthread and the
KASSERT() is upgraded according with this.
In order to support on-the-fly owner switching, the new function
lockmgr_disown() has been introduced and gets used in BUF_KERNPROC().
KPI, so, results changed and FreeBSD version will be bumped soon.
Differently from previous code, we assume idle thread cannot try to
acquire the lockmgr as it cannot sleep, so loose the relative check[1]
in BUF_KERNPROC().
Tested by: kris
[1] kib asked for a KASSERT in the lockmgr_disown() about this
condition, but after thinking at it, as this is a well known general
rule, I found it not really necessary.
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
object which provides the backing store. Each descriptor starts off with
a size of zero, but the size can be altered via ftruncate(2). The shared
memory file descriptors also support fstat(2). read(2), write(2),
ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
manage shared memory file descriptors. The virtual namespace that maps
pathnames to shared memory file descriptors is implemented as a hash
table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
path argument to shm_open(2). In this case, an unnamed shared memory
file descriptor will be created similar to the IPC_PRIVATE key for
shmget(2). Note that the shared memory object can still be shared among
processes by sharing the file descriptor via fork(2) or sendmsg(2), but
it is unnamed. This effectively serves to implement the getmemfd() idea
bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
collected when they are not referenced by any open file descriptors or
the shm_open(2) virtual namespace.
Submitted by: dillon, peter (previous versions)
Submitted by: rwatson (I based this on his version)
Reviewed by: alc (suggested converting getmemfd() to shm_open())
assertion hit in swapoff_one() when we un-mount a swap partition. We
should be using curthread where we used thread0 before. This change
also replaces the thread argument with a credential argument, as the
MAC framework only requires the cred.
It should be noted that this allows the machine to be rebooted without
panicing with "cannot differ from curthread or NULL" when MAC is enabled.
Submitted by: rwatson
Reviewed by: attilio
MFC after: 2 weeks
dev2udev() when a tty was being detached concurrently with the sysctl
handler:
- Hold the 'tty_list_mutex' lock while we read all the fields out of the
struct tty for copying out later. Previously the pty(4) and pts(4)
destroy routines could set t_dev to NULL, drop their reference on the
tty and destroy the cdev while the sysctl handler was attempting to
invoke dev2udev() on the cdev being destroyed. This happened when the
sysctl handler read the value of t_dev prior to it being set to NULL
either due to it being stale or due to timing races. By holding the
list lock we guarantee that the destroy routines will block in ttyrel()
in that case and not destroy the cdev until after we've copied all of our
data. We may see a NULL cdev pointer or we may see the previous value,
but the previous value will no longer point to a destroyed cdev if we
see it.
- Fix the ttyfree() routine used by tty device drivers in their detach
methods to use ttyrel() on the tty so we don't leak them. Also, fix it
to use the same order of operations as pty/pts destruction (set t_dev
NULL, ttyrel(), destroy_dev()) so it cooperates with the sysctl handler.
MFC after: 3 days
Tested by: avatar
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.
Submitted by: rwatson
add correct locking to the operation of unmounting.
This will prevent debugging kernels from panicking if mounting a
non-hpfs partition (I'm not sure if this can be a problem with a
successful mounting operation though).
MFC: 3 days
pv_list_count from struct md_page. Ever since Peter rewrote the pv
entry allocator for amd64 and i386 pv_list_count has been correctly
maintained but otherwise unused.
- spell 16384 as 16384 and not as BKVASIZE. 16384 is (not quite) just a
magic size that works well in practice. BKVASIZE should be MAXBSIZE
(65536), but is 16384 because i386's don't have enough kva for it to
be MAXBSIZE; 16384 works (not so well) for it for much the same reasons
that it works well in the heuristic.
- expand and/or add comments about this and other details.
- don't explicitly inline this function.
- fix some other style bugs.
ABI override binary isn't found. This could probably be smoother, but
it is what I did in p4 change #126891 on 2007/09/27. It should solve
the "ld-elf32.so.1"-in-chroot problem.
allocation, free the indirect blocks before clearing the disk pointers,
that could lead to the softupdate inconsistencies in the case of the
machine or disk crash at the wrong time.
Rearrange the recover code to do the ffs_blkfree() after the second
ffs_syncvnode(), that clears the pointers chain.
Proposed and reviewed by: tegge
Tested by: Peter Holm
MFC after: 3 weeks
happen if there are no files open. Accounting for these can
eventually return a negative value for olenp causing sysctl to
crash with a bad malloc.
Reported by: Pawel Worach <pawel.worach@gmail.com>
set, announce BIO_DELETE capability and issue ATA_CFA_ERASE when we get one.
Once we issue more BIO_DELETE, this will improve lifetime, and
possibly write speed of Flash based devices which have usable flash
adaptation layers.
For now, about the only usage is the newfs(1) -E flag.
Approved by: sos
peoples code with irrelevant changes[1]:
Use bus_{read|write_*() instead of bus_space_{read|write}_*() for
purely stylistic reasons.
Due to compiler optimizations and inlining, this is for all practical
purposes without effect in the compiled code.
[1] NB: Approved by: sos
instead of writing apologetic comments. As it turns out, I need every
kernel page table page to have a legitimate pindex to support superpage
promotion on kernel memory.
Correct a nearby style error: Pointers should be compared to NULL.
queues lock is acquired. Otherwise, the state of a reservation's
pages' flags and its population count can be inconsistent. That could
result in a page being freed twice.
Reported by: kris
- Clear all of the gc flags before doing a run. Stale flags were causing
us to skip some descriptors.
- If a unp socket has been marked REF in a gc pass it can't be dead.
Found by: rwatson's test tool.
of two compares against 0. The negative effect of cache flushing
is probably more than the gain by not doing the two compares (the
value is almost certainly in register or at worst, cache).
Note that the uses of m_freem() are in error cases and m_freem()
handles NULL anyhow. So fast-path really isn't changed much at all.
feature is represented by a node in the new 'kern.features' sysctl node.
A feature is present if the corresponding node is present and evaluates to
true.
A FEATURE() wrapper macro is added which takes the sysctl node name and
a description of the feature as the sole arguments and creates a read-only
sysctl node with a value of 1.
Discussed on: arch
correct number of acpi_thermalX devices. Having this wrong caused the
acpi_thermal thread to realloc the array of devices on each loop iteration.
MFC after: 1 week
PR: kern/118497
Submitted by: Pasi Parviainen
- Introduce a finit() which is used to initailize the fields of struct file
in such a way that the ops vector is only valid after the data, type,
and flags are valid.
- Protect f_flag and f_count with atomic operations.
- Remove the global list of all files and associated accounting.
- Rewrite the unp garbage collection such that it no longer requires
the global list of all files and instead uses a list of all unp sockets.
- Mark sockets in the accept queue so we don't incorrectly gc them.
Tested by: kris, pho
possible to end up in the interrupt handler again while processing the
previous RX interrupt in ifp->if_input() because the MD interrupt code
disables the delivery of the respective interrupt until all associated
handlers were called (in the INTR_FILTER case the MI code supposedly
does the same). Toggling the NIC interrupt enable bit in these handlers
still is necessary though as some chips (f.e. the VMware emulated one)
require this to be done in order to keep issuing interrupts.
MFC after: 1 month
implemented with macros. This patch improves code readability. Reasoning
behind vidd_* is a sort of "video discipline".
List of macros is supposed to be complete--all methods of video_switch
ought to have their respective macros from now on.
Functionally, this code should be no-op. My intention is to leave current
behaviour of touched code as is.
No objections: rwatson
Silence on: freebsd-current@
Approved by: cognet
implemented with macros. This patch improves code readability. Reasoning
behind kbdd_* is a "keyboard discipline".
List of macros is supposed to be complete--all methods of keyboard_switch
should have their respective macros from now on.
Functionally, this code should be no-op. My intention is to leave current
behaviour of code as is.
Glanced at by: rwatson
Reviewed by: emax, marcel
Approved by: cognet
machine-independent support for superpages. (The earlier part was
the rewrite of the physical memory allocator.) The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.
Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64. (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)
argument. It allows ppp, mpd or any other node consumer to request
connection to specified access concentrator.
Proposed by: Alexander A. Burylov <burylov@mail.ru>
Without it, code has two problems:
- behaviour of the old and new [l]stat are different with regard of
the /compat/linux
- directly accessing the userspace data from the kernel asks for
the panics.
Reported and tested by: Peter Holm
Reviewed by: rdivacky
MFC after: 3 days
the inode, do the rollback in case the allocation failed (due to
insufficient free space or quota limits). But, the code does leaves the
buffers corresponding to the inoirect blocks on the vnode bufobj list.
This causes several assertion failures (for instance, "ffs_truncate3"
in ffs_truncate()) to fail, and could result in the indirect block
aliasing problem, like writing the context of such blocks to random
disk location.
Remove the buffers from the bufobj properly.
Reported and tested by: Peter Holm
Reviewed by: tegge
MFC after: 3 weeks
so that the results end up in the DDB output stream rather than the
console output stream.
This should likely also be done for the vprint() function it calls.
MFC after: 3 months
This option just adds complexity and the new implementation no longer
will support it, so axing it now that it is unused is probabilly the
better idea.
FreeBSD version is bumped in order to reflect the KPI breakage introduced
by this patch.
In the ports tree, kris found that only old OSKit code uses it, but as
it is thought to work only on 2.x kernels serie, version bumping will
solve any problem.
with the interlock), owner of the lock should be only curthread or at
least, for its limited usage, NULL which identifies LK_KERNPROC.
The thread "extra argument" for the lockmgr interface is going to be
removed in the near future, but for the moment, just let kernel run for
some days with this check on in order to find potential deadlocking
places around the kernel and fix them.
p_candebug() will return EAGAIN which, if the other process never
leaves execve(), will result in the sysctl spinning and never returning
to userspace. Processes should always eventually leave execve(), but
spinning in kernel while we wait is bad for countless reasons, and
particularly harmful if execve() itself is deadlocked.
Possibly we should return another error, or return a marker indicating
the thread is in execve() so it can be reported that way in userspace.
Reported by: kris
equivalent with this and so operate the switch.
That call is the only one remaining LK_EXCLUPGRADE consumer and removing
it will prepare the ground for LK_EXCLUPGRADE axing and further
lockmgr improvements.
Discussed with: jeff, ups
Recycle the vm object's "pg_color" field to represent the color of the
first virtual page address at which the object is mapped instead of the
color of the object's first physical page. Since an object may not be
mapped, introduce a flag "OBJ_COLORED" that indicates whether "pg_color"
is valid.
mounted FS' problems. These are more along the lines of 'avoiding an
avoidable panic' than a complete solution to removable devices. We
now close the barn door after the horse has gotten lose and has been
hit by a truck, as it were. The barn no longer catches fire in this
case, but the horse is still dead :-).
The vfs_bio.c fix causes us not to put a failed write back into the
dirty pool if the error returned was ENXIO. In that case, the buffer
is treated like any other clean buffer that's being retured. ENXIO
means the device isn't there anymore and will never be there again in
the future, so retrying is futile.
The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file
system. If the device is gone, retrying later won't help and we'll
never be able to unmount the device.
These two are part of a larger patch set submitted by the author. The
other patches will be forth coming. I added comments to these two
patches.
Submitted by: Henrik Gulbrandsen
Reviewed by: phk@
PR: usb/46176 (partial)
functions. It is easily triggered by running routed, and, I expect, by
running any other daemon that uses routing sockets.
Reviewed by: net@
MFC after: 1 week
- Use the correct offsets when copying out the results of PCIOCGETCONF_OLD.
This happened to not affect the 64-bit architectures because there the
addition of pc_domain to struct pcisel didn't change the overall size of
struct pci_conf. [1]
- Always copy the name and unit information to conf_old so it's also part
of the output once this information is cached in dinfo.
- Use the correct type for flags in struct pci_match_conf_old. This
change is more or less cosmetic though.
Reported and tested by: bde [1]
Reviewed by: imp
MFC after: 3 days
Committed from: 24C3
If a mouse has both a wheel and a Z direction we report both.
XXX Due to tradition the wheel is reported as the Z direction (and the Z
direction as W).
Now Apple's Mighty Mouse is fully supported, except the X11 mouse driver
doesn't know what to do with the new coordinate.
MFC after: 3 months
Approved by: njl (mentor), imp
dump using mechanically generated/extracted debugging output rather than
a simple memory dump. Current sources of debugging output are:
- DDB output capture buffer, if there is captured output to save
- Kernel message buffer
- Kernel configuration, if included in kernel
- Kernel version string
- Panic message
Textdumps are stored in swap/dump partitions as with regular dumps, but
are laid out as ustar files in order to allow multiple parts to be stored
as a stream of sequentially written blocks. Blocks are written out in
reverse order, as the size of a textdump isn't known a priori. As with
regular dumps, they will be extracted using savecore(8).
One new DDB(4) command is added, "textdump", which accepts "set",
"unset", and "status" arguments. By default, normal kernel dumps are
generated unless "textdump set" is run in order to schedule a textdump.
It can be canceled using "textdump unset" to restore generation of a
normal kernel dump.
Several sysctls exist to configure aspects of textdumps;
debug.ddb.textdump.pending can be set to check whether a textdump is
pending, or set/unset in order to control whether the next kernel dump
will be a textdump from userspace.
While textdumps don't have to be generated as a result of a DDB script
run automatically as part of a kernel panic, this is a particular useful
way to use them, as instead of generating a complete memory dump, a
simple transcript of an automated DDB session can be captured using the
DDB output capture and textdump facilities. This can be used to
generate quite brief kernel bug reports rich in debugging information
but not dependent on kernel symbol tables or precisely synchronized
source code. Most textdumps I generate are less than 100k including
the full message buffer. Using textdumps with an interactive debugging
session is also useful, with capture being enabled/disabled in order to
record some but not all of the DDB session.
MFC after: 3 months
to identify textdumps in the swap/dump partition. While textdumps
aren't really an architecture, they are architecture-neutral and so
don't really correspond to any existing architecture.
Define a version number for textdumps, KERNELDUMP_TEXT_VERSION, of 1.
MFC after: 3 months
define a set of named scripts. Each script consists of a list of DDB
commands separated by ";"s that will be executed verbatim. No higher
level language constructs, such as branching, are provided for:
scripts are executed by sequentially injecting commands into the DDB
input buffer.
Four new commands are present in DDB: "run" to run a specific script,
"script" to define or print a script, "scripts" to list currently
defined scripts, and "unscript" to delete a script, modeled on shell
alias commands. Scripts may also be manipulated using sysctls in the
debug.ddb.scripting MIB space, although users will prefer to use the
soon-to-be-added ddb(8) tool for usability reasons.
Scripts with certain names are automatically executed on various DDB
events, such as entering the debugger via a panic, a witness error,
watchdog, breakpoint, sysctl, serial break, etc, allowing customized
handling.
MFC after: 3 months
(dummynet), ipsec_filter() would return the empty error code and the ipsec code
would continue to forward/deference the null mbuf.
Found by: m0n0wall
Reviewed by: bz
MFC after: 3 days
captured to a memory buffer for later inspection using sysctl(8), or in the
future, to a textdump.
A new DDB command, "capture", is added, which accepts arguments "on", "off",
"reset", and "status".
A new DDB sysctl tree, debug.ddb.capture, is added, which can be used to
resize the capture buffer and extract buffer contents.
MFC after: 3 months
kern.console format as is. Thus, no difference in output format should
appear after this commit.
Reviewed by: cognet@ (mentor)
Approved by: cognet@ (mentor)
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.
Assign approximate why values to all current consumers of the
kdb_enter() interface.
fget() call, that is sleeping point, and possibly dropping Giant.
The snp_target == NULL implies the snp_tty == NULL. Remove the code
that is put under snp_target == NULL and snp_tty != NULL clause.
In snpclose(), do the snp_detach() before scheduling the snp device
destruction. Otherwise, after the return from snpclose(), the snp
device is already removed from the snp_list, but tty is still in
snooped state. Any attempt to do i/o on such tty cause panic because
ttytosnp() returns NULL.
Tested by: Peter Holm
MFC after: 1 week
o BSD disklabels have relative offsets. Even for the BSD in MBR slice
setup, except when the mbroffset ioctl is supported. Since we don't
support that ioctl, bsdlabel(8) expects relative offsets. So, when
reading an existing disklabel, correct for disklabels that mistakenly
have the mbroffset offsets.
o Don't take the geometry seriously, because it's untrustworthy. We do
expect the numbers to be within range. This means that the secperunit
field will not be computed from secpercyl and ncyls, but simply is
the mediasize in sectors.
o Don't enforce partitions to be aligned to track boundaries. The
default label, constructed by bsdlabel(8), puts partition a at offset
BBSIZE bytes, which commonly means sector 16.
free the MAC label on the inpcb before freeing the inpcb.
MFC after: 3 days
Submitted by: tanyong <tanyong at ercist dot iscas dot ac dot cn>,
zhouzhouyi
old code special cased them too early which caused a few differences for
these sort of links relative to other PCI links:
- They were always re-routed via the BIOS call instead of assuming that
they were already routed if the BIOS had programmed the IRQ into a
matching device during POST.
- If the BIOS did route that link to a different IRQ that was marked as
invalid, we trusted the $PIR table rather than the BIOS IRQ.
This change moves the special casing for "unique IRQ" links to only take
that into account when picking an IRQ for an unrouted link so that these
links will now not be routed if the BIOS appears to have routed it already
(some BIOSen have problems with that) and so that if the BIOS uses a
different IRQ than the $PIR, we trust the BIOS routing instead (this is
what we do for all other links as well).
Reported by: Bruce Walter walter of fortean com
MFC after: 1 week
page to be in the free lists. Instead, it now returns TRUE if it
removed the page from the free lists and FALSE if the page was not
in the free lists.
This change is required to support superpage reservations. Specifically,
once reservations are introduced, a cached page can either be in the
free lists or a reservation.
as multicast/broadcast frames. Previously re(4) ignored multicast
frames in promiscuous mode. The RTL8169 datasheet was not clear
how it handles multicast frames in promiscuous mode.
PR: kern/118572
MFC after: 3 days
NULL and doesn't point to a NULL pointer before dereferencing it. This
fixes a panic triggered by Xorg 7.3.
Reported and tested by: Bill Green
MFC after: 3 days
a pointer to struct bus_space. The structure contains function
pointers that do the actual bus space access.
The reason for this change is that previously all bus space
accesses were little endian (i.e. had an explicit byte-swap
for multi-byte accesses), because all busses on Macs are little
endian.
The upcoming support for Book E, and in particular the E500
core, requires support for big-endian busses because all
embedded peripherals are in the native byte-order.
With this change, there's no distinction between I/O port
space and memory mapped I/O. PowerPC doesn't have I/O port
space. Busses assign tags based on the byte-order only.
For that purpose, two global structures exist (bs_be_tag and
bs_le_tag), of which the address can be taken to get a valid
tag.
Obtained from: Juniper, Semihalf
is actually a circular log. Deal with it rolling around. Fortunately,
the log area is big and I haven't seen any roll over yet. Update and
get rid of the obsolete comment.
When system ticks are positive, for entries in the cache
bucket, syncache_timer() ran on every tick (doing nothing
useful) instead of the supposed 3, 6, 12, and 24 seconds
later (when it's time to retransmit SYN,ACK).
When ticks are negative, syncache_timer() was scheduled
for the too far future (up to ~25 days on systems with
HZ=1000), no SYN,ACK retransmits were attempted at all,
and syncache entries added in that period that correspond
to non-established connections stay there forever.
Only HEAD and RELENG_7 are affected.
Reviewed by: silby, kmacy (earlier version)
Submitted by: Maxim Dounin, ru
- Rename output routines tcp_gen_* -> tcp_output_*.
- Rename notification routines that turn in to no-ops in the absence of TOE
from tcp_gen_* -> tcp_offload_*.
- Fix some minor comment nits.
- Add a /* FALLTHROUGH */
Reviewed by: Sam Leffler, Robert Watson, and Mike Silbersack
link has been marked discarding by Spanning Tree. This would cause the bridge
to see duplicate packets to itself even if STP has correctly calculated the
topology and blocked redundant links.
Reported by: trasz
Tested by: trasz
MFC after: 3 days
administratively down (!IFF_UP)
- Use the same parameters to lagg_link_active() to get the backup port as in
the output path, this didnt actually matter in practice as sc_primary is
always the first on the port list.
MFC after: 3 days
would be properly disposed of, but the global label structure for the
semaphore wouldn't be freed.
MFC after: 3 days
Reported by: tanyong <tanyong at ercist dot iscas dot ac dot cn>,
zhouzhouyi
destroy call; this transpired because the inpcb alloc path for IPv4/IPv6
is the same code, but IPv6 has a separate free path. The results was
that as new IPv6 TCP connections were created, kernel memory would
gradually leak.
MFC after: 3 days
Reported by: tanyong <tanyong at ercist dot iscas dot ac dot cn>,
zhouzhouyi
and t3_push_frames).
- Import latest changes to cxgb_main.c and cxgb_sge.c from toestack p4 branch
- make driver local copy of tcp_subr.c and tcp_usrreq.c and override tcp_usrreqs so
TOE can also functions on versions with unmodified TCP
- add cxgb back to the build
- rename tcp_ofld.[ch] to tcp_offload.[ch]
- document usage and locking conventions of the functions in the
toe_usrreqs function vector
- document tcpcb, inpcb, and socket fields used by toe
- widen the listen interface into 2 functions
- rename DISABLE_TCP_OFFLOAD to TCP_OFFLOAD_DISABLE
- shrink conditional compilation to reduce the likelihood of bitrot
- replace sc->sc_toepcb checks in tcp_syncache.c with TOEPCB_ISSET
or any other bio chopping geom a reasonable size of work.
Check for delivered signals between chunks, because the request size
and service time is unbounded.
details from consumers.
- Track individual selecters on a per-descriptor basis such that there
are no longer collisions and after sleeping for events only those
descriptors which triggered events must be rescaned.
- Protect the selinfo (per descriptor) structure with a mtx pool mutex.
mtx pool mutexes were chosen to preserve api compatibility with
existing code which does nothing but bzero() to setup selinfo
structures.
- Use a per-thread wait channel rather than a global wait channel.
- Hide select implementation details in a seltd structure which is
opaque to the rest of the kernel.
- Provide a 'selsocket' interface for those kernel consumers who wish to
select on a socket when they have no fd so they no longer have to
be aware of select implementation details.
Tested by: kris
Reviewed on: arch
processors (it's the PowerPC Operating Environment Architecture).
AIM designates the processors made by the Apple-IBM-Motorola
alliance and those we typically support.
While here, remove the NetBSD option IPKDB. It's not an option
used by us. Also, PPC_HAVE_FPU is not used by us either. Remove
that too.
Obtained from: Juniper, Semihalf
the ABI when enabled. There is no longer an embedded lock_profile_object
in each lock. Instead a list of lock_profile_objects is kept per-thread
for each lock it may own. The cnt_hold statistic is now always 0 to
facilitate this.
- Support shared locking by tracking individual lock instances and
statistics in the per-thread per-instance lock_profile_object.
- Make the lock profiling hash table a per-cpu singly linked list with a
per-cpu static lock_prof allocator. This removes the need for an array
of spinlocks and reduces cache contention between cores.
- Use a seperate hash for spinlocks and other locks so that only a
critical_enter() is required and not a spinlock_enter() to modify the
per-cpu tables.
- Count time spent spinning in the lock statistics.
- Remove the LOCK_PROFILE_SHARED option as it is always supported now.
- Specifically drop and release the scheduler locks in both schedulers
since we track owners now.
In collaboration with: Kip Macy
Sponsored by: Nokia
cards:
o RocketRAID 172x series
o RocketRAID 174x series
o RocketRAID 2210
o RocketRAID 222x series
o RocketRAID 2240
o RocketRAID 230x series
o RocketRAID 231x series
o RocketRAID 232x series
o RocketRAID 2340
o RocketRAID 2522
Many thanks to Highpoint for their continued support of FreeBSD.
Submitted by: Highpoint
it's multi DAC / playback channels is not that good. Enabling vchans
make the bug more visible since playback allocation will look for
possible free hardware channels first (i.e: the next DAC, the very first
has been consumed by vchan mixer) which in this case has been proven faulty.
Tested by: Dominic Fandrey <LoN_Kamikaze at gmx dot de>
URL: http://lists.freebsd.org/pipermail/freebsd-stable/2007-December/039022.html
The HT1000 DMA engine seems to not always like 64K transfers and sometimes barfs data all over memory leading to instant chrash and burn.
Also fix 48bit adressing issues, apparently newer chips needs 16bit writes and not the usual fifo thing.
HW donated by: Travis Mikalson at TerraNovaNet
- make neccessary changes to release offload resources when a syncache
entry is removed before connection establishment
- disable checks for offloaded connection where insufficient information
is available
Reviewed by: silby
register (MacBooks only).
This allows MacBooks to boot in SMP mode without any trick and solves
the timer problems with HZ=1000.
MFC after: 1 week
Reviewed by: njl (mentor), jhb
Approved by: njl (mentor), jhb
Previous value 16 was too small for real LAC as temporal activity
spike cound easily overflow queue demanding tunnel disconnection due
to possible state inconsistency.
that favours true hardware channel, the first instance of recording
request will grab this channel (the first channel is being used as
vchan master). In many cases, it is not really work as intended and give
false impression of broken recording.
PR: kern/118546
MFC after: 3 days
7.2.3, bytes 0-3 and 5-15 are used to calculate the checksum of a descriptor
tag.
PR: kern/90521
Submitted by: Björn König <bkoenig@cs.tu-berlin.de>
Reviewed by: scottl
Approved by: emax (mentor)
header, then don't try to pullup anything, because there is no next
header if we hit IPPROTO_NONE. Set ulp to a non-NULL value so the
search for an upper layer header terinates.
This is based on Pekka's diagnosis, but I chose a simpler fix.
PR: 115261
Submitted by: Pekka Savola <pekkas@netcore.fi>
Reviewed by: mlaier
MFC after: 2 weeks
Ethernet Controller. Multicast filtering wasn't tested and needs more
expore. While I'm here change complex if statements with switch
statement which would improve readability.
Reported by: Abdullah Ibn Hamad Al-Marri < wearabnet AT yahoo DOT ca >
Tested by: Abdullah Ibn Hamad Al-Marri < wearabnet AT yahoo DOT ca >
pass back the desired buffer length. This fixes scanning with the Marvell
88W8335 and BCM4328 wireless cards.
PR: kern/118370
Submitted by: Weongyo Jeong
Tested by: Ed Schouten
the sent_queue. Sometimes I wonder why any code
ever works :-)
- Fix the pad of the last mbuf routine, It was working improperly
on non-4 byte aligned chunks which could cause memory overruns.
MFC after: 1 week
by Daniel Kamm.
Adaptec RAID 51245
Adaptec RAID 51645
Adaptec RAID 52445
Adaptec RAID 5405
Sun STK RAID REM
Sun STK RAID EM
SG-XPCIESAS-R-IN
SG-XPCIESAS-R-EX
XXX: This only works currently with GEOM_GPT which only exists in 6.x.
XXX: I didn't add 'mbroffset' support for a GPT partition holding a BSD
label as I'm not sure if they use relative or absolute offsets.
MFC after: 3 days
o Disklabels can have between 8 and 20 partitions (inclusive).
o No device special file is created for the raw partition.
o Switch ia64 to use this backend.
o No support for boot code yet.
- Missing lock when sending data and moving it to the
outqueue.
- If a mbuf alloc fails during moving to outqueue the
reassembly of the old mbuf chain was incorrect.
- some_taken becomes a counter in sctputil.c instead of a set to 1.
- Fix a panic to be only under invarients and have a proper recovery.
- msg_flags needed to be set.to the value collected not or'd.
MFC after: 1 week
initialized before use and returned integrally instead of up to size.
Submitted by: Ilja van Sprundel <ilja -at- netric.org>
Reviewed by: secteam
MFC after: 1 day
on 1/2 of each of the successive limits tied to the limit for
2k clusters.
- Adds real functionality in so that doing a sysctl to change these
actually changes them :-)
MFC after: 1 week
when applicable.
Aquire Giant slightly later for vnlru.
In the syncer, aquire the Giant only when a vnode belongs to the
non-MPsafe fs.
In both speedup_syncer() and syncer_shutdown(), remove the syncer thread from
the lbolt sleep queue after the syncer state is modified, not before.
Herded by: attilio
Tested by: Peter Holm
Reviewed by: ups
MFC after: 1 week
This makes updates mounts such as:
"mount -u -o rdonly" work more like, "mount -u -o ro".
References to "-o rdonly" were changed to "-o ro" in revision 1.60 of
the mount(8) man page,
but some people still like to use "-o rdonly" since it was documented
in earlier versions of FreeBSD.
Requested by: rwatson
MFC after: 1 week
within the jail are never freed. si_cred is only used by the MAC framework so
make the cred reference conditional on it being compiled in, this is not a fix
and will need to be reviewed for any new consumers of si_cred.
This will quell some user complaint when using jails with a default kernel.
Reviewed by: rwatson
MFC after: 3 days
INCLUDE_CONFIG_FILE. Make a user to look at what config(8) actually does,
and how can one fetch actual configuration file.
Reported by: many
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
is what gcc3 complains about.
Without this change, it's impossible to build the kernel with gcc3.
Tested by: cognet@ (mentor)
Approved by: cognet@ (mentor)
test incorrect.
- Fix the initial buf calculation to be more friendly, calc is the same
but we use different variable to make it easier amongst the different
code versions.
MFC after: 1 week
sending, once the locks are all unlocked to
do the copy's in, its possible that other
events could then raise the number of bytes
outstanding pushing it so not all the message
would fit. This would then cause us to send
only part of the message. This fix makes it
so we keep a "reserved" amount that can be
kept in mind when making calculations to send.
- rcv msg args with a NULL/NULL for to/tolen will return an error incorrectly
for the 1-2-1 model.
- We were not doing 0 len return correctly and not setting cantrcv more
correctly. Previouly we "fixed" this area by taking out the socantrcv
since we then could not get the data out. The correct rix is to still
flag the socket but alow a by-pass route to continue to read until
all data is consumed.
MFC after: 1 week
with insufficient protection mode.
For the i386 and amd64, create the tunable, machdep.prot_fault_translation,
with the following behaviour:
0 = autodetect the signal to be delivered on KERN_PROTECTION_FAILURE
from vm_fault based on the ELF OSABI note:
no note or __FreeBSD_version < 700004 - SIGBUS/BUS_PAGE_FAULT
note, and __FreeBSD_version >= 700004 - SIGSEGV/SEGV_ACCERR
1 = always SIGBUS/BUS_PAGE_FAULT
2 = always SIGSEGV/SEGV_ACCERR
This would do mostly automatic correction of ABI breakage, with the exception
of the untaged binaries for 7-CURRENT/RELENG_7 before the note is fixed. For
them, sysctl would allow to run the binary with manual settings.
Discussed with: portmgr (kris)
PR: kern/118304
MFC after: 3 days
dereferencing. Unaligned access could cause panic on strict alignment
architectures.
Reviewed by: marcel, marius (also tested on sparc64, thanks !)
MFC after: 3 days
Before this fix, FreeBSD would negotiate SACK on outgoing
connections, but would always fail to negotiate it on incoming
connections.
Discovered by: James Healy and Lawrence Stewart
Submitted by: James Healy and Lawrence Stewart
MFC after: 3 days
attached. Otherwise, the snp->snp_tty would be overwritten, while the
tty line discipline still set to the snpdisc. Then snplwrite() causes
panic because ttytosnp() cannot find the snp.
MFC after: 1 week
support its -k argument:
kern.proc.kstack - dump the kernel stack of a process, if debugging
is permitted.
This sysctl is present if either "options DDB" or "options STACK" is
compiled into the kernel. Having support for tracing the kernel
stacks of processes from user space makes it much easier to debug
(or understand) specific wmesg's while avoiding the need to enter
DDB in order to determine the path by which a process came to be
blocked on a particular wait channel or lock.
- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).
Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.
Update stack(9) man page.
Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)
We used to allocate the domains 0-14 for userland, and leave the domain 15
for the kernel. Now supersections requires the use of domain 0, so we
switched the kernel domain to 0, and use 1-15 for userland.
How it's done currently, the kernel domain could be allocated for a
userland process.
So switch back to the previous way we did things, set the first available
domain to 0, and just add 1 to get the real domain number in the struct pmap.
Reported by: Mark Tinguely <tinguely AT casselton DOT net>
MFC After: 3 days
1. A packet comes in that is to be forwarded
2. The destination of the packet is rewritten by some firewall code
3. The next link's MTU is too small
4. The packet has the DF bit set
Then the current code is such that instead of setting the next
link's MTU in the ICMP error, ip_next_mtu() is called and a guess
is sent as to which MTU is supposed to be tried next. This is because
in this case ip_forward() is called with srcrt set to 1. In that
case the ia pointer remains NULL but it is needed to get the MTU
of the interface the packet is to be sent out from.
Thus, we always set ia to the outgoing interface.
MFC after: 2 weeks
The RAS implementation would set the end address, then the start
address. These were used by the kernel to restart a RAS sequence if
it was interrupted. When the thread switching code ran, it would
check these values and adjust the PC and clear them if it did.
However, there's a small flaw in this scheme. Thread T1, sets the end
address and gets preempted. Thread T2 runs and also does a RAS
operation. This resets end to zero. Thread T1 now runs again and
sets start and then begins the RAS sequence, but is preempted before
the RAS sequence executes its last instruction. The kernel code that
would ordinarily restart the RAS sequence doesn't because the PC isn't
between start and 0, so the PC isn't set to the start of the sequence.
So when T1 is resumed again, it is at the wrong location for RAS to
produce the correct results. This causes the wrong results for the
atomic sequence.
The window for the first race is 3 instructions. The window for the
second race is 5-10 instructions depending on the atomic operation.
This makes this failure fairly rare and hard to reproduce.
Mutexs are implemented in libthr using atomic operations. When the
above race would occur, a lock could get stuck locked, causing many
downstream problems, as you might expect.
Also, make sure to reset the start and end address when doing a syscall, or
a malicious process could set them before doing a syscall.
Reviewed by: imp, ups (thanks guys)
Pointy hat to: cognet
MFC After: 3 days
its -f and -v arguments:
kern.proc.filedesc - dump file descriptor information for a process, if
debugging is permitted, including socket addresses, open flags, file
offsets, file paths, etc.
kern.proc.vmmap - dump virtual memory mapping information for a process,
if debugging is permitted, including layout and information on
underlying objects, such as the type of object and path.
These provide a superset of the information historically available
through the now-deprecated procfs(4), and are intended to be exported
in an ABI-robust form.
January 1, 1601. The 1601 - 1970 period was in seconds rather than 100ns
units.
Remove duplication by having NdisGetCurrentSystemTime call ntoskrnl_time.
linker interfaces for looking up function names and offsets from
instruction pointers. Create two variants of each call: one that is
"DDB-safe" and avoids locking in the linker, and one that is safe for
use in live kernels, by virtue of observing locking, and in particular
safe when kernel modules are being loaded and unloaded simultaneous to
their use. This will allow them to be used outside of debugging
contexts.
Modify two of three current stack(9) consumers to use the DDB-safe
interfaces, as they run in low-level debugging contexts, such as inside
lockmgr(9) and the kernel memory allocator.
Update man page.
sx driver), change a magic value in the PLX bridge chip. Apparently later
builds of the PCI cards had corrected values in the configuration eeprom.
This change supposedly fixes some pci bus problems.
information in support of DDB(4); these functions bypass normal linker
locking as they may run in contexts where locking is unsafe (such as the
kernel debugger).
Add a new interface linker_ddb_search_symbol_name(), which looks up a
symbol name and offset given an address, and also
linker_search_symbol_name() which does the same but *does* follow the
locking conventions of the linker.
Unlike existing functions, these functions place the name in a
caller-provided buffer, which is stable even after linker locks have been
released. These functions will be used in upcoming revisions to stack(9)
to support kernel stack trace generation in contexts as part of a live,
rather than suspended, kernel.
gets enabled when INVARIANTS is on instead of DIAGNOSTIC (which apparently
nobody uses). From Tor's description:
This happens when the block range spans two block maps, the first in the
inode (mapping up to NDADDR direct blocks) and the second being the first
indirect block. The current check assumes that both block maps are
indirect blocks.
Work done by: tegge
Tested by: kris, kensmith
in the tcp header. With relevant parts of the tcp header changing after
the 'signature' was computed, the signature becomes invalid.
Reviewed by: tools/regression/netinet/tcpconnect
MFC after: 3 days
Tested by: Nick Hilliard (see net@)
is required by the X.Org PCI domains code and additionally needs
a workaround for Hummingbird and Sabre bridges as these don't
allow their config headers to be read at any width, which is an
unusual behavior.
- In psycho(4) take advantage of DEFINE_CLASS_0 and use more
appropriate types for some softc members.
MFC after: 3 days
hack means you can get the units and flags to match up more easily with
serial consoles on machines with acpi tables that cause the com ports
to be probed in the wrong order (and hence get the wrong sio unit number).
This replaces the common alternative hack of editing the code to comment
out the acpi attachment. This could go away entirely when device wiring
patches are committed.
stomping on the units intended for the motherboard sio ports. This is
no real substitute for the not-yet-committed device wiring enhancements.
Code taken from sio's pci attachment.
allocation fails and pv entries are reclaimed, there may be an unused pv
entry in a pv chunk that survived the reclamation. However, previously,
after reclamation, get_pv_entry() did not look for an unused pv entry in
a surviving pv chunk; it simply retried the page allocation. Now, it
does look for an unused pv entry before retrying the page allocation.
Note: This only applies to RELENG_7. Earlier branches use a different
pv entry allocator.
MFC after: 6 weeks
Intel CPUs with family 0x6, model 0xE and later (i.e., Intel Core(TM))
have a PMC architecture that differs somewhat from previous CPUs in
family 0x6. Even though the basic programming model is similar, the
documented set of legal values that may be loaded into their PMC MSRs
differs from that of the previous PMCs in family 0x6 and reusing bit
values valid for the older PMCs could result in undefined behaviour in
the general case.
per-cpu area. cp_time[] goes away and a new function creates a merged
cp_time-like array for things like linprocfs, sysctl etc. The
atomic ops for updating cp_time[] in statclock go away, and the scope
of the thread lock is reduced.
sysctl kern.cp_time returns a backwards compatible cp_time[] array.
A new kern.cp_times sysctl returns the individual per-cpu stats.
I have pending changes to make top and vmstat optionally show per-cpu
stats.
I'm very aware that there are something like 5 or 6 other versions "out
there" for doing this - but none were handy when I needed them.
I did merge my changes with John Baldwin's, and ended up replacing a
few chunks of my stuff with his, and stealing some other code.
Reviewed by: jhb
Partly obtained from: jhb
since the branch caches on at least Athlon XP through Athlon 64 CPU's
don't understand such instructions and guarantee a cache miss taking
at least 10 cycles. Use the documented workaround "ret $0" instead
("nop; ret" also works, but "ret $0" is probably faster on old CPUs).
Normal code (even asm code) doesn't branch to "ret", since there is
usually some cleanup to do, but the __mcount, .mcount and .mexitcount
entry points were optimized too well to have the minimum number of
instructions (3 instructions each if profiling is not enabled) and
they did this. I didn't see a significant number of cache misses for
.mexitcount, but for the shared "ret" for __mcount and .mcount I
observed cache misses costing 26 cycles each. For a send(2) syscall
that makes about 70 function calls, the cost of these cache misses
alone increased the syscall time from about 4000 cycles to about 7000
cycles. 4000 is for a profiling (GUPROF) kernel with profiling disabled;
after this fix, configuring profiling only costs about 600 cycles in the
4000, which is consistent with almost perfect branch prediction in the
mcounting calls.
unused except to obfuscate disassemblies. -mprofiler-epilogue is
currently with gcc-4 (it does too little), but -finstrument-functions
is broken in a different way (it does too much).
amd64 version: meger whitespace fixes from i386 version.
Call uma_sel_align() there at well.
Set CPU_CONTROL_VECRELOC if we're using the high vectors page.
Submitted by: Rafal Jaworowski <raj AT semihalf DOT com>
MFC After: 1 week
bpf will see inner and outer headers or just inner or outer
headers for incoming and outgoing IPsec packets.
This is useful in bpf to not have over long lines for debugging
or selcting packets based on the inner headers.
It also properly defines the behavior of what the firewalls see.
Last but not least it gives you if_enc(4) for IPv6 as well.
[ As some auxiliary state was not available in the later
input path we save it in the tdbi. That way tcpdump can give a
consistent view of either of (authentic,confidential) for both
before and after states. ]
Discussed with: thompsa (2007-04-25, basic idea of unifying paths)
Reviewed by: thompsa, gnn
- On amd64, just assume type #1 is always used. PCI 2.0 mandated
deprecated type #2 and required type #1 for all future bridges which
was well before amd64 existed.
- For i386, ignore whatever value was in 0xcf8 before testing for type #1
and instead rely on the other tests to determine if type #1 works. Some
newer machines leave garbage in 0xcf8 during boot and as a result the
kernel doesn't find PCI at all (which greatly confuses ACPI which expects
PCI to exist when PCI busses are in the namespace).
MFC after: 3 days
Discussed with: scottl
ZFS porting style didn't extend this, instead using a heap of additional
header files that don't get installed.
My intention had been to allow OpenSolaris external code to build on
FreeBSD out of the box (i.e. without a src tree).
Make clear that this is not a good idea when called from
tcp_output()->ipsec_hdrsiz_tcp()->ipsec4_hdrsize_tcp()
as we do not know if IPsec processing is needed at that point.
T_DIRECT filtering so that disk drives can be attached via the
pass driver. Add CAM locking. Don't mark CAM commands as SG64
since the hardware isn't designed to deal with 64-bit passthru
commands. Hopefully the bounce buffer changes that were done
for the management/ioctl interface are robust enough to handle
this deficiency for CAM as well.
- Enable pcbeep control for Acer + ALC268 (nid 29). Give enough (fake)
hints so the parser will grab it and allocate "speaker" control.
- Fix regression while preparing DAC and ADC for multichannel
format. Since playback policy is to output to every possible path,
ensure that each DAC is started.
Reported / Tested by: Guy Brand
Currently, Giant is not too much contented so that it is ok to treact it
like any other mutexes.
Please don't forget to update your own custom config kernel files.
Approved by: cognet, marcel (maintainers of arches where option is
not enabled at the moment)
of some old programs. Since sigval is union type, this change will not have
binary compatibility problem.
MFC: after 3 days
Discussed with: rwatson, glebius
It should just contain the value we want to add, as if we're interrupted
between the add and the str, we will restart from the beginning. Just use
a register we can scratch instead.
MFC After: 1 week
routine. It is not needed as the existing tests for segment coalescing
already handle bounced addresses and it prevents legal segment coalescing
in certain edge cases.
MFC after: 1 week
Reviewed by: scottl
The call should happen with the driver lock held. We don't hold the driver
lock in newstate as it's a separate thread where we can't sleep (and we only
call wpi_cmd in async mode).
Discovered By: Attillo's callout rework
Approved By: mlaier (comentor)
currently, before to spin the turnstile spinlock is acquired and the
waiters flag is set.
This is not strictly necessary, so just spin before to acquire the
spinlock and to set the flags.
This will simplify a lot other functions too, as now we have the waiters
flag set only if there are actually waiters.
This should make wakeup/sleeping couplet faster under intensive mutex
workload.
This also fixes a bug in rw_try_upgrade() in the adaptive case, where
turnstile_lookup() will recurse on the ts_lock lock that will never be
really released [1].
[1] Reported by: jeff with Nokia help
Tested by: pho, kris (earlier, bugged version of rwlock part)
Discussed with: jhb [2], jeff
MFC after: 1 week
[2] John had a similar patch about 6.x and/or 7.x about mutexes probabilly
sends frames up the stack after changing the current channel then
the lookup by ieee channel number may fail leaving a null ptr in
se_chan; if this happens fallback to the channel recorded when the
frame is processed (curchan). Since the frame doesn't contribute
to scan results for the sta this is acceptable.
Reviewed by: thompsa
MFC after: 3 days
1837014 Kernel panics after authentication of an outgoing packet
1836992 Potential bugs in packet auth code (w/patches)
1836967 Kernel panic when using auth rule with keep state
and another reported only to FreeBSD by Andiry (see PR)
PR: kern/118251
Submitted by: Andriy Syrovenko <andriys@gmail.com>
Reviewed by: darrenr
MFC after: 5 days
cast as uint32_t which is defined as unsigned int. gcc doesn't want to
consider that there might not be much difference between an int and
a long on a 32 bit architecture.
vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better
handle a lock-ordering problem. Consequently, trylock's failure on the
page's containing object no longer implies that the page cannot be
laundered.
MFC after: 6 weeks
This has the benefit that rmlocks have proper support for reader recursion
(in contrast to rwlock(9) which could potential lead to writer stravation).
It also means a significant performance gain, eventhough only visible in
microbenchmarks at the moment.
Discussed on: -arch, -net
malloc_type_allocated(..., 0) calls that occur when contigmalloc() has
failed. Eliminate the acquisition and release of the page queues lock
from vm_page_release_contig(). Rename contigmalloc2() to
contigmapping(), reflecting what it does.
as up if at least one of its ports also has a link up. This fixes using
carp+lagg together and any other system that relies on linkstate events.
PR: kern/113956
MFC after: 3 days
the inpcb when there's an inpcb without associated timewait state, and
not unlocking when the inpcb has been freed. This avoids a kernel panic
when tcpdrop(8) is run on a socket in the TIMEWAIT state.
MFC after: 3 days
Reported by: Rako <rako29 at gmail dot com>
should never be moved by one lock to another.
As, luckily, nothing in our tree is using it, axe the function.
This breaks lockmgr KPI, so interested, third-party modules should update
their source code with appropriate replacement.
Ok'ed by: ups, rwatson
MFC after: 3 days
comments from vnode_pager_setsize(). This call was introduced in
revision 1.140 to address a problem that no longer exists.
Specifically, pmap_zero_page_area() has replaced a (possibly)
problematic implementation of page zeroing that was based on
vm_pager_map(), bzero(), and vm_pager_unmap().
while the global callout spinlock is not held, and can lead to PF#.
Reported by: dougb, Mark Atkinson <atkin901 at yahoo dot com>
Tested by: dougb
Diagnosed by: jhb
The lookup hurts a bit for connections but had been there anyway
if IPSEC was compiled in. So moving the lookup up a bit gives us
TSO support at not extra cost.
PR: kern/115586
Tested by: gallatin
Discussed with: kmacy
MFC after: 2 months
a good job of it) in the copypktopts() function, just call ip6_clearpktopts()
directly. Otherwise, the callers of this function would end up freeing the
memory twice.
Reviewed by: jinmei
PR: kern/116360
o Acer Aspire 4520 laptop
- jack sensing / automute
o Toshiba Satellite A135-S4527 laptop
- jack sensing / automute
Tested by: lioux
o Apple Macbook 3 (is it?)
- require gpio0 (for speakers) and ovref50 (for headphone)
to make it works
- jack sensing / automute
Tested by: Ed Schouten
* Add Nvidia MCP67 controller ids.
* Be sensible about simmilar controller with multiple pci ids.
* Connect unused DAC/ADC to stream#0 rather than forcing each of them
managing their own stream.
MFC after: 3 days
include the ithread scheduling step. Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled. Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state. The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler. Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example. This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported. Many thanks to Sam
Lefler for getting to the bottom of this problem.
Reviewed by: jhb, jeff, silby
- Bring HEAD up to the latest shared code
- Fix TSO problem using limited MSS and forwarding
- Dual lock implementation
- New device support
- For my ease, this code can compile in either 6.x or later
- brings this driver in sync with the 6.3
prepend a data mbuf in front of a header mbuf without moving the header
to the new mbuf, and (2) a possible alignment problem on architectures
with strict alignment as reported in kern/4184.
PR: kern/4184 (1)
addresses as the source of an AARP request. While this PR was submitted
in the context of work in OpenBSD to port netatalk (in 1997), I've
synchronized the code more to our ARP input routine, which had similar
requirements.
Submitted by: Denton Gentry
PR: kern/4184
MFC after: 1 week
only at address 0 which is supposed to be the only valid phy address
on Marvell PHY. The more correct solution would be masking PHY
address ranges allowable in PHY probe routine. Unfortunately,
FreeBSD has no way to retrict the PHY address ranges or to pass special
flags to PHY driver.
This change assumes that PHY hardwares attached to msk(4) would be
Marvell made 88E11xx PHY.
With this changes the phantom phys attached on 88E8036(Yukon FE)
should disappear.
Reported by: Oleg Lomaka < oleg AT lomaka DOT org DOT ua >
Tested by: Oleg Lomaka < oleg AT lomaka DOT org DOT ua >
only 4KB SRAM.
o Rework setting Tx/Rx RAM buffer size. Give receiver 2/3 of memory
and round it down to the multiple of 1024. The RAM buffer size of
Yukon II should be multiple of 1024. This fixes bogus RAM buffer
configuration used in Yukon FE.
Reported by: Oleg Lomaka < oleg AT lomaka DOT org DOT ua >
Tested by: Oleg Lomaka < oleg AT lomaka DOT org DOT ua >
timestamps in the initial SYN packet actually use them in the rest of the
connection. Unfortunately, during the 7.0 testing cycle users have already
found network devices that violate this constraint.
RFC 1323 states 'and may send a TSopt in other segments' rather than
'and MUST send', so we must allow it.
Discovered by: Rob Zietlow
Tracked down by: Kip Macy
PR: bin/118005
publicly available datasheet for Yukon II and don't know what
bug/workaround exist for the specific hardware revision. Also I don't
think the vendor will release hardware errata in near future.
The hardware feature lists were not used at all except setting water
mark registers. Since msk(4) should know exact chip model/revision
number to decide which hardware capability could be used the extra
feature lists were redundant.
o Enable jumbo frame support for EC Ultra and disable jumbo frame
for FE.
o Enable store and forward mode for standard MTU sized frame.
o Enable TSO for EC Ultra. However TSO/checksum offload is disabled
for jumbo frame case. Because EC Ultra can't use store and forward
mode for jumbo frame TSO/checksum offload is not available.
o Adjust Tx GMAC almost empty threshold value and add a jumbo frame
water mark. The maic value was obtained from Marvell's sk98lin
driver.
o Fix EC Ultra chip revision number.
rwlocks in conjuction with callouts. The function does basically what
callout_init_mtx() alredy does with the difference of using a rwlock
as extra argument.
CALLOUT_SHAREDLOCK flag can be used, now, in order to acquire the lock only
in read mode when running the callout handler. It has no effects when used
in conjuction with mtx.
In order to implement this, underlying callout functions have been made
completely lock type-unaware, so accordingly with this, sysctl
debug.to_avg_mtxcalls is now changed in the generic
debug.to_avg_lockcalls.
Note: currently the allowed lock classes are mutexes and rwlocks because
callout handlers run in softclock swi, so they cannot sleep and they
cannot acquire sleepable locks like sx or lockmgr.
Requested by: kmacy, pjd, rwatson
Reviewed by: jhb
Revert the probe in atapi-cd.c to the old usage now its fixed on AHCI.
THis change also fixes using virtual CD's om fx parallels.
Still leaves the GEOM problem of telling media vs device access apart in the access function.
server-side RPC retranmission cache for non-idempotent operations: these
hacks substituted 0 (success) for the expected EEXIST in the event that
a target name already existed for LINK, SYMLINK, and MKDIR operations,
under the assumption that EEXIST represented a second application of the
original RPC rather than a true failure.
Background: certain NFS operations (in this case, LINK, SYMLINK, and
MKDIR) are not idempotent, as they leave behind persisting state on the
server that prevents them from being replayed without an error;if an UDP
RPC reply is lost leading to a retransmission by theclient, the second
reply will return EEXIST rather than success, asthe new object has
already been created. The NFS client previouslysilently mapped the
EEXIST return into success to paper over thisproblem.
However, in all modern NFS server implementations, a reply cache is kept
in order to retransmit the original reply to a retransmitted request,
rather than performing the operation a second time, allowing this hack
to be avoided. This allows link()-based filelocking over NFS to operate
correctly, as an application requestingthe creation of a new link for a
file to tell if it succeededatomically or not.
Other NFS clients, including Solaris and Linux, generally follow this
behavior for the same reasons. Most clients also now default to TCP,
which also helps avoid the issue of retransmitted but non-idempotent
requests in most cases.
Reported by: Adam McDougall <mcdouga9 at egr dot msu dot edu>,
Timo Sirainen <tss at iki dot fi>
Reviewed by: mohans
MFC after: 1 week
o buffered write, for chunks smaller than PIPE_MINDIRECT bytes
o direct write, for everything else
A call to writev(2) may receive struct iov of various size and the
kernel may have to switch from one solution to the other. Before doing
this, it must wake reader processes and any select/poll/kqueue up.
This commit fixes a bug where select/poll/kqueue are not triggered
when switching from buffered write to direct write. It adds calls to
pipeselwakeup().
I give more details on freebsd-arch@:
http://lists.freebsd.org/pipermail/freebsd-arch/2007-September/006790.html
This should fix issues with Erlang (lang/erlang) and kqueue.
Reported by: Rickard Green (Erlang)
time ago (2002 according to the gcc log). Using the proper name
fixes a warning in src/lib/libc/gen/ulimit.c about the second
argument of va_start() not being the last named (when it really
was).
This has the following benefits:
- allows to use the AT keyboard maps in share/syscons/keymaps with
sunkbd(4),
- allows to use kbdmux(4) with sunkbd(4),
- allows Sun RS232 keyboards to be configured and used the same
way as Sun USB keyboards driven by ukbd(4) (which also does AT
keyboard emulation) with X.Org, putting an end to the problem
of native support for the former in X.Org being broken over and
over again.
MFC after: 3 days
an unified way for all the lock primitives to express lock assertions.
Currenty, lockmgrs and rmlocks don't have assertions, so just panic in
that case.
This will be a base for more callout improvements.
Ok'ed by: jhb, jeff
strees2 suite, to quote his letter, this change:
1. It removes the tn_lookup_dirent stuff. I think this cannot be fixed,
because nothing protects vnode/tmpfs node between lookup is done, and
actual operation is performed, in the case the vnode lock is dropped.
At least, this is the case with the from vnode for rename.
For now, we do the linear lookup in the parent node. This has its own
drawbacks. Not mentioning speed (that could be fixed by using hash), the
real problem is the situation where several hardlinks exist in the dvp.
But, I think this is fixable.
2. The patch restores the VV_ROOT flag on the root vnode after it became
reclaimed and allocated again. This fixes MPASS assertion at the start
of the tmpfs_lookup() reported by many.
Submitted by: kib
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().
Reviewed by: tegge
MFC after: 6 weeks
If it is set to zero value (default) dummynet module will try to emulate
real link as close as possible (bandwidth & latency): packet will not leave
pipe faster than it should be on real link with given bandwidth.
(This is original behaviour of dummynet which was altered in previous commit)
If it is set to non-zero value only bandwidth is enforced: packet's latency
can be lower comparing to real link with given bandwidth.
- Document recently introduced dummynet(4) sysctl variables.
Requested by: luigi, julian
MFC after: 3 month
with ACCESSPERMS. Document in mount_ntfs(8) only the nine
low-order bits of mask are used (taken from mount_msdosfs(8)).
PR: kern/114856
Submitted by: Ighighi
MFC after: 1 month
In case attach fails because of the priv check we leaked the
memory and left so_pcb as fodder for invariants.
Reported by: Pawel Worach
Reviewed by: rwatson
- Implement timing out of VPD register access.[1]
- Fix an off-by-one error of freeing malloc'd space when checksum is invalid.
- Fix style(9) bugs, i.e., sizeof cannot be followed by space.
- Retire now obsolete 'hw.pci.enable_vpd' tunable.
Submitted by: cokane (initial revision)[1]
Reviewed by: marius (intermediate revision)
Silence from: jhb, jmg, rwatson
Tested by: cokane, jkim
MFC after: 3 days
The register layout is little different from memory-mapped stats
in the previous generation chips. In fact, it is bad because
registers in this range are cleared after reading them.
Reviewed by: scottl
MFC after: 3 days
- Trying to eliminate another racing by replacing the timeout(9) with
callout APIs. In addition to that, the callout_drain() in an_detach()
help us to avoid a possible panic-on-free due to the callout API tries
to lock a destroyed mutex.
- In an_stats_update(), check the return value of an_read_record(). This
should reduce the chance of device removal(PCCARD) panic [2].
- Adding a comment to state the fact that an_stats_update() is now called
via callout(9) with a lock held [2].
Submitted by: jhb [1], ambrisko [2]
Reviewed by: jhb, ambrisko
Reported by: dhw
Tested by: dhw
MFC after: 3 days
priorities of the technologies supported by 802.3 Selector Field
value.
1000BASE-T full duplex
1000BASE-T
100BASE-T2 full duplex
100BASE-TX full duplex
100BASE-T2
100BASE-T4
100BASE-TX
10BASE-T full duplex
10BAST-T
However PHY drivers didn't honor the order such that 100BASE-T4 had
higher priority than 100BASE-TX full duplex. Fix that long standing
bugs such that have PHY drivers choose the highest common denominator
ability.
Fix a bug in dcphy which inadvertently aceepts 100BASE-T4.
PR: 92599
- Populate the register values for the trapframe put on the stack by the
double fault handler.
- Teach DDB's trace routine to treat a double fault like other trap frames.
MFC after: 3 days
process_fini, thread_ctor, thread_dtor, thread_init, thread_fini. This
will allow us to extend dynamically areas in proc/thread for dtrace ;-)
Reviewed by: rwatson
- process_ctor,dtor, init and fini
- thread_ctor,dtor, init and fini
This allows the ability to add on additional things
during construction/destruction of threads and processes.
Reviewed by: rwatson
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().
i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().
PR: ia64/118024
removing some copy&pasted code.
- Reduce copy and paste in ng_apply_item().
- Resurrect ng_send_fn() as a valid symbol, not a define.
Reviewed by: mav, julian
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
sys/dev/acpica/acpi.c rev 1.196 a while ago:
Grab Giant around calls to DEVICE_SUSPEND/RESUME in
acpi_SetSleepState().
If we are resuming non-MPSAFE drivers, they need Giant held for them.
This may fix some obscure suspend/resume problems. It has fixed keyrate
setting problems that were triggered by cardbus (MPSAFE) changing the
ordering for syscons resume (non-MPSAFE). Also, add some asserts that
Giant is held in our suspend/resume and shutdown methods.
Submitted by: Marko Zec
amd64 mechanism over. Instead of page table hackery that isn't
actually needed, just use 'struct pcpu __pcpu[MAXCPU]' for backing like
all the other platforms do. Get rid of 'struct privatespace' and a
while mess of #ifdef SMP garbage that set it up. As a bonus, this
returns the 4MB of KVA that we stole to implement it the old way.
This also allows you to read the pcpu data for each cpu when reading a
minidump.
Background information: Originally, pcpu stuff was implemented as having
per-cpu page tables and magic to make different data structures appear
at the same actual address. In order to share page tables, we switched
to using the GDT and %fs/%gs to access it. But we still did the evil
magic to set it up for the old way. The "idle stacks" are not used
for the idle process anymore and are just used for a few functions during
bootup, then ignored. (excercise for reader: free these afterwards).
that the driver will handle WEP encryption. However, this does not seem to be
implemented by this driver (or maybe the chipset doesn't support it?)
Removing the flag makes my wpi card work using wpa_supplicant(8) on a
network with 802.1x security (without this change it authenticated fine, but
tcpdump only saw garbage packets)
Reviewed by: benjsc, imp (mentor)
Approved by: imp (mentor), sam
frequency from OpenFirmware moved out and into a routine that is called
from cpu_startup().
This allows correct reporting of the CPU clockspeed when printing out
CPU information at boot time.
Reported by: numerous
Reviewed by: marcel
MFC after: 1 day
Enhanced Disk Drive Specification Ver 3.0 defines that the version
of extension in AH would be 30h.
Correct the check for that to be >=30h instead of >3h.
MFC after: 2 months
from messing with the spdb and sadb.
Problem sneaked in with the fast_ipsec+v6->ipsec merger by no
longer going via raw_usrreqs.pr_attach.
Reported by: Pawel Worach
Identified by: rwatson
Reviewed by: rwatson
MFC after: 3 days
bumped to 800004 to note the change though userland apps should not be
affected since they use <sys/agpio.h> rather than the headers in
sys/dev/agp.
Discussed with: anholt
Repocopy by: simon
and update the rx code to handle multiple frames in a single usb
transfer. AX772 parts (at least) exhibit many input errors when
operated with a 2K rx buffer and no errors w/ a 4K rx buffer (it's
unclear what the cause of the errors is for 2K so this may just be
covering up the real issue). Larger rx buffer sizes show no
significant performance improvement for AX772. Bypassing the common
buffer management routines also eliminates an extra context switch
on every packet which noticeably improves performance (TCP netperf
rx goes from 45 Mb/s to 85 MB/s).
Submitted by: "J.R. Oldroyd" <fbsd@opal.com>
Reviewed by: imp
Obtained from: openbsd (partly)
MFC after: 3 weeks
The reliability of it's multi DAC / playback channels is
not that good. Enabling vchans make the bug more visible
since playback allocation will look for possible free
hardware channels first (i.e: the next DAC, the very first
has been consumed by vchan mixer) which in this case has
been proven faulty.
Reported / Tested by: Sascha Klauder
MFC after: 3 days
This includes:
o mtree (for legal/intel_wpi)
o manpage for i386/amd64 archs
o module for i386/amd64 archs
o NOTES for i386/amd64 archs
Approved by: mlaier (comentor)
proc_rwmen.
Otherwise copy on write may create an anonymous page that is
not marked as dirty. Since writing data to these pages
in this function also does not dirty these pages they may be
later discarded by the pagedaemon.
- Use unit2minor() and minor2unit() to generate minor numbers to support
unit numbers higher than 255.
- Use simple string operations on the 'names' array rather than hard-coded
constants and switch statements so that more ptys can be added by simply
expanding the 'names' array.
MFC after: 1 week
lock optimized for almost exclusive reader access. (see also rmlock.9)
TODO:
Convert to per cpu variables linkerset as soon as it is available.
Optimize UP (single processor) case.
- Patch registers CR47 and CR157 on devices that require it.
- Fix power calibration setting on ZD1211B.
Obtained from: OpenBSD
- Fix multicast transfer by properly reprogram multicast global
hash table, which in turns fix promiscuous mode and IPv6
autoconfiguration / local networking.
Reviewed by: sam, Weongyo Jeong
Tested using: Aztech WL230 , Belkin F5D7050, Unicorn WL-54G,
3COM 3CRUSB10075
MFC after: 1 week
I've tried to move md(4) to use geom_disk class, like real disks do, but
this requires major rework of some of the existing features such as
configuration dumping for example. Therefore just putting devstat support
directly into md(4) seems to be optimal solution.
Now you can see md(4) stats in `systat -vm' again.
MFC after: 2 weeks
present on the MacBook, MacBook Pro, and Intel MacMini.
This driver exports information via sysctl in its private sysctl tree
dev.asmc.*. You can get information about temperatures, fan speeds, the
keyboard light sensor and the Sudden Motion Sensor (SMS).
The SMS is very useful to park the disk heads when the laptop is
moved. Basically, the SMS is setup so that, under movement, we get an
interrupt on irq 6 and a devd notification is sent.
Sponsored by: Google Summer of Code 2007
Approved by: njl (mentor)
Reviewed by: attilio (previous version, but very similar), jhb (interrupt
specific review)
LINUX_SIOCGIFCOUNT just returns 0 since it is not implemented in the
Linux 2.6.16.
LINUX_SIOCGIFINDEX/LINUX_SIOGIFINDEX are mapped to the FreeBSD native
SIOCGIFINDEX.
Tested by: Peter Kostouros <kpeter@melbpc.org.au>
Reviewed by: brooks, rpaulo (on net@)
Submitted by: rdivacky
MFC after: 1 week
2) Alter packet flow inside dummynet: allow certain packets to bypass
dummynet scheduler. Benefits are:
- lower latency: if packet flow does not exceed pipe bandwidth, packets
will not be (up to tick) delayed (due to dummynet's scheduler granularity).
- lower overhead: if packet avoids dummynet scheduler it shouldn't reenter ip
stack later. Such packets can be fastforwarded.
- recursion (which can lead to kernel stack exhaution) eliminated. This fix
long existed panic, which can be triggered this way:
kldload dummynet
sysctl net.inet.ip.fw.one_pass=0
ipfw pipe 1 config bw 0
for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done
ping -c 1 localhost
3) Three new sysctl nodes are added:
net.inet.ip.dummynet.io_pkt - packets passed to dummynet
net.inet.ip.dummynet.io_pkt_fast - packets avoided dummynet scheduler
net.inet.ip.dummynet.io_pkt_drop - packets dropped by dummynet
P.S. Above comments are true only for layer 3 packets. Layer 2 packet flow
is not changed yet.
MFC after: 3 month
while other variants have inorder ethernet address for the same
chipset. Override ethernet address ordering if we already know how
it was stored. This fixes the use of inversed ethernet address on
MCP67.
Submitted by: ariff
MFC after: 3 days
Allocate space in keyboard state structure instead to prevent random byte
from possibly overwritten stack location frombeing shoved into USB device
when transfer actually takes place.
This fixes at least one instance of LEDs not working with USB keyboards.
characters (mostly "&"). Because top(1) shows only first six characters of
wait channel, without this change we saw only one meaningful character.
Requested by: kris & others
MFC after: 1 week
must be globally performed before calling any of the TLB invalidation
functions.
With one exception, on amd64, this requirement was already met. Fix this
one case. Also, as a clarification, change an existing atomic op into a
release. (Suggested by: jhb)
Reported and reviewed by: ups
MFC after: 3 days
o do not override the home channel recorded for the sta when the frame is
received off-channel; this fixes a problem where we might think the sta
was operating on the channel the frame was received on causing association
requests to be ignored/rejected (likely cause of kern/99036)
o don't include rssi of off-channel frames in the avg rssi used to select
a bss; this gives us a better estimate of the signal we will see for the
station when on-channel
PR: kern/99036
Found by: Yubin Gong
Reviewed by: sephe
MFC after: 1 week
This import includes:
o wpi Wireless driver for the Intel 3945 Wireless Lan Controller (802.11abg) (sys/dev/wpi)
o Intel firmware revision 2.14.4 & associated LICENSE (sys/dev/contrib/wpi, sys/contrib/dev/wpi/LICENSE)
o wpifw Firmware driver (sys/modules/wpifw)
Approved by: mlaier, sam (co-mentors)
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.
As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.
The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).
In collaboration with: Peter Holm
Reviewed by: jhb
default object rather than cache it was to have
vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is
no cached page in object at pindex. This allows to avoid explicit
checks for cached pages in vm_object_backing_scan().
For now, we need the same bandaid for the swap object, otherwise both
the vm_page_lookup() and the pager can report that there is no page at
offset, while page is stored in the cache. Also, this fixes another
instance of the KASSERT("object type is incompatible") failure in the
vm_page_cache_transfer().
Reported and tested by: Peter Holm
Reviewed by: alc
MFC after: 3 days
interface. Once the limit is reached packets with unknown source addresses are
dropped until an existing host cache entry expires or is removed. Useful to
use with the STICKY cache option.
Sponsored by: miniSuperHappyDevHouse NZ
reset problem when we reboot the system with the zyd device inserted.
Submitted by: Weongyo Jeong
Reported by: Ted Lindgreen (ted@tednet.nl)
MFC after: 1 week
it's been printing out scary messages about "Unhanded Event Notify Frame"
that are needlessly worrisome to users. Change this warning to only print
out at an elevated debugging level.
warnings. Specifically, whenever vm_page_alloc(9) returned NULL to
get_pv_entry(), we issued a warning regardless of the number of pv
entries in use. (Note: The older pv entry allocator in RELENG_6 does
not have this problem.)
Reported by: Jeremy Chadwick
Eliminate the direct call to pagedaemon_wakeup() by get_pv_entry().
This was a holdover from earlier times when the page daemon was
responsible for the reclamation of pv entries.
MFC after: 5 days
Put in a little comment explaining why it went away.
Re-enable it in the case there an exisiting process is just splitting
off its address space and file descriptors.
(I donpt think anything uses that code but it needs some sort of locking
and this does the job.
Reviewed by: Davidxu, alc, others
MFC after: 3 days
CPUs to make sure idle threads are evicted from the softc before returning
from acpi_cpu_shutdown(). However, this is unnecessary since stop_cpus()
handles this for itself and at this point it's possible that our IPI will be
blocked (interrupts disabled).
Thanks to: Glen Leeder <glen.leeder / nokia.com>
MFC after: 3 days
don't do this right; instead go to the scan cache so we pass through
auth state (if the cache is warm we can do this w/o an actual scan)
MFC after: 1 week
(BIO_WRITE and BIO_FLUSH) as it is done is Solaris. The difference is
that Solaris calls it only for sync requests, but we can't say in GEOM
is the request is sync or async, so we do it for every request.
MFC after: 1 week
to change the freq before the other CPUs are active. The current code
always attempts to change all CPUs to match each other, and the requisite
sched_bind() call won't work before APs are launched.
/dev/agpgart and agp_free_res() frees resources like the BAR for the
aperture. Splitting this up lets chipset-specific detach routines
manipulate the aperture during their detach routines without panicing.
MFC after: 1 week
Reviewed by: anholt
* Do not hold any locks over calls to copyin/copyout.
* Clean up some #ifdefs
* fix a possible mbuf leak when NAT fails on policy routed packets
PR: 117216
- Select a tag gains ability to optionally save new tags
off in the timewait system.
- When looking up associations do not give back a stcb that
is in the about-to-be-freed state, and instead continue
looking for other candiates.
- New function to query to see if value is in time-wait.
- Timewait had a time comparison error that caused very
few vtags to actually stay in time-wait.
- When setting tags in time-wait, we now use the time
requested NOT a fixed constant value.
- sstat now gets the proper associd when we do the query.
- When we process an association, we expect the tag chosen
(if we have one from a cookie) to be in time-wait. Before
we would NOT allow the assoc up by checking if its good.
In theory this should have caused almost all assoc not
to come up except for the time-comparison bug above (this
bug was hidden by the time comparison bug :-D).
- Don't save tags for nonce values in the time-wait cache
since these are used only during cookie collisions and do
not matter if they are unique or not.
MFC after: 1 week
set this flag and it was more or less just copied and pasted from
another FreeBSD driver while porting this driver from NetBSD, whose
gentbi(4) doesn't set MIIF_NOISOLATE either.
- Fix spelling in a comment.
OK'ed by: yongari
MFC after: 3 months
zero (0). Actual RFCOMM channel will be assigned after listen(2)
call is done on a RFCOMM socket bound to a ''wildcard'' RFCOMM
channel zero (0).
Address locking issues in ng_btsocket_rfcomm_bind()
Submitted by: Heiko Wundram (Beenic) < wundram at beenic dot net >
MFC after: 1 week
- Remove AU_.* hard-coded audit class constants, as udit classes are now
entirely dynamically configured using /etc/security/audit_class.
Obtained from: TrustedBSD Project
supports the removal of hard-coded audit class constants in OpenBSM
1.0. All audit classes are now dynamically configured via the
audit_class database.
Obtained from: TrustedBSD Project
changes:
01 - Enhanced LRO:
LRO feature is extended to support multi-buffer mode. Previously,
Ethernet frames received in contiguous buffers were offloaded.
Now, frames received in multiple non-contiguous buffers can be
offloaded, as well. The driver now supports LRO for jumbo frames.
02 - Locks Optimization:
The driver code was re-organized to limit the use of locks.
Moreover, lock contention was reduced by replacing wait locks
with try locks.
03 - Code Optimization:
The driver code was re-factored to eliminate some memcpy
operations. Fast path loops were optimized.
04 - Tag Creations:
Physical Buffer Tags are now optimized based upon frame size.
For better performance, Physical Memory Maps are now re-used.
05 - Configuration:
Features such as TSO, LRO, and Interrupt Mode can be configured
either at load or at run time. Rx buffer mode (mode 1 or mode 2)
can be configured at load time through kenv.
06 - Driver Statistics:
Run time statistics are enhanced to provide better visibility
into the driver performance.
07 - Bug Fixes:
The driver contains fixes for the problems discovered and
reported since last submission.
08 - MSI support:
Added Message Signaled Interrupt feature which currently uses 1
message.
09 Removed feature:
Rx 3 buffer mode feature has been removed. Driver now supports 1,
2 and 5 buffer modes of which 2 and 5 buffer modes can be used
for header separation.
10 Compiler warning:
Fixed compiler warning when compiled for 32 bit system.
11 Copyright notice:
Source files are updated with the proper copyright notice.
MFC after: 3 days
Submitted by: Alicia Pena <Alicia dot Pena at neterion dot com>,
Muhammad Shafiq <Muhammad dot Shafiq at neterion dot com>
made by Michael Eisele and the patch was slightly modified by me.
With this change several NVIDIA ethernet controllers(e.g. MCP61)
works.
RTL8211B(L) is RealTek's new gigabit PHY. The PHY has several
features including crossover correction, polarity correction as
well as supporting triple speed(10/100/1000bps). Data transfer
between MAC and PHY is via RGMII for 1000baseT, MII for
10baseT/100baseTX.
Unfortunately, RealTek used the same model number for RTL8211B(L)
PHY so there is no way to discriminate between RTL8211B(L) and its
predecessors. ATM RTL8211B uses revision number 2 so checking the
revision number seems to be only way to identify it.
Obtained from: Michael Eisele [1]
Tested by: clemens fischer < ino-qc AT spotteswoode DOT de DOT eu DOT org >
mii_anegticks to MII_ANEGTICKS_GIGE and use it. Previously it used
to MII_ANEGTICKS which may not be enough to wait before retrying
autonegotiation process at 1000bps.
o Reset autonegotation timer if media option is not IFM_AUTO or we
got a valid link.
o Announce link loss right after it happends.
o Autonegiation is retried every mii_anegticks seconds.
o Report link state changes right after setting autonegotiation.
Blade 1500/SX1500 boards have inherited the firmware bug of the
AX1105 mainboards to not include an interrupt map entry for the
parallel port controller (for the AX1105 the heuristic code for
E450s probably erroneously kicks in and guesses an interrupt).
- Take advantage of bus_generic_setup_intr(9).
- Fix some whitespace bugs.
entry point, which is no longer required now that we don't support
old-style multicast tunnels. This removes the last mbuf object class
entry point that isn't init/copy/destroy.
Obtained from: TrustedBSD Project
Framework by moving from mac_mbuf_create_netlayer() to more specific
entry points for specific network services:
- mac_netinet_firewall_reply() to be used when replying to in-bound TCP
segments in pf and ipfw (etc).
- Rename mac_netinet_icmp_reply() to mac_netinet_icmp_replyinplace() and
add mac_netinet_icmp_reply(), reflecting that in some cases we overwrite
a label in place, but in others we apply the label to a new mbuf.
Obtained from: TrustedBSD Project
in the TrustedBSD MAC Framework:
- Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send()
for AARP packet labeling, rather than using a generic link layer
entry point.
- Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send()
for ND6 packet labeling, rather than using a generic link layer entry
point.
- Add expliict entry point mac_netinet_arp_send() for ARP packet
labeling, and mac_netinet_igmp_send() for IGMP packet labeling,
rather than using a generic link layer entry point.
- Remove previous genering link layer entry point,
mac_mbuf_create_linklayer() as it is no longer used.
- Add implementations of new entry points to various policies, largely
by replicating the existing link layer entry point for them; remove
old link layer entry point implementation.
- Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global
to the MAC Framework rather than static to mac_net.c as it is now
needed outside of mac_net.c.
Obtained from: TrustedBSD Project
reason (not all BIOSen have _DIS methods for all link devices for example).
This matches the behavior of attach() with respect to _DIS as well.
Submitted by: njl
userland preemption directly from hardclock() via sched_clock() when a
thread uses up a full quantum instead of using a periodic timeout to cause
a userland preemption every so often. This fixes a potential deadlock
when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held
by a thread pinned or bound to another CPU. The current thread on that
CPU will never be preempted while softclock is blocked.
Note that ULE already drives its round-robin userland preemption from
sched_clock() as well and always enables IPI_PREEMPT.
MFC after: 1 week
a private softc list is needed neither for tracking clones in general
nor for destroying all clones before the module unload -- if_clone
takes care of all that. (Note that some other interface drivers do
need a softc list to be able to scan it for their private purposes.)
noatime, noexec, suiddir, nosuid, nosymfollow, union,
noclusterr, noclusterw, multilabel, acls, force, update,
async. These options correspond to MOPT_STDOPTS, MOPT_FORCE, MOPT_UPDATE,
and MOPT_ASYNC.
Currently, mount_nfs converts these "-o" options from strings
to MNT_ flags via getmntopts(),
and passes the flags from userspace to the kernel.
This change will allow us in future to pass these mount options
as strings directly to the kernel via nmount() when doing NFS mounts.
out instead of returning an error.
(1) This makes the behavior consistent with mount(2).
(2) This makes update mounts on the root file system work properly.
(3) The explicit checks for MNT_ROOTFS in src/sbin/fsck_ffs/main.c
and src/usr.sbin/mountd/mountd.c which were put in to
eliminate errors during update mounts on the root file system
can be removed.
The only place were MNT_ROOTFS can be validly set
is inside the kernel, i.e. with vfs_mountroot_try().
Reviewed by: phk
MFC after: 3 days
handle to the PCI device_t if the ACPI device_t is already attached to a
driver. This happens on the Tablet TC1000 which for some reason includes
two PCI-ISA bridges and treats the second bridge as an ACPI system resource
device.
Reviewed by: njl (a while ago)
MFC after: 3 days
that would have an offset beyond the end of the target object. Such
pages should remain in the source object.
MFC after: 3 days
Diagnosed and reviewed by: Kostik Belousov
Reported and tested by: Peter Holm
defined. This lets each boot program choose which version of cgbase() it
wants to use rather than forcing ufsread.c to have that knowledge.
MFC after: 1 week
Discussed with: imp
saves about 500 bytes in the boot code. While the AT91RM9200 has 12k
of space for the boot loader, which is more than i386's 8k, the code
generated by gcc is a bit bigger.
I've had this in p4 for about two years now.
we move towards netinet as a pseudo-object for the MAC Framework.
Rename 'mac_create_mbuf_linklayer' to 'mac_mbuf_create_linklayer' to
reflect general object-first ordering preference.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
kthread_add() takes the same parameters as the old kthread_create()
plus a pointer to a process structure, and adds a kernel thread
to that process.
kproc_kthread_add() takes the parameters for kthread_add,
plus a process name and a pointer to a pointer to a process instead of just
a pointer, and if the proc * is NULL, it creates the process to the
specifications required, before adding the thread to it.
All other old kthread_xxx() calls return, but act on (struct thread *)
instead of (struct proc *). One reason to change the name is so that
any old kernel modules that are lying around and expect kthread_create()
to make a process will not just accidentally link.
fix top to show kernel threads by their thread name in -SH mode
add a tdnam formatting option to ps to show thread names.
make all idle threads actual kthreads and put them into their own idled process.
make all interrupt threads kthreads and put them in an interd process
(mainly for aesthetic and accounting reasons)
rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper'
man page fixes to follow.
refactored it to be a generic device.
Instead of being part of the standard kernel, there is now a 'nvram' device
for i386/amd64. It is in DEFAULTS like io and mem, and can be turned off
with 'nodevice nvram'. This matches the previous behavior when it was
first committed.
This change introduces audit_proc_coredump() which is called by coredump(9)
to create an audit record for the coredump event. When a process
dumps a core, it could be security relevant. It could be an indicator that
a stack within the process has been overflowed with an incorrectly constructed
malicious payload or a number of other events.
The record that is generated looks like this:
header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec
argument,0,0xb,signal
path,/usr/home/csjp/test.core
subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2
return,success,1
trailer,111
- We allocate a completely new record to make sure we arent clobbering
the audit data associated with the syscall that produced the core
(assuming the core is being generated in response to SIGABRT and not
an invalid memory access).
- Shuffle around expand_name() so we can use the coredump name at the very
beginning of the coredump call. Make sure we free the storage referenced
by "name" if we need to bail out early.
- Audit both successful and failed coredump creation efforts
Obtained from: TrustedBSD Project
Reviewed by: rwatson
MFC after: 1 month
primary object type, and then by secondarily by method name. This sorts
entry points relating to particular objects, such as pipes, sockets, and
vnodes together.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
the PS/2 mouse controller. Thus, when acpi_ibm(4) claimed the mouse
device, the mouse would stop working. The one ACPI dump of an R40 that
I've looked at includes an HKEY device with the proper "IBM0068" ID, so
I'm not sure how the "IBM0057" ID could have helped at all.
MFC after: 1 week
Approved by: njl
Rework the read/write support in the bios disk driver some to cut down
on duplicated code.
- All of the bounce buffer and retry logic duplicated in bd_read() and
bd_write() are merged into a single bd_io() routine that takes an
extra direction argument. bd_read() and bd_write() are now simple
wrappers around bd_io().
from mac_vfs.c to mac_process.c to join other functions that setup up
process labels for specific purposes. Unlike the two proc create calls,
this call is intended to run after creation when a process registers as
the NFS daemon, so remains an _associate_ call..
Obtained from: TrustedBSD Project
than mac_<policy>_whatever, as this shortens the names and makes the code
a bit easier to read.
When dealing with label structures, name variables 'mb', 'ml', 'mm rather
than the longer 'mac_biba', 'mac_lomac', and 'mac_mls', likewise making
the code a little easier to read.
Obtained from: TrustedBSD Project
order. The kernel used to shuffle them around to get things right,
but that was recently fixed. This makes our boot loader match the
behavior of most other boot loaders for the atmel parts. This bug was
inherited from the Kwikbyte loader that we started from.
This bug was discovered by Bj.ANvrn KNvnig back in June, but fell on the
floor. He provided patches to the kernel, include backwards
compatibility options that were similar to Olivier's if_ate.c commit.
in the same order as it's set in ate_set_mac.
I remember a discussion about this on -arm, but apparently nothing was done.
Warner, is this wrong ?
X-MFC After: proper review
on i386 and amd64 machines. The overall process is that /boot/pmbr lives
in the PMBR (similar to /boot/mbr for MBR disks) and is responsible for
locating and loading /boot/gptboot. /boot/gptboot is similar to /boot/boot
except that it groks GPT rather than MBR + bsdlabel. Unlike /boot/boot,
/boot/gptboot lives in its own dedicated GPT partition with a new
"FreeBSD boot" type. This partition does not have a fixed size in that
/boot/pmbr will load the entire partition into the lower 640k. However,
it is limited in that it can only be 545k. That's still a lot better than
the current 7.5k limit for boot2 on MBR. gptboot mostly acts just like
boot2 in that it reads /boot.config and loads up /boot/loader. Some more
details:
- Include uuid_equal() and uuid_is_nil() in libstand.
- Add a new 'boot' command to gpt(8) which makes a GPT disk bootable using
/boot/pmbr and /boot/gptboot. Note that the disk must have some free
space for the boot partition.
- This required exposing the backend of the 'add' function as a
gpt_add_part() function to the rest of gpt(8). 'boot' uses this to
create a boot partition if needed.
- Don't cripple cgbase() in the UFS boot code for /boot/gptboot so that
it can handle a filesystem > 1.5 TB.
- /boot/gptboot has a simple loader (gptldr) that doesn't do any I/O
unlike boot1 since /boot/pmbr loads all of gptboot up front. The
C portion of gptboot (gptboot.c) has been repocopied from boot2.c.
The primary changes are to parse the GPT to find a root filesystem
and to use 64-bit disk addresses. Currently gptboot assumes that the
first UFS partition on the disk is the / filesystem, but this algorithm
will likely be improved in the future.
- Teach the biosdisk driver in /boot/loader to understand GPT tables.
GPT partitions are identified as 'disk0pX:' (e.g. disk0p2:) which is
similar to the /dev names the kernel uses (e.g. /dev/ad0p2).
- Add a new "freebsd-boot" alias to g_part() for the new boot UUID.
MFC after: 1 month
Discussed with: marcel (some things might still change, but am committing
what I have so far)
the PCIOCGETCONF, PCIOCREAD and PCIOCWRITE IOCTLs, which was broken
with the introduction of PCI domain support.
As the size of struct pci_conf_io wasn't changed with that commit,
this unfortunately requires the ABI of PCIOCGETCONF to be broken
again in order to be able to provide backwards compatibility to
the old version of that IOCTL.
Requested by: imp
Discussed with: re (kensmith)
Reviewed by: PCI maintainers (imp, jhb)
MFC after: 5 days
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:
mac_<object>_<method/action>
mac_<object>_check_<method/action>
The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.
All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
on duplicated code and support 64-bit LBAs for GPT.
- The code to manage an EDD or C/H/S I/O request are now in their own
routines. The EDD routine now handles a full 64-bit LBA instead of
truncating LBAs to the lower 32-bits. (MBRs and BSD labels only
have 32-bit LBAs anyway, so the only LBAs ever passed down were 32-bit).
- All of the bounce buffer and retry logic duplicated in bd_read() and
bd_write() are merged into a single bd_io() routine that takes an
extra direction argument. bd_read() and bd_write() are now simple
wrappers around bd_io().
- If a disk supports EDD then always use it rather than only using it if
the cylinder is > 1023. Other parts of the boot code already do
something similar to this. Also, GPT just uses LBAs, so for a GPT disk
it's probably best to ignore C/H/S completely. Always using EDD when
it is supported by a disk is an easy way to accomplish this.
MFC after: 1 week
Slightly cleanup the 'bootdev' concept on x86 by changing the various
macros to treat the 'slice' field as a real part of the bootdev instead
of as hack that spans two other fields (adaptor (sic) and controller)
that are not used in any modern FreeBSD boot code.
macros to treat the 'slice' field as a real part of the bootdev instead
of as hack that spans two other fields (adaptor (sic) and controller)
that are not used in any modern FreeBSD boot code.
MFC after: 1 week
audit it at the beginning of the syscall. This fixes a problem
where the user supplies an invalid process ID which is > 0 which
results in the PID argument not being audited.
Obtained from: TrustedBSD Project
MFC after: 1 week
state is stored in an extended subject token now. Make sure
that we are using the extended data. This fixes the termID
for process tokens.
Obtained from: TrustedBSD Project
Discussed with: rwatson
MFC after: 1 week
After discussions with jeff, alc, (various Ironport people), david Xu,
and mostly Alfred (who found the problem) it has been demonstrated that this
is not needed for our implementations of threads and represents a real
(as in we've seen it happen a lot) deadlock danger.
Several points:
Since forking multiple threads is not allowed, and posix states that
any mutexes owned by othre threads wilol be owned in the child by
phantom threads, and therads shouldn't ba accessing shared structures without
protection, It can be proved that if this leads to the child process accessing
inconsistent data, it's a programming error.
The mode of thread_single() being used in fork() is the wrong one.
It is using SINGLE_NO_EXIT when it should be using SINGLE_BOUNDARY.
Even if this we used, System processes have no need to do it as they have
no userland to get inconsistent.
This commmit first fixes the above bugs to get tehm correct in CVS.
then removes them with #ifdef.
This is so that history contains the corrected version should it
be needed in the future.
This code may be needed if we implement the forkall() syscall from
Solaris. It may be needed for other non-posix thread libraries
at some time in the future, so let the code sit for a short while
while I do some work on it anyhow.
This removes a reproducible lockup in NFS.
It may be argued that maybe doing a fork while holding a vnode lock may
not be the best idea in th efirst place but it shouldn't cause a deadlock.
The removal has been running under soak test for several days now.
This removal should be seriously considered for 7.0 and RELENG_6.
Note. There is code in the core-dumping code that may have a similar problem
with coredumping threaded processes
MFC After: 4 days
kern/sched_ule.c - Add __powerpc__ to the list of supported architectures
powerpc/conf/GENERIC - Swap SCHED_4BSD with SCHED_ULE
powerpc/powerpc/genassym.c - Export TD_LOCK field of thread struct
powerpc/powerpc/swtch.S - Handle new 3rd parameter to cpu_switch() by
updating the old thread's lock. Note: uniprocessor-only, will require
modification for MP support.
powerpc/powerpc/vm_machdep.c - Set 3rd param of cpu_switch to mutex of
old thread's lock, making the call a no-op.
Reviewed by: marcel, jeffr (slightly older version)
Specifically, if two threads were doing concurrent lookups and the existing
gateway was marked down, the the first thread would drop a reference on the
gateway route and then unlock the "root" route while it tried to allocate
a new route. The second thread could then also drop a reference on the
same gateway route resulting in a reference underflow. Fix this by
clearing the gateway route pointer after dropping the reference count but
before dropping the lock. Secondly, in this same case, the second thread
would overwrite the gateway route pointer w/o free'ing a reference to the
route installed by the first thread. In practice this would probably just
fix a lost reference that would result in a route never being freed.
This fixes panics observed in rt_check() and rtexpunge().
MFC after: 1 week
PR: kern/112490
Insight from: mehuljv at yahoo.com
Reviewed by: ru (found the "not-setting it to NULL" part)
Tested by: several
- markvoldirty() needs to write to underlying GEOM provider. We
have to do that *before* g_access() which sets the GEOM provider
to read-only.
- Remove dirty flag before free'ing iconv related resources. The
dirty flag removal could fail, and it is hard to revert the
iconv-free after the fail.
- Mark volume as dirty if we have failed to mark it clean for safe.
- Other style fixes to the touched functions.
cache: vnode_pager_setsize() must handle the case where a file is
truncated to a non-page-size-aligned boundary and there is a cached
page underlying the new end of file.
Reported by: kris, tegge
Tested by: kris
MFC after: 3 days
since revision 1.1. Specifically, neither traversal of the vm map checks
whether the end of the vm map has been reached. Consequently, the first
traversal can wrap around and bogusly return an error.
This error has gone unnoticed for so long because no one had ever before
tried msync(2)ing a region above the stack.
Reported by: peter
MFC after: 1 week
for kldstat(2).
This allows libdtrace to determine the exact file from which
a kernel module was loaded without having to guess.
The kldstat(2) API is versioned with the size of the
kld_file_stat structure, so this change creates version 2.
Add the pathname to the verbose output of kldstat(8) too.
MFC: 3 days
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.
optimization level (-march=pentium-mmx for example) does not insert
intermediate ops which would trash the carry.
Change both sys/i386/i386/in_cksum.c[1] and sys/i386/include/in_cksum.h.
To my best understanding the same problem was addressed in rev. 1.16
of src/sys/i386/include/in_cksum.h for just a single function 3y ago.
Reviewed by: jhb
Submitted by: Zhouyi ZHOU <zhouzhouyi FreeBSD.org> (intial version of [1])
MFC after: 5 days
PR: 115678, 69257
codecs. Codec at address 0 seems purely digital, or perhaps an HDMI
interface. Let the driver skip it and continue scanning the codecs
starting with address 2 (Realtek ALC885).
* Due to possibilities of future similar cases, put enough logic
in hdac_scan_codecs() to force codec scanning starting from
XX address via tunable "hint.pcm.%d.codec_index".
Reported / Tested by: Toomas Pelberg <toomasp@gmx.net>
- Trivial headphone / speaker automute fixup for Fujitsu-Siemens
AMILO Si 1848 laptop.
Reported / Tested by: Ed <ed@bsd.it>
- Trivial headphone / speaker automute fixup for Fujitsu-Siemens
Lifebook S7020D laptop.
Reported / Tested by: Jaromir Dvoracek <jarek@ataxo.com>
- Some smart vendor trying to create interplanetary wormhole by
screwing pci config space during their BIOS update. The side effects
of their failure attempt includes mutilated hardware id, broken
speaker automuting and loosing the entire analog CD connectivity,
thus causing enough collateral damages to collapse the entire
universe. Move along with it.
Please exercise extra cautious when applying BIOS updates.
Reported / Tested by: Pietro Cerutti <gahr@gahr.ch>
- assembled laptop, based on the MSI-1034
(662) which is now becoming MSI-034A.
- Fix no sound issues (on headphones) for Lenovo ThinkCentre A55 due
to global automute table entry which is not applicable for
non-laptops.
Reported / Tested by: Piotr Smyrak <piotr.smyrak@heron.pl>
- Speaker mute control for HP DC7700 since the front headphone jack
does not generate any interesting unsolicited signal/response.
Reported / Tested by: tyop @ irc.freenode.net
Approved by: re (kensmith)
MFC after: 3 days
When item forwarded refence counter is incremented, when item
processed, counter decremented. When counter reaches zero,
apply handler is getting called.
Now it allows to report right connect() call status from user-level
at the right time.
This is much simpler than for ffs since there are many fewer places
where we need to choose between a delayed write and a sync write --
just 5 in msdosfs and more than 30 in ffs.
This is more complete and correct than in ffs. Several places in ffs
are are still missing the choice. ffs_update() has a layering violation
that breaks callers which want to force a sync update (mainly fsync(2)
and O_SYNC write(2)).
However, fsync(2) and O_SYNC write(2) are still more broken than in
ffs, since they are broken for default (non-sync non-async) mounts
too. Both fail to sync the FAT in all cases, and both fail to sync
the directory entry in some cases after losing a race. Async everything
is probably safer than the half-baked sync of metadata given by default
mounts.
us to scale up to sb_max, aka kern.ipc.maxsockbuf.
We do this because there are broken firewalls that will corrupt the window
scale option, leading to the other endpoint believing that our advertised
window is unscaled. At scale factors larger than 5 the unscaled window will
drop below 1500 bytes, leading to serious problems when traversing these
broken firewalls.
With the default maxsockbuf of 256K, a scale factor of 3 will be chosen by
this algorithm. Those who choose a larger maxsockbuf should watch out
for the compatiblity problems mentioned above.
Reviewed by: andre
queue so the output network card must support the same tagging mechanism as
how the frame was input (prepended Ethernet header tag or stripped HW mflag).
Now the vlan Ethernet header is _always_ stripped in ether_input and the mbuf
flagged, only only network cards with VLAN_HWTAGGING enabled would properly
re-tag any outgoing vlan frames.
If the outgoing interface does not support hardware tagging then readd the vlan
header to the front of the frame. Move the common vlan encapsulation in to
ether_vlanencap().
Reported by: Erik Osterholm, Jon Otterholm
MFC after: 1 week
leaving space for adding missing options. Negative options are sorted
after removing their "no" prefix, and generic options are sorted before
msdosfs-specific ones.
(except indirectly for the size pseudo-attribute). If anything deserves
a sync update, then it is ids and immutable flags, since these are
related to security, but ffs never synced these and msdosfs doesn't
support them. (ufs_setattr() only does an update in one case where
it is least needed (for timestamps); it did pessimal sync updates for
timestamps until 1998/03/08 but was changed for unlogged reasons related
to soft updates.)
Now msdosfs calls deupdat() with waitfor == 0, which normally gives a
delayed update to disk but always gives a sync update of timestamps
in core, while for ffs everything is delayed until the syncer daemon
or other activity causes an update (except for timestamps).
This gives a large optimization mainly for things like cp -p, where
attribute adjustment could easily triple the number of physical I/O's
if it is done synchronously (but cp -p to msdosfs is not as bad as
that, since msdosfs doesn't support many attributes so null adjustments
are more common, and msdosfs doesn't support ctimes so even if cp
doesn't weed out null adjustments they don't become non-null after
clobbering the ctime).
in the way we implement handling of relocations.
As for the kernel part this fixes the loading of lots of modules,
which failed to load due to unresolvable symbols when built after
the GCC 4.2.0 import. This wasn't due to a change in GCC itself
though but one of several changes in configuration done along the
import. Specfically, HAVE_AS_REGISTER_PSEUDO_OP, which causes GCC
to denote global registers used for scratch purposes and in turn
GAS uses R_SPARC_OLO10 relocations for, is now defined.
While at it replace some more ELF_R_TYPE which should have been
ELF64_R_TYPE_ID but didn't cause problems so far.
- Sync a sanity check between kernel and rtld(1) and change it to be
maintenance free regarding the type used for the lookup table.
- Sprinkle const on lookup tables.
- Use __FBSDID.
Reported and tested by: yongari
MFC after: 5 days
- fix a bug during cookie collision that prevented an
association from coming up in a specific restart case.
- Fix it so the shutdown-pending flag gets removed (this is
more for correctness then needed) when we enter shutdown-sent
or shutdown-ack-sent states.
- Fix a bug that caused the receiver to sometimes NOT send
a SACK when a duplicate TSN arrived. Without this fix
it was possible for the association to fall down if the
- Deleted primary destination is also stored when SCTP_MOBILITY_BASE.
(Previously, it is stored when only SCTP_MOBILITY_FASTHANDOFF)
- Fix a locking issue where we might call send_initiate_ack() and
incorrectly state the lock held/not held. Also fix it so that
when we release the lock the inp cannot be deleted on us.
- Add the debug option that can cause the stack to panic instead
of aborting an assoc. This does not and should never show up
in options but is useful for debugging unexpected aborts.
- Add cumack_log sent to track sending cumack information for
the debug case where we are running a special log per assoc.
- Added extra () aroudn sctp_sbspace macro to avoid compile warnings.
MFC after: 1 week
This avoids back-to-back faults for all TLB misses. This can be
improved further in the future by also setting PTE_DIRTY for TLB
misses for write accesses.
MFC after: 1 week
ukbd_poll to mark this keyboard instance as polling before calling
usbd_set_polling at USB level. usbd_set_polling runs softintr before
returning, stealing our input and making consequent polling getchar
kind of pointless.
This allows USB keyboards to coexist peacefully with serial console in DDB
and other contexts where polling is used.
MFC after: 1 week
properly due to the shortage of the RX buffer size. In a case of zyd
devices, up to 3 frames can be combined in an USB transaction. So, RX
buffer should be at least ((MCLBYTES + extra structs) * 3)
Submitted by: Weongyo Jeong <weongyo.jeong@gmail.com>
MFC after: 3 days
(it is established practice) and ``-o whiteout=whenneeded'' is less
disk-space using mode especially for resource restricted environments
like embedded environments. (Contributed by Ed Schouten. Thanks)
Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week
Some folks who have reported some issues have solved with transparent mode.
We guess it is time to change the default copy mode. The transparent-mode is
the best in most situations.
Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week
applications that use procfs on unionfs.
- Removed unionfs internal cache mechanism because it has
vfs_cache support instead. As a result, it just simplified code of
unionfs.
- Fixed kern/111262 issue.
Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week
make sure to never call sched_bind() for uninitialised CPUs.
Submitted by: Constantine A. Murenin <cnst@FreeBSD.org>
Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors)
Mentored by: syrinx
Tested by: many
OKed by: kensmith
This commit includes the following core components:
* sample configuration file for sensorsd
* rc(8) script and glue code for sensorsd(8)
* sysctl(3) doc fixes for CTL_HW tree
* sysctl(3) documentation for hardware sensors
* sysctl(8) documentation for hardware sensors
* support for the sensor structure for sysctl(8)
* rc.conf(5) documentation for starting sensorsd(8)
* sensor_attach(9) et al documentation
* /sys/kern/kern_sensors.c
o sensor_attach(9) API for drivers to register ksensors
o sensor_task_register(9) API for the update task
o sysctl(3) glue code
o hw.sensors shadow tree for sysctl(8) internal magic
* <sys/sensors.h>
* HW_SENSORS definition for <sys/sysctl.h>
* sensors display for systat(1), including documentation
* sensorsd(8) and all applicable documentation
The userland part of the framework is entirely source-code
compatible with OpenBSD 4.1, 4.2 and -current as of today.
All sensor readings can be viewed with `sysctl hw.sensors`,
monitored in semi-realtime with `systat -sensors` and also
logged with `sensorsd`.
Submitted by: Constantine A. Murenin <cnst@FreeBSD.org>
Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors)
Mentored by: syrinx
Tested by: many
OKed by: kensmith
Obtained from: OpenBSD (parts)
which is ukbd0. Specifically, the keyboard driver structures for ukbd0
are not allocated/freed but are statically allocated via a persistent
global variable. There is some additional magic for the ukbd0 such that
if the keyboard is marked as probed in this global variable, then we
don't check to see if the device_t we are probing has an interface.
This causes a problem if an attach of ukbd0 fails without fulling clearing
the state in the global variable. Specifically, if the keyboard fails to
initialize in init_keyboard() or kbd_register(), then the keyboard will
still be marked as probed. The USB layer will then try to offer the
"generic" version of the USB keyboard device (as opposed to the
per-interface sub-devices) and the ukbd(4) driver will see that the
keyboard is marked probe and will skip the "is this a per-interface device"
check. Later in ukbd_attach() it panics because it tries to dereference
the interface pointer which is NULL.
The fix is to clear the flags in the persistent keyboard data for ukbd0
when init_keyboard() or kbd_register() fail.
MFC after: 1 week
Reviewed by: imp
- Eliminate the hideous nfs_sndlock that serialized NFS/TCP request senders
thru the sndlock.
- Institute a new nfs_connectlock that serializes NFS/TCP reconnects. Add
logic to wait for pending request senders to finish sending before
reconnecting. Dial down the sb_timeo for NFS/TCP sockets to 1 sec.
- Break out the nfs xid manipulation under a new nfs xid lock, rather than
over loading the nfs request lock for this purpose.
- Fix some of the locking in nfs_request.
Many thanks to Kris Kennaway for his help with this and for initiating the
MP scaling analysis and work. Kris also tested this patch thorougly.
Approved by: re@ (Ken Smith)
on multiple different audit pipes. The old method used cv_signal()
which would result in only one thread being woken up after we
appended a record to it's queue. This resulted in un-timely wake-ups
when processing audit records real-time.
- Assign PSOCK priority to threads that have been sleeping on a read(2).
This is the same priority threads are woken up with when they select(2)
or poll(2). This yields fairness between various forms of sleep on
the audit pipes.
Obtained from: TrustedBSD Project
Discussed with: rwatson
MFC after: 1 week
This fixes the process portion of the bpf(4) stats if the peer forks
into the background after it's opened the descriptor. This bug
results in the following behavior for netstat -B:
# netstat -B
Pid Netif Flags Recv Drop Match Sblen Hblen Command
netstat: kern.proc.pid failed: No such process
78023 em0 p--s-- 2237404 43119 2237404 13986 0 ??????
MFC after: 1 week
- Add proper scanning support rather than letting the firmware grab the first
access point
- Overhaul state changes
- Use macros for locking and provide _locked() versions of some functions
- Increase debugging output
- Use a callout rather than the old watchdog interface
- Improve style, function names and defines
- Add WPA (TKIP) support
Based heavily on a patchset provided by Sam Leffler.
VR_STICKHW register would result in unexpected results on these
hardwares. wpaul said the following for the issue.
The vr_attach() routine unconditionally does this for all supported
chips:
/*
* Windows may put the chip in suspend mode when it
* shuts down. Be sure to kick it in the head to wake it
* up again.
*/
VR_CLRBIT(sc, VR_STICKHW, (VR_STICKHW_DS0|VR_STICKHW_DS1));
The problem is, the VR_STICKHW register is not valid on all Rhine
devices. The VT86C100A chip, which is present on the D-Link DFE-530TX
boards, doesn't support power management, and its register space is
only 128 bytes wide. The VR_STICKHW register offset falls outside this
range. This may go unnoticed in most scenarios, but if you happen to have
another PCI device in your system which is assigned the register
space immediately after that of the Rhine, the vr(4) driver will
incorrectly stomp it. In my case, the BIOS on my test board decided
to put the register space for my PRO/100 ethernet board right next
to the Rhine, and the Rhine driver ended up clobbering the IMR register
of the PRO/100 device. (Long story short: the board kept locking up on
boot. Took me the better part of the morning suss out why.)
The strictly correct thing to do would be to check the PCI config space
to make sure the device supports the power management capability and only
write to the VR_STICKHW register if it does.
Instead of inspecting chip revision numbers for the availability of
VR_STICKHW register, check the existence of power management capability
of the hardware as wpaul suggested.
Reported by: wpaul
Suggested by: wpaul
OK'ed by: jhb
1. The locking was changed to shared but roundrobin mode still updated a
pointer in the softc with the next tx interface to use. This will panic
under high load. Change this to an atomically incremented sequence number in
order to choose the tx port in round robin.
2. IFQ_HANDOFF will free the mbuf if the queue is full, this will then be freed
again by lagg_start() and panic. Reorganised the error handling and freeing
to fix this.
MFC after: 3 days
SAS-enabled cards. It also makes the driver MPSAFE, eliminating some
problems that resulted from CAM becoming MPSAFE. Many thanks to 3Ware/AMCC
for continuing to support FreeBSD.
Submitted by: Manjunath Ranganathaiah
Approved by: re
voltage of 0. This can result in a divide by zero trap. Add a guard
for this case. The value of lfcap is checked in acpi_battery_bif_valid()
just before this, so it is safe.
Reportd by: sam
Approved by: re
MFC after: 3 days
of directly from acpi0. Before it would attach prior to the sysresource
devices, causing the later allocation of its memory range to fail and
print a warning like "acpi0: reservation of fed00000, 1000 (3) failed".
Use an explicit define for our probe order base value of 10.
Help from: jhb
Tested by: Abdullah Ibn Hamad Al-Marri <almarrie / gmail.com>
MFC after: 3 days
Approved by: re
fixes a bug on UP machines with SMP kernels where the idle thread
constantly switches after trying to steal work from the local cpu.
- Make the idle stealing code more robust against self selection.
- Prefer to steal from the cpu with the highest load that has at least one
transferable thread. Before we selected the cpu with the highest
transferable count which excludes bound threads.
Collaborated with: csjp
Approved by: re
to simply switch rather than lowering priority and switching. This allows
threads of equal priority to run but not lesser priority.
Discussed with: davidxu
Reported by: NIIMI Satoshi <sa2c@sa2c.net>
Approved by: re
critical_exit() owepreempt check. ULE will always use owepreempt to
preempt the idle thread. This change does not effect 4BSD since it will
never set owepreempt without PREEMPTION enabled.
- Remove some unused code from choosethread().
Discussed with: jhb
Approved by: re
it must first ensure that the page is no longer mapped. This is
trivially accomplished by calling pmap_remove_all() a little earlier
in vm_page_cache(). While I'm in the neighborbood, make a related
panic message a little more useful.
Approved by: re (kensmith)
Reported by: Peter Holm and Konstantin Belousov
Reviewed by: Konstantin Belousov
a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs
when uma_small_free() frees a page. The solution has two parts: (1) Mark
pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock
assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested.
This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable
flags, i.e., they do not change state between the time that a page is
allocated and freed.
Approved by: re (kensmith)
PR: 116794
TCP: [X.X.X.X]:X to [X.X.X.X]:X tcpflags 0x18<PUSH,ACK>; tcp_do_segment: FIN_WAIT_2: Received data after socket was closed, sending RST and removing tcpcb
So that it also includes how many bytes of data were received. It now looks
like this:
TCP: [X.X.X.X]:X to [X.X.X.X]:X tcpflags 0x18<PUSH,ACK>; tcp_do_segment: FIN_WAIT_2: Received X bytes of data after socket was closed, sending RST and removing tcpcb
Approved by: re (gnn)
not being independently freeable. This allows one to embed an mbuf in
the cluster itself. This confers the benefits of the packet zone on
all cluster sizes. Embedded mbufs currently suffer from the same
limitation that packet zone mbufs do in that one cannot disconnect
them and pass them around independently of the cluster. It would
likely be possible to eliminate this limitation in the future by
adding a second reference for the mbuf itself.
Approved by: re(gnn)
problems with the syncache, it produces a lot of console noise and has led
to quite a few false positive bug reports. It can be selectively
re-enabled when debugging specific problems by frobbing the same sysctl.
Discussed with: silby
Approved by: re (gnn)
directory itself (rather than any of its contents) is visible to the
current thread.
MFC after: 1 week
PR: kern/90063
Submitted by: john of 8192.net
Approved by: re (kensmith)
with all functions supported. This is done adding usb device IDs
to the table of recognised devices (because there is no standard
'scanner' class, so no other way to recognise them), and with
a small change to the uscanner attach routine that prevents
reconfiguring the whole USB device while we are dealing only with
one of its USB interfaces.
The latter part has been suggested by Steinar Hamre in
http://www.freebsd.org/cgi/query-pr.cgi?pr=107665 , i have
only added a bit of explaination to the code.
I have personally tried this on the Epson DX-5050 and DX-6000
devices (on the US market they have different names, CX-something).
I have good reasons to think that, possibly with the mere addition
of more USB ids to the table in uscanner.c, this should work with
all Epson multifunction devices in that family (from DX-3800 to
DX-7000 - these units are in the 50-120$ price range).
More details on related topics (SANE configuration, OCR, etc.)
at http://info.iet.unipi.it/~luigi/FreeBSD/dx5050.html
Manpage updates coming soon.
Approved by: re, imp
MFC after: 3 days
UDMA modes.
Please notice that Soekris NET5501 bios versions before 1.32f has a bug
that prevents this from working.
Approved by: re (gnn)
MFC: 2 weeks
Before that fix, it was possible for the function to fail if number
of sharers changes between 'x = sx->sx_lock' step and atomic_cmpset_acq_ptr()
call.
This fixes ZFS problem when ZFS returns strange EIO errors under load.
In ZFS there is a code that depends on the fact that sx_try_slock() can
only fail if there is an exclusive owner.
Discussed with: attilio
Reviewed by: jhb
Approved by: re (kensmith)
after the switch leads to a race where the outgoing thread still owns
the local queue lock while another cpu may switch it in. This race
is only possible on machines where cpu_switch can take significantly
longer on different cpus which in practice means HTT machines with
unfair thread scheduling algorithms.
Found by: kris (of course)
Approved by: re
starvation caused by unbalanced interrupt loads.
- Change the rebalancer to work on stathz ticks but retain randomization.
- Simplify locking in tdq_idled() to use the tdq_lock_pair() rather than
complex sequences of locks to avoid deadlock.
Reported by: kris
Approved by: re
retransmittion by handover event (fast mobility code)
- Fixed problem of mobility code which is caused by remaining
parameters in the deleted primary destination.
- Add a missing lock. When a peer sends an INIT, and while we
are processing it to send an INIT-ACK the socket is closed,
we did not hold a lock to keep the socket from going away.
Add protection for this case.
- Fix so that arwnd is alway uses the minimal rwnd if the user
has set the socket buffer smaller. Found this when the test
org decided to see what happens when you set in a rwnd of 10
bytes (which is not allowed per RFC .. 4k is minimum).
- Fixes so a cookie-echo ootb will NOT cause an abort to
be sent. This was happening in a MPI collision case.
- Examined all panics and unless there was no recovery, moved
any that were not already to INVARANTS.
Approved by: re@freebsd.org (gnn)
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.
Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)
After discussion with Sam, switch back to use firmware(9) instead of
having the firmware in hex format.
Put the binary firmware uuencoded into sys/contrib/dev/npe, and slap a
LICENSE file, as found on the Intel website.
Approved by: re (blanket), mux (mentor)
MFC After: 1 week
Without this change the following situation was possible:
1. Provider is orphaned from within class' access() method on last write
close - orphan provider event is send.
2. GEOM detects last write close on a provider and sends new provider event.
3. g_orphan_register() is called, and calls all orphan methods of attached
consumers.
4. New provider event is executed on orphaned provider, all classes can
taste already orphaned provider, and some may attach consumers to it.
Those consumers will never go away, because the g_orphan_register()
was already called.
We end up with a zombie provider.
With this change, at step 3, we will cancel new provider event.
How to repeat this problem:
# mdconfig -a -t malloc -s 10m
# geli init -i 0 md0
# geli attach md0
# newfs -L test /dev/md0.eli
# mount /dev/ufs/test /mnt/tmp
# geli detach -l md0.eli
# umount /mnt/tmp
# glabel status
Name Status Components
ufs/test N/A N/A
Reviewed by: phk
Approved by: re (kensmith)
value for kern.sched.preempt_thresh appropriately. It can still by
adjusted at runtime. ULE will still use IPI_PREEMPT in certain
migration situations.
- Assert that we're not trying to compile ULE on an unsupported
architecture. To date, I believe only i386 and amd64 have implemented
the third cpu switch argument required.
Approved by: re
cache: vm_object_page_remove() should convert any cached pages that
fall with the specified range to free pages. Otherwise, there could
be a problem if a file is first truncated and then regrown.
Specifically, some old data from prior to the truncation might reappear.
Generalize vm_page_cache_free() to support the conversion of either a
subset or the entirety of an object's cached pages.
Reported by: tegge
Reviewed by: tegge
Approved by: re (kensmith)
to gem_attach() as the former access softc members not yet initialized
at that time and gem_reset() actually is enough to stop the chip. [1]
o Revise the use of gem_bitwait(); add bus_barrier() calls before calling
gem_bitwait() to ensure the respective bit has been written before we
starting polling on it and poll for the right bits to change, f.e. even
though we only reset RX we have to actually wait for both GEM_RESET_RX
and GEM_RESET_TX to clear. Add some additional gem_bitwait() calls in
places we've been missing them according to the GEM documentation.
Along with this some excessive DELAYs, which probably only were added
because of bugs in gem_bitwait() and its use in the first place, as
well as as have of an gem_bitwait() reimplementation in gem_reset_tx()
were removed.
o Add gem_reset_rxdma() and use it to deal with GEM_MAC_RX_OVERFLOW errors
more gracefully as unlike gem_init_locked() it resets the RX DMA engine
only, causing no link loss and the FIFOs not to be cleared. Also use it
deal with GEM_INTR_RX_TAG_ERR errors, with previously were unhandled.
This was based on information obtained from the Linux GEM and OpenSolaris
ERI drivers.
o Turn on workarounds for silicon bugs in the Apple GMAC variants.
This was based on information obtained from the Darwin GMAC and Linux GEM
drivers.
o Turn on "infinite" (i.e. maximum 31 * 64 bytes in length) DMA bursts.
This greatly improves especially RX performance.
o Optimize the RX path, this consists of:
- kicking the receiver as soon as we've a spare descriptor in gem_rint()
again instead of just once after all the ready ones have been handled;
- kicking the receiver the right way, i.e. as outlined in the GEM
documentation in batches of 4 and by pointing it to the descriptor
after the last valid one;
- calling gem_rint() before gem_tint() in gem_intr() as gem_tint() may
take quite a while;
- doubling the size of the RX ring to 256 descriptors.
Overall the RX performance of a GEM in a 1GHz Sun Fire V210 was improved
from ~100Mbit/s to ~850Mbit/s.
o In gem_add_rxbuf() don't assign the newly allocated mbuf to rxs_mbuf
before calling bus_dmamap_load_mbuf_sg(), if bus_dmamap_load_mbuf_sg()
fails we'll free the newly allocated mbuf, unable to recycle the
previous one but a NULL pointer dereference instead.
o In gem_init_locked() honor the return value of gem_meminit().
o Simplify gem_ringsize() and dont' return garbage in the default case.
Based on OpenBSD.
o Don't turn on MAC control, MIF and PCS interrupts unless GEM_DEBUG is
defined as we don't need/use these interrupts for operation.
o In gem_start_locked() sync the DMA maps of the descriptor rings before
every kick of the transmitter and not just once after enqueuing all
packets as the NIC might instantly start transmitting after we kicked
it the first time.
o Keep state of the link state and use it to enable or disable the MAC
in gem_mii_statchg() accordingly as well as to return early from
gem_start_locked() in case the link is down. [3]
o Initialize the maximum frame size to a sane value.
o In gem_mii_statchg() enable carrier extension if appropriate.
o Increment if_ierrors in case of an GEM_MAC_RX_OVERFLOW error and in
gem_eint(). [3]
o Handle IFF_ALLMULTI correctly; don't set it if we've turned promiscuous
group mode on and don't clear the flag if we've disabled promiscuous
group mode (these were mostly NOPs though). [2]
o Let gem_eint() also report GEM_INTR_PERR errors.
o Move setting sc_variant from gem_pci_probe() to gem_pci_attach() as
device probe methods are not supposed to touch the softc.
o Collapse sc_inited and sc_pci into bits for sc_flags.
o Add CTASSERTs ensuring that GEM_NRXDESC and GEM_NTXDESC are set to
legal values.
o Correctly set up for 802.3x flow control, though #ifdef out the code
that actually enables it as this needs more testing and mainly a proper
framework to support it.
o Correct and add some conversions from hard-coded functions names to
__func__ which were borked or forgotten in if_gem.c rev. 1.42.
o Use PCIR_BAR instead of a homegrown macro.
o Replace sc_enaddr[6] with sc_enaddr[ETHER_ADDR_LEN].
o In gem_pci_attach() in case attaching fails release the resources in
the opposite order they were allocated.
o Make gem_reset() static to if_gem.c as it's not needed outside that
module.
o Remove the GEM_GIGABIT flag and the associated code; GEM_GIGABIT was
never set and the associated code was in the wrong place.
o Remove sc_mif_config; it was only used to cache the contents of the
respective register within gem_attach().
o Remove the #ifdef'ed out NetBSD/OpenBSD code for establishing a suspend
hook as it will never be used on FreeBSD.
o Also probe Apple Intrepid 2 GMAC and Apple Shasta GMAC, add support for
Apple K2 GMAC. Based on OpenBSD.
o Add support for Sun GBE/P cards, or in other words actually add support
for cards based on GEM to gem(4). This mainly consists of adding support
for the TBI of these chips. Along with this the PHY selection code was
rewritten to hardcode the PHY number for certain configurations as for
example the PHY of the on-board ERI of Blade 1000 shows up twice causing
no link as the second incarnation is isolated.
These changes were ported from OpenBSD with some additional improvements
and modulo some bugs.
o Add code to if_gem_pci.c allowing to read the MAC-address from the VPD on
systems without Open Firmware.
This is an improved version of my variant of the respective code in
if_hme_pci.c
o Now that gem(4) is MI enable it for all archs.
Pointed out by: yongari [1]
Suggested by: rwatson [2], yongari [3]
Tested on: i386 (GEM), powerpc (GMACs by marcel and yongari),
sparc64 (ERI and GEM)
Reviewed by: yongari
Approved by: re (kensmith)
33MHz for calculating the latency timer values for its children.
Inspired by NetBSD doing the same and Linux as well as OpenSolaris
using a similar approach.
While at it rename a variable and change its type to be more
appropriate fuer values of PCI properties so the variable can be
more easily reused.
- Initialize the cache line size register of PCI devices to a
legal value; the cache line size is limited to 64 bytes by the
Fireplane/Safari, JBus and UPA interconnection busses. Setting
it to an unsupported value caused bad performance at least with
GEM as it causes them to not do cache line bursts and to not
issue cache line commands on the PCI bus.
Approved by: re (kensmith)
MFC after: 1 week
timeout occurring at exactly the same time. If this happens, the nfsiod
exits although there may be a queued async IO request for it.
Found by : Kris Kennaway
Approved by: re
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
This patch was part of ACPI-CA 20070508 release and the
following is excerpt from its change log:
Fixed a problem where the Global Lock handle was not properly
updated if a thread that acquired the Global Lock via executing
AML code then attempted to acquire the lock via the
AcpiAcquireGlobalLock interface. Reported by Joe Liu.
Approved by: re (kensmith)
Tested by: ambrisko
Obtained from: Intel
polling/interrupt-driven fallback and instead use polling only during
boot and pure interrupt-driven mode after boot. Polled mode could be
relegated completely to a legacy role if we could enable interrupts
during boot. Polled mode can be forced after boot by setting
debug.acpi.ec.polled="1", i.e. if there are timeouts.
- Use polling only during boot, shutdown, or if requested by the user.
Otherwise, use a generation count of GPEs, incremented atomically. This
prevents an old status value from being used if the EC is really slow
and the same condition (i.e. multiple IBEs for a write transaction) is
being checked.
- Check for and run the query handler directly if the SCI bit is set in
the status register during boot. Previously, the query handler wouldn't
run until interrupts were finally enabled late in boot.
- During boot and after starting a command, check if the event appears
to already have occurred before we even start waiting. If so, it's
possible the EC is very slow and we might accept an old status value.
Print a warning in this case. Once we've booted, interrupt-driven mode
should work just fine but polled mode could be unreliable. There's not
much more we can do about this until interrupts are enabled during boot.
- In the above case, we also do one final check if the interrupt-driven
mode gets a timeout. If the status is complete, it will force the
system back into polled mode since interrupt mode doesn't work. For
polled mode during boot, if the status appears to be already complete
before beginning the check loop, it waits 10 us before actually checking
the status, just in case the EC is really slow and hasn't gotten to work
on the new request yet.
- Use upper-case hex for the _Qxx method
- Use device_printf for errors, don't hide them under verbose
- Increase default total timeout to 750 ms and decrease polling interval
to 5 us.
- Don't pass the status value via the softc. Just read it directly.
- Remove the mutex. We use the sx lock for transaction serialization
with the query handler.
- Remove the Intel copyright notice as no code of theirs was ever
present in this file (verified against rev 1.1)
- Allow KTR module-only builds for ease of testing
Thanks to jkim and Alexey Starikovskiy for helpful discussions and testing.
Approved by: re
MFC after: 2 weeks
- Reintegrate the ANSI C function declaration change
from tcp_timer.c rev 1.92
- Reorganize the tcpcb structure so that it has a single
pointer to the "tcp_timer" structure which contains all
of the tcp timer callouts. This change means that when
the single tcp timer change is reintegrated, tcpcb will
not change in size, and therefore the ABI between
netstat and the kernel will not change.
Neither of these changes should have any functional
impact.
Reviewed by: bmah, rrs
Approved by: re (bmah)
route and once they are done with it, call rtfree(). rtfree() should
only be used when we are certain we hold the last reference to the
route. This bug results in console messages like the following:
rtfree: 0xc40f7000 has 1 refs
This patch switches the rtfree() to use RTFREE_LOCKED() instead,
which should handle the reference counting on the route better.
Approved by: re@ (gnn)
Reviewed by: bms
Reported by: many via net@ and current@
Tested by: many
All active fields in fsi are advisory/optional, so we shouldn't do
extra work to make them valid at all times, but instead we write to
the fsi too often (we still do), and we searched for a free cluster
for fsinxtfree too often.
This commit just removes the whole search and its results, so that we
write out our in-core copy of fsinxtfree instead of writing a "fixed"
copy and clobbering our in-core copy. This saves fixing 3 bugs:
- off-by-1 error for the end of the search, resulting in fsinxtfree
not actually being adjusted iff only the last cluster is free.
- missing adjustment when no clusters are free.
- off-by-many error for the start of the search. Starting the search
at 0 instead of at (the in-core copy of) fsinxtfree did more than
defeat the reasons for existence of fsinxtfree. fsinxtfree exists
mainly to avoid having to start at 0 for just the first search per
mount, but has the side effect of reducing bias towards allocating
near cluster 0. The bias would normally only be generated by the
first search per mount (if fsinxtfree is not supported), but since
we also adjusted the in-core copy of fsinxtfree here, we were doing
extra work to maximize the bias.
Approved by: re (kensmith)
providers with limited physical storage and add physical storage as
needed.
Submitted by: Ivan Voras
Sponsored by: Google Summer of Code 2006
Approved by: re (kensmith)
- Improve load long-term load balancer by always IPIing exactly once.
Previously the delay after rebalancing could cause problems with
uneven workloads.
- Allow nice to have a linear effect on the interactivity score. This
allows negatively niced programs to stay interactive longer. It may be
useful with very expensive Xorg servers under high loads. In general
it should not be necessary to alter the nice level to improve interactive
response. We may also want to consider never allowing positively niced
processes to become interactive at all.
- Initialize ccpu to 0 rather than 0.0. The decimal point was leftover
from when the code was copied from 4bsd. ccpu is 0 in ULE because ULE
only exports weighted cpu values.
Reported by: Steve Kargl (Load balancing problem)
Approved by: re
Eliminates panics due to locking issues.
Idea taken from src/sys/gnu/fs/xfs/FreeBSD/xfs_super.c.
PR: 89966, 92000, 104393
Reported by: H. Matsuo <hiroshi50000 yahoo co jp>,
Chris <m2chrischou gmail.com>,
Andrey V. Elsukov <bu7cher yandex ru>,
Jan Henrik Sylvester <me janh de>
Approved by: re (kensmith)
simplifies code and should speedup pppoe_findsession() function which is
called for every incoming packet.
Approved by: re (kensmith), glebius (mentor)
changes the units from seconds to the value of 'ticks' when swapped
in/out. ULE does not have a periodic timer that scans all threads in
the system and as such maintaining a per-second counter is difficult.
- Change computations requiring the unit in seconds to subtract ticks
and divide by hz. This does make the wraparound condition hz times
more frequent but this is still in the range of several months to
years and the adverse effects are minimal.
Approved by: re
last interface should own the address, but the current code
fumbles the handoff. This fixes that.
- move address related debugs to PCB4 and add additional ones to
help in debugging address problems.
Approved by: re@freebsd.org (K Smith)
changes the units from seconds to the value of 'ticks' when swapped
in/out. ULE does not have a periodic timer that scans all threads in
the system and as such maintaining a per-second counter is difficult.
- Change computations requiring the unit in seconds to subtract ticks
and divide by hz. This does make the wraparound condition hz times
more frequent but this is still in the range of several months to
years and the adverse effects are minimal.
Approved by: re
In particular:
- smp_tlb_mtx is no longer used, so it is axed.
- smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in
the table, however, has been the source of a false positive LOR reporting
with the dt_lock. However, smp rendezvous lock would have had sched_lock
there for older lock, so it wasn't still a leaf lock.
- allpmaps is only used in ia32 architecture, so it is inserted in the
appropriate stub.
Addictionally:
- kse_zombie_lock is no longer present, so its definition is axed out.
- zombie_lock doesn't need to have an exported symbol, so just let's it be
declared as static.
Tested by: kris
Approved by: jeff (mentor)
Approved by: re
Together with the sys/i386/i386/trap.c rev. 1.306 it fixes the PR.
Submitted by: rdivacky
Suggested by: jhb
Sponsored by: Google Summer of Code 2007
PR: kern/77710
Approved by: re (kensmith)
- The output routine of low level console is not protected by any lock
by default.
- Increment and decrement of sc->write_in_progress are not atomic and
this may cause console hang.
- We also have many other states used by emulator that should be protected
by the lock.
- This change does not fix interspersed messages which PRINTF_BUFR_SIZE
kernel option should fix.
Approved by: re (bmah)
MFC after: 1 week
commented out until I can re-test them on all our architectures. I
had re@ approval to commit this a long time ago, but that's before we
were this close to the branch.
Approved by: re@
controller if it's sole child device has the "usb" device class.
Previously ehci(4) would think that PCI-ISA bridges on the same slot
(such as in some Intel ICHs) were "neighbors" resulting in spurious
warnings about neighbor count mismatches.
- Fix a memory leak when looking for neighbors.
MFC after: 1 week
Approved by: re (kensmith)
Tested by: phk
to become unkillable when that process is sent a termination signal. The
process will sit in waitvt looping in the kernel, and chewing up all
available CPU until the system is rebooted.
Submitted by: Jilles Tjoelker <jilles@stack.nl>
Reviewed by: bde
Approved by: re (kensmith)
MFC after: 1 week
to the node before starting the work, otherwise the node may go
away before a reference is made in ieee80211_send_mgmt.
Approved by: re (blanket wireless)
Obtained from: Atheros
3 arguments, but we had forgotten the second argument. Also make the
Linux statfs64 struct depend on the architecture because it has an
extra 4 bytes padding on amd64 compared to i386.
The three argument fix is from David Taylor, the struct statfs64
stuff is my fault. With this patch I can install i386 Linux matlab
on an amd64 machine.
Submitted by: David Taylor <davidt_at_yadt.co.uk>
Approved by: re (kensmith)
the mfi(4) LSI MegaSAS RAID card. Looking at the Linux driver for the
mpt(4) it should be 0x0062 and not 0x0060. Tested with an mfi card
of this device id.
Approved by: re (bmah)
Reviewed by: scottl
MFC after: 3 days
also involves macro changes to have a RLOCK and a WLOCK
and placing the correct version within the code.
- The INP-INFO lock is changed to a rwlock.
- When sctp_shutdown() is called on Mac OS X, the socket lock is held.
So call sctp_chunk_output with SCTP_SO_LOCKED and
not SCTP_SO_NOT_LOCKED.
- Add SCTP_IPI_ADDR_[RW]LOCK and SCTP_IPI_ADDR_[RW]UNLOCK for Mac OS X.
- u_int64_t -> uint64_t
- add missing addr unlock for error return path
Approved by: re@freebsd.org (K Smith)
was used in assembler code in such a way that no unresolved relocation
records were generated, so ld didn't flag the problem. You can see
this with an 'nm' of the kernel. There will be 'U MAXCPU' on SMP systems.
The impact of this is that the intrcount/intrnames arrays do not have
the intended amount of space reserved. This could lead to interesting
problems due to the arrays being present in the middle of kernel code.
An overflow would be rather interesting as executable code would be used
as per-cpu incrementing interrupt counters.
This fixes it for now by exporting MAXCPU to the assembler. A better fix
might be to define these data structures in C - they're only referenced
in the kernel from C code these days anyway.
Approved by: re (kensmith)
o add driver callback to handle notification of beacon changes;
this is required for devices that manage beacon frames themselves
(devices must override the default handler which does nothing)
o move beacon update-related flags from ieee80211com to the beacon
offsets storage (or handle however a driver wants)
o expand beacon offsets structure with members needed for 11h/dfs
and appie's
o change calling convention for ieee80211_beacon_alloc and
ieee80211_beacon_update
o add overlapping bss support for 11g; requires driver to pass
beacon frames from overlapping bss up to net80211 which is not
presently done by any driver
o move HT beacon contents update to a routine in the HT code area
Reviewed by: avatar, thompsa, sephe
Approved by: re (blanket wireless)
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.
Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)
does not have a rate table in older hal's so if we scan such a
channel the driver will hit an assertion or crash; for old hal's
fallback to using the static turbo rate table for this mode
(not correct but good enough for now given none of the rate
control algorithms understand how to switch between base+boost)
Approved by: re (blanket wireless)
mode works properly, previously the hostap channel could not be changed off #3.
Fix an ifp/sc misuse while I am here.
Reported by: many
Approved by: re (bmah)
- Fix panic from mutex unlock on freed lock when ASCONF-ACK
aborts an assoc
- Fix panic from addr lock recursion when ASCONFs are queued
in the front states
- ASCONFs "queued" in the front states should really be
bundled after the COOKIE-ACK, not in front of it
- Fix issue with addresses deleted in the front states from
being sent with ASCONF(DELETE)-- replaced
sctp_asconf_queue_add_sa() with delete specific function
- Comment change in sctp.h the drafts are now RFC's
Approved by: re@freebsd.org (B Mah)
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren. They should
be. This changes fixes that.
It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc(). I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago. This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().
Approved by: re (kensmith)
status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have
already been recycled, i.e., freed and reallocated to a new purpose;
thus, asserting that such pages cannot be written is inappropriate.
Reported by: kris
Submitted by: tegge
Approved by: re (kensmith)
MFC after: 1 week
change interrupt if the link is established with link parter. However
interrupt handler didn't acknowledge the interrupt if nfe(4) was not
running at the time of interrupt delivery. This caused endless
interrupt generation. Fix the bug by acknowledging the interrupt
regardless of running state of the driver.
PR: kern/116295
Submitted by: Mark Derbyshire (mark At taom dot com)
Approved by: re (kensmith)
per-primitive macros like MTX_NOPROFILE, SX_NOPROFILE or RW_NOPROFILE) is
not really honoured. In particular lock_profile_obtain_lock_failure() and
lock_profile_obtain_lock_success() are naked respect this flag.
The bug leads to locks marked with no-profiling to be profiled as well.
In the case of the clock_lock, used by the timer i8254 this leads to
unpredictable behaviour both on amd64 and ia32 (double faults panic,
sudden reboots, etc.). The amd64 clock_lock is also not marked as
not profilable as it should be.
Fix these bugs adding proper checks in the lock profiling code and at
clock_lock initialization time.
i8254 bug pointed out by: kris
Tested by: matteo, Giuseppe Cocomazzi <sbudella at libero dot it>
Approved by: jeff (mentor)
Approved by: re
incorrect and should be OFF letting IP fragment
large cookie-echos.
- Rename sysctl variable logging to log_level.
- Fix description of sysctl variable stats.
- Add sysctl variable log to make sctp_log readable via sysctl
mechanism (this is by compile switch and targets non KTR platforms or
when someone wants to do performance wise tracing).
- Removed debug code
Approved by: re@freebsd.org (B Mah)
stream (using EEOR mode). Changed to EINVAL (in sctp_output.c)
- Static analysis comments added
- fix in mobility code to return a value (static analysis found).
- sctp6_notify function made visible instead of
static (this is needed for Panda).
Approved by: re@freebsd.org (B Mah)
races for some struct thread members.
More specifically, this bug seems responsible for some memory dumping
problems people were experiencing.
Fix this adding correct thread locking.
Tested by: rwatson
Submitted by: tegge
Approved by: jeff
Approved by: re
matches the BPF registers (which are the only thing that is assigned
to/from BPF memory). This is a pedantic change that shouldn't change
any behaviour.
PR: 115931
Submitted by: Matthew Luckie <mjl@luckie.org.nz>
Approved by: re (bmah)
MFC after: 3 weeks
Fix a few while (!uart_getreg() & SR1_TNF) when
while (!(uart_getreg() & SR18TNF)) was really meant.
This driver should die anyway, it's awful, and uart_ns8250 should be fine
for the StrongArm 1110. I'll kill it later.
Submitted by: Mikhael Skvorts
Approved by: re (blanket)
prevents insmntque() from placing reallocated syncer vnode on mount
list, that causes panic in vfs_allocate_syncvnode().
Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is
not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is
set and cleared simultaneously with MNTK_UNMOUNT, except on umount error
path, where it is cleaned just before the syncer vnode is going to be
allocated.
Reported by: Peter Jeremy <peterjeremy optushome com au>
Suggested by: tegge
Approved by: re (rwatson)
other changes too).
(without any real order)
1. Use device_get_nameunit for mutex naming
2. Add timer for low-latency playback
3. Move most mixer controls from sysctls to mixer(8) controls.
This is a largest part of this patch.
4. Add analog/digital switch (as a temporary sysctl)
5. Get back support for low-bitrate playback (with help of (2))
6. Change locking for exclusive I/O. Writing to non-PTR register
is almost safe and does not need to be ordered with PTR operations.
7. Disable MIDI until we get it to detach properly and fix memory
managment problems.
8. Enable multichannel playback by default. It is as stable as
single-channel mode. Multichannel recording is still an
experimental feature.
9. Multichannel options can be changed by loader tunables.
10. Add a way to disable card from a loader tunable.
11. Add new PCI IDs.
12. Debugger settings are loader tunables now.
14. Remove some unused variables.
15. Mark pcm sub-devices MPSAFE.
16. Partially revert (bus_setup_intr -> snd_setup_intr) since it need
to be done independently
Submitted by: Yuriy Tsibizov (driver maintainer)
Approved by: re (bmah)
The control input routine passes a NULL as its void argument when it
has reached the innermost header, which terminates the loop.
Reported by: Pawel Worach <pawel.worach@gmail.com>
Approved by: re
topology foo functions.
Working at the patch for topology problems in ia32/amd64 evicted some
problems regarding functions ordering in the SI_SUB_CPU family of
SYSINIT'ed subsystems.
In order to avoid problems with new modified to involved functions, a
correct ordering is not semantically specified for SI_SUB_CPU functions
(for a larger view of the issue please visit:
http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html )
Discussed with: peter
Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org>
Approved by: jeff
Approved by: re
- duplicate #define in header, thanks to Kevin Lo for pointing out.
- incorrect BUSMASTER enable logic, thanks Patrick Oeschger
- 82543 fails due to bogus IO BAR logic
- Allow 82571 to use MSI interrupts
- Checksum Offload for UDP not working on 82575
Approved by:re
value, so we don't run out of KVA. The default vnodes limit fits better for
UFS, but ZFS allocated more file system specific memory for a vnode than UFS.
Don't touch vnodes limit if we detect it was tuned by system administrator
and restore original value when ZFS is unloaded.
This isn't final fix, but before we implement something better, this will
help to stabilize ZFS under heavy load on i386.
Approved by: re (bmah)
code comes from.
- Fix a LOR on Mac OS X: Do not hold an stcb lock when
calling soisconnected for a socket which has the
SS_INCOMP bit set on so_state.
- fix a comment to be non c++ style.
Approved by: re@freebsd.org (B Mah)
jumping to dropunlock to avoid a panic. While here move the calls to
ipsec4_in_reject() and ipsec6_in_reject() so they are after we obtain
the lock on inp.
Original patch to avoid panic: pjd
Review of locking adjustments: gnn, sam
Approved by: re (rwatson)
- Resort includes a bit.
- Correct typos and wording problems in comments.
- Rename udpcksum to udp_cksum to be consistent with other UDP-related
configuration variables.
- Remove indirection of udp_notify through local notify variable in
udp_ctlinput(), which is presumably due to copying and pasting from TCP,
where multiple notify routines exist.
Approved by: re (kensmith)
is given (with newfs or tunefs) and dirsize overflows.
In case dirsize is <= 0 because of an overflow set maxcontigdirs
to 0 so it will be 1 later. This is what would happen for large
fs_avgfilesize. [1]
Identified with help from: roberto, pjd
Submitted by: pjd [1]
Approved by: re (rwatson)
MFC after: 8 days
ioctl().
Note that other information provided by ifconfig(8) such like "list chan"
or "list ap" are still not available at this moment.
Before an(4) is connected to wlan(4), users are encouraged to use
ancontrol(8) to retrieve aforementioned information.
Reported by: dhw (http://lists.freebsd.org/pipermail/freebsd-current/2007-July/074848.html)
Reviewed by: ambrisko
Tested by: dhw
Approved by: re (bmah)
flags, the absense of these flags causes problems in other areas such as
bridging which expect them to be correct.
At the moment only Ethernet DLTs are checked.
Reviewed by: bms, csjp, sam
Approved by: re (bmah)
point to mac_check_vnode_unlink(), reflecting UNIX naming conventions.
This is the first of several commits to synchronize the MAC Framework
in FreeBSD 7.0 with the MAC Framework as it will appear in Mac OS X
Leopard.
Reveiwed by: csjp, Samy Bahra <sbahra at gwu dot edu>
Submitted by: Jacques Vidrine <nectar at apple dot com>
Obtained from: Apple Computer, Inc.
Sponsored by: SPARTA, SPAWAR
Approved by: re (bmah)
- fix the use after free seen when sending packets small enough to fit as an immediate
and bpf peers are present
- update to firmware rev 4.7 along with various small vendor fixes
Supported by: Chelsio
Approved by: re (blanket)
MFC after: 3 days
the recent send code, but uio may be NULL on sendfile
calls. Change to use sndlen variable.
- EMSGSIZE is not being returned in non-blocking mode
and needs a small tweak to look if the msg would
ever fit when returning EWOULDBLOCK.
- FWD-TSN has a bug in stream processing which could
cause a panic. This is a follow on to the codenomicon
fix.
- PDAPI level 1 and 2 do not work unless the reader
gets his returned buffer full. Fix so we can break
out when at level 1 or 2.
- Fix fast-handoff features to copy across properly on
accepted sockets
- Fix sctp_peeloff() system call when no true system call
exists to screen arguments for errors. In cases where a
real system call exists the system call itself does this.
- Fix raddr leak in recent add-ip code change for bundled
asconfs (even when non-bundled asconfs are received)
- Make sure ipi_addr lock is held when walking global addr
list. Need to change this lock type to a rwlock().
- Add don't wake flag on both input and output when the
socket is closing.
- When deleting an address verify the interface is correct
before allowing the delete to process. This protects panda
and unnumbered.
- Clean up old sysctl stuff and get rid of the old Open/Net
BSD structures.
- Add a function to watch the ranges in the sysctl sets.
- When appending in the reassembly queue, validate that
the assoc has not gone to about to be freed. If so
(in the middle) abort out. Note this especially effects
MAC I think due to the lock/unlock they do (or with
LOCK testing in place).
- Netstat patch to get rid of warnings.
- Make sure that no data gets queued to inactive/unconfirmed
destinations. This especially effect CMT but also makes a
impact on regular SCTP as well.
- During init collision when we detect seq number out
of sync we need to treat it like Case C and discard
the cookie (no invarient needed here).
- Atomic access to the random store.
- When we declare a vtag good, we need to shove it
into the time wait hash to prevent further use. When
the tag is put into the assoc hash, we need to remove it
from the twait hash (where it will surely be). This prevents
duplicate tag assignments.
- Move decr-ref count to better protect sysctl out of
data.
- ltrace error corrections in sctp6_usrreq.c
- Add hook for interface up/down to be sent to us.
- Make sysctl() exported structures independent of processor
architecture.
- Fix route and src addr cache clearing for delete address case.
- Make sure address marked SCTP_DEL_IP_ADDRESS is never selected
as src addr.
- in icmp handling fixed so we actually look at the icmp codes
to figure out what to do.
- Modified mobility code.
Reception of DELETE IP ADDRESS for a primary destination and
SET PRIMARY for a new primary destination is used for
retransmission trigger to the new primary destination.
Also, in this case, destination of chunks in send_queue are
changed to the new primary destination.
- Fix so that we disallow sending by mbuf to ever have EEOR
mode set upon it.
Approved by: re@freebsd.org (B Mah)
additional flags to many function calls. The flags only
get used in BSD when we compile with lock testing. These
flags allow apple to escape the "giant" lock it holds on
the socket and have more fine-grained locking in the NKE.
It also allows us to test (with witness) the locking used
by apple via a compile switch (manually applied).
Approved by: re@freebsd.org(B Mah)
- Fix copyrights, comments in UDPv6.
- Remove macro defines for in6pcb and udp6stat.
- Consistently refer to inpcbs as 'inp' and not also 'in6p'.
Reviewed by: gnn, jinmei, bz
Approved by: re (bmah)
TCP timers as a single timer, but retain the API changes necessary to
reintroduce this change. This will back out the source of at least two
reported problems: lock leaks in certain timer edge cases, and TCP timers
continuing to fire after a connection has closed (a bug previously fixed and
then reintroduced with the timer rewrite).
In a follow-up commit, some minor restylings and comment changes performed
after the TCP timer rewrite will be reapplied, and a further change to allow
the TCP timer rewrite to be added back without disturbing the ABI. The new
design is believed to be a good thing, but the outstanding issues are
leading to significant stability/correctness problems that are holding
up 7.0.
This patch was generated by silby, but is being committed by proxy due to
poor network connectivity for silby this week.
Approved by: re (kensmith)
Submitted by: silby
Tested by: rwatson, kris
Problems reported by: peter, kris, others
that can lead to a panic when the stick is yanked.
- make sure that zyd_attach() returns 0 or errno.
Submitted by: Weongyo Jeong <weongyo.jeong@gmail.com>
Reported by: Ted Lindgreen <ted@tednet.nl>
Reviewed by: sam
Approved by: re (blanket wireless)
with the INTR_FILTER-enabled MI code. Basically this consists of
registering an interrupt controller (of which there can be multiple
and optionally different ones either per host-to-foo bridge or shared
amongst host-to-foo bridges in any one machine) along with an interrupt
vector as specific argument for all the interrupt vectors used by a
given host-to-foo bridge (roughly similar to registering interrupt
sources on amd64 and i386), providing functions to enable, clear and
disable the interrupts of the children beneath the bridge.
This also includes:
- No longer entering a critical section in tl0_intr() and tl1_intr()
for executing interrupt handlers but rather let the handlers enter
it themselves so in the case of intr_event_handle() we don't enter
a nested critical section.
- Adding infrastructure for binding delivery of interrupt vectors to
specific CPUs which later on can be interfaced with the code from
amd64/i386 for binding interrupts to specific CPUs.
- Getting rid of the wrapper hack introduced along the lines of the
API changes for INTR_FILTER which as a side-effect caused interrupts
associated with ithread handlers only to get the elevated priority
of those associated with filters ("fast handlers") (this removes the
hack also in the non-INTR_FILTER case).
- Disabling (by not clearing) an interrupt in the interrupt controller
until all associated handlers have been executed, which is crucial
for the typical locking strategy of NIC drivers in order to work
correctly in case of shared interrupts. This was a more or less
theoretical problem on sparc64 though, as shared interrupts are
rather uncommon there except for the on-board SCCs and UARTs.
Note that due to the behavior of at least of some of the interrupt
controllers used on sparc64 an enable+EOI instead of a disable+EOI
approach (as implied by the INTR_FILTER MI code and implemented on
other architectures) is used as the latter can cause lost interrupts
or in the worst case interrupt starvation.
o Correct a typo in sbus_alloc_resource() which caused (pass-through)
allocations to only work down to the grandchildren of the bus, which
wasn't a real problem so far as we don't support any devices which are
great-grandchildren or greater of a U2S bridge, yet.
o In fhc(4) use bus_{read,write}_4() instead of bus_space_{read,write}_4()
in order to get rid of sc_bh and sc_bt in the fhc_softc. Also get rid
of some other unneeded members in fhc_softc.
Reviewed by: marcel (earlier version)
Approved by: re (kensmith)
o reset ni_inact when ni_inact_reload is changed so we're
assured a valid setting
o never let ni_inact go negative
o add a knob to disable hostap sta idle handling (e.g. so it can be done
by a user application)
o remove bogus reload on associate
Reviewed by: avatar
Approved by: re (blanket wireless)
o update ic_lastdata to reflect time of last outbound frame
o outbound traffic must preempt/cancel bg scanning to avoid delays
This stuff was somehow missed in the initial import.
Reviewed by: thompsa, avatar, sephe (earlier version)
Approved by: re (blanket wireless)
o add ic_extieee to hold the HT40 extension channel number
o add ic_state to track dynamic channel state for DFS
o add flags to mark regulatory channel requirements
o add state defs for DFS/radar support
Reviewed by: avatar
Approved by: re (blanket wireless)
o update 11n definitions to D2.0 spec
o add IEEE80211_CAPINFO_SPECTRUM_MGMT for DFS support
o add CSA ie definition for DFS support
o purge some unused definitions
o correct 802.11 reason and status codes
o correct reason code returned when a sta tries to associate to an
ap operating with WPA/RSN but without a WPA/RSN ie
Reviewed by: thompsa, avatar
Approved by: re (blanket wireless)
device and have had the crypto bits stripped from the 802.11 header
o strip mbuf flags in the rx path before passing up the stack
Reviewed by: thompsa, sephe, avatar
Approved by: re (blanket wireless)
The first drop was Beta, this code is expected to be the release version.
Note that this driver code will build in either 6.2 or 7. If you
use the code in 6.2 you will not get TSO or MSI/X support but it will
function in a legacy mode.
Approved by: re
used, rather than the one passed via 'req', which may not reflect a
rewrite. This call to useracc() is redundant to validation performed by
later copyin()/copyout() calls, so there isn't a security issue here,
but this could technically lead to excessive validation of addresses if
the length in newlen is shorter than req.newlen.
Approved by: re (kensmith)
Reviewed by: jhb
Submitted by: Constantine A. Murenin <cnst+freebsd@bugmail.mojo.ru>
Sponsored by: Google Summer of Code 2007
When any PnP device exists, isa_release_resource() is called with no
activated resource. So a bushandle is not allocated yet.
Approved by: re (kensmith)
can easily block in bread(), and then there was nothing to prevent the
static buffer (nambuf_{ptr,len,last_id}) being clobbered by another
thread.
The effects of the bug seem to have been limited to failed lookups and
mangled names in readdir(), since Giant locking provides enough
serialization to prevent concurrent calls to the functions that access
the buffer. They were very obvious for multiple concurrent tree walks,
especially with a small cluster size.
The bug was introduced in msdosfs_conv.c 1.34 and associated changes,
and is in all releases starting with 5.2.
The fix is to allocate the buffer as a local variable and pass around
pointers to it like "_r" functions in libc do. Stack use from this
is large but not too large. This also fixes a memory leak on module
unload.
Reviewed by: kib
Approved by: re (kensmith)
the callout_lock spin lock and the sleepqueue spin locks. In the fix,
callout_drain() has to drop the callout_lock so it can acquire the
sleepqueue lock. The state of the callout can change while the
callout_lock is held however (for example, it can be rescheduled via
callout_reset()). The previous code assumed that the only state change
that could happen is that the callout could finish executing. This change
alters callout_drain() to effectively restart and recheck everything
after it acquires the sleepqueue lock thus handling all the possible
states that the callout could be in after any changes while callout_lock
was dropped.
Approved by: re (kensmith)
Tested by: kris
needed at least to convince the BIOS to give us access to CPU freq
control on MacBooks.
Submitted by: Rui Paulo <rpaulo / fnop.net>
Approved by: re
MFC after: 5 days
active in failover mode rather than all interfaces with a link. This makes it
clear if the master interface is in use or one of the backup links.
Found by: Writing the Handbook section
Approved by: re (kensmith)
ktruserret() is invoked, an unlocked check of the per-process queue
is performed inline, thus, we don't lock the ktrace_sx on every userret().
Pointy hat to: jhb
Approved by: re (kensmith)
Pointy hat recovered from: rwatson
ZD1211/ZD1211B USB IEEE 802.11b/g wireless network devices. Not (yet)
connected to the build process (next batch of commits once I've looped
the current back back).
Submitted by: Weongyo Jeong
Reviewed by: sam@
Approved by: re@
import. The PF mbuf-tagging support routines changed to link the
allocated tags into the provided mbuf themselves, so the left-over
m_tag_prepend() was trying to add a bogus (usually NULL) tag.
Reviewed by: mlaier
Approved by: re
64-bit counters) to a 4.x statfs structure (with long-sized counters).
- For block counters, we scale up the block size sufficiently large so
that the resulting block counts fit into a the long-sized (long for the
ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs
VOP did this already. This can lie about the block size to 4.x binaries,
but it presents a more accurate picture of the ratios of free and
available space.
- For non-block counters, fix the freebsd32 stats converter to cap the
values at INT32_MAX rather than losing the upper 32-bits to match the
behavior of the 4.x statfs conversion routine in vfs_syscalls.c
Approved by: re (kensmith)
with Linux 2.6 emulation. This shall be reimplemented once FreeBSD gets
native scheduler affinity syscalls.
Submitted by: rdivacky
Reviewed by: jkim
Sponsored by: Google Summer of Code 2007
Approved by: re (kensmith)
Both WWNN and WWPN are 64-bit unsigned integers and they are prefixed
with "0x", which requires two more bytes each.
Submitted by: Danny Braniss (danny at cs dot huji dot ac dot il)
via Matthew Jacob (lydianconcepts at gmail dot com)
Approved by: re (bmah)
MFC after: 3 days
the last message on the send stream was "null" but still
there, a state we allow, we could get hung and not clean
it up and wait for the shutdown guard timer to clear the
association without a graceful close. Fix this so that
that we properly clean up.
- Added support for Multiple ASCONF per new RFC. We only
(so far) accept input of these and cannot yet generate
a multi-asconf.
- Sysctl'd support for experimental Fast Handover feature. Always
disabled unless sysctl or socket option changes to enable.
- Error case in add-ip where the peer supports AUTH and ADD-IP
but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to
ABORT in this case.
- According to the Kyoto summit of socket api developers
(Solaris, Linux, BSD). We need to have:
o non-eeor mode messages be atomic - Fixed
o Allow implicit setup of an assoc in 1-2-1 model if
using the sctp_**() send calls - Fixed
o Get rid of HAVE_XXX declarations - Done
o add a sctp_pr_policy in hole in sndrcvinfo structure - Done
o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch!
- Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize
when we close sending out the data and disabling Nagle.
- Change key concatenation order to match the auth RFC
- When sending OOTB shutdown_complete always do csum.
- Don't send PKT-DROP to a PKT-DROP
- For abort chunks just always checksums same for
shutdown-complete.
- inpcb_free front state had a bug where in queue
data could wedge an assoc. We need to just abandon
ones in front states (free_assoc).
- If a peer sends us a 64k abort, we would try to
assemble a response packet which may be larger than
64k. This then would be dropped by IP. Instead make
a "minimum" size for us 64k-2k (we want at least
2k for our initack). If we receive such an init
discard it early without all the processing.
- When we peel off we must increment the tcb ref count
to keep it from being freed from underneath us.
- handling fwd-tsn had bugs that caused memory overwrites
when given faulty data, fixed so can't happen and we
also stop at the first bad stream no.
- Fixed so comm-up generates the adaption indication.
- peeloff did not get the hmac params copied.
- fix it so we lock the addr list when doing src-addr selection
(in future we need to use a multi-reader/one writer lock here)
- During lowlevel output, we could end up with a _l_addr set
to null if the iterator is calling the output routine. This
means we would possibly crash when we gather the MTU info.
Fix so we only do the gather where we have a src address
cached.
- we need to be sure to set abort flag on conn state when
we receive an abort.
- peeloff could leak a socket. Moved code so the close will
find the socket if the peeloff fails (uipc_syscalls.c)
Approved by: re@freebsd.org(Ken Smith)
pack a set number correctly.
Submitted by: oleg
o Plug a memory leak.
Submitted by: oleg and Andrey V. Elsukov
Approved by: re (kensmith)
MFC after: 1 week
- remove cpl->iff panic - we can't know the port number from the rspq on the 4-port
- pick the ifnet based on the interface in the CPL header
- switch to using qset 0 for egress on the 4-port for now - may change
when we start using RSS
- move ether_ifdetach to before the port lock gets deinitialized to avoid
hang in the case where there are BPF peers (cxgb_ioctl is called indirectly
when BPF peers are present)
- don't call t3_mac_reset if multiport is set, this was causing tx errors
by misconfiguring the MAC on the 4-port
- change V_TXPKT_INTF to use txpkt_intf as the interfaces are not contiguous
- free the mbuf immediately in the case where the payload is small enough to be copied
into the rspq
- only update the coalesce timer if for a queue if packets were taken off of it
- add in missed 20ms DELAY in initializaton vsc8211
- prompt MFC as this only applies to the 4-port which is currently completely
broken - OK'd by kensmith
Supported by: Chelsio
Approved by: re (blanket)
MFC after: 0 days
when peer acks the add in case the routing table changes.
- Fix sctp_lower_sosend to send shutdown chunk for mbuf send
case when sndlen = 0 and sinfoflag = SCTP_EOF
- Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data,
So that it does not send the "null" data mbuf out and cause
it to get freed twice.
- Fix so auto-asconf sysctl actually effect the socket's asconf state.
- Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets.
- Memset bug in sctp_output.c (arguments were reversed) submitted
found and reported by Dave Jones (davej@codemonkey.org.uk).
- PD-API point needs to be invoked >= not just > to conform to socket api
draft this fixes sctp_indata.c in the two places need to be >=.
- move M_NOTIFICATION to use M_PROTO5.
- PEER_ADDR_PARAMS did not fail properly if you specify an address
that is not in the association with a valid assoc_id. This meant
you got or set the stcb level values instead of the destination
you thought you were going to get/set. Now validate if the
stcb is non-null and the net is NULL that the sa_family is
set and the address is unspecified otherwise return an error.
- The thread based iterator could crash if associations were freed
at the exact time it was running. rework the worker thread to
use the increment/decrement to prevent this and no longer use
the markers that the timer based iterator uses.
- Fix the memleak in sctp_add_addr_to_vrf() for the case when it is
detected that ifa is already pointing to a ifn.
- Fix it so that if someone is so insane that they drop the
send window below the minimal add mark, they still can send.
- Changed all state for associations to use mask safe macro.
- During front states in association freeing in sctp_inpcbfree, we
had a locking problem where locks were not in place where they
should have been.
- Free association calls were not testing the return value in
sctp_inpcb_free() properly... others should be cast void returns
where we don't care about the return value.
- If a reference count is held on an assoc, even from the "force free"
we should not do the actual free.. but instead let the timer
free it.
- When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED
flag is set, we must NOT process the packet but handle it like
ootb. This is because while freeing an assoc we release the
locks to get all the higher order locks so we can purge all
the hash tables. This leaves a hole if a packet comes in
just at that point. Now sctp_common_input_processing() will
call the ootb code in such a case.
- Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes
it so we don't have a conflict (I think this is a covertity change).
We made this change AFTER some conversation and looking to make sure
that M_PROTO5 does not have a problem between SCTP and the 802.11
stuff (which is the only other place its used).
- Fixed lock order reversal and missing atomic protection around
locked_tcb during association lookup and the 1-2-1 model.
- Added debug to source address selection.
- V6 output must always do checksum even for loopback.
- Remove more locks around inp that are not needed for an atomically
added/subtracted ref count.
- slight optimization in the way we zero the array in sctp_sack_check()
- It was possible to respond to a ABORT() with bad checksum with
a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT
send a PKT-DROP to any ABORT().
- Add an option for local logging (useful for macintosh or when
you need better performing during debugging). Note no commands
are here to get the log info, you must just use kgdb.
- The timer code needs to be aware of if it needs to call
sctp_sack_check() to slide the maps and adjust the cum-ack.
This is because it may be out of sync cum-ack wise.
- Added threshold managment logging.
- If the user picked just the right size, that just filled the send
window minus one mtu, we would enter a forever loop not copying and
at the same time not blocking. Change from < to <= solves this.
- Sysctl added to control the fragment interleave level which defaults
to 1.
- My rwnd control was not being used to control the rwnd properly (we
did not add and subtract to it :-() this is now fixed so we handle
small messages (1 byte etc) better to bring our rwnd down more
slowly.
Approved by: re@freebsd.org (Bruce Mah)
ICMP error message, do not access th_flags. The field is beyond
the first eight bytes of the header that are required to be present
and were pulled up in the mbuf.
A random value of th_flags can have TH_SYN set, which made the
sequence number comparison not apply the window scaling factor,
which led to legitimate ICMP(v6) packets getting blocked with
"BAD ICMP" debug log messages (if enabled with pfctl -xm), thus
breaking PMTU discovery.
Triggering the bug requires TCP window scaling to be enabled
(sysctl net.inet.tcp.rfc1323, enabled by default) on both end-
points of the TCP connection. Large scaling factors increase
the probability of triggering the bug.
PR: kern/115413: [ipv6] ipv6 pmtu not working
Tested by: Jacek Zapala
Reviewed by: mlaier
Approved by: re (kensmith)
on an down mxge interface
- Fix a bug where mxge reported the link state as
active when it wasn't (after ifconfig down).
- Prevent spurious watchdog resets when link partner is not consuming
- Add support for CX4 and popular XFP media detection
- Update the firmware and associated header files to 1.4.25
Approved by: re (kensmith)
longer create a pv entry for that mapping. (The two exceptions are
mappings into the kernel's exec and pipe submaps.) Consequently, there is
no reason for get_pv_entry() to dig deep into the free page queues, i.e.,
use VM_ALLOC_SYSTEM, by default. This revision changes get_pv_entry() to
use VM_ALLOC_NORMAL by default, i.e., before calling pmap_collect() to
reclaim pv entries.
Approved by: re (kensmith)
ENOENT if the option wasn't provided, instead of setting it to 0.
xfs however didn't catch up on this, so it assumed something went bad if
vfs_getopts() sets the error to non-zero, and just returns the error.
Unbreak xfs mount by just ignoring the error if vfs_getopts() sets the
error to ENOENT, as we should have sane defaults.
Reviewed by: kan
Approved by: re (rwatson)
Tested by: rpaulo
For this, introduce vm_map_fixed() that does that for MAP_FIXED case.
Dropping the lock allowed for parallel thread to occupy the freed space.
Reported by: Tijl Coosemans <tijl ulyssis org>
Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks
aio_proc_rundown.
Do not allow for zero-length read to be passed to the fo_read file method
by aio.
Reported and tested by: Peter Holm
Approved by: re (kensmith)
of the bits we want to ignore on the first pass rather than doing a
linear scan. This puts us within a few instructions of the cost of
runq_findbit() and removes this function from the top of profiling output
for context switch heavy workloads.
Approved by: re
on 2cpu machines by reducing it to 1 by default. This improves loaded
operation on 8cpu machines by increasing it to 3 where the extra idle
time is not as critical.
Approved by: re
have caused a hang, but we got lucky with the available multi-CPU states
on actual hardware.
Submitted by: Bjorn Koenig <bkoenig / alpha-tierchen.de>
Approved by: re
MFC after: 3 days
sys/vm/device_pager.c:
Protect the creation of the phys pager with non-NULL handle with the
phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now
synchronized with its removal from the list, and phys_pager_mtx is put
before vm object lock in lock order. Dispose the phys_pager_alloc_lock
and tsleep calls, together with acquiring Giant, since phys_pager_mtx
now covers the same block.
Reviewed by: alc
Approved by: re (kensmith)
Without it some errors may left unnoticed and unhandeled
that will lead to hooks left in half-connected state.
Reviewed by: julian@
Approved by: re (kensmith), glebius (mentor)
- Remove unneeded WLOCK/UNLOCK of inp for getting TCB lock.
- Fix panic that may occur when freeing an assoc that has partial
delivery in progress (may dereference null socket pointer when
queuing partial delivery aborted notification)
- Some spacing and comment fixes.
- Fix address add handling to clear cached routes and source addresses
when peer acks the add in case the routing table changes.
Approved by: re@freebsd.org (Bruce Mah)
projected_offset against isn_offset to account for
wrap around.
Reviewed by: gnn, kmacy, silby
Submitted by: yusheng.huang@bluecoat.com
Approved by: re
MFC: 3 days
and newer CPUs (including Core 2 and Core / Core 2 based Xeons). The
driver attaches to each cpu device and creates a sysctl node in that
device's sysctl context (dev.cpu.N.temperature). When invoked, the
handler binds to the appropriate CPU to ensure a correct reading.
Submitted by: Rui Paulo <rpaulo@fnop.net>
Sponsored by: Google Summer of Code 2007
Tested by: des, marcus, Constantine A. Murenin, Ian FREISLICH
Approved by: re (kensmith)
MFC after: 3 weeks
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)
Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow
In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home
Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)
Requested by: jhb
Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by: re (bmah)
Proxy commit for: rodrigc
Without this the PHY wouldn't work as expected. This should fix
dual-boot Windows XP machine where RealTek Windows drivers put the
PHY in power down mode during shutdown. The magic PHY register
accesses come from RealTek driver. No datasheets mention the magic
PHY registers.
In general, the PHY wakeup code should go into PHY driver. However it
seems that it only apply to RTL8169S single chip and it would be
another hack if we have rgephy(4) check what parent driver/chip model
is attached.
Reported by: lofi, Laurens Timmermans ( laurens AT timkapel DOT nl )
Tested by: lofi
Obtained from: RealTek FreeBSD driver
Approved by: re (Ken Smith)
- Don't leak the config lock if detach() fails due to the controller char
dev being open.
- Close a race between detach() and a process opening the controller char
dev.
MFC after: 1 week
Approved by: re (bmah)
Fix a resource allocation bug (explained by jhb on -acpi)
Thanks for Mike Tancsa for testing and helping track down the bug.
Approved by: re (kensmith)
MFC after: 3 weeks
detailed status on each of the backing subdisks. This allows userland
to see which subdisks are online, failed, missing, or a hot spare.
MFC after: 1 week
Approved by: re (bmah)
Reviewed by: sos
differ in their details with calls to a new function, ehci_hcreset(),
that performs the reset.
The original sequences either had no delay or a 1ms delay between
telling the controller to stop and asserting the controller reset
bit. One instance of the original reset sequence waited for the
controller to indicate that its reset was complete before continuing,
but the other two immediately let the subsequent code execute. The
latter is a problem on some hardware, because a read of the HCCPARAMS
register returns an incorrect value while the reset is in progress,
which triggers an infinite loop in ehci_pci_givecontroller(), which
hangs the system on shutdown.
The reset sequence in ehci_hcreset() starts with the most complete
instance from the original code, which contains a loop to wait for
the controller to indicate that its reset is complete. This appears
to be the correct thing to do according to "Enhanced Host Controller
Interface Specification for Universal Serial Bus" revision 1.0,
section 2.3.1. Add another loop to wait for the controller to
indicate that it has stopped before setting the HCRESET bit. This
is required by the section 2.3.1 in the specification, which says
that setting HCRESET before the controller has halted "will result
in undefined behaviour".
Reviewed by: imp (previous patch version without the extra wait loop)
Tested by: se (previous patch version without the extra wait loop)
Approved by: re (bmah)
MFC after: 1 week
o Revamp the PIC I/F to only abstract the PIC hardware. The
resource handling has been moved to nexus, where it belongs.
o Include EOI and MASK+EOI methods to the PIC I/F in support of
INTR_FILTER.
o With the allocation of interrupt resources and setup of
interrupt handlers in the common platform code we can delay
talking to the PIC hardware after enumeration of all devices.
Introduce a call to powerpc_intr_enable() in configure_final()
to achieve that and have powerpc_setup_intr() only program the
PIC when !cold.
o As a consequence of the above, remove all early_attach() glue
from the OpenPIC and Heathrow PIC drivers and have them
register themselves when they're found during enumeration.
o Decouple the interrupt vector from the interrupt request line.
Allocate vectors increasingly so that they can be used for
the intrcnt index as well. Extend the Heathrow PIC driver to
translate between IRQ and vector. The OpenPIC driver already
has the support for vectors in hardware.
Approved by: re (blanket)
- LK_RETRY prohibits vget() and vn_lock() to return error.
Remove associated code. [1]
- Properly use vhold() and vdrop() instead of their unlocked
versions, we are guaranteed to have the vnode's interlock
unheld. [1]
- Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
with the same way used in modern NetBSD versions. [2]
- Reorganize tmpfs_readdir to reduce duplicated code.
Submitted by: kib [1]
Obtained from: NetBSD [2]
Approved by: re (tmpfs blanket)
- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
- Properly lock around tn_vnode to avoid NULL deference
- Be more careful handling vnodes (*)
(*) This is a WIP
[1] by pjd via howardsu
Thanks kib@ for his valuable VFS related comments.
Tested with: fsx, fstest, tmpfs regression test set
Found by: pho's stress2 suite
Approved by: re (tmpfs blanket)
cr0-4, etc. Support should be added for other platforms that have a
different set of registers for system use.
Loosely based on: OpenBSD
Approved by: re
a test that assumes that char is signed by default and causes a
warning with GCC 4.2 on PowerPC.
A patch has been sent to the maintainer that addresses this.
Approved by: re (blanket)
of device pager in the pagers list by handle is now synchronized with
its removal from the list, and dev_pager_mtx is put before vm object
lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx
now covers the same block.
Noted by: kensmith
Reviewed by: alc
Approved by: re (kensmith)
(uio_offset < 0) since this can't happen. If this happens, then the
general code handles the problem safely (better than before for reading,
returning 0 (EOF) instead of the bogus errno EINVAL, and the same as
before for writing, returning EFBIG).
In msdosfs_read(), don't check for (uio_resid < 0). msdosfs_write()
already didn't check.
In msdosfs_read(), document in a comment our assumptions that the caller
passed a valid uio_offset and uio_resid. ffs checks using KASSERT(),
and that is enough sanity checking. In the same comment, partly document
there is no need to check for the EOVERFLOW case, unlike in ffs where this
case can happen at least in theory.
In msdosfs_write(), add a comment about why the checking of
(uio_resid == 0) is explicit, unlike in ffs.
In msdosfs_write(), check for impossibly large final offsets before
checking if the file size rlimit would be exceeded, so that we don't
have an overflow bug in the rlimit check and are consistent with ffs.
We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final
offset would be impossibly large but not so large as to cause overflow.
Overflow normally gave the benign behaviour of no signal.
Approved by: re (kensmith) (blanket)
remove some parentheses; fix some whitespace errors; fix only one case of
a boolean comparison of a non-boolean).
Improve an error message by quoting ".", and by not printing large positive
values as negative ones.
Approved by: re (kensmith) (blanket)
namespace pollution in <sys/vnode.h>.
Sort the include of <sys/mutex.h> instead of unsorting it after
<sys/vnode.h> and depending on the pollution there.
Approved by: re (kensmith) (blanket)
the use of divert sockets to dead locks. A number of LORs have been reported
between divert and a number of other network subsystems including: IPSEC, Pfil,
multicast, ipfw and others. Other dead locks could occur because of recursive
entry into the IP stack. This change should take care of most if not all of
these issues.
A summary of the changes follow:
- We disallow multicast operations on divert sockets. It really doesn't make
semantic sense to allow this, since typically you would set multicast
parameters on multicast end points.
NOTE: As a part of this change, we actually dis-allow multicast options on
any socket that IS a divert socket OR IS NOT a SOCK_RAW or SOCK_DGRAM family
- We check to see if there are any socket options that have been specified on
the socket, and if there was (which is very un-common and also probably
doesnt make sense to support) we duplicate the mbuf carrying the options.
- We then drop the INP/INFO locks over the call to ip_output(). It should be
noted that since we no longer support multicast operations on divert sockets
and we have duplicated any socket options, we no longer need the reference
to the pcb to be coherent.
- Finally, we replaced the call to ip_input() to use netisr queuing. This
should remove the recursive entry into the IP stack from divert.
By dropping the locks over the call to ip_output() we eliminate all the lock
ordering issues above. By switching over to netisr on the inbound path,
we can no longer recursively enter the ip_input() code via divert.
I have tested this change by using the following command:
ipfwpcap -r 8000 - | tcpdump -r - -nn -v
This should exercise the input and re-injection (outbound) path, which is
very similar to the work load performed by natd(8). Additionally, I have
run some ospf daemons which have a heavy reliance on raw sockets and
multicast.
Approved by: re@ (kensmith)
MFC after: 1 month
LOR: 163
LOR: 181
LOR: 202
LOR: 203
Discussed with: julian, andre et al (on freebsd-net)
In collaboration with: bms [1], rwatson [2]
[1] bms helped out with the multicast decisions
[2] rwatson submitted the original netisr patches and came up with some
of the original ideas on how to combat this issue.
for bakeoff.. using the next sequential ones)
- In cookie processing 1-2-1, we did not increment the stcb
refcnt before releasing the tcb lock. We need to do this
to keep the tcb from being freed by a abort or ?? unlikely
but worth doing. Also get rid of unneed INP_WLOCK.
- extra receive info included the rcvinfo which killed the
padding/alignment. We now redefine all the fields properly
so they both align properly both to 128 bytes.
- A peeled off socket would not close without an error due to
its misguided idea that sctp_disconnect() was not supported
on it. This fixes it so it goes through the proper path.
- When an assoc was being deleted after abort (via a timer) a
small race condition exists where we might take a packet for
the old assoc (since we are waiting for a cleanup timer). This
state especially happens in mac. We now add a state in the asoc
so these can properly handle the packet as OOTB.
Approved by: re@freebsd.org(Ken Smith)
previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.
While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.
Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)
Recently the AP in my Merced box seems to have grown a habit
of getting unexpected interrupts, such as redundant wake-ups
and legacy interrupts that require an INTA cycle.
While here, replace DELAY(0) with cpu_spinwait() so that it's
clear what we're doing as well as enable the code to take
advantage of cpu_spinwait() when it gets implemented.
Approved by: re (blanket)
There's no advantage in allowing nested external interrupts.
In fact, it leads to a potential stack overrun.
While here, put the interrupt vector in the trapframe, so as
to compensate for the 36 cycle latency of reading cr.ivr.
Further simplify assembly code by dealing with ASTs from C.
Approved by: re (blanket)
vm_object_terminate() on a device-backed object at the same time that
another processor, call it Pa, is performing dev_pager_alloc() on the
same device. The problem is that vm_pager_object_lookup() should not be
allowed to return a doomed object, i.e., an object with OBJ_DEAD set,
but it does. In detail, the unfortunate sequence of events is: Pt in
vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD
on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls
vm_pager_object_lookup(), which returns the doomed object. Next, Pa
calls vm_object_reference(), which requires the doomed object's lock, so
Pa waits for Pt to release the doomed object's lock. Pt proceeds to the
point in vm_object_terminate() where it releases the doomed object's
lock. Pa is now able to complete vm_object_reference() because it can
now complete the acquisition of the doomed object's lock. So, now the
doomed object has a reference count of one! Pa releases dev_pager_sx
and returns the doomed object from dev_pager_alloc(). Pt now acquires
dev_pager_mtx, removes the doomed object from dev_pager_object_list,
releases dev_pager_mtx, and finally calls uma_zfree with the doomed
object. However, the doomed object is still in use by Pa.
Repeating my key point, vm_pager_object_lookup() must not return a
doomed object. Moreover, the test for the object's state, i.e.,
doomed or not, and the increment of the object's reference count
should be carried out atomically.
Reviewed by: kib
Approved by: re (kensmith)
MFC after: 3 weeks
us to do the data serializations once after writing multiple
region registers, as is done in pmap_switch(). All existing
calls to ia64_set_rr() are followed with calls to ia64_srlz_d().
Approved by: re (blanket)
Also rename the related functions in a similar way.
There are no functional changes.
For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.
With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.
The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.
Discussed at: BSDCan 2007
Best new name suggested by: rwatson
Reviewed by: rwatson
Approved by: re (bmah)
sector, instead of failing the whole mount if it is garbage. Fields
in the fsinfo sector are only advisory, so there are better sanity
checks than this, and we already silently fix up the only other advisory
field in the fsinfo (the free cluster count).
This wasn't handled quite right in rev.1.92, 1.117, or in NetBSD. 1.92
also failed the whole mount for the non-garbage magic value 0xffffffff
1.117 fixed this well enough in practice since garbage values shouldn't
occur in practice, but left the error handling larger and more convoluted
than necessary. Now we handle the magic value as a special case of
fixing up all out of bounds values.
Also fix up the estimated next free cluster number when there is no
fsinfo sector. We were using 0, but CLUST_FIRST is safer.
Approved by: re (kensmith)
instead of per IOMMU, so we no longer need to program all of them
identically in systems having multiple IOMMUs. This continues the
rototilling of the nexus(4) done about 5 months ago, which amongst
others changed nexus(4) and the drivers for host-to-foo bridges
to provide bus_get_dma_tag methods, allowing to handle DMA tags in
a hierarchical way and to link them with devices.
This still doesn't move the silicon bug workarounds for Sabre (and
in the uncommitted schizo(4) for Tomatillo) bridges into special
bus_dma_tag_create() and bus_dmamap_sync() methods though, as w/o
fully newbus'ified bus_dma_tag_create() and bus_dma_tag_destroy()
this still requires too much hackery, i.e. per-child parent DMA
tags in the parent driver.
- Let the host-to-foo drivers supply the maximum physical address
of the IOMMU accompanying the bridges. Previously iommu(4) hard-
coded an upper limit of 16GB, which actually only applies to the
IOMMUs of the Hummingbird and Sabre bridges. The Psycho variants
as well as the U2S in fact can can translate to up to 2TB, i.e.
translate to 41-bit physical addresses. According to the recently
available Tomatillo documentation these bridges even translate to
43-bit physical addresses and hints at the Schizo bridges doing
43 bits as well.
This fixes the issue the FreeBSD 6.0 todo list item "Max RAM on
sparc64" was refering to and pretty much obsoletes the lack of
support for bounce buffers on sparc64.
Thanks to Nathan Whitehorn for pointing me at the Tomatillo manual.
Approved by: re (kensmith)
requiring DC_TX_ALIGN or DC_TX_COALESCE, which was previously done
in dc_start_locked(), into dc_encap().
o In dc_encap():
- If m_defrag() fails just drop the packet like other NIC drivers
do. This should only happen when there's a mbuf shortage, in which
case it was possible to end up with an IFQ full of packets which
couldn't be processed as they couldn't be defragmented as they
were taking up all the mbufs themselves. This includes adjusting
dc_start_locked() to not trying to prepend the mbuf (chain) if
dc_encap() has freed it.
- Likewise, if bus_dmamap_load_mbuf() fails as dc_dma_map_txbuf()
failed, free the mbuf possibly allocated by the above call to
m_defrag() and drop the packet.
o In dc_txeof():
- Don't clear IFF_DRV_OACTIVE unless there are at least 6 free TX
descriptors. Further down the road dc_encap() will bail if there
are only 5 or fewer free TX descriptors, causing dc_start_locked()
to abort and prepend the dequeued mbuf again so it makes no sense
to pretend we could process mbufs again when in fact we won't.
While at it replace this magic 5 with a macro DC_TX_LIST_RSVD.
- Just always assign idx to sc->dc_cdata.dc_tx_cons; it doesn't
make much sense to exclude the idx == sc->dc_cdata.dc_tx_cons
case.
o In dc_dma_map_txbuf() there's no need to set sc->dc_cdata.dc_tx_err
to error if the latter is != 0, bus_dmamap_load_mbuf() already
returns the same error value in that case anyway.
o For less overhead, convert to use bus_dmamap_load_mbuf_sg() for
loading RX buffers.
o Remove some banal and/or outdated comments.
Approved by: re (kensmith)
MFC after: 1 week
to clear RL_TDESC_VLANCTL_TAG). This fixes sending packets in the
native VLAN when running both tagged and an untagged VLAN over the
same trunk and descriptors are recycled.
Approved by: re (kensmith)
MFC after: 1 week
d_mmap methods. prep_cdevsw() already installs the shims that
acquire/drop Giant for the methods of a driver that specified the
D_NEEDGIANT flag.
Reviewed by: alc
Approved by: re (kensmith)
- If the path cost is calculated when the link is down, set a pending flag so
it is calculated again when it comes back up.
- To not use 00:00:00:00:00:00 as the bridge id, all interfaces are scanned and
the lowest number wins. All zeros is too low.
Approved by: re (rwatson)
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add instruction-serialization after writing to cr.pta.
Delay enabling interrupts until after we setup the clocks and after
we program the task priority register.
Approved by: re (blanket)
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to the region registers and
add instruction-serialization after writing to cr.pta.
Approved by: re (blanket)
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to cr.tpr.
Approved by: re (blanket)
tdq_group structure. Hyper-threaded cores won't really benefit from
seperate locks anyway.
- Seperate out the migration case from sched_switch to simplify the main
switch code. We only migrate here if called via sched_bind().
- When preempted place the preempted thread back in the same queue at
the head.
- Improve the cpu group and topology infrastructure.
Tested by: many on current@
Approved by: re
message explained why the size is 1 sector, but the code used a
size of 1 cluster.
I/o sizes larger than necessary may cause serious coherency problems
in the buffer cache. Here I think there were only minor efficiency
problems, since a too-large fsinfo buffer could only get far enough
to overlap buffers for the same vnode (the device vnode), so mappings
are coherent at the page level although not at the buffer level, and
the former is probably enough due to our limited use of the fsinfo
buffer.
Approved by: re (kensmith)
- Copy before testing a pointer. This closes a race window.
- Use msleep with the node interlock instead of tsleep.
- Do proper locking around access to tn_vpstate.
- Assert vnode VOP lock for dir_{atta,de}tach to capture
inconsistent locking.
Suggested by: kib
Submitted by: delphij
Reviewed by: Howard Su
Approved by: re (tmpfs blanket)
cpu_start_mp(). This is after we have read the cpuid registers to
calculate the hyperthreading_cpus value for the sysctl that enables or
disables hyperthread cores. Change mp_topology() to use that information
rather than trying to do it itself.
This solves the problem of ULE being incorrectly told that dual core
Athlon64 X2 or Operton cpus are hyperthreading cores. At the very least,
we now have a single piece of code to identify hyperthreading.
Obtained from: jhb
Approved by: re (kensmith)
64bit counters are needed to simplify traffic accounting and
reduce system load at the big PPP concentrators.
Approved by: re (rwatson), glebius (mentor)
Till now node's transmit path was completely unprotected
and so wasn't thread safe in multilink mode. It's receive path was
declared as WRITER as the simpliest protection method but it
reduces performance when compression or encryption enabled.
Approved by: re (rwatson), glebius (mentor)
communicate with another private port.
All unicast/broadcast/multicast layer2 traffic is blocked so it works much the
same way as using firewall rules but scales better and is generally easier as
firewall packages usually do not allow ARP blocking.
An example usage would be having a number of customers on separate vlans
bridged with a server network. All the vlans are marked private, they can all
communicate with the server network unhindered, but can not exchange any
traffic whatsoever with each other.
Approved by: re (rwatson)
be in ticks "for algorithm stability" when originally committed, it turns
out that it has a significant impact in timing out connections. When we
changed HZ from 100 to 1000, this had a big effect on reducing the time
before dropping connections.
To demonstrate, boot with kern.hz=100. ssh to a box on local ethernet
and establish a reliable round-trip-time (ie: type a few commands).
Then unplug the ethernet and press a key. Time how long it takes to
drop the connection.
The old behavior (with hz=100) caused the connection to typically drop
between 90 and 110 seconds of getting no response.
Now boot with kern.hz=1000 (default). The same test causes the ssh session
to drop after just 9-10 seconds. This is a big deal on a wifi connection.
With kern.hz=1000, change sysctl net.inet.tcp.rexmit_min from 3 to 30.
Note how it behaves the same as when HZ was 100. Also, note that when
booting with hz=100, net.inet.tcp.rexmit_min *used* to be 30.
This commit changes TCPTV_MIN to be scaled with hz. rexmit_min should
always be about 30. If you set hz to Really Slow(TM), there is a safety
feature to prevent a value of 0 being used.
This may be revised in the future, but for the time being, it restores the
old, pre-hz=1000 behavior, which is significantly less annoying.
As a workaround, to avoid rebooting or rebuilding a kernel, you can run
"sysctl net.inet.tcp.rexmit_min=30" and add "net.inet.tcp.rexmit_min=30"
to /etc/sysctl.conf. This is safe to run from 6.0 onwards.
Approved by: re (rwatson)
Reviewed by: andre, silby
that could cause panics and corruption under moderate load. Many thanks
to Matt Reimer, Tom McDonald, and the rest of the guys at VPOP.net for
their help in identifying and testing this.
Approved by: re
only USB 1.1 speeds available, but this shouldn't hurt. Now that we have
working usb support for this board, this is a natural followup.
Approved by: re (kensmith)
7 months. You must have JP6 in the 1-2 position to supply power to the
USB devices, but I've used uftdi, uplcom and umass successfully. If you
have it in 2-3, then nothing will show up. Also, if you have the FQPA
packaging for the AT91RM9200 (like the KN9202 boards have), you will get
the following message
uhub0: device problem (IOERROR), disabling port 2
due to a hardware erratum. It is safe to ignore as it is about pins that
aren't brought out on the FQPA package and aren't proeprly terminated either.
Alas, there's no register to read to tell the FQPA from the BGA versions.
Submitted by: Daan Vreeken
Approved by: re (kensmith)
revision 1.66
date: 2007/07/31 06:23:26; author: marcel; state: Exp; lines: +2 -2
Fix backward compatibility of the "old" (i.e. FreeBSD6) lseek
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.
Based on a patch from: grehan@
Approved by: re (blanket)
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.
Based on a patch from: grehan@
Approved by: re (rwatson)
errors (especially when jumbo frames are enabled or in low memory systems)
because the RX chain was corrupted when an mbuf was mapped to an unexpected
number of buffers.
- Fixed a problem that would cause kernel panics when an excessively
fragmented TX mbuf couldn't be defragmented and was released by
bce_tx_encap().
Approved by: re(hrs)
MFC after: 7 days
bucket pointer. The virtual mapping may not be present in the
translation cache. This will result in a nested TLB fault at
a place we don't handle (and don't want to handle).
o Make sure there's a stop after the rfi instruction, otherwise
its behaviour is undefined.
o Make sure we switch back to virtual addressing before doing
a rfi. Behaviour is undefined otherwise.
Approved by: re (blanket)
(INTR_FILTER). This includes:
o Save a pointer to the sapic structure and IRQ for every vector,
so that we can quickly EOI, mask and unmask the interrupt.
o Add locking to the sapic code now that we can reprogram a
sapic on multiple CPUs at the same time.
o Use u_int for the vector and IRQ. We only have 256 vectors, so
using a 64-bit type for it is rather excessive.
o Properly handle concurrent registration of a handler for the
same vector.
Since vectors have a corresponding priority, we should not map
IRQs to vectors in a linear fashion, but rather pick a vector
that has a priority in line with the interrupt type. This is left
for later. The vector/IRQ interchange has been untangled as much
as possible to make this easier.
Approved by: re (blacket)
merely lucky that the VHPT was mapped as a side-effect of
mapping the kernel, but when there's enough physical memory,
this may not at all be the case.
Approved by: re (blanket)
ports to the lagg interface.
- Use the MTU from the first interface as the lagg MTU, all extra interfaces
must be the same.
This fixes using a lagg interface for a vlan or enabling jumbo frames, etc.
Approved by: re (kensmith)
MFC After: 3 days
the fast or safe/slow method is in use. Fast remains at 1000, slow is
now at 850 (always preferred to TSC). Since the HPET has proven slower
than ACPI-fast on some systems, drop its quality to 900. In the future,
it is hoped that HPET performance will improve as it is the main
timer Intel supports. HPET may move back to 2000 in -current once RELENG_7
is branched to ensure that it gets tested.
Approved by: re
<netinet/tcp_fsm.h> is included into any compilation unit that needs
tcpstates[]. Also remove incorrect extern declarations and TCPDEBUG
conditionals. This allows kernels both with and without TCPDEBUG to
build, and unbreaks the tinderbox.
Approved by: re (rwatson)
pc98 motherboards do not provide us with the correct day of week
either. Ignore the day of week when setting the clock here too.
Approved by: re (bmah)
Requested from: nyan
MFC after: 3 weeks
the duration of the function. The device we would otherwise
have left in an useless state may just as well be the low-level
console. When booting verbose, we do need it addressable if we
want to avoid a MCA.
Approved by: re (kensmith)
sys.net.inet.tcp.log_debug = 1
It defaults to enabled for the moment and is to be turned off for
the next release like other diagnostics from development branches.
It is important to note that sysctl sys.net.inet.tcp.log_in_vain
uses the same logging function as log_debug. Enabling of the former
also causes the latter to engage, but not vice versa.
Use consistent terminology in tcp log messages:
"ignored" means a segment contains invalid flags/information and
is dropped without changing state or issuing a reply.
"rejected" means a segments contains invalid flags/information but
is causing a reply (usually RST) and may cause a state change.
Approved by: re (rwatson)
SYNCACHE_TIMEOUT to new function syncache_timeout().
o Fix inverted timeout callout engagement logic to actually
enable the timer for the bucket row. Before SYN|ACK was
not retransmitted.
o Simplify SYN|ACK retransmit timeout backoff calculation.
o Improve logging of retransmit and timeout events.
o Reset timeout when duplicate SYN arrives.
o Add comments.
o Rearrange SYN cookie statistics counting.
Bug found by: silby
Submitted by: silby (different version)
Approved by: re (rwatson)
syncache_rst().
o Fix tests for flag combinations of RST and SYN, ACK, FIN. Before
a RST for a connection in syncache did not properly free the entry.
o Add more detailed logging.
Approved by: re (rwatson)
a proper solution.
- Add a dummy entry point which just calls the C entry points, and try to make
sure it's the first code in the binary.
- Copy a bit more than func_end to try to copy the whole load_kernel()
function. gcc4 puts code behind the func_end symbol.
Approved by: re (blanket)
framework for non-MPSAFE network protocols:
- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
which keyed off debug_mpsafenet to determine various aspects of their own
locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
no-op's, as their entire behavior was determined by the value in
debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.
Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.
Reviewed by: bz, jhb
Approved by: re (kensmith)
day of week field correctly, or they remember bad values that are
written into the day of week field. For this reason, ignore the day
of week field when reading the clock on i386 rather than bailing if
it is set incorrectly.
Problems were seen on a number of platforms, including VMWare, qemu,
EPIA ME6000, Epox-3PTA and ABIT-SL30T.
This is a slightly different fix to that proposed by Ted in his PR,
but the same basic idea.
PR: 111117
Submitted by: Ted Faber <faber@lunabase.org>
Approved by: re (rwatson)
MFC after: 3 weeks
should call uma_zfree() with various spinlock helds. Rearranging the
code would not help here because we cannot break atomicity respect
prcess spinlock, so the only one choice we have is to defer the operation.
In order to do this use a global queue synchronized through the kse_lock
spinlock which is freed at any thread_alloc() / thread_wait() through a
call to thread_reap().
Note that this approach is not ideal as we should want a per-process
list of zombie upcalls, but it follows initial guidelines of KSE authors.
Tested by: jkim, pav
Approved by: jeff, julian
Approved by: re
scope security check for the UDPv6 socket credential lookup service,
allowing security policies to bound access to credential information.
While not an immediate issue for Jail, which doesn't allow use of UDPv6,
this may be relevant to other security policies that may wish to control
ident lookups.
While here, eliminate a very unlikely panic case, in which a socket in
the process of being freed is inspected by the sysctl.
Approved by: re (kensmith)
Reviewed by: bz
- make NDIS_DEBUG a sysctl
- default to IEEE80211_MODE_11B if the card doesnt tell us the channels
- dont mess with ic_des_chan when we assosciate
- Allow a directed scan by setting the ESSID before scanning (verified
with wireshark). Hidden APs probably wouldnt have worked before.
- Grab the channel type and use it to look up the correct curchan for
the scan results (mistakenly used 11B before)
- Fix memory leak in the ndis_scan_results
Tested by: matteo
Reviewed by: sam
Approved by: re (rwatson)
stack overflow in complicated traffic filtering setups.
There can be minor performance degradation for the MHLEN < len <= 256 case
due to additional buffer allocation, but it is a rare case.
Approved by: re (rwatson), glebius (mentor)
MFC after: 1 week
value, then we would use a negative index into the trap_msg[] array
resulting in a nested page fault. Make the 'type' variable holding the
trap number unsigned to avoid this.
MFC after: 2 weeks
Approved by: re (rwatson)
to repeat if you had more than two keys down at any given time (which
happened to me all the time with emacs).
This is taken from PR 110681, although what URATAN Shigenobu describes
there is different than the pathology that I have been seeing. I'm
seeing this only in X, while he sees it on his console, yet I think
the two problems are related. I've also reworked the patch slightly
to conform to the coding standards of adjacent code.
It is unclear to me if this merely masks the maddening bug that I have
seen, or if this is a real fix. I typically see the problem when I'm
typing fast in emacs and using lots of motion keys (meta and control).
In either case, my workstation at work again is finally useful with
this patch.
PR: 110681
Submitted by: URATAN Shigenobu
Approved by: re (blanket)
the protocol to be report on each open, but ignore any errors as set
protocol for mice that don't implement the boot protocol can generate
an error. Evidentally, the Gyration GyroPoint RF Technology Receiver
(Gyration Ultra Cordless) device has this problem.
Submitted by: Eugene M. Kim
PR: 106565
Approved by: re (blanket)
- Fix addrs's error checking of sctp_sendx(3) when addrcnt is less than
SCTP_SMALL_IOVEC_SIZE
- re-add back inpcb_bind local address check bypass capability
- Fix it so sctp_opt_info is independant of assoc_id postion.
- Fix cookie life set to use MSEC_TO_TICKS() macro.
- asconf changes
o More comment changes/clarifications related to the old local address
"not" list which is now an explicit restricted list.
o Rename some functions for clarity:
- sctp_add/del_local_addr_assoc to xxx_local_addr_restricted()
- asconf related iterator functions to sctp_asconf_iterator_xxx()
o Fix bug when the same address is deleted and added (and removed from
the asconf queue) where the ifa is "freed" twice refcount wise,
possibly freeing it completely.
o Fix bug in output where the first ASCONF would not go out after the
last address is changed (e.g. only goes out when retransmitted).
o Fix bug where multiple ASCONFs can be bundled in the same packet with
the and with the same serial numbers.
o Fix asconf stcb iterator to not send ASCONF until after all work
queue entries have been processed.
o Change behavior so that when the last address is deleted (auto asconf
on a bound all endpoint) no action is taken until an address is
added; at that time, an ASCONF add+delete is sent (if the assoc
is still up).
o Fix local address counting so that address scoping is taken into
account.
o #ifdef SCTP_TIMER_BASED_ASCONF the old timer triggered sending
of ASCONF (after an RTO). The default now is to send
ASCONF immediately (except for the case of changing/deleting the
last usable address).
Approved by: re(ken smith)@freebsd.org
This fixes tmpfs caculations on 32-bit systems equipped with more than
4GB swap.
Reported by: Craig Boston <craig xfoil gank org>
PR: kern/114870
Approved by: re (tmpfs blanket)
included man pages on how to use it. This code is still somewhat experimental
but has been successfully tested on a number of targets. Many thanks to
Danny for contributing this.
Approved by: re
Ever since switching to adaptive polling re(4) occasionally spews
watchdog timeouts on systems with MSI capability. This change is
minimal one for supporting MSI and re(4) also needs MSIX support
for RTL8111C in future. Because softc structure of re(4) is shared
with rl(4), rl(4) was touched to use the modified softc.
Reported by: cnst
Tested by: cnst
Approved by: re (kensmith)
Because nfe(4) hardware doesn't support SG on Rx path, supporting
jumbo frame requires very large contiguous kernel memory(i.e. several
mega bytes). In case of lack of contiguous kernel memory that
allocation request may always fail. However nfe(4) can operate on normal
sized MTU frames, so go ahead and just disable jumbo frame support.
While I'm here add a new tunable "hw.nfe.jumbo_disable" to disable
jumbo frame support.
In nfe_poll, make sure to invoke correct Rx handler.
Approved by: re (kensmith)
results unused; this, with -Werror option of gcc, rise a warning for gcc
which let the buildkernel to be busted.
Fix this removing upcall_free().
Reported by: various
Approved by: jeff
Approved by: re
Pointy hat to: attilio
dangerous races.
Fix this problems adding correct locking for the members of 'struct
kse_upcall' and other struct proc/struct thread related members.
For the moment, just leave ku_mflag and ku_flags "lazy" locked.
While here, cleanup the code removing the function kse_GC() (unused),
and merging upcall_link(), upcall_unlink(), upcall_stash() in their
respective callers (static functions, very short and only called in one
place).
Reported by: pav
Tested by: pav (on some pointyhat cluster nodes)
Approved by: jeff
Approved by: re
Sponsorized by: NGX Italy (http://www.ngx.it)
vnode label for a check rather than the directory vnode label a second
time.
MFC after: 3 days
Submitted by: Zhouyi ZHOU <zhouzhouyi at FreeBSD dot org>
Reviewed by: csjp
Sponsored by: Google Summer of Code 2007
Approved by: re (bmah)
print a one line error message. Add some comments on not being able to
trust the day of week field (I'll act on these comments in a follow up
commit).
Approved by: re
MFC after: 3 weeks
udp6_output() from udp6_output.c to udp6_usrreq.c, matching the UDPv4
structure, and allowing us to remove udp6_output.c.
Reviewed by: bz, gnn
Approved by: re (bmah)
o Initialize ownerships and permissions. They were garbage (0) for
root mounts since vfs_mountroot_try() doesn't ask for them to be set
and msdosfs's old incomplete code to set them was removed. The
garbage happened to give the correct ownerships root:wheel, but it
gave permissions 000 so init could not be execed. Use the macros
for root: wheel and 0755. (The removed code gave 0:0 and 0777. 0755
is more normal and secure, thought wrong for /tmp.)
o Check the readonly flag for initial (non-MNT_UPDATE) mounts in the
correct place, as in ffs. For root mounts, it is only passed in
mp->mnt_flags, since vfs_mountroot_try() only passes it as a flag
and nothing translates the flag to the "ro" option string. msdosfs
only looked for it in the string, so it gave a rw mount for root
mounts without even clearing the flag in mp->mnt_flags, so the final
state was inconsistent. Checking the flag only in mp->mnt_flags
works for initial userland mounts too. The MNT_UPDATE case is
messier.
The main point that should work but doesn't is fsck of msdosfs root
while it is mounted ro. This needs mainly MNT_RELOAD support to work.
It should be possible to run fsck -p and succeed provided the fs is
consistent, not just for msdosfs, but this fails because fsck -p always
tries to open the device rw. The hack that allows open for writing
in ffs is not implemented in msdosfs, since without MNT_RELOAD support
writing could only be harmful. So fsck must be turned off to use
msdosfs as root. This is quite dangerous, since msdosfs is still missing
actually using its fs-dirty flag internally, so it is happy to mount
dirty fileystems rw.
Unrelated changes:
- Fix missing error handling for MNT_UPDATE from rw to ro.
- Catch up with renaming msdos to msdosfs in a string.
Approved by: re (kensmith)
physical memory pages into account for tm_maxfilesize.
Reported by: Dominique Goncalves <dominique.goncalves gmail.com>
Submitted by: Howard Su
Approved by: re (tmpfs blanket)
consumers.
This patch makes KSE no more an optionally stub for kernel structures
fixing the breakage.
As a tail note, this bug has broken kqemu for a long period now.
Tested by: Ulf Lilleengen <lulf@FreeBSD.org>
Discussed with: rwatson, jeff
Approved by: jeff (mentor)
Approved by: re
be woken up by kthread_exit. This is racey and in some cases the kthread will
exit before ndis gets around to sleep so it will be stuck indefinitely. This
change reuses the kq_exit variable to indicate that the thread has gone and
will loop on tsleep with a timeout waiting for it. If the kthread has already
exited then it will not sleep at all.
Approved by: re (rwatson)
advancing. Read from the timer before attaching to be sure it advances
in 1 us. Since the slowest rate allowed by the spec is 10 MHz, the
timer is guaranteed to change in this interval if it is working.
Tested by: Rui Paulo
Approved by: re
MFC after: 3 days
- Synchronized audit event list to Solaris, picking up the *at(2) system call
definitions, now required for FreeBSD and Linux. Added additional events
for *at(2) system calls not present in Solaris.
Obtained from: TrustedBSD Project
Approved by: re (hrs)
- remove duplicate #include <sys/priv.h> that is not under
#ifdef FreeBSD version to allow compile on 6.1
- static analysis changes per the cisco SA tool including:
o some SA_IGNORE comments
o some checks for NULL before unlock.
o type corrections int -> size_t
- Fix it so sctp_alloc_asoc takes a thread/proc argument. Without this
we pass a NULL in to bind on implicit assoc setup and crash :-(
Approved by: re@freebsd.org(Ken Smith)
4KB pages as i386, data structures that just fit in one page on i386 (and
on 64 bit architectures with 8KB pages) can be distributed over two pages
on amd64. This is a porblem in the case of the Symbios driver, since the
SCRIPTS engine in the SCSI chip operates on physical addresses and needs
physically contiguous memory. Earlier patches used contigmalloc on amd64,
but this version replaces part of a structure by a pointer to that data.
In order to not introduce an extra indirection for other architectures,
the change has been made conditional on __amd64__.
Earlier attempts to repair this problem are removed (i.e. the macros that
made amd64 use contigmalloc). The fix was submitted by Jan Mikkelsen and
modified by me to only affect amd64.
PR: 89550
Submitted by: janm at transactionware dot com (Jan Mikkelsen)
Approved by: re (Hiroki Sato)
MFC after: 2 weeks
This gives a very large speedup for small block sizes (in my tests,
about 5 times for write and 3 times for read with a block size of 512,
if clustering is possible) and a moderate speedup for the moderatatly
large block sizes that should be used on non-small media (4K is the
best size in most cases, and the speedup for that is about 1.3 times
for write and 1.2 times for read). mmap() should benefit from clustering
like read()/write(), but the current implementation of vm only supports
clustering (at least for getpages) if the fs block size is >= PAGE SIZE.
msdosfs is now only slightly slower than ffs with soft updates for
writing and slightly faster for reading when both use their best block
sizes. Writing is slower for msdosfs because of more sync writes.
Reading is faster for msdosfs because indirect blocks interfere with
clustering in ffs.
The changes in msdosfs_read() and msdosfs_write() are simpler merges
of corresponding code in ffs (after fixing some style bugs in ffs).
msdosfs_bmap() needs fs-specific code. This implementation loops
calling a lower level bmap function to do the hard parts. This is a
bit inefficient, but is efficient enough since msdsfs_bmap() is only
called when there is physical i/o to do.
Approved by: re (hrs)
In msdosfs_read(), mainly reorder the main loop to the same order as in
ffs_read().
In msdosfs_write() and extendfile(), use vfs_bio_clrbuf() instead of
clrbuf(). I think this just just a bogus optimization, but ffs always
does it and msdosfs already did it in one place, and it is what I've
tested.
In msdosfs_write(), merge good bits from a comment in ffs_write(), and
fix 1 style bug.
In the main comment for msdosfs_pcbmap(), improve wording and catch
up with 13 years of changes in the function. This comment belongs in
VOP_BMAP.9 but that doesn't exist.
In msdosfs_bmap(), return EFBIG if the requested cluster number is out
of bounds instead of blindly truncating it, and fix many style bugs.
Approved by: re (hrs)
11b channel is not found, e.g. Atheros 5211.
Reported by: matteo
Problem outlined by: thompsa
Reviewed by: sam, thompsa
Approved by: re (kensmith), sam (mentor)
Tested by: matteo (an early version)
We allocate coda_ctlvp when /coda is mounted, but never release it.
During the unmount this vnode was marked as UNMOUNTING and when venus
is started a second time the system would hang, possibly waiting for
the old vnode to disappear.
So now we call vrele on the control vnode when file system is unmounted
to drop the reference we got during the mount. I'm pretty sure it is
also necessary to not skip the handling in coda_inactive for the control
vnode, it seems like that is the place we actually get rid of the vnode
once the refcount has dropped to 0.
Submitted by: Jan Harkes <jaharkes at cs dot cmu dot edu>
Approved by: re (kensmith)
filt_ttyrdetach() etc would later attempt to dereference cdev->si_tty,
causing a 0xdeadc0de dereference. Change kn_hook value from cdev to
struct tty to avoid dereferencing freed cdev.
In ttygone(), wake up select(), sigio and kevent() users in addition
to the queue sleepers.
Return EV_EOF from kevent filters if TS_GONE is set.
Submitted by: peter
Tested by: Peter Holm
Approved by: re (kensmith)
MFC after: 2 weeks
- Adjust lock_profiling stubs semantic in the hard functions in order to be
more accurate and trustable
- As for sx locks, disable shared paths for lock_profiling. Actually,
lock_profiling has a subtle race which makes results caming from shared
paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can
be actually used for re-enabling this paths, but is currently intended
for developing use only.
- style(9) fixes
Approved by: jeff, kmacy, jhb[1]
Approved by: re
[1] Had initial reservations not shared by others, conceded
in the end.
1. Rewrite the backward scan. Specifically, reverse the order in which
pages are allocated so that upon failure it is never necessary to
free pages that were just allocated. Moreover, any allocated pages
can be put to use. This makes the backward scan behave just like the
forward scan.
2. Eliminate an explicit, unsynchronized check for low memory before
calling vm_page_alloc(). It serves no useful purpose. It is, in
effect, optimizing the uncommon case at the expense of the common
case.
Approved by: re (hrs)
MFC after: 3 weeks
interrupt that is shared with other devices(e.g. USB) in system and
provide a new tunable "hw.msk.legacy_intr" to activate the legacy
interrupt handler. Setting the tunable automatically disables MSI
for msk(4). Previously msk(4) used adoptive polling with taskqueue(9)
as all msk(4) hardwares I know supports MSI. However, there are cases
that MSI couldn't be used on some hardwares due to bugs in MSI
implementatins.
Tested by: Li-Lun Wang < llwang AT infor DOT org >
Approved by: re (kensmith)
UDPv4 features to UDPv6:
- Add MAC checks on delivery and MAC labeling on transmit.
- Check for (and reject) datagrams with destination port 0.
- For multicast delivery, check the source port only if the socket being
considered as a destination has been connected.
- Implement UDP blackholing based on net.inet.udp.blackhole.
- Add a new ICMPv6 unreachable reply rate limiting category for failed
delivery attempts and implement rate limiting for UDPv6 (submitted by
bz).
Approved by: re (kensmith)
Reviewed by: bz
machines.
- Leave the long-term load balancer running by default once per second.
- Enable stealing load from the idle thread only when the remote processor
has more than two transferable tasks. Setting this to one further
improves buildworld. Setting it higher improves mysql.
- Remove the bogus pick_zero option. I had not intended to commit this.
- Entirely disallow migration for threads with SRQ_YIELDING set. This
balances out the extra migration allowed for with the load balancers.
It also makes pick_pri perform better as I had anticipated.
Tested by: Dmitry Morozovsky <marck@rinet.ru>
Approved by: re
properly. We have to temporarily unlock the TDQ lock so we can lock
the thread and add it to the run queue. This is used only for KSE.
- When we add a thread from the tdq_move() via sched_balance() we need to
ipi the target if it's sitting in the idle thread or it'll never run.
Reported by: Rene Landan
Approved by: re
- Add custom .c wrappers for the firmware, rather than the standard
firmware(9) generated firmware objects to work around toolchain
problems on ia64 involving linking objects produced by
ld -b -binary into the kernel.
- Move from using Myricom's ".dat" firmware blobs to using Myricom's
zlib compressed ".h" firmware header files. This is done to
facilitate the custom wrappers, and saves a fair amount of wired
memory in the case where the firmware is built in, or preloaded.
- Fix two compile issues in mxge which only appear on non-i386/amd64.
Reviewed by: mlaier, mav (earlier version with just zlib support)
Glanced at by: sam
Approved by: re (kensmith)
IPV6_IPSEC_POLICY always visible again. This unbreaks some
third party user space applications.
PR: 114491
Reported by: sumikawa
Reviewed by: sumikawa
Approved by: re (hrs)
should finally fix fsx test case.
The printf's added here would be eventually turned into
assertions.
Submitted by: Mingyan Guo (mostly)
Approved by: re (tmpfs blanket)
new code and third party modules which try to depend on it.
- Initialize sched_lock in sched_4bsd.c.
- Declare sched_lock in sparc64 pmap.c and assert that we're compiling
with SCHED_4BSD to prevent accidental crashes from running ULE. This
is the sole remaining file outside of the scheduler that uses the
global sched_lock.
Approved by: re
been in development for over 6 months as SCHED_SMP.
- Implement one spin lock per thread-queue. Threads assigned to a
run-queue point to this lock via td_lock.
- Improve the facility for assigning threads to CPUs now that sched_lock
contention no longer dominates scheduling decisions on larger SMP
machines.
- Re-write idle time stealing in an attempt to make it less damaging to
general performance. This is still disabled by default. See
kern.sched.steal_idle.
- Call the long-term load balancer from a callout rather than sched_clock()
so there are no locks held. This is disabled by default. See
kern.sched.balance.
- Parameterize many scheduling decisions via sysctls. Try to document
these via sysctl descriptions.
- General structural and naming cleanups.
- Document each function with comments.
Tested by: current@ amd64, x86, UP, SMP.
Approved by: re
require fewer blocking loops.
- Don't use atomic ops with 4BSD or on UP.
- Only use the blocking loop if ULE is compiled in.
- Use the correct memory barrier.
Discussed with: attilio, jhb, ssouhlal
Tested by: current@
Approved by: re
- use proper tick gathering macro instead of ticks directly.
- Placed reasonable boundaries on sets that a user can do
that are converted to ticks from ms.
- Fix CMT_PF to always check to be sure CMT is on.
- Fix ticks use of CMT_PF.
- put back code to allow asconfs to be queued while INITs are in flight
and before the assoc is established.
- During window probes, an ack'd packet might be left with the window
probe mark on it causing it to be retransmitted. Change so that
the flight decrease macro clears the window_probe mark.
- Additional logging flight size/reading and ASOC LOG. This
is only enabled if you manually insert things into opt_sctp.h
since its a set of debug code only.
- Found an interesting SMP race in the way data was appended which
could cause a reader to lose a part of a message, had to
reorder when we marked the message was complete to after
the data was appended.
- bug in ADD-IP for the subset bound socket case when the peer has only
one address
- fix ASCONF implicit success/error handling case
- proper support of jails in Freebsd 6>
- copy out the timeval for the 64 bit sparc world on cookie-echo
alignment error crashes without this).
Approved by: re(Ken Smith)
config info. from device.hints. Some machines have ipmi controllers
that do not have attachment info in either PCI, SMBIOS or ACPI.
This idea was hacked together by me and then done properly by
jhb.
Submitted by: jhb
Reviewed by: jhb (man page)
Approved by: re (Ken Smith)
MFC after: 1 week
The SDM states that writing to ar.bspstore invalidates the ar.rnat
register as a side-effect. This was interpreted as "bits in the
ar.rnat register that correspond to registers whose value is on
the stack are undefined'. Since we keep the kernel stack NaT-
aligned with the user stack (i.e. the lower 9 bits of the backing
store pointer remain unchanged when we switch to the kernel stack)
bits that need preserving would be preserved.
That interpretation is questionable. So, now, the interpretation
is more absolute: ar.rnat is undefined after writing to ar.bspstore.
As such, we write the saved value of ar.rnat back to ar.rnat after
writing to ar.bspstore.
Discussed with: christian.kandeler@hob.de
Approved by: re (kensmith)
- Keep last transaction label for each destination.
- If the next label is not free, just give up.
- This should reduce CPU load for TX on if_fwip under heavy load.
Approved by: re (hrs)
NET_NEEDS_GIANT, which will shortly be removed. This is done in a
away that it may be easily reattached to the build before 7.1 if
appropriate locking is added. Specifics:
- Don't install netatm include files
- Disconnect netatm command line management tools
- Don't build libatm
- Don't include ATM parts in rescue or sysinstall
- Don't install sample configuration files and documents
- Don't build kernel support as a module or in NOTES
- Don't build netgraph wrapper nodes for netatm
This removes the last remaining consumer of NET_NEEDS_GIANT.
Reviewed by: harti
Discussed with: bz, bms
Approved by: re (kensmith)
vm_phys_free_pages(). Rename vm_phys_alloc_pages_locked() to
vm_phys_alloc_pages() and vm_phys_free_pages_locked() to
vm_phys_free_pages(). Add comments regarding the need for the free page
queues lock to be held by callers to these functions. No functional
changes.
Approved by: re (hrs)
- CMT_PF states added (w/sysctl to turn the PF version on)
- sctp_input.c had a missing incr of cookie case when the
auth was bad. This meant a free was called without an
increment to refcnt, added increment like rest of code.
- There was a case, unlikely, when the scope of the destination
changed (this is a TSNH case). In that case, it would not free
the alloc'ed asoc (in sctp_input.c).
- When listed addresses found a colliding cookie/Init, then
the collided upon tcb was not unlocked in sctp_pcb.c
- Add error checking on arguments of sctp_sendx(3) to prevent it from
referencing a NULL pointer.
- Fix an error return of sctp_sendx(3), it was returing
ENOMEM not -1.
- Get assoc id was changed to use the sanctified socket api
method for getting a assoc id (PEER_ADDR_INFO instead of
PEER_ADDR_PARAMS).
- Fix it so a peeled off socket will get a proper error return
if it trys to send to a different address then it is connected to.
- Fix so that select_a_stream can avoid an endless loop that
could hang a caller.
- time_entered (state set time) was not being set in all cases
to the time we went established.
Approved by: re(ken smith)
This adds a function to agp.c to set the aperture resource ID if it's
not the usual AGP_APBASE. Previously, agp.c had been assuming
AGP_APBASE, which resulted in incorrect agp_info, and contortions by
agp_i810.c to work around it.
This also adds functions to agp.c for default AGP_GET_APERTURE() and
AGP_SET_APERTURE(), which return the aperture resource size and disallow
aperture size changes. Moving to these for our AGP drivers will likely
result in stability improvements. This should fix 855-class aperture
size detection.
Additionally, refuse to attach agp_i810 when some RAM is above 4GB and
the GART can't reference memory that high. This should be very rare.
The correct solution would be bus_dma conversion for agp, which is
beyond the scope of this change. Other AGP drivers could likely use
this change as well.
G33/Q35/Q33 AGP support is also included, but disconnected by default
due to lack of testing.
PR: kern/109724 (855 aperture issue)
Submitted by: FUJIMOTO Kou<fujimoto@j.dendai.ac.jp>
Approved by: re (hrs)
Add support for the CENTIPAD board (http://www.harerod.de/centipad/index.html)
(which is a very cool, very small ARM board)
Add support for KB9202B (it has different memory)
Make BOOT_FLAVOR settable
Minor cleanup nits
Approved by: re@
by removing files from src/sys/coda, and updating include paths in the
new location, kernel configuration, and Makefiles. In one case add
$FreeBSD$.
Discussed with: anderson, Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
Repo-copy madness: simon
- change include style so build in kernel try OR standalone work.
- Limit HWCSUM - I was led to believe that it would work with RSS,
but our testing had odd issues which suggests this is false.
- A fatfinger error in the ioctl code made ifconfig up not work.
Approved by: re
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
and use that in preference to maxssiz during exec() rather than always
using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
RLIMIT_STACK to determine if the existing mapping for the stack needs to
be grown or shrunk (as well as how much it should be grown or shrunk).
Approved by: re (kensmith)
to the FAT is possible.
Make the FAT block size less arbitrary before it is rounded up:
- for FAT12, default to 3*512 instead of to 3 sectors. The magic 3 is
the default number of 512-byte FAT sectors on a floppy drive. That
many sectors is too many if the sector size is larger.
- for !FAT12, default to PAGE_SIZE instead of to 4096. Remove
MSDOSFS_DFLTBSIZE since it only obfuscated this 4096.
For reading the BPB, use a block size of 8192 instead of 2048 so that
sector sizes up to 8192 can work. We should try several sizes, or just
try the maximum supported size (MAXBSIZE = 64K). I use 8192 because
that is enough for DVD-RW's (even 2048 is enough) and 8192 has been
tested a lot in use by ffs.
This completes fixing msdosfs for some large sector sizes (up to 8K
for read and 64K for write). Microsoft documents support for sector
sizes up to 4K in mdosfs. ffs is currently limited to 8K for both
read and write.
Approved by: re (kensmith)
Approved by: nyan (several years ago)
Rev 1.9 introduced another path where machclk_freq would be initialized
before the rest of setup was done (i.e. initializing the callout). Make
the one-time initialization a separate function and make init_machclk()
able to be called multiple times, any time. We depend on tsc_freq first
being updated from the highest priority eventhandler, thus we run last
and call init_machclk() to set machclk_freq. Also, don't initialize
static variables to 0.
Tested by: Eygene Ryabinkin
Approved by: re
part of fixing msdosfs for large sector sizes. One of the fixed bugs
was fatal for large sector sizes.
1. The fsinfo block has size 512, but it was misunderstood and declared
as having size 1024, with nothing in the second 512 bytes except a
signature at the end. The second 512 bytes actually normally (if
the file system was created by Windows) consist of a second boot
sector which is normally (in WinXP) empty except for a signature --
the normal layout is one boot sector, one fsinfo sector, another
boot sector, then these 3 sectors duplicated. However, other
layouts are valid. newfs_msdos produces a valid layout with one
boot sector, one fsinfo sector, then these 2 sectors duplicated.
The signature check for the extra part of the fsinfo was thus
normally checking the signature in either the second boot sector
or the first boot sector in the copy, and thus accidentally
succeeding. The extra signature check would just fail for weirder
layouts with 512-byte sectors, and for normal layouts with any other
sector size.
Remove the extra bytes and the extra signature check.
2. Old versions did i/o to the fsinfo block using size 1024, with the
second half only used for the extra signature check on read. This
was harmless for sector size 512, and worked accidentally for sector
size 1024. The i/o just failed for larger sector sizes.
The version being fixed did i/o to the fsinfo block using size
fsi_size(pmp) = (1024 << ((pmp)->pm_BlkPerSec >> 2)). This
expression makes no sense. It happens to work for sector small
sector sizes, but for sector size 32K it gives the preposterous
value of 64M and thus causes panics. A sector size of 32768 is
necessary for at least some DVD-RW's (where the minimum write size
is 32768 although the minimum read size is 2048).
Now that the size of the fsinfo block is 512, it always fits in
one sector so there is no need for a macro to express it. Just
use the sector size where the old code uses 1024.
Approved by: re (kensmith)
Approved by: nyan (several years ago for a different version of (2))
than indirecting through ifaddr_byindex, which makes things easier with
respect to virtualized network stacks.
Submitted by: Marko Zec <zec at icir dot org>
Reviewed by: Leonid Grossman <Leonid dot Grossman at neterion dot com>
Approved by: re (kensmith)
non-sleepable lock held. drm_pci_alloc() calls them, thus drm mutex shall
not be held during the call.
Move the drm_pci_alloc() to the start of the i915_initialize() and drop the
the drm mutex around it.
Reported by: Ganbold <ganbold micom mng net>
Reviewed by: anholt
Approved by: re (hrs)
MFC after: 1 week
- use net80211 for scanning and pass the results back to the scan cache
- use ieee80211_init_channels to fill our channel list
- fix up state transitions
- depreciate the old wicontrol ioctls
- add some debugging lines (#define NDIS_DEBUG)
Reviewed by: sam
Approved by: re (kensmith)
ENOTTY. Make the control vnode a regular file so that ioctls are passed
through to our kernel module.
Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
some previously disabled code which according to the comment caused a
problem during shutdown. But even that is still better than
triggering a kernel panic whenever venus is started.
Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
we can't open container files by device/inode number pair anymore.
Replace the CODA_OPEN upcall with CODA_OPEN_BY_FD, where venus returns
an open file descriptor for the container file. We can then grab a
reference on the vnode coda_psdev.c:vc_nb_write and use this vnode for
further accesses to the container file.
Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
ioctls can be removed. These have been #ifdef'd out and left as a reference in
case any of the RIDs need to be turned into sysctls at a later date.
Reviewed by: sam, avatar
Approved by: re (kensmith)
operations. But we don't have to, if we find the coda_mntinfo structure
for this device in our linked list, we know the device is good.
Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
need to initialize dev so that we can actually find the allocated
coda_mntinfo structure later on.
Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
macros for lock_profiling.
Reported by: Tom McLaughlin <tmclaugh@sdf.lonestar.org>
Tested by: Tom McLaughlin <tmclaugh@sdf.lonestar.org>
Approved by: jeff (mentor)
Approved by: re
ELF files. On ia64 the ELF header contains information about
characteristics of the machine code and ld(1) needs that to
determine whether input files are compatible for linking. To
this end non-ELF files are not supported by binutils on ia64.
However, the resulting ELF file seems to be correct despite the
warnings and the non-supportedness of non-ELF files and it
appears enough to unbreak the build of firmware(9) files on ia64
by simply supressing the warning.
Ran into by: gallatin@
Approved by: re (hrs)
Looks good to me: mlaier@
vm_page_cowfault(). Initially, if vm_page_cowfault() sleeps, the given
page is wired, preventing it from being recycled. However, when
transmission of the page completes, the page is unwired and returned to
the page queues. At that point, the page is not in any special state
that prevents it from being recycled. Consequently, vm_page_cowfault()
should verify that the page is still held by the same vm object before
retrying the replacement of the page. Note: The containing object is,
however, safe from being recycled by virtue of having a non-zero
paging-in-progress count.
While I'm here, add some assertions and comments.
Approved by: re (rwatson)
MFC After: 3 weeks
of the the first cluster in a file (and, if the allocation cannot be
continued contiguously, for subsequent clusters in a file) was randomized
in an attempt to leave space for contiguous allocation of subsequent
clusters in each file when there are multiple writers. This reduced
internal fragmentation by a few percent, but it increased external
fragmentation by up to a few thousand percent.
Use simple sequential allocation instead. Actually maintain the fsinfo
sequence index for this. The read and write of this index from/to
disk still have many non-critical bugs, but we now write an index that
has something to do with our allocations instead of being modified
garbage. If there is no fsinfo on the disk, then we maintain the index
internally and don't go near the bugs for writing it.
Allocating the first free cluster gives a layout that is almost as good
(better in some cases), but takes too much CPU if the FAT is large and
the first free cluster is not near the beginning.
The effect of this change for untar and tar of a slightly reduced copy
of /usr/src on a new file system was:
Before (msdosfs 4K-clusters):
untar: 459.57 real untar from cached file (actually a pipe)
tar: 342.50 real tar from uncached tree to /dev/zero
Before (ffs2 soft updates 4K-blocks 4K-frags)
untar: 39.18 real
tar: 29.94 real
Before (ffs2 soft updates 16K-blocks 2K-frags)
untar: 31.35 real
tar: 18.30 real
After (msdosfs 4K-clusters):
untar 54.83 real
tar 16.18 real
All of these times can be improved further.
With multiple concurrent writers or readers (especially readers), the
improvement is smaller, but I couldn't find any case where it is
negative. 342 seconds for tarring up about 342 MB on a ~47MB/S partition
is just hard to unimprove on. (This operation would take about 7.3
seconds with reasonably localized allocation and perfect read-ahead.)
However, for active file systems, 342 seconds is closer to normal than
the 16+ seconds above or the 11 seconds with other changes (best I've
measured -- won easily by msdosfs!). E.g., my active /usr/src on ffs1
is quite old and fragmented, so reading to prepare for the above
benchmark takes about 6 times longer than reading back the fresh copies
of it.
Approved by: re (kensmith)
- Move udp_sendspace and udp_recvspace global variables and associated
sysctls to the top of the file where most other such things are present.
- Rename static variable 'blackhole' to 'udp_blackhole' and unstaticize
so that we can add blackhole support for UDPv6 using the same MIB
variable.
- Move udp_append() above udp_input() to match the function order in
udp6_usrreq.c.
Approved by: re (kensmith)
- reduce cpu usage by as much as 25% (40% -> 30) by doing txq reclaim more efficiently
- use mtx_trylock when trying to grab the lock to avoid spinning during long encap loop
- add per-txq reclaim task
- if mbufs were successfully re-claimed try another pass
- track txq overruns with sysctl
Approved by: re (blanket)
- Add controller id for Intel 82801I (ICH9).
PR: kern/114399
Submitted by: Michael Fuckner <michael@fuckner.net>
- MSI support. Disable by default due to various issues with too many
broken hardwares. MSI can be enabled through device.hints(5) or
kenv(8) by setting "hint.pcm.%d.msi=1".
Partially submitted by: kevlo
YAMAMOTO Taku <taku@tackymt.homeip.net>
Tested by: joel, kevlo, YAMAMOTO Taku
Approved by: re (hrs)
MFC after: 3 days
prototypes, don't use register, etc. Synchronize structure and
layout to the IPv4 versions of these functions to a greater extent,
making visual comparison easier.
Remove now stale or incorrect comments.
Enable full lock assertions, and correct one exception handling
case where the wrong label was jumped to.
Tested by: bz
Approved by: re (bmah)
do the heavy lifting of the 'mii_tick' function, rue was left behind.
Implement this in a naive way. Reports from the field show this makes
the driver functional with some locking issues, as opposed to an
instant panic. Those will be addressed in a later version of the
driver.
Approved by: re@ (bmah)
With the in_mcast.c code, if an interface for an IPv4 multicast join was
not specified, and a route did not exist for the specified group in the
unicast forwarding tables, the join would be rejected with the error
EADDRNOTAVAIL.
This change restores the old behaviour whereby if no interface is specified,
and no route exists for the group destination, the IPv4 address list is
walked to find a non-loopback, multicast-capable interface to satisfy
the join request.
This should resolve problems with starting multicast services during
system boot or when a default forwarding entry does not exist.
Approved by: re (rwatson)
Sort NETGEAR list per convention.
Swap QUALCOMM and QUALCOMM2.
Add a few vendor products.
no md5 changes with this file (except when USBVERBOSE is enabled)
Approved by: re@ (blanket)
vm_fault_additional_pages() that was introduced in revision 1.47. Then
as now, it is unnecessary because dev_pager_haspage() returns zero for
both the number of pages to read ahead and read behind, producing the
same exact behavior by vm_fault_additional_pages() as the special case
handling.
Approved by: re (rwatson)
- Plug memory leak.
- Respect underlying vnode's properties rather than assuming that
the user want root:wheel + 0755. Useful for using tmpfs(5) for
/tmp.
- Use roundup2 and howmany macros instead of rolling our own version.
- Try to fix fsx -W -R foo case.
- Instead of blindly zeroing a page, determine whether we need a pagein
order to prevent data corruption.
- Fix several bugs reported by Coverity.
Submitted by: Mingyan Guo <guomingyan gmail com>, Howard Su, delphij
Coverity ID: CID 2550, 2551, 2552, 2557
Approved by: re (tmpfs blanket)
- Handle directories and leaves other than unit directories and text leaves
correctly.
- Now we can retrieve CROM of iSight correctly.
Approved by: re (hrs)
Tested by: flz
MFC after: 3 days
- When a LDT entry changes, the old one is freed while it is still
referenced by gdt and ldtr. This can lead to disruptive behaviours in
particular on SMP machines.
- When a LDT entry changes, it is assumed that the only one entity sharing
the same LDT are threads in the same proc. It doesn't take in account
edge cases where two processes share the same VM (rfork'ed ones, for
example).
This patch addresses these two problems and addictionally it fixes the
usage of refcount switching back it to the old manually-grown refcount
(since in this case would be faster).
Diagnosed by: tegge
Tested by: pho (a former version)
Reviewed by: kib
Approved by: jeff (mentor)
Approved by: re
free to be consistent with other error handling, and release socket buffer
lock before freeing mbufs and statistics updates rather than after.
Approved by: re (kensmith)
tracks the total number of reactivated pages. (We have not been
counting reactivations by vm_fault() since revision 1.46.)
Correct a comment in vm_fault_additional_pages().
Approved by: re (kensmith)
MFC after: 1 week
in. These are exclusively in the name of the company for this round.
No new devices have been added, but the MITEL entry has been
eliminated because nothing uses it. You won't see any difference
unless you have USBVERBOSE defined for the kernel.
Approved by: re@ (blanket)
- Adjust lock_profiling stubs semantic in the hard functions in order to be
more accurate and trustable
- Disable shared paths for lock_profiling. Actually, lock_profiling has a
subtle race which makes results caming from shared paths not completely
trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for
re-enabling this paths, but is currently intended for developing use only.
- Use homogeneous names for automatic variables in hard functions regarding
lock_profiling
- Style fixes
- Add a CTASSERT for some flags building
Discussed with: kmacy, kris
Approved by: jeff (mentor)
Approved by: re
sys/i4b/include/ so they will be available to all architectures
once I4B compiles on those.
We no longer need these "glue" files.
Reminded by: nyan
Approved by: re (kensmith)
nxge: cast page size fragments down to (int). If the vm's demand paging
PAGE_SIZE is ever too big for that, we've got far bigger problems.
ofw: move va_start() a little earlier. gcc-4.2 doesn't like us modifying
the last arg before the va_start().
Approved by: re (rwatson)
would be 93C46(1Kbit) or 93C56(2Kbit). One of differences between them
is number of address lines required to access the EEPROM. For example,
93C56 EEPROM needs 8 address lines to read/write data. If 93C56
recevied premature end of required number of serial clock(CLK) to set
OP code/address of EEPROM, the result would be unexpected behavior.
Previously it tried to detect 93C46, which requires 6 address lines,
and then assumed it would be 93C56 if read data was not expected
value. However, this approach didn't work in some models/situations
as 93C56 requries 8 address lines to access its data. In order to fix
it, change EEPROM probing order such that 93C56 is detected reliably.
While I'm here change hard-coded address line numbers with defined
constant to enhance readability.
PR: 112710
Approved by: re (mux)
- Sort copyrights by date.
- Re-wrap, and in some cases, fix comments.
- Fix tabbing, white space, remove extra blank lines.
- Remove commented out debugging printfs.
Approved by: re (kensmith)
it with netipsec now that KAME IPsec is gone.
While here add missing netinet6 directories.
Add comments about the ports needed to be able to run those targets.
Reviewed by: philip
Approved by: re (rwatson)
o Adonics Cable 205
o Aiptek PocketCAM 3Mega
o Belkin USB2SCSI
o Casio QV DigiCam
o CCYU EasyDisk ED1064
o Desknote UCR-61S2B
o Epson Stylus Photo 875DC Card Reader
o Epson Stylus Photo 895 Card Reader
o Feiya 5-in-1 Card Reader
o Hitachi Dvd-CAM DZ-MV100A Camcorder
o HP CD-WRiter+ CD-4e
o Insystem Storage Adapter v2
o Kyocera Finecam S3x
o Kyocera Finecam S4
o Kyocera Finecam S5
o Kyocera Finecam L3
o Lexar USB CF Reader
o MindAtWork Digital Wallet
o Minolta Dimage F300
o Minolta Dimage E223
o Minsumi USB Fdd
o Netac USB-CF-Card
o NetChip USB Clik! 40
o Onspec MDCFE-B USB CF Reader
o Onspec SIIG/Datafab Memory Stick + CF Reader/Writer
o Onspec Datafab-based Reader
o Onspec PNY/Datafab CF+SM Reader
o Onspec SimpleTech/Datafab CF+SM Reader
o Onspec MDSM-b Reader
o Onspec USB To CF + SM Combo (LC1)
o Onspec ImageMate SDDR55
o Panasonic LS-120 Camera
o Samsung Techwin Digimax 410
o Shuttle eUSB SmartMedia / CompactFlash Adapter
o Skanhex MD 7425 Camera
o Skanhex SX 520z Camera
o Sony Memorystick NW-MS7
o Sony Portable USB Hardrive V2
o Sony Memorystick PEG N760c
o Sony Memorystick MSC-U03
o TREK/IBM USB memory key
o Trumpion T33520 USB Flash Card Controller
o Trumpion MP3 Player
o Vivtar Vivicam 35Xx
o WinMaxGroup USB Flash Disk 64M-C
o Zoran Digital Camera EX-20 DSC
and maybe a few others...
Submitted by: Vaidas Damosevicius and flz
PR: 79893
Reviewed by: njl, flz
Approved by: re (blanket)
ftruncate(), but without the pad arg.
There are several reasons for this. Consider 'mmap()'. On AMD64, the
function call (and syscall) ABI allow for 6 register arguments. Additional
arguments go on the stack. mmap(2) has 6 arguments. However, the syscall
definition has an extra 'int pad' argument. This pushes it to 7 arguments,
which means one must spill into the memory stack. Since the kernel API
doesn't match userland API, we have a hack in libc - libc/sys/mmap.c.
This implements the userland API by calling __syscall() with an extra
argument and the pad argument, for a total of 8 args. This is all
unnecessary and inconvenient for several things, including the kernel's
syscall handler code which now has to handle merging stack arguments with
register arguments. It is a big deal for certain 3rd party code.
I'm adding libc glue to make the transition totally painless. I had
intended to mark the old syscalls as COMPAT6, but the potential to shoot
your feet by building a new kernel without COMPAT_FREEBSD6 but with a
slighly older userland was too great. For now, they have manual
"freebsd6_" prefixes rather than being COMPAT6. They will go back to
being marked 'COMPAT6' after 7-stable starts.
Approved by: re (kensmith)
Also, change the visibility of compat syscalls a slightly. Compat
syscalls were missing from 'syscalls.h' entirely. This additionally adds
them with their compat prefix. eg: SYS_freebsd6_mmap.
Also, the syscalls.c names strings have different prefixes to differentiate
syscalls. Instead of several "old.mmap" strings, there will now be a
"compat.mmap" and "compat6.mmap" etc. Before, both would have had the
same "old.mmap" label.
Approved by: re
that should be a no-op (for example, requesting SYNC on record path).
The standards does not indicate that such requests are illegal, so
just return it as success instead of EINVAL.
Approved by: re (mux)
shall not be called while holding cdev mutex. devfs_inos unrhdr has cdev as
mutex, thus creating this LOR situation.
Postpone calling free() in kern/subr_unit.c:alloc_unr() and nested functions
until the unrhdr mutex is dropped. Save the freed items on the ppfree list
instead, and provide the clean_unrhdrl() and clean_unrhdr() functions to
clean the list.
Call clean_unrhdrl() after devfs_create() calls immediately before
dropping cdev mutex. devfs_create() is the only user of the alloc_unrl()
in the tree.
Reviewed by: phk
Tested by: Peter Holm
LOR: 80
Approved by: re (kensmith)
the 7.0 timeframe.
This is needed because I4B is not locked and NET_NEEDS_GIANT goes away.
The plan is to lock I4B and bring everything back for 7.1.
Approved by: re (kensmith)
setenv(3) by tracking the size of the memory allocated instead of using
strlen() on the current value.
Convert all calls to POSIX from historic BSD API:
- unsetenv returns an int.
- putenv takes a char * instead of const char *.
- putenv no longer makes a copy of the input string.
- errno is set appropriately for POSIX. Exceptions involve bad environ
variable and internal initialization code. These both set errno to
EFAULT.
Several patches to base utilities to handle the POSIX changes from
Andrey Chernov's previous commit. A few I re-wrote to use setenv()
instead of putenv().
New regression module for tools/regression/environ to test these
functions. It also can be used to test the performance.
Bump __FreeBSD_version to 700050 due to API change.
PR: kern/99826
Approved by: wes
Approved by: re (kensmith)
can acquire shared filedescriptor locks in the appropriate cases.
- Remove Giant from calls that issue ioctls. The ioctl path has been
mpsafe for some time now.
- Only acquire giant for VOP_ADVLOCK when the filesystem requires giant.
advlock is now mpsafe.
Reviewed by: rwatson
Approved by: re
to protect this datastructure instead.
- Preallocate an extra lockf structure in case we want to split a lock
on insert or delete.
- msleep() on the vnode interlock when blocking on a lock.
Reviewed by: rwatson
Approved by: re
- Use cpu_spinwait() in the spin loops in stop_cpus(), restart_cpus(), and
smp_rendezvous_action().
- Remove unneeded acq memory barriers in stop_cpus(), restart_cpus(), and
smp_rendezvous_action().
- Add an additional synch point in smp_rendezvous() to ensure that all the
CPUs will always see an up-to-date value of smp_rv_setup_func.
Reviewed by: attilio
Approved by: re (kensmith)
Tested on: alpha, amd64, i386, sparc64 SMP (for several years)
nfsnode could lead to attrs being stale. One example (that we
ran into) was a READDIR+, WRITE. The responses came back in
order, but the attrs from the WRITE were loaded before the
attrs from the READDIR+, leading to the wrong size from being
read on the next stat() call.
MFC after: 1 week
Submitted by: mohans
Approved by: re (kensmith)
recoverable and unrecoverable. For the former, we redirty the
buffer and hang onto it for future retries. For the latter (eg.
ESTALE), we discard the buffer and return the error back to the
user on the next syscall. This fixes a number of vfs panics and
fixes having a large number of dirty buffers (that cannot be
written out and reclaimed) from hanging around. Thanks to ups@
for discussions on this issue.
Reported by: kris, Kai, others
Approved by: re (kensmith)
Lock cdev mutex too to close the race with tty being freed.
Relock clone_drain_lock to prevent the LOR with proctree lock, thus
add #include <fs/devfs/devfs_int.h>.
Suggested by: tegge
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)
Lock Giant in the clone handler.
Use destroy_dev_sched() explicitely from pty_maybecleanup() and postpone
pty_release() until both master and slave cdevs are destroyed by setting
it as callback for destroy_dev_sched().
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)
destroy_dev() is called from csw method, and no d_purge driver method is
provided. Transform the direct call to destroy_dev() into destroy_dev_sched().
Reviewed by: njl (programming interface)
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)
destroy_dev() from d_close() cdev method would self-deadlock.
devfs_close() bump device thread reference counter, and destroy_dev()
sleeps, waiting for si_threadcount to reach zero for cdev without
d_purge method.
destroy_dev_sched() could be used instead from d_close(), to
schedule execution of destroy_dev() in another context. The
destroy_dev_sched_drain() function can be used to drain the scheduled
calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains
the events clone to make sure no lingering devices are left after
dev_clone event handler deregistered.
make_dev_credf(MAKEDEV_REF) function should be used from dev_clone
event handlers instead of make_dev()/make_dev_cred() to ensure that created
device has reference counter bumped before cdev mutex is dropped inside
make_dev().
Reviewed by: tegge (early versions), njl (programming interface)
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)
First, we were never correctly checking for a 24XX Status Type 0
response- that cased us to fall through to evaluate status for
commands as if this were a 2100/2200/2300 Status Type 0 response.
This is *close*, but not quite the same. This has been reported
to be apparent with some wierd lun configuration problems with
some arrays. It became glaringly apparent on sparc64 where none
of the correct byte swap things were done.
Fixing this omission then caused a whole universe shifting debug
cycle of endian issues for the 2400. The manual for 24XX f/w turns
out to be wrong about the endianness of a couple of entities. The
lun and cdb fields for the type 7 request are *not* unconditionally
big endian- they happen to be opposite of whatever the endian of
the current machine type is. Same with the sense data for the
24XX type 0 response.
While we're at it investigate and resolve some NVRAM endian
issues.
Approved by: re (ken)
MFC after: 3 days
call the sctp_free_remote_address() function.
- Assure that when we allocate a chunk the whoTo is NULL,
also when we free it and place it into the cache we NULL
it (that way the consolidation code will always work).
- Fix a small race, when a empty data holder is left on the stream
out queue, and both sides do a shutdown, the empty data holder
would prevent us from sending a SHUTDOWN-ACK and at the same time we
never would cleanup the empty holder (since nothing was ever in queue).
We now add a utility function that a) cleans up empty holders and
b) properly determines if there are still pending data chunks on
the stream out wheel.
Approved by: re@freebsd.org (Ken Smith)
of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate
the acquisition of the vnode interlock before releasing the vm object's
lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is
performed. Unfortunately, this allows the vnode to be recycled between
the release of the vm object's lock and the vget() on the vnode.
In this revision, I prevent the vnode from being recycled by acquiring
another reference to the vm object and underlying vnode before releasing
the vm object's lock.
This change also addresses another preexisting but trivial problem. By
acquiring another reference to the vm object, I also prevent the vm
object from being recycled. Previously, the "vnodes skipped" counter
could be wrong because if it examined a recycled vm object.
Reported by: kib
Reviewed by: kib
Approved by: re (kensmith)
MFC after: 3 weeks
and replace with software-testable sysctl node (security.audit) that
can be used to detect kernel audit support.
Obtained from: TrustedBSD Project
Approved by: re (kensmith)
Submitted by: Simon Schubert <corecode@fs.ei.tum.de>
- Defer flushing unsolicited response into taskqueue thread rather
than handle it directly in interrupt handler, since few of its
operations (like measuring/calibrating jack impedance) are quite
expensive.
- Misc. debugging cleanups.
Tested by: joel
Approved by: re (hrs)
MFC after: 3 days
Note: The offending quirk should have been made model/codec specific,
but since there were no records / log which model requires it, the quirk
logic had to be inverted (blacklist instead of whitelist).
Tested by: Arkadiy Dudevitch <dudevitch@englerllc.com>
Approved by: re (hrs)
MFC after: 3 days
This commit includes only the kernel files, the rest of the files
will follow in a second commit.
Reviewed by: bz
Approved by: re
Supported by: Secure Computing
holding the page queues lock. Thus, the page table pages released by
pmap_remove() and pmap_remove_pages() can be freed after the page queues
lock is released.
Approved by: re (kensmith)
- provide dummy routines for ic_scan_curchan and ic_scan_mindwell, we do not support those operations.
- add ieee80211_scan_done() to tell the scanning module that all channels have been scanned.
- pass IEEE80211_S_SCAN state off to net80211 so it can initiate scanning
- fix overflow in the rates array
- scale the rate value passed back from the firmware scan to the units that net80211 uses.
Submitted by: Token
Reviewed by: sam, avatar
Approved by: re (kensmith)
operating channel and use this in the scan cache rather than directly using
ic_curchan. Some firmware cards can only do a full scan and so ic_curchan does
not have the correct value.
Also add IEEE80211_CHAN2IEEE to directly dereference ic_ieee from the channel
to be used in the fast path.
Reviewed by: sam, sephe
Approved by: re (kensmith)
to be index by IEEE channel number but that is no longer the case and it needs
to be searched for.
Submitted by: avatar
Reviewed by: sam
Approved by: re (kensmith)
(1) Add size parameter to usbd_get_string()
(2) Properly limit speed when a full speed hub is plugged into a high
speed hub.
Submitted by: Hans Petter Selasky
PR: 80773, 79725
Approved by: re@ (kensmith)
yet supported by this driver. Support will be committed soon, or a
filter on all the 'newer' devices will be installed before the
release.
Approved by: re@ (blanket)
Obtained from: NetBSD, OpenBSD
Small Furry Animals by: Pink Floyd
switch (i.e. lid) is set to have an action of NONE. This is not an
invalid state, so silently return. This fixes the warning:
"acpi: request to enter state S6 failed (err 22)"
Approved by: re
the command. Make UFI devices return 'success' when asked to do a
SYNC_CACHE. There's no support for write caching in the UFI spec, so
this is the most appropriate action to undertake.
Reviewed by: scottl
Approved by: re@ (blanket)
Hellmuth with some refinements by myself and flz@. It works for me
with my non-MS mice, so nothing should be broken by it.
Submitted by: Hellmuth Michaelis
PR: 90162
Approved by: re (blanket)
pr, the submitter says:
Found this while running freebsd as guest in qemu with -usb
parameter. The patch implements the missing dynamic size based on
number of ports a hub has.
Submitted by: Lonnie Mendez
PR: 94946
Approved by: re@ (blanket)
- Remove unnecessary NULL checks after M_WAITOK allocations.
- Use VOP_ACCESS instead of hand-rolled suser_cred()
calls. [1]
- Use malloc(9) KPI to allocate memory for string. The
optimization taken from NetBSD is not valid for FreeBSD
because our malloc(9) already act that way. [2]
Requested by: rwatson [1]
Submitted by: Howard Su [2]
Approved by: re (tmpfs blanket)
properly (un)padded on the arm platform. With this change, FreeBSD/arm
boxes are able to route AppleTalk properly.
Submitted/tested by: Nathan Whitehorn <nathanw at uchicago dot edu>
Tested on: arm, i386, amd64
Approved by: re (kensmith)
patch that converts ms to ticks was used. Another PR states that a
return code of 0 is the right one for libusb.
Submitted by: Lonnie Mendez
PR: 94311
Approved by: re (blanket)
adequate. Increase them to 1k. The referenced PR made this a sysctl,
but that seems like overkill to me. The difference between 320 and
2048 bytes in modern systems, even embedded ones, seems to be in the
noise to be worth the extra hair to make it settable.
PR: 74609
Submitted by: Divacky Roman
Approved by: re (blanket)
applied to, but I'd think both), honor the timeout that's been set.
Return 0 bytes to be consistant with what libusb expects. By default,
the timeout will be zero, so only applications that change the default
will see a change. The patch only seems to apply to the interrupt end
points, but it should also apply to isochronous endpoints as well.
Submitted by: Maurice Castro
PR: 110122
Approved by: re (blanket)
- In audit_bsm.c, make sure all the arguments: ARG_AUID, ARG_ASID, ARG_AMASK,
and ARG_TERMID{_ADDR} are valid before auditing their arguments. (This is done
for both setaudit and setaudit_addr.
- Audit the arguments passed to setaudit_addr(2)
- AF_INET6 does not equate to AU_IPv6. Change this in au_to_in_addr_ex() so the
audit token is created with the correct type. This fixes the processing of the
in_addr_ex token in users pace.
- Change the size of the token (as generated by the kernel) from 5*4 bytes to
4*4 bytes (the correct size of an ip6 address)
- Correct regression from ucred work which resulted in getaudit() not returning
E2BIG if the subject had an ip6 termid
- Correct slight regression in getaudit(2) which resulted in the size of a pointer
being passed instead of the size of the structure. (This resulted in invalid
auditinfo data being returned via getaudit(2))
Reviewed by: rwatson
Approved by: re@ (kensmith)
Obtained from: TrustedBSD Project
MFC after: 1 month
some false positives but at this moment it is better to add
support then to dont have it at all (comment from Soren).
PR: kern/111516
Submitted by: Thomas Nystrom <thn at saeab dot se>
Approved by: re (kensmith)
Approved by: imp (mentor)
OK'ed by: sos (With the comment noted above about false
positives).
could lead to a deadlock).
- sleepq_set_timeout acquires callout_lock (via callout_reset()) only
with sleepq chain lock held
- msleep_spin in _callout_stop_safe lock the sleepqueue chain with
callout_lock held
In order to solve this don't use msleep_spin in _callout_stop_safe() but
use directly sleepqueues as inline msleep_spin code. Rearrange the
wakeup path in order to have it consistent too.
Reported by: kris (via stress2 test suite)
Tested by: Timothy Redaelli <drizzt@gufi.org>
Reviewed by: jhb
Approved by: jeff (mentor)
Approved by: re
This is very similar to sx_init_flags: it initializes the rwlock using
special flags passed as third argument (RW_DUPOK, RW_NOPROFILE,
RW_NOWITNESS, RW_QUIET, RW_RECURSE).
Among these, the most important new feature is probabilly that rwlocks
can be acquired recursively now (for both shared and exclusive paths).
Because of the recursion counter, the ABI is changed.
Tested by: Timothy Redaelli <drizzt@gufi.org>
Reviewed by: jhb
Approved by: jeff (mentor)
Approved by: re
put out a ispreqt2e_t structure onto the request queue- not a ispreqt2_t
structure. I forgot that the 23XX can use a t2 structure.
Approved by: re (ken, implicitly)
MFC after: 3 days
mpo_check_proc_setaudit_addr to be used when controlling use of
setaudit_addr(), rather than mpo_check_proc_setaudit(), which takes a
different argument type.
Reviewed by: csjp
Approved by: re (kensmith)
changes for example:
(From Craig Leres):
tip to a rocketport line
run "/etc/rc.d/devfs restart"
exit tip
(wait for the system to reboot)
Thanks to Robert Watson for poking me to fix this.
PR: kern/109152
Approved by: imp (mentor)
Approved by: re (kensmith)
Reviewed by: jhb
Submitted by: Craig Leres <leres@ee dot lbl dot gov>
of the file numerically for vendors and then each product numerically
by vendor (with all the foo2's sorting after the foo's). Someday, all
the usbdevs will be merged, I hope, but until then, we have these
mega-merges.
This also finishes the LINKSYS4 -> CISCOLINKSYS rename.
Approved by: re@ (blanket)
firmware reset. Also zero out struct iwi_rateset although its not strictly
necessary.
Reported by: Maxim Konovalov
Reviewed by: sam
Approved by: re (bmah)
- Remove tmpfs_zone_xxx KPI, the uma(9) wrapper, since
they does not bring any value now.
- Use |= instead of = when applying VV_ROOT flag.
- Remove tm_avariable_nodes list. Use uma to hold the
released nodes.
- init/destory interlock mutex of node when init/fini
instead of ctor/dtor.
- Change memory computing using u_int to fix negative
value in 2G mem machine.
- Remove unnecessary bzero's
- Rely uma logic to make file id allocation harder to
guess.
- Fix some unsigned/signed related things. Make sure
we respect -o size=xxxx
- Use wire instead of hold a page.
- Pass allocate_zero to obtain zeroed pages upon first
use.
Submitted by: Howard Su
Approved by: re (tmpfs blanket, kensmith)
to put out a ispreqt3e_t structure onto the request queue-
not a ispreqt3_t structure. We weren't. This turns out only
to really matter for big endian machines.
Approved by: re (ken)
MFC after: 3 days
around an output freezing problem (see the CVS log for details). This
is the same approach that sio takes to solve that problem. However,
ucom has a problem that sio doesn't have.
Consider the case where output is pending, and the device is closed.
ttyclose calls tt_close (which indirects to ucomclose) and then calls
ttyflush which calls tt_stop (which indirects to ucomstop). Since
ucomclose removed all the usb transfer points, sc_oxfer will be NULL
when ucomstop calls ucomstart. This results in a null pointer
dereference.
Since calling ucomstart in ucomstart solves other problems, we need to
work with this calling sequence. The easiest way to do that is to
bail early if sc_oxfer is NULL.
Kazuaki ODA-san came up with this patch, and filed a PR. I had seen
this bug at work and this patch does seem to solve it. He had no idea
why it worked, but knew that either this patch, or backing out ucom.c
1.56 fixed his panic. I just did the legwork of chasing down the code
paths that would cause this, and added a comment. This is obscure
enough to warrant a comment, I think.
Submitted by: Kazuaki ODA-san
PR: 113964
Approved by: re (bmah)
around to force the IO port to a fixed address. They were only turned
on in the module build and were present since the original import. This
breaks soft power-off on the Asus A7V since it reprograms the SMBus base
address to a different one than the BIOS expects. A similar issue was
found in the alpm(4) module build.
PR: kern/113986, i386/97468
MFC after: 3 days
Approved by: re
where a device timeout that occurs with a mgt frame on the tx q
will leave the net80211 layer w/o any way to make progress.
Reviewed by: thompsa, sephe
Approved by: re (hrs)
request queues rather than shove it down a word at a time, we have
to remember to put it into little endian format. Use the macros
ISP_IOXPUT_{16,32} for this purpose. Otherwise, on sparc the firmware
is loaded garbled and we get a (not surprisingly) firmware checksum
failure and the card won't start and we don't attach it.
Approved by: re (bruce)
MFC after: 3 days
both 6.x and 7.x. This is based on feedbacks on this thread
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=81818+0+current/freebsd-stable
and my use it on 6.x.
MFC after: 3 days
- Update the warning about UNION filesystem. It is now actively maintained,
although there are still some issues being resolved.
Reviewed by: freebsd-stable@, kris, bmah
Approved by: re (bmah)
- Fix fwd-tsn to use proper accessor so it does not overrun mbufs
- Fix stream reset error reporting to actually work (it has always been
broken if the peer rejects a stream reset)
- Some 64 bit friendly changes
Approved by: re(bmah@freebsd.org)
some quota limit was exceeded. Sequence of UFS_VALLOC()/UFS_VFREE()
call there could cause inodeblock to have both freefile and inodedep
dependencies without any inode in the block being marked for write.
Then, softdep_check_suspend() would return EAGAIN forewer.
Force write of inodeblock with allocated freefile softdependency by
setting IN_MODIFIED flag in softdep_freefile and unconditionally calling
UFS_UPDATE() in ufs_reclaim.
Reported by: kris
Debug help and tested by: Peter Holm
Approved by: re (kensmith)
MFC after: 3 weeks
Previously it didn't honor parent dma tag's restrictions such that
an invalid dma segment could be passed to device. The driver for the
device may panic in sanity check routine for the dma segment or may
produce unexpected results. I have no idea how it could ever have
worked before.
Reviewed by: grehan
Tested by: gad
Approved by: re (hrs)
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.
Reviewed by: scottl (long a ago)
Approved by: re (hrs)
Improvements:
* /etc/rc.suspend,rc.resume are always run, no matter the source of the
suspend request (user or kernel, apm or acpi)
* suspend now requires positive user acknowledgement. If a user program
wants to cancel the suspend, they can. If one of the user programs
hangs or doesn't respond within 10 seconds, the system suspends anyway.
* /dev/apm is clonable, allowing multiple listeners for suspend events.
In the future, xorg-server can use this to be informed about suspend
even if there are other listeners (i.e. apmd).
Changes:
* Two new ACPI ioctls: REQSLPSTATE and ACKSLPSTATE. Request begins the
process of suspending by notifying all listeners. acpi is monitored by
devd(8) and /dev/apm listener(s) are also counted. Users register their
approval or disapproval via Ack. If anyone disapproves, suspend is vetoed.
* Old user programs or kernel modules that used SETSLPSTATE continue to
work. A message is printed once that this interface is deprecated.
* acpiconf gains the -k flag to ack the suspend request. This flag is
undocumented on purpose since it's only used by /etc/rc.suspend. It is
not intended to be a permanent change and will be removed once a better
power API is implemented.
* S5 (power off) is no longer supported via acpiconf -s 5 or apm -z/-Z.
This restores previous behavior of halt/shutdown -p being the interface.
* Miscellaneous improvements to error reporting
Approved by: re
o Consistently use device_foo_t and bus_foo_t for functions implementing
device_foo and bus_foo respectively. Adjust those routines that were wrong
(we should do this throughout the tree).
o make all the modules depend on usb. Otherwise these modules won't
load.
o ucycom doesn't need usb_port.h
o Minor unifdefing
o uhub, umass, ums, urio, uscanner conversion complete.
o ukbd: Remove the NO_SET_PROTO quirk (fixes a PR 77940). NetBSD removed
their check and setting the proto a long time ago.
o umodem panic fixed. UQ_ASSUME_CM_OVER_DATA quirk removed because I've never
seen a umodem that needed this rejection for proection (this gets rid of
~20% of the quirks).
Approved by: re@ (kensmith)
PR: 77940
Older drivers that do not wish to convert to the native API (which
will work with both 6.x and 7.x) can simply include
<dev/usb/usb_port.h>. Drivers in the tree shouldn't these macros,
unless they actually work on other OSes and are actively maintained.
Approved by: re@
when 'make obj' was done first. I found this when fixing
a problem reported by tinderbox, but forgot to send the
patchset to re@ altogether.
Approved by: re (kensmith)
Postpone call to devfs_free() after cdev mutex is dropped. Reuse
cdp_list link for queuing devices awaiting deletion in the
cdevp_free_list.
Reported by: Hans Petter Selasky <hselasky c2i net>
Tested by: Peter Holm
Approved by: re (kensmith)
MFC after: 2 weeks
mouse pointer instead of a 8 x 16 one so device drivers don't
need to bring there own one there and in gfb_mouse() (ab)use
the pixel_mask argument of putm() to pass along on/off info as
erasing the mouse cursor image by redrawing the text underneath
doesn't work as we use hardware cursors on sparc64.
allowing the driver for the host-PCI-bridge to indicate that
reenumeration of the PCI busses isn't supported by returning
-1 instead of a valid PCI bus number. This is needed in order
support both Tomatillo, which don't support reenumeration and
thus are apparently intended to be used for independently
numbered PCI domains only, and Psycho bridges, whose busses
need to be reenumerated on at least some E450, without the
#ifndef currently used for sun4v in order to support multiple
independently PCI domains. The actual allocation/incrementation
of the PCI bus numbers is now done in psycho(4), though it
no longer establish a mapping between bus numbers and device
nodes like ofw_pci_alloc_busno() did as that functionality
wasn't used (but can easily brought back if really needed).
The now no longer used sys/sparc64/pci/ofw_pci.c is also
removed from sys/conf/files.sun4v as ofw_pci_alloc_busno()
wasn't used there in the first place.
- In ofw_pci_default_{adjust_busrange,intr_pending}() sanity
check that the device has a parent before passing it on.
- Make psycho_softcs static to sys/sparc64/pci/psycho.c as
it's not used outside of that module.
- In sys/sparc64/pci/ofw_pcib_subr.c remove the superfluous
inclusion of opt_global.h and correct the debug output for
adjusting the subordinate bus number.
instead of using the PCI bus number, like it's already done for
sun4v in order to deal properly with independently numbered PCI
domains which can't be reenumerated (in the case of sun4u f.e.
Tomatillo bridges). For machines where we need to reenumerate
all PCI busses this change obviously introduces the theoretical
cosmetic problem that the device number of the PCI bus no longer
equals to its PCI bus number. In practice this doesn't happen
as both are assigned linearly and in parallel.
- to show a specific set: ipfw set 3 show
- to delete rules from the set: ipfw set 9 delete 100 200 300
- to flush the set: ipfw set 4 flush
- to reset rules counters in the set: ipfw set 1 zero
PR: kern/113388
Submitted by: Andrey V. Elsukov
Approved by: re (kensmith)
MFC after: 6 weeks
passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because
it came from the inactive queue and PG_UNMANAGED pages are not in any
page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS
objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED.
So, if the page that is passed to vm_pageout_clean() is not
PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its
neighbors from the same object cannot themselves be PG_UNMANAGED.
Reviewed by: tegge
don't have it. Some partitioning schemes, as well as file systems,
operate on the geometry and without it such schemes (e.g. MBR)
and file systems (e.g. FAT) can't be created. This is useful for
memory disks.
will intialize the the header length and re-initialize the mbuf pointer
to reference the mbuf that is allocated after moving user supplied packet
data in.
to hold off freeing if there is data pending ... someone
might do send/close. Which means we want the data to
go and then close it after startup. Added comments to
the code as well to note that this is done for a reason.
Remove device_t dv, since it is no longer needed.
Add sizeof(device_t) to replace sizeof dv.
Change device_detach(dev) to device_detach(dev->subdevs[i]) since the type
of dev isn't right! Not sure when this was introduced, but it likely would
lead to a crash on disconnect.
MFC After: 1 week
of the magic string is passed in a 32-bit register, we can't use high
memory in the PAE case. This also eliminates a use of vtophys().
Tested by: Jeff Shimbo <jts767 / gmail.com>
MFC after: 1 week
now takes a device_t to be the parent of the bus that is being created.
Most SIMs have been updated with a reasonable argument, but a few exceptions
just pass NULL for now. This argument isn't used yet and the newbus
integration likely won't be ready until after 7.0-RELEASE.
can be allocated atomically
- add debug macros for printing lock initialization / teardown
- add buffers to port_info and adapter to allow each lock to have a
unique name
- destroy mutexes initialized by cxgb_offload_init
- remove recursive calls to ADAPTER_LOCK
- move callout_drain calls so that they don't occur with the lock held
- ensure that only as many qsets as are needed are initialized and
destroyed
MFC after: 3 days
Sponsored by: Chelsio Inc.
- re-factor the packet drop in sctp_output a bit more, we don't need the
trim after all, but the size calc is now corrected.
- When a assoc is in the COOKIE-ECHO/COOKIE-WAIT state and the user
closes, it should not matter if data is queued, the assoc should be
purged.
- In error leg a missing free_chunk when iph comes in NULL (should not
happen but just in case).
to nonzero you fulfill the same function as the variable 'cmp'. so you
might as well zero match and test against it later.
Reviewed by: timeout on review request
of obtaining them over and over again and pretending we could do
anything useful without them (for chosen this includes adding a
declaration and initializing it in OF_init()).
- In OF_init() if obtaining the memory or mmu handle fails just call
OF_exit() instead of panic() as the loader hasn't initialized the
console at these early stages yet and trying to print out something
causes a hang. With OF_exit() one at least has a change to get back
to the OFW boot monitor and debug the problem.
- Fix OF_call_method() on 64-bit machines (this is a merge of
sys/dev/ofw/openfirm.c rev 1.6).
- Replace OF_alloc_phys(), OF_claim_virt(), OF_map_phys() and
OF_release_phys() in the MI part of the loader with wrappers around
OF_call_method() in the sparc64. Beside the fact that they duplicate
OF_call_method() the formers should never have been in the MI part
of the loader as contrary to the OFW spec they use two-cell physical
addresses.
- Remove unused functions which are also MD dupes of OF_call_method().
- In sys/boot/sparc64/loader/main.c add __func__ to panic strings as
different functions use otherwise identical panic strings and make
some of the panic strings a tad more user-friendly instead of just
mentioning the name of the function that returned an unexpected
result.
handlers as filter/"fast" handlers so shutdown_nice() can
acquire the process lock.
- Use bus_{read,write}_8() instead of bus_space_{read,write}_8()
in order to get rid of sc_bushandle and sc_bustag in the softc.
- Remove the banal and outdated comment above sbus_filter_stub().
allowing it to be a filter/"fast" handler. Locking the interrupt
handlers with a spin lock is mainly a requirement in schizo(4)
but as we ought to register the spin lock anyway it should not
hurt to take advantage of it in psycho(4).
- Pass both a driver_filter_t and a driver_intr_t argument to
psycho_set_intr(), allowing to get rid of the FAST interrupt
flag hack.
- Don't register the over-temperature interrupt handler as filter/
"fast" handler so shutdown_nice() can acquire the process lock.
- Use bus_{read,write}_8() instead of bus_space_{read,write}_8()
in order to get rid of sc_bushandle and sc_bustag in the softc.
- Correct the debug output for adjusting the subordinate bus number.
- Remove the banal and outdated above psycho_filter_stub().
- Fix some white space nits.
a privilege is checked against the real uid rather than the effective
uid, instead decide which uid to use in priv_check_cred() based on the
privilege passed in. We use the real uid for PRIV_MAXFILES,
PRIV_MAXPROC, and PRIV_PROC_LIMIT. Remove the definition of
SUSER_RUID; there are now no flags defined for priv_check_cred().
Obtained from: TrustedBSD Project
- Move the rtc_mtx spin lock out from under #ifdef SMP as it's just
not SMP-specific.
- Add a new spin lock pcib_mtx for locking "fast" interrupt handlers
of host-to-PCI bridge drivers on sparc64.
of the register rather than in the offset describing the register.
- In gem_reset_rx() let gem_bitwait() check for the Rx reset bit
rather than the Tx reset bit to clear.
Obtained from: OpenBSD (same/similar bugs being fixed)
These CPUs use an enhanced layout of the interrupt vector dispatch
and dispatch status registers in order to allow sending IPIs to
multiple targets simultaneously. Thus support for these CPUs was
put in a newly added cheetah_ipi_selected(). This is intended to
be pointed to by cpu_ipi_selected, which now is a function pointer,
in order to avoid cpu_impl checks once booted. Alternatively it
can point to spitfire_ipi_selected(), which was renamed from
cpu_ipi_selected(). Consequently cpu_ipi_send() was also renamed
to spitfire_ipi_send() (there's no need for a cheetah equivalent
of this so far). Initialization of the cpu_ipi_selected pointer
and other requirements is done in mp_init(), which was renamed
from mp_tramp_alloc(), as cpu_mp_start() isn't called on UP
systems while cpu_ipi_selected() is. As a side-effect this allows
to make mp_tramp static to sys/sparc64/sparc64/mp_machdep.c.
For the sake of avoiding #ifdef SMP and for keeping the history in
place cheetah_ipi_selected() and spitfire_ipi_{selected,send}()
where not put into/moved to sys/sparc64/sparc64/{cheetah,spitfire}.c
- Add some CTASSERTs and KASSERTs ensuring that MAXCPU doesn't
exceed the data types we use to store the CPU bit fields or the
number of USIII and greater CPUs supported by the current
cheetah_ipi_selected() implementation (which for JBus-CPUs is
only 4; that should be fine though as according to OpenSolaris
there are no sun4u machines with more than 4 JBus-CPUs).
- In cpu_mp_start() don't enumerate and start more than MAXCPU CPUs
as we can't handle more than that.
- In cpu_mp_start() check for upa-portid vs. portid depending on
cpu_impl for consistency with nexus(4).
- In spitfire_ipi_selected() add KASSERTs ensuring that a CPU isn't
told to IPI itself as sun4u CPUs just can't do that.
- In spitfire_ipi_send() do a MEMBAR #Sync after writing the
interrupt vector data as we want to make sure the payload was
actually written before we trigger the dispatch.
- In spitfire_ipi_send() also verify IDR_BUSY when checking whether
the dispatch was successful as it has to be cleared for this to
be the case.
- Remove some redundant variables.
RTC function of a National Semiconductor PC87317/PC97317. This
consists of using the century register the same way Solaris does
for compatibility reasons. Once there is a MD power(4) we'd also
want to interface the APC (Advanced Power Control) functionality
of the same chip function with it.
- Use a macro for the device description and take advantage of
ISA_PNP_PROBE() setting the device description.
- Use the generated typedefs for the prototypes of the device
interface functions.
reasons outlined in the comment removed along with it, because the
OFW hostid has no real meaning for FreeBSD and mainly so the OFW
hostid is not confused with the FreeBSD hostid.
moving OF_set_mmfsa_traptable() (SUNW,set-trap-table with the two
arguments used here is specific to sun4v) to MD code.
- In sys/dev/ofw/openfirm.h remove prototypes for unimplemented
functions and unused Solaris compatibility macros.
to be compiled into every driver making use of it. Use a const instance
of struct gfb_font for this as the font isn't intended to be changed at
run-time and in order to accompany the font data with height and width
info.
- Add missing prototypes.
- Define global variables not used outside of this module as static.
- Replace some outdated hard-coded functions names in panic strings
with __func__.
- Fix some style(9) bugs.
data and remove the array size from the definition as f.e. the gallant
12 x 22 font data is 256 * 44 in size, exceeding the previously hard-
coded size.
- Declare the bold8x16 instance of struct gfb_font as const as it's not
intended to be changed at run-time as a whole either.
- Use __FBSDID in xboxfb.c
Tested by: rink
the passed in auth_type is unacceptable to rpcauth_buildheader-
this avoids a null pointer panic. Clean up allocations if this
happens. This also quiets a gcc 4.2 complaint about ussing mheadend
without it being initialized.
Reviewed by: alfred
through wpa_supplcant. If a sta is deauth'd (e.g. due to inactivity)
with roaming mode set to manual then a subsequent MLME assoc request
will be incorrectly handled and the station will never reauthenticate.
To fix this interpret a reason code of zero as sufficient to send an
auth request frame.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
eradication in/from userland path, countless locking fixes, etc.
- General sleep call through msleep(9) has been converted to condvar(9)
with better consistencies.
- Heavily guard every possible "slow path" entries (open(), close(),
few ioctl()s, sysctls), but once it entering "fast path" (io, interrupt
started), they are free to fly on their own.
- Rearrange locking sequences, resulting better concurrency and
serialization. Large part doesn't even need locking at all, and will be
removed in future. Less clutter, except in few places due to lock
ordering.
- Anonymous mixer object creation/deletion to simplify mixer handling
beyond typical mixer ioctls.
Submitted by: chibis (with modifications)
- Add few mix_[get|set|..] functions to avoid calling mixer_ioctl()
directly using cryptic arguments.
- Locking fixes to avoid possible deadlock with (still under Giant) USB.
- Better simplex/duplex device handling.
- Recover mmap() functionality for recording, which has been lost
since 2.2.x - 3.x (the introduction of newpcm). Full-duplex mmap still
doesn't work (due to VM/page design), but people still can mmap
both by opening each direction separately. mmaped playback is guarantee
to work either way.
- New sysctl: "hw.snd.compat_linux_mmap" to allow PROT_EXEC page
mapping, due to recent changes in linux compatibility layer which
require it. All linux applications that using sound + mmap() (mostly games)
require this to be enabled. Disabled by default.
- Other goodies.. too many, that will increase releng7 shareholder value
and make users of releng6 (and below) cry ;)
* This commit should be atomic. If anything goes wrong (not counting problem
originated from elsewhere), I will not hesitate to revert everything back
within 12 hours. This substantial changes itself not a rocket science
and the process has begun for almost 2 years, and lots of incremental
changes are already in place during that period of time.
* Some issues does occur in snd_emu10kx (note the 'x') due to various
internal locking issues and it is currently being worked on by chibis.
Tested by: chibis (Yuriy Tsibizov), joel, Alexandre Vieira,
many innocent souls...
Without bus_dma clean up and increment of number of Tx descriptors
it's hard to guarantee correct Tx operation in TSO case. The TSO
support would be enabled again when I get more feeback from re(4)
patch posted to current.
Please note that, this is currently considered as an
experimental feature so there could be some rough
edges. Consult http://wiki.freebsd.org/TMPFS for
more information.
For now, connect tmpfs to build on i386 and amd64
architectures only. Please let us know if you have
success with other platforms.
This work was developed by Julio M. Merino Vidal
for NetBSD as a SoC project; Rohit Jalan ported it
from NetBSD to FreeBSD. Howard Su and Glen Leeder
are worked on it to continue this effort.
Obtained from: NetBSD via p4
Submitted by: Howard Su (with some minor changes)
Approved by: re (kensmith)
- Remove unnecessary timestamps.
- Return CAM_RESRC_UNAVAIL for ORB shortage.
- Fix a lock problem when doorbell is used.
- Fix a potential bug for unordered execution.
'result' is still NULL and we do not need to free anything.
That allows us to gc the entire goto parts and a now unused variable.
Found with: Coverity Prevent(tm)
CID: 2519
do not continue with a NULL pointer. [1]
While here change the return of the error handling code path above.
I cannot see why we should always return 0 there. Neither does KAME
nor do we in here for the similar check in all the other functions.
Found with: Coverity Prevent(tm) [1]
CID: 2521
114 bytes of cmos ram in the PC clock chip. The big difference between
this and the Linux version is that we do not recalculate the checksums
for bytes 16..31.
We use this at work when cloning identical machines - we can copy the
bios settings as well. Reading /dev/nvram gives 114 bytes of data but
you can seek/read/write whichever bytes you like.
Yes, this is a "foot, gun, fire!" type of device.
without an mtag in ipsec4_common_input_cb.
So in case of !IPCOMP (AH,ESP) only change the m_tag_id if an mtag
was passed to ipsec4_common_input_cb.
Found with: Coverity Prevent(tm)
CID: 2523
handle, document those sprotos using an IPSEC_ASSERT so that it will
be clear that 'spi' will always be initialized when used the first time.
Found with: Coverity Prevent(tm)
CID: 2533
- In tdq_choose() only assert that a thread does not have too high a
priority (low value) for the queue we removed it from. This will catch
bugs in priority elevation. It's not a serious error for the thread
to have too low a priority as we don't change queues in this case as
an optimization.
Reported by: kris
thinking it had the whole chunk. This could cause a crash if
a large packet drop came in. Fixed by adjusting the trunc length
down to the limit.
- Large sacks with lots of segments could also have same issue. Changed
duplicate and segment handling to use proper get_m_ptr function to
pull each block from mbuf chains.
ioctl routines if we are running with !mpsafenet
- Change un-conditional Giant acquisition around ifpromisc
to occur only if we are running with !mpsafenet
With these locking bits in place, we can now remove the Giant
requirement from BPF, so drop the D_NEEDGIANT device flag.
This change removes Giant acquisitions around BPF device
handlers (read, write, ioctl etc).
MFC after: 1 month
Discussed with: rwatson
or idle priority of another process owned by the same user. This means
that privilege in rtprio(2) (and rtprio_thread(2)) is required indirectly
via p_cansched(9) or directly to set realtime/idle privilege, rather than
directly affecting target process authorization.
- Fix so VRF's will clean themselves up when no references are around.
- Allow sctp_ifa to be passed into inpcb_bind, addr_mgmt_ep_sa to bypass
normal validation checks.
- turn auto-asconf off for subset bound sockets
- Moves all logging to use KTR. This gets rid of most
of the logging #ifdef's with a few exceptions reducing
the number of config options for SCTP.
more exposure. The current state of SCTP implementation is
considered to be ready for 32-bit platforms, but still need some
work/testing on 64-bit platforms.
Approved by: re (kensmith)
Discussed with: rrs
Also, remove usb_malloc_type: it was unused.
Remove METHODS_NONE: it was unused.
Move include of opt_usb.h from usb_port.h to usb.h, since usb_port.h is
going away (there will be a usb_compat.h for out-of-tree drivers that want it).
- Depessimize userret() in kernels where KTRACE is enabled by doing an
unlocked check of the per-process queue of pending events before
acquiring any locks. Previously ktr_userret() unconditionally acquired
the global ktrace_sx lock on every return to userland for every thread,
even if ktrace wasn't enabled for the thread.
- Optimize the locking in exit() to first perform an unlocked read of
p_traceflag to see if ktrace is enabled and only acquire locks and
teardown ktrace if the test succeeds. Also, explicitly disable tracing
before draining any pending events so the pending events actually get
written out. The unlocked read is safe because proc lock is acquired
earlier after single-threading so p_traceflag can't change between then
and this check (well, it can currently due to a bug in ktrace I will fix
next, but that race existed prior to this change as well).
Reviewed by: rwatson
during execve() when turning off tracing due to executing a setuid binary
as non-root. Previously this could fail to acquire Giant and fail an
assertion if the ktrace file was on a non-MPSAFE filesystem and the
executable was on an MPSAFE filesystem.
MFC after: 3 days
Reported by: kris
bridged, previously legitimate traffic was not passed as the bridge could not
tell that it was on a different Ethernet segment.
All non-tagged traffic is treated as vlan1 as per IEEE 802.1Q-2003
than the 5288.
It is not correctly implemented in earlier silicon, and the BIOS often
lies about AHCI capability on platforms where these chips are deployed.
With this change I am able to boot FreeBSD on the ASUS Vintage AH-1
barebones system.
Approved by: sos
tunnels, and was not MPSAFE. The code can be easily restored in the
event that someone with an IPX over IP tunnel configuration can work
with me to test patches.
This removes one of five remaining consumers of NET_NEEDS_GIANT.
Approved by: re (kensmith)
timing loops being optimized away.
Once apon a time, gcc promised not to optimize away timing loops, but
gcc started optimizing away the call to a null function in the timing
loop here some time between gcc-3.3.3 and gcc-3.4.6, and it started
optimizing away the timing loop itself some time between gcc-3.4.6
and gcc-4.2.
- update to firmware version 4.1.0
- switch over to standard method for initializing cdevs (contributed by scottl@)
- break out timer_reclaim_task to be per-port
- move msix teardown into separate function
- fix bus_setup_intr for msi-x for the multi-port case so that msi-x resources
are not corrupted on unload
- handle 10/100/1000 base-T media and auto negotiation
- bind qset to cpu even for singleq case
- white space cleanups
- remove recursive PORT_LOCK
- move mtu setting to separate function
- stop and re-init port when changing mtu
- replace all direct references to m_data with calls to mtod
- handle attach failure better by not trying to de-initialize
taskqueues when they have not been allocated
- no longer default to jumbo frames
Sponsored by: Chelsio
MFC after: 3 days
- Add and document the KVM and KVM_SUPPORT options that
are needed for the ifmcstats(3) makefile
- Garbage collect unused variables
- Add missing inclusion of bsd.own.mk where needed
Approved by: kan (mentor)
Reviewed by: ru
its an INIT collision case.
- Fixed RTO calc to maintain a seperate variable to track
if a RTO calc as been done, this allows the RTO var to be
doubled during initial timeouts.
- Reduces the amount of stack used by process control.
- Use a constant for the peer chunk overhead.
- Name change to spell candidate correctly.
- Remove unused kse fields from struct proc.
- Group remaining fields and #ifdef KSE them.
- Move some kern_kse.c only prototypes out of proc and into kern_kse.
Discussed with: Julian
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.
This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.
The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html
Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.
This work was financially supported by another FreeBSD committer.
Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)
but are a seperate call that can be re-used if needed.
- 64 bit issues
o re-arrange cookie so it is better 64 bit aligned
o For wire level things we need the packed attribute.
- Add a count of exiting threads, p_exitthreads, to struct proc.
- Increment p_exithreads when we set the deadthread in thread_exit().
- When we thread_stash() a deadthread use an atomic to drop the count.
- Spin until the p_exithreads count reaches 0 in thread_wait().
- Lock the last exiting thread momentarily to be certain that it has
exited cpu_throw().
- Restructure thread_wait(). It does not need a loop as there will only
ever be one thread.
Tested by: moose@opera.com
Reported by: kris, moose@opera.com
MCLBYTES for the segment size but it used too many Tx descriptors in
TSO case.
While I'm here adjust maximum size of the sum of all segment lengths
in a given DMA mapping to 65535, the maximum size, in bytes, of a IP
packet.
o s/printf/device_printf/g
o Nuke OpenBSDism.
o Nuke NetBSD/OpenBSD specific DMA sync operations.(we don't have a way
to sync a single descriptor within a DMA map.)
o Remove recursive mutex.
o bus_dma(9) clean up.
o 40bit DMA address support.
o Add protection for Rx map load failure.
o Fix a long standing bug for watchdog timeout. [1]
o Add additional protections, missing Tx completion interrupt, losing
start Tx command, for watchdog timeout.
o Switch to taskqueue(9) API to handle interrupts.
o Use our own timer for watchdog instead of if_watchdog/if_timer
interface.
o Advertise VLAN header length/capability correctly to upper layer.
o Remove excessive kernel stack consumption in nfe_encap().
o Handle highly fragmented mbuf chains correctly.
o Enable etherenet address reprogramming with ifconfig(8).
o Add ALTQ/TSO, MSI/MSIX support.
o Increased Rx ring to 256 descriptors from 128.
o Align Tx/Rx descriptor ring on sizeof(struct nfe_desc64) boundary.
o Remove alignment restrictions on Tx/Rx buffers.
o Rewritten jumbo frame support code.
o Add support for hardware assistend VLAN tag insertion/stripping.
o Add support for Tx/Rx flow control based on patches from Peer Chen. [2]
o Add a routine that detects whether ethernet address swap routines is
required. [3]
o Add a workaround that take MAC/PHY out of power down mode.
o Add suspend/resume support.
o style(9) and code clean up.
Special thanks to Shigeaki Tagashira, the original porter of nfe(4),
who submitted lots of patches, performed uncountable number of
regression tests and maintained nfe(4) for a long time. Without his
enthusiastic help and support I could never have completed this
overhauling task.
The only weak point of nfe(4) compared to nve(4) is instability of
manual half-duplex media selection on certain hardwares(auto sensing
media type should work for all cases, though). This was a long
standing bug of nfe(4) and I still have no idea why it doesn't work
on some hardwares.
Obtained from: OpenBSD [1]
Submitted by: Peer Chen < pchen at nvidia dot com > [2], [3]
Reviewed by: Shigeaki Tagashira < shigeaki AT se DOT hiroshima-u DOT ac DOT jp >
Tested by: Shigeaki Tagashira, current
Discussed with: current
Silence from: obrien
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.
Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.
We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.
Reviewed by: csjp
Obtained from: TrustedBSD Project
- Allow LRO to be enabled / disabled at runtime
- Fix a double-free at module unload time.
- Only update timestamp in lro merge when it is present in the frame
Sponsored by: Myricom
- Use a seperate taskqueue+thread for reset tasks since iwi_ops will
block.
- Return from iwi_ops if the interface has been downed
- The firmware will fail if we are already associated
- Add myself to the copyright
o major overhaul of the way channels are handled: channels are now
fully enumerated and uniquely identify the operating characteristics;
these changes are visible to user applications which require changes
o make scanning support independent of the state machine to enable
background scanning and roaming
o move scanning support into loadable modules based on the operating
mode to enable different policies and reduce the memory footprint
on systems w/ constrained resources
o add background scanning in station mode (no support for adhoc/ibss
mode yet)
o significantly speedup sta mode scanning with a variety of techniques
o add roaming support when background scanning is supported; for now
we use a simple algorithm to trigger a roam: we threshold the rssi
and tx rate, if either drops too low we try to roam to a new ap
o add tx fragmentation support
o add first cut at 802.11n support: this code works with forthcoming
drivers but is incomplete; it's included now to establish a baseline
for other drivers to be developed and for user applications
o adjust max_linkhdr et. al. to reflect 802.11 requirements; this eliminates
prepending mbufs for traffic generated locally
o add support for Atheros protocol extensions; mainly the fast frames
encapsulation (note this can be used with any card that can tx+rx
large frames correctly)
o add sta support for ap's that beacon both WPA1+2 support
o change all data types from bsd-style to posix-style
o propagate noise floor data from drivers to net80211 and on to user apps
o correct various issues in the sta mode state machine related to handling
authentication and association failures
o enable the addition of sta mode power save support for drivers that need
net80211 support (not in this commit)
o remove old WI compatibility ioctls (wicontrol is officially dead)
o change the data structures returned for get sta info and get scan
results so future additions will not break user apps
o fixed tx rate is now maintained internally as an ieee rate and not an
index into the rate set; this needs to be extended to deal with
multi-mode operation
o add extended channel specifications to radiotap to enable 11n sniffing
Drivers:
o ath: add support for bg scanning, tx fragmentation, fast frames,
dynamic turbo (lightly tested), 11n (sniffing only and needs
new hal)
o awi: compile tested only
o ndis: lightly tested
o ipw: lightly tested
o iwi: add support for bg scanning (well tested but may have some
rough edges)
o ral, ural, rum: add suppoort for bg scanning, calibrate rssi data
o wi: lightly tested
This work is based on contributions by Atheros, kmacy, sephe, thompsa,
mlaier, kevlo, and others. Much of the scanning work was supported by
Atheros. The 11n work was supported by Marvell.
MCLBYTES for the segment size but it used too many Tx descriptors in
TSO case.
While I'm here adjust maximum size of the sum of all segment lengths
in a given DMA mapping to 65535, the maximum size, in bytes, of a IP
packet.
making the relevant files standard. This avoids duplication and
makes it easier to override/disable unwanted schemes. Since ARM
doesn't have a DEFAULTS configuration file, leave the source
files for the BSD and MBR partitioning schemes in files.arm for
now.
- Add codec id for AD1988B, along with fixing its line-in and other
issues (with proper quirks). [2]
Submitted by: [1] barbara.xxx1975@libero.it
[2] Oliver Brandmueller ob@e-Gitt.NET
MFC after: 3 days
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments
Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)
be called with an incorrect segment end value. tcp_reass() may
trim segments when they overlap with already existing ones in the
reassembly queue. Instead of saving the segment end value before
the call to tcp_reass() compute it on the fly based on the effective
segment length afterwards.
This bug was not really problematic as no information got lost and
the eventual SACK information computation was correct nontheless.
MFC after: 1 week
instead of an authentication function. There are a design reason
and a practical reason for that. First, the module belongs in
account management because it checks availability of the account
and does no authentication. Second, there are existing and potential
PAM consumers that skip PAM authentication for good or for bad.
E.g., sshd(8) just prefers internal routines for public key auth;
OTOH, cron(8) and atrun(8) do implicit authentication when running
a job on behalf of its owner, so their inability to use PAM auth
is fundamental, but they can benefit from PAM account management.
Document this change in the manpage.
Modify /etc/pam.d files accordingly, so that pam_nologin.so is listed
under the "account" function class.
Bump __FreeBSD_version (mostly for ports, as this change should be
invisible to C code outside pam_nologin.)
PR: bin/112574
Approved by: des, re
is really a memory mapped I/O address. The bug is in the GAS that
describes the address and in particular the SpaceId field. The field
should not say the address is an I/O port when it clearly is not.
With an additional check for the IA64_BUS_SPACE_IO case in the bus
access functions, and the fact that I/O ports pretty much not used
in general on ia64, make the calculation of the I/O port address a
function. This avoids inlining the work-around into every driver,
and also helps reduce overall code bloat.
builds had been succeeding if run serially but could fail if run in
parallel because the bge module build might start before ofw_bus_if.h
got created as part of the mainline kernel build.
Diagnosis and patch by: ru
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.
Reviewed by: jeff
Approved by: jeff (mentor)
in tcp_outout(). This is currently not strictly necessary but paves
the way to simplify the entire SYN options handling quite a bit.
Clarify comment. No change in effective behavour with this commit.
RFC1323 requires the window field in a SYN (i.e., a <SYN> or
<SYN,ACK>) segment itself never be scaled.
and simplify handling of the send/receive window scaling. No
change in effective behavour.
RFC1323 requires the window field in a SYN (i.e., a <SYN> or
<SYN,ACK>) segment itself never be scaled.
Noticed by: yar
- Unsafeness on ruadd() in thread_exit()
- Unatomicity of thread_exiit() in the exit1() operations
This patch addresses these problems allocating p_fd as part of the
process and modifying the way it is accessed.
A small chunk of this patch, resolves a race about p_state in kern_wait(),
since we have to be sure about the zombif-ing process.
Submitted by: jeff
Approved by: jeff (mentor)
a timer issues a shutdown and a simultaneous close on the socket
happens. This race condition is inherent in the current socket/
inpcb life cycle system but can be handled well.
Reported by: kris
Tested by: kris (on 8-core machine)
- Reorder send failed to be in correct order.
- Fixed calulation of init-ack to be right off
mbuf lengths instead of the precalculated value. This
will fix one 64 bit platform issue.
error doing so. It seems an increasing number of phones have this
quirk, and we're not keeping up. There appears to be nothing bad that
happens for non-quirked phones.
Minor cleanups:
o prefer device_printf over printf
o kill devinfo stuff
o minor other preening.
need to do it at all anymore. Remove it from here. Expand
USB_ATTACH_SETUP inline now that it is one line and we're moving away
from the compat macros. Remove some bzero calls that turn out not be
be necessary.
what we print, don't print it anymore. And don't compute it anymore.
And don't malloc/free memory for it anymore. While I'm here, prefer
device_printf where appropriate.
the value of ph_nhooks to zero, not the address. This removes
extranious calls to pfil_run_hooks (and an rw lock) from the
network stack's critical path when no pfil hooks are active.
Reviewed by: csjp
Sponsored by: Myricom Inc.
implementing some of them using existing ones.
- Allow to compile ZFS on all archs and use atomic operations surrounded
by global mutex on archs we don't have or can't have all atomic
operations needed by ZFS.
algoritm would not go through the proper initialization.
- The initialization was incorrect as well, causing problems in
sat networks with > 1sec RTT
- Get rid of magic numbers in RTT calculations.
A change to dconschat(8) will follow so that it can bomb
this address over FireWire to reset a wedged system.
Though this method is just a hack and far from perfection,
it should be useful if you don't want to go machine room
just to reset or to power-cycle a machine without
remote-managed power supply. And much better than doing:
# fwcontrol -m target-eui64
# dd if=/dev/zero of=/dev/fwmem0.2 bs=1m
Now, it's safe to call the fwohci interrupt(polling) routine while ddb/gdb
is active. After this change, a dcons connnection over FireWire can survive
bus resets even in kernel debugger.
This means that it is not too late to plug a FireWire cable after a panic
to investigate the problem.
Actually there is a small window(between a jump to kernel from loader and
initialization of dcons_crom) in which no one can take care of a bus reset.
Except that window, firewire console should keep working
from loader to reboot even with a panic and a bus reset.
(as far as you enable LOADER_FIREWIRE_SUPPORT)
embedded storage in struct ucred. This allows audit state to be cached
with the thread, avoiding locking operations with each system call, and
makes it available in asynchronous execution contexts, such as deep in
the network stack or VFS.
Reviewed by: csjp
Approved by: re (kensmith)
Obtained from: TrustedBSD Project
cache size limit but this bucket row is empty. Normally we want to
recycle the oldest entry in the bucket row. If there isn't any the
TAILQ_REMOVE leads to a panic by trying to remove a non-existing
element. Fix this by just returning NULL and failing the insert.
This is not a problem as the TCP hostache is only advisory.
Submitted by: jhb
grab sched_lock. This would serialize calls to pmap_switch from
cpu_switch(). With the introduction of thread_lock, this is not
possible anymore, because thread_lock is not a single lock. It
varies. Secondly and most importantly, it's not needed at all. The
only requirement for pmap_switch() is that it's not preempted
while in the middle of updating the CPU and PCPU. In other words,
it's a critical region. No locking required.
This is enabled by default. It should be disabled for
those who are uneasy with peeking/poking from FireWire.
Please note sbp(4) and dcons(4) over FireWire need
this feature.
- Moved BCM5706S/5708S SerDes support to brgphy (since they are not technically
TBI interfaces)
- Added 2.5G support for BCM5708S
Comments:
Since this driver is shared with bge I tested several available controllers
supported by bge and all worked as expected, however the list was not
exhaustive. Need wider testing.
MFC after: 4 weeks
In Rx path it allocates a new mbuf with m_getcl(9) so the length of
the mbuf is MCLBYTES which is greater than a segment size specified by
the dma tag. This segment size mismatch caused a voluntary panic.
Fix the panic by settting the mbuf length to TULIP_DATA_PER_DESC.
Reported by: Arne H Juul <arnej AT yahoo-inc DOT com>
Tested by: Arne H Juul <arnej AT yahoo-inc DOT com>
- lock its own locks and drop Giant.
- create its own taskqueue thread.
- split interrupt routine
- use interrupt filter as a fast interrupt.
- run watchdog timer in taskqueue so that it should be
serialized with the bottom half.
- add extra sanity check for transaction labels.
disable ad-hoc workaround for unknown tlabels.
- add sleep/wakeup synchronization primitives
- don't reset OHCI in fwohci_stop()
when coping out association data.
- Fixes a small bug that prevented the SCTP_UNORDERED indication
from going up to the app on a recv in the sinfo_flags field.
seems not enough to verify its consistencies.
- Define AC97_MIXER_SIZE as SOUND_MIXER_NRDEVICES (25), since we
don't need more than that. Stop doing wild and random guess about
its size since we're stricly bound to it.
application specific SEND_OP_COND (CMD55 + ACMD41), go ahead and allow
100 tries. This gives a timeout of a second rather than the ~100ms
the old style produces.
I've had one old 16MB SD card which needs the extra time. I've now
had reports from the field that other cards need this too.
Originally done at BSDcan 2007 while waiting to give my embedding
madness minitalk.
Add the machine-specific definitions for configuring the new physical
memory allocator.
Set the size of phys_avail[] and dump_avail[] using one of these
definitions.
which support a 2.5Gbps mode over fiber using next page extensions during
autonegotiation. Typically only found in blade systems which also include
a Broadcom 2.5Gbps capable switch.
MFC after: 2 weeks
oldthread should point at before we return.
- When cpu_switch() is called the td_lock pointer in the old thread may
point at the blocked lock. This prevents other processors from
switching into this thread while we're still switching out. Wait
until we're done deactivating the vmspace before we release the
thread by assigning to td_lock.
- Before we can activate the new vmspace we must make sure that the new
thread is not assigned to the blocked lock. It may be in the process
of switching out on another cpu. Spin until the new thread is
available.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Add a new parameter to cpu_switch() that is used to release the lock on
the outgoing thread and properly acquire the lock on the incoming
thread. This parameter is not required for schedulers that don't do
per-cpu locking and architectures which do not support it may continue
to use the 4BSD scheduler. This feature is presently not supported
on ia64
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- There is no globally visible scheduler lock any longer. For now the
watchdog can only check Giant. This model of checking particular locks
is flawed and should be revisited. Other metrics should be considered.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Use sched_throw() rather than replicating the same cpu_throw() code for
each architecture. This also allows the scheduler to use any locking it
may want to.
- Use the thread_lock() rather than sched_lock when preempting.
- The scheduler lock is not required to synchronize release_aps.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Add new spinlocks to support thread_lock() and adjust ordering.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Attempt to return the ttyinfo() selection algorithm to something sane
as it has been broken and disabled for some time. Adapt this algorithm
in such a way that it does not conflict with per-cpu scheduler locking.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Use a global umtx spinlock to protect the sleep queues now that there
is no global scheduler lock.
- Use thread_lock() to protect thread state.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
- Use a global kse spinlock to protect upcall and thread assignment. The
per-process spinlock can not be used because this lock must be acquired
via mi_switch() where we already hold a thread lock. The kse spinlock
is a leaf lock ordered after the process and thread spinlocks.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
- Replace the tail-end of fork_exit() with a scheduler specific routine
which can do the appropriate lock manipulations.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Protect the cp_time tick counts with atomics instead of a global lock.
There will only be one atomic per tick and this allows all processors
to execute softclock concurrently.
- In softclock, protect access to rusage and td_*tick data with the
thread_lock(), expanding the scope of the thread lock over the whole
function.
- Do some creative re-arranging in hardclock() to avoid excess locking.
- Protect the p_timer fields with the per-process spinlock.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
- Move some common code into thread_suspend_switch() to handle the
mechanics of suspending a thread. The locking here is incredibly
convoluted and should be simplified.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Add a per-turnstile spinlock to solve potential priority propagation
deadlocks that are possible with thread_lock().
- The turnstile lock order is defined as the exact opposite of the
lock order used with the sleep locks they represent. This allows us
to walk in reverse order in priority_propagate and this is the only
place we wish to multiply acquire turnstile locks.
- Use the turnstile_chain lock to protect assigning mutexes to turnstiles.
- Change the turnstile interface to pass back turnstile pointers to the
consumers. This allows us to reduce some locking and makes it easier
to cancel turnstile assignment while the turnstile chain lock is held.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Adapt sleepqueues to the new thread_lock() mechanism.
- Delay assigning the sleep queue spinlock as the thread lock until after
we've checked for signals. It is illegal for a thread to return in
mi_switch() with any lock assigned to td_lock other than the scheduler
locks.
- Change sleepq_catch_signals() to do the switch if necessary to simplify
the callers.
- Simplify timeout handling now that locking a sleeping thread has the
side-effect of locking the sleepqueue. Some previous races are no
longer possible.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Move all scheduler locking into the schedulers utilizing a technique
similar to solaris's container locking.
- A per-process spinlock is now used to protect the queue of threads,
thread count, suspension count, p_sflags, and other process
related scheduling fields.
- The new thread lock is actually a pointer to a spinlock for the
container that the thread is currently owned by. The container may
be a turnstile, sleepqueue, or run queue.
- thread_lock() is now used to protect access to thread related scheduling
fields. thread_unlock() unlocks the lock and thread_set_lock()
implements the transition from one lock to another.
- A new "blocked_lock" is used in cases where it is not safe to hold the
actual thread's lock yet we must prevent access to the thread.
- sched_throw() and sched_fork_exit() are introduced to allow the
schedulers to fix-up locking at these points.
- Add some minor infrastructure for optionally exporting scheduler
statistics that were invaluable in solving performance problems with
this patch. Generally these statistics allow you to differentiate
between different causes of context switches.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).
Reviewed by: alc, bde
Approved by: jeff (mentor)
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.
Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.
Reviewed by: alc, bde
Approved by: jeff (mentor)
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.
Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.
In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported. In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.
- In the ioctl path let command get queued up and return
when complete _without_ blocking the driving waiting for
the response. This way the driver doesn't "lock up" for
~30s during a flash command. Submitted by scottl.
- Add a guard so that if a DCMD of 0 is sent down the ioctl
path don't send it to the controller. Return with a
status of OK. This is a little strange since MegaCli
doesn't seem to like something and will issue some DCMD
of 0. This doesn't happen under Linux. So the emulation
needs to be improved but I'm not sure what. Another strange
thing is that when a DCMD of 0 gets issued under i386 the
controller returns OK but in amd64 the context is messed
up.
- Add a guard so the context has to be with-in the legal
limit so we get a reasonable error assertion versus random
panic.
It's going to be a challenge to figure out why MegaCli is not totally
happy and then sends some bogus commands. This means that flashing
firmware via the Linux tool won't work since it generates a DCMD of
0 when it should be opening the firmware for a flash update. Without
this problem flashing works fine. This means there is no publicly
available tool to upgrade the RAID firmware under FreeBSD right now.
I plan to MFC all of the mfi changes to 6.X shortly. This might not
include the SCSI pass-through changes.
Submitted by: scottl
Reviewed by: scottl
MFC after: 3 days
1. Pass locking flags to VFS_ROOT().
2. Check v_mountedhere while the vnode is locked.
3. Always return locked vnode on success.
Change 1 fixes problem reported by Stephen M. Rumble - after
zfs_vfsops.c,1.9 change, zfs_root() no longer locks the vnode
unconditionally and traverse() didn't pass right lock type to
VFS_ROOT(). The result was that kernel paniced when .zfs/ directory
was accessed via NFS.
playtone() so that it uses times of 1/100ths of a second.
Now 'time echo T60ABC >/dev/speaker' takes ~3 seconds.
MFC after: 2 weeks
Problem noted by: dwmalone
- removed unused structure members
- fixed a minor bug that the ECN code point may not be restored correctly
Approved by: ume (mentor)
MFC after: 1 week
GEMs is unable to discriminate UDP from TCP packets such that
it can generate 0x0000 checksum value for the UDP datagram. So the
UDP checksum offload was disabled by default. You can enable it
by setting link0 flag with ifconfig(8).
o bus_dma(9) clean up. It now correctly set number of required DMA
segments/size and removed incorrect use of BUS_DMA_ALLOCNOW flag
in static allocations done via bus_dmamem_alloc(9).
o Implemented ALTQ(9) support.
o Implemented Tx side bus_dmamap_load_mbuf_sg(9) which can remove
several book keeping chores orginated from call-back mechanism.
Therefore gem_txdma_callback() was removed and its functionality
was reimplemented in gem_load_txmbuf().
o Don't set GEM_TD_START_OF_PACKET flag until all remaining mbuf
chains are set. I think it was a long standing bug and it caused
fluctuating interrupts/CPU usage patterns while netperf test
is in progress. Previously it seems that we race with the device.
Because I don't have a documentation for GEM I'm not sure this is
correct but almost all other documentations I have stated this
implications on setting SOP mark in descriptor ring(e.g. hme(4)).
o Borrowed gem_defrag() from ath(4) which is supposed to be much
faster than m_defrag(9) since it's not need to defrag all
mbuf chains.
o gem_load_txmbuf() was changed to allow passed mbuf chains to free.
Caller of gem_load_txmbuf() correctly handles freed mbuf chains.
o In gem_start_locked(), added checks for availability of Tx
descriptors before trying to load DMA maps which could save CPU
cycles when number of available descriptors are low. Also, simplyfy
IFF_DRV_OACTIVE detection logic.
o Removed hard-coded function names in CTR macros and replaced it
with __func__.
o Moved statistics counter register access to gem_tick() to reduce
number of PCI bus accesses. There is no reason to update statistics
counters in interrupt handler.
o Removed unnecessary call of gem_start_locked() in gem_ioctl().
Reviewed by: grehan (initial version), marius (with improvements and suggestions)
Tested by: grehan (ppc), marius(sparc64)
own entry in the softc. This should allow more of cbb_pci_intr() to
migrate to a new cbb_pci_filt() so that we don't have to run cbb's ISR
in almost every case we get an interrupt. We can't just move
cbb_pci_intr into cbb_pci_filt because it does things that aren't safe
to do from a fast interrupt handler, err I mean from a filter. This is
an important first step.
# I wonder if I need to make cardok volatile or not.
mpt.h:
Add support for reading extended configuration pages.
mpt_cam.c:
Do a top level topology scan on the SAS controller. If any SATA
device are discovered in this scan, send a passthrough FIS to set
the write cache. This is controllable through the following
tunable at boot:
hw.mpt.enable_sata_wc:
-1 = Do not configure, use the controller default
0 = Disable the write cache
1 = Enable the write cache
The default is -1. This tunable is just a hack and may be
deprecated in the future.
Turning on the write cache alleviates the write performance problems with
SATA that many people have observed. It is not recommend for those who
value data reliability! I cannot stress this strongly enough. However,
it is useful in certain circumstances, and it brings the performence in line
with what a generic SATA controller running under the FreeBSD ATA driver
provides (and the ATA driver has had the WC enabled by default for years).
Things can get ugly without it due to uninitialized class. RELENG_6 need
a simmilar, but different treatment as well.
err.. perhaps we should teach devclass_get_maxunit() to return -1 ?
MFC after: 1 day
o If we don't have a filter, also check to make sure the card is there before
calling the scheduled ISR. This is necessary to help old drivers whose
ISRs can't cope with being called with the hardware missing, which sadly
still exist in the tree. This is the main reason why we have an extra
layer of indirection for cardbus interrupts.
o If the card is no longer present, mark the interrupt as 'handled' rather
than 'stray' because this accounts for why the interrupt happened. Stray
isn't all bad, since there are other filters that would claim it...
o Fix some comments
+ Add comment about why we check for CARD_OK and touch the hardware in both
the filter and ISR.
+ add a note about why we don't care about Giant
+ also note that giant can't be taken out in a filter...
+ Some minor formatting nits on very long comments.
While in the suspend path, this means the idle thread will just return
immediately rather than trying to enter C1-n. This helps in the case where
the chipset is powered down before the rest of the system and reads from
the cpu sleep registers begin returning immediately, causing the logic that
catches bad C2/C3 behavior to kick in. Observed on my Panasonic Y4.
MFC after: 3 days
1.50 to help out with the GCC 2 to GCC 3 transition and it became
obsolete when C flags compatible with GCC 3.x became the default.
With GCC 4 in the tree this variable (i.e. GCC3) is beyond bogus
because it causes confusion when looking for the newly introduced
WITH_GCC3 option that helps the GCC 3 -> GCC 4 bump.
(j/i) was being used and it was being incremented, not decremented as before.
Factor out this code into a common function and call it from both the common
and per-CPU case.
MFC after: 1 day
The global lock is a memory region shared with the BIOS and thus
has some strange behavior like the fact that the sleep is 1 ms max.
We use standard mutexes to synchronize with the SCI so acquiring
the global lock after locking the mutex resulted in a witness
warning.
To deal with this for now, acquire the global lock before all other
locks, similar to Giant. This should fix the witness "sleeping
with mutex held" issue on boot that occurred after the last ACPI-CA
import. In the future, we hope to move to the new mutex interface
in ACPI-CA instead of the pseudo-semaphore version we have now.
Reviewed by: jkim
default_vrf_id
- Missing lock/unlock of inp added as well in the v6 side.
- IFN hash table moves to sctppcbinfo since indexes are
unique across systems (including different VRFs) this makes it easier
to do ifn lookups.
argument from being file descriptor index into the pointer to struct file:
part 2. Convert calls missed in the first big commit.
Noted by: rwatson
Pointy hat to: kib
remove associated comments.
Slip audit_file_rotate_wait assignment in audit_rotate_vnode() before
the drop of the global audit mutex.
Obtained from: TrustedBSD Project
concept that is NOT well thought out for a multi-homed transport
protocol. So the useless table-id entries passed around need to
be removed.
- Add a event timer for the zero copy api.
- Fix a bug in sctp_timer.c when searching for an alternate
with the largest ssthresh (the compare was wrong).
td_ru. This removes the requirement for per-process synchronization in
statclock() and mi_switch(). This was previously supported by
sched_lock which is going away. All modifications to rusage are now
done in the context of the owning thread. reads proceed without locks.
- Aggregate exiting threads rusage in thread_exit() such that the exiting
thread's rusage is not lost.
- Provide a new routine, rufetch() to fetch an aggregate of all rusage
structures from all threads in a process. This routine must be used
in any place requiring a rusage from a process prior to it's exit. The
exited process's rusage is still available via p_ru.
- Aggregate tick statistics only on demand via rufetch() or when a thread
exits. Tick statistics are kept in the thread and protected by sched_lock
until it exits.
Initial patch by: attilio
Reviewed by: attilio, bde (some objections), arch (mostly silent)
this change both simplifies the code and plugs a hole where the devise
was reset without keeping the management controller at bay :) Second,
the 82571 LAA reset problem was incomplete, this addition is necessary.
Just one of those days :)
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.
Requested by: alc
Approved by: jeff (mentor)
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.
Discussed with: jhb
- Rework the entire pcm_channel structure:
* Remove rarely used link placeholder, instead, make each pcm_channel
as head/link of each own/each other. Unlock - Lock sequence due to
sleep malloc has been reduced.
* Implement "busy" queue which will contain list of busy/active
channels. This greatly reduce locking contention for example while
servicing interrupt for hardware with many channels or when virtual
channels reach its 256 peak channels.
- So I heard you like v chan ... O RLY?
Welcome to Virtual **Record** Channels (vrec, rec vchans, vchans for
recording, Rec-Chan, you decide), the ultimate solutions for your
nagging O_RDWR full-duplex wannabe (note: flash plugins) monopolizing
single record channel causing EBUSY. Vrec works exactly like Vchans
(or, should I rename it to "Vplay" :) , except that it operates on the
opposite direction (recording). Up to 256 vrecs (like vchans) are
possible.
Notes:
* Relocate dev.pcm.%d.{vchans,vchanformat,vchanrate} to each of its
respective node/direction:
dev.pcm.%d.play.* for "play" (cdev = dsp%d.vp%d)
dev.pcm.%d.rec.* for "record" (cdev = dsp%d.vr%d)
* Don't expect that it will magically give you ability to split
"recording source" (eg: 1 channel for cdrom, 1 channel for mic,
etc). Just admit that you only have a *single* recording source /
channel. Please bug your hardware vendor instead :)
- Bump maxautovchans from 4 to 16. For a full-fledged multimedia
desktop/workstation with too many soundservers installed (esound,
artsd, jackd, pulse/polypaudio, ding-dong pling plong mudkip fuh fuh,
etc), 4 seems inadequate. There will be no memory penalty here, since
virtual channels are allocate only by demand.
- Nuke/Rework the entire statically created cdev entries. Everything is
clonable through snd own clone manager which designed to withstand many
kind of abusive devfs droids such as:
* while : ; do /bin/test -e /dev/dsp ; done
* jot 16777216 0 | while read x ; do ls /dev/dsp0.$x ; done
* hundreds (could be thousands) concurrent threads/process opening
"/dev/dsp" (previously, this might result EBUSY even with just
3 contesting threads/procs).
o Reusable clone objects (instead of creating new one like there's no
tomorrow) after certain expiration deadline. The clone allocator will
decide whether to reuse, share, or creating new clone.
o Automatic garbage collector.
- Dynamic unit magic allocator. Maximum attached soundcards can be tuned
using tunable "hw.snd.maxunit" (Default to 512). Minimum is 16, and
maximum is 2048.
- ..other fixes, mostly related to concurrency issues.
joel@ will do the manpage updates on sound(4).
Have fun.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.
Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)
properly observe the SB_NOINTR flag in sblock. This restores the
required behavior that lock acquisition be interruptible on the socket
buffer I/O serialization lock to allow threads waiting for I/O to be
signaled even if they aren't the thread currently holding the I/O lock.
With this change, the sblock regression test is again passed.
Reported by: alfred
sx(9) handiwork: attilio
These functions are intended to do the same actions of sx_xlock() and
sx_slock() but with the difference to perform an interruptible sleep, so
that sleep can be interrupted by external events.
In order to support these new featueres, some code renstruction is needed,
but external API won't be affected at all.
Note: use "void" cast for "int" returning functions in order to avoid tools
like Coverity prevents to whine.
Requested by: rwatson
Tested by: rwatson
Reviewed by: jhb
Approved by: jeff (mentor)
- Coverity Prevent(tm) CID 1906 a bogus use of bzero where unneeded.
- ICH8 systems autoneg to 100 rather than 1000, this can also be
seen in 82573, the logic was backwards.
- On new 82575 quadports half duplex tx speed is slow... this was due
to overwriting TCTL reg rather than adding bits.
- Fixed a LOR in handling a cookie. Turns out create lock is applied.
And if we abort processing, this causes LOR. Changed to force the
timer to clean up, that way create lock is released.
is expanded, size of expansion was not taken int consideration.
- Fix so vtag hash is 1 bigger so that it modulo's out
correctly, avoids a panic when restart with right modulo happens.
- do not dereference stcb when control->do_not_ref_stcb is set
- Fix up packet logging to not often use a lock and also to
add to options.
- Fix some logging option duplication in the sctputil.h
OpenBSD's if_ral.c.
I didn't make the LINKSYS4 -> CISCOLINKSYS name change, nor did I
include the RALINK RT2573 that's supported by the rum(4) driver. I
didn't merge any code changes either.
"0" cannot be a correct value since when the function is entered at least
one shared holder must be present and since we want the last one "1" is
the correct value.
Note that lock_profiling for sx locks is far from being perfect.
Expect further fixes for that.
Approved by: jeff (mentor)
patch:
- Do the correct test for ldt allocation
- Drop dt_lock just before to call kmem_free (since it acquires blocking
locks inside)
- Solve a deadlock with smp_rendezvous() where other CPU will wait
undefinitively for dt_lock acquisition.
- Add dt_lock in the WITNESS list of spinlocks
While applying these modifies, change the requirement for user_ldt_free()
making that returning without dt_lock held.
Tested by: marcus, tegge
Reviewed by: tegge
Approved by: jeff (mentor)
race seen on smp laptops when suspending where the rx task can be
entered after the interface is detach'd.
NB: use of taskqueue_drain while holding the softc mutex is problematic
Submitted by: ambrisko
MFC after: 1 month
It is disabled by default. You need to put
LOADER_FIREWIRE_SUPPORT=yes in /etc/make.conf
and rebuild loader to enable it.
(cd /sys/boot/i386 && make clean && make && make install)
You can find a short introduction of dcons at
http://wiki.freebsd.org/DebugWithDcons
as to the type of the command argument: int -> u_long.
These types have different widths in the 64-bit world.
Add a note to UPDATING because the change breaks KBI
on 64-bit platforms.
Discussed on: -net, -current
Reviewed by: bms, ru
hold a wq lock for the iterator. Panda uses a
silly recursive lock they hold through the timer.
- Add poor mans wireshark compile option..
- Allocate and start using SCTP_M_XXX for all SCTP_MALLOC() calls.
- sysctl now will get back the refcnt for viewing by onlookers.
Reviewed by: gnn
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.
Reviewed by: scottl
arm and powerpc have 64KB as the maximum argument size, so one
cannot run "make delete-old" on arm or powerpc anymore. Stop
special-casing powerpc and give it 256KB of arguments like all
other platforms, but keep arm on 64KB for now. There may be a
purpose to it that doesn't exist for powerpc.
produced incorrect behaviour with the KDB_UNATTENDED option) and call
panic in both the KDB and non-KDB cases. This change is consistent
with rwatson's current kdb/ddb work.
by the subsequent mix_setdevs() and friends.
- Minor style(9) declaration arrangement nit.
Requested by: joeld
Submitted by: pluknet <pluknet@gmail.com>
Note on dcons:
To enable dcons in kernel, put the following lines in /boot/loader.conf.
You may also want to enable dcons in /etc/ttys.
boot_multicons="YES"
#Force dcons to be the high-level console if a firewire bus presents.
#hw.firewire.dcons_crom.force_console=1
FireWire/dcons support in loader will come shortly.
(i386/amd64 only)
- bounded cookie-life to 1 second minimum in socket option set.
- Delayed_ack_time becomes delayed_ack per new socket api document.
- Improve port number selection, we now use low/high bounds and
no chance of a endless loop. Only one call to random per bind
as well.
- fixes so set_peer_primary pre-screens addresses to be
valid to this host.
- maxseg did not allow setting on an assoc basis. We needed
to thus track and use an association value instead of a inp value.
- Fixed ep get of HB status to report back properly.
- use settings flag to tell if assoc level hb is on off not
the timer.. since the timer may still run if unconf address
are present.
- check for crazy ENABLE/DISABLE conditions.
- set and get of pmtud (fixed path mtu) not always taking into account ovh.
- Getting PMTU info on stcb only needs to return PMTUD_ENABLED if
any net is doing PMTU discovery.
- Panic or warning fixed to not do so when a valid ip frag is
taking place.
- sndrcvinfo appearing in both inp and stcb was full size, instead
of the non-pad version. This saves about 92 bytes from each struct
by carefully converting to use the smaller version.
- one-2-one model get(maxseg) would always get ep value, never the
tcb's value.
- The delayed ack time could be under a tick, this fixes so
it bounds it to at least 1 tick for platforms whos tick
is more than a ms.
- Fragment interleave level set to wrong default value.
- Fragment interleave could not set level 0.
- Defered stream reset was broken due to a guard check and ntohl issue.
- Found two lock order reversals and fixed.
- Tighten up address checking, if the user gives an address the sa_len
had better be set properly.
- Get asoc by assoc-id would return a locked tcb when it was asked
not to if the tcb was in the restart hash.
- sysctl to dig down and get more association details
Reviewed by: gnn
in tcp_input():
o tighten the checks on allowed TCP flags to be RFC793 and
tcp-secure conform
o log check failures to syslog at LOG_DEBUG level
o rearrange the code flow to be easier to follow
o add KASSERTs to validate assumptions of the code flow
Add sysctl net.inet.tcp.syncache.rst_on_sock_fail defaulting to enable
that controls the behavior on socket creation failure for a otherwise
successful 3-way handshake. The socket creation can fail due to global
memory shortage, listen queue limits and file descriptor limits. The
sysctl allows to chose between two options to deal with this. One is
to send a reset to the other endpoint to notify it about the failure
(default). The other one is to ignore and treat the failure as a
transient error and have the other endpoint retransmit for another try.
Reviewed by: rwatson (in general)
based on individual fields beeing set. This doesn't work for setattr replay,
because va_type is set there, so we add AT_TYPE flag to va_mask, which won't
be accepted by zfs_setattr().
Reported by: kris
- Double the number of descriptors that a single call to send can use
- Quadruple the number of descriptors that can be reclaimed per pass
- only run reclaim twice per second
- increase coalesce timer from 3.5us to 5us
fix printf warning on 64-bit platforms
Neither me nor Ariff have access to any of this hardware, so all tests
have been made by Konstantin and Artem. Commit message mostly written
by Konstantin.
envy24:
- Add test code to support rear line-in input on 'Terratec DMX 6fire'
audio card. This code is also intended to be used in the future for
support of cards, that have I2C-to-GPIO expanders wired between the
control line of the audio codec and the Envy24, however such cards
are too complex and i can't add that support without hardware sample
of such board, i've already tried and failed.
envy24ht:
- Add support for 'AudioTrak Prodigy HD2'.
- Add support for 'AudioTrak Prodigy 7.1 XT'.
- Add support for 'ESI Juli@' (Works ok, DAC volume is hard-coded for
the time being, so 'mixer vol ...' doesn't work, only 'mixer pcm
...' works). [1]
- Fix bug in the init data for M-Audio Revolution 5.1, that
results in distorted sound.
- Add software volume control (now 'mixer pcm' works, thanks to Ariff).
- Add support for more samples rates - 176.4kHz and 192kHz.
- Fix problem with the 192kHz samples rate playback when 24.576MHz
crystal is used on the board instead of 49.152MHz crystal.
spicds:
- Add support for Asahi Kasei flagship DAC - AK4396 (used in AudioTrak
Prodigy HD2).
Submitted by: Konstantin Dimitrov <kosio.dimitrov@gmail.com>
Tested by: Artem Antonov [1]
Reviewed by: ariff
debugger is quite capable of handling Giant-free execution at this
point. Several other similar comments remain in trap.c on both i386
and amd64 awaiting analysis.
speculative loads. This at least makes control speculative loads
work. In the future we should analyze which faults/exceptions
we want to handle rather than defer to avoid having to call the
recovery code when it's not strictly necessary.
actually works. mbp_count() turns out only to be used in debugging code
in if_patm_intr.c, so this bug did not affect much in practice.
Found with: Coverity Prevent(tm)
CID: 1943
existing UMA statistics for pipes, and allows us to get rid of both the
per-pipe dtor and two atomic operations per pipe required to maintain
the counter.
at the credential to be used by the connection. However, the pointer's
value was ignored when actually setting hcp->nc_owner.
(1) Do set nc_owner to the owner pointer value so that the credential is
not discarded after being carefully configured.
(2) In the case where we create a new credential with modified uid, copy
the existing credential to initialize non-uid fields to existing
values, which will lead to a fully initialized MAC label, groups, etc.
Found with: Coverity Prevent(tm)
CID: 2226
the card, panic explicitly if EN_DEBUG is enabled. In the (default)
case of !EN_DEBUG, the driver resets the card. Probably this case
shouldn't exist at all.
debug is turned off, initialize locks with NOWITNESS flag.
At some point I'll get back to them, we would probably need BLESSING
functionality, which is currently turned off by default.
SD Simplified specification, as well as other SD and SDIO
implemenations I've examined, suggest this disclaimer may be required.
It is unclear to me exactly what the license would be for, or why it
might be required. Err on the side of caution and include this
disclaimer so anybody deploying this code can judge for themselves. I
have no further unformation about the details.
parent vnode and relock it after locking child vnode. The problem was that
we always relock it exclusively, even when it was share-locked.
Discussed with: jeff
clusters. This helps quite a bit on my low end machines (improves
performance by about 300Kpps when being blasted by a hardware
packet generator).
- Include one extended f/w counter forgotten in earlier commit
Sponsored by: Myricom Inc.
- upgrade to reflect state of 1.0.0.86
- move from firmware rev 3.2 to 4.0.0
- import driver bits for offload functionality
- remove binary distribution clause from top level files as it
runs counter to the intent of purely supporting the hardware
MFC after: 3 days
back in a simulated resume instead of entering the requested suspend state.
This helps in testing drivers separately from the acpi suspend code. To
test your drivers, set debug.acpi.suspend_bounce=1 and then run
acpiconf -s3 (or 4).
MFC after: 1 day
compiler invocation. This is just to help get over the hump of people
tracking down bugs that may cross the GCC 4.2 upgrade.
It is envisioned that this option goes away after a suitable amount
of time.
release number up to the max. This should eliminate the need to
tweak the default imageid define for later releases that are found
on the Intel web site.
MFC after: 1 month
quirky code: uarts, led, cf/ide, ixpqmgr, npe are now specified with hints.
May want to put some of these devices back in the code and just use hints
to override/specify configuration.
MFC after: 1 month
should setup is the class. This corrects an issue where enabling
uart1 on the avila board caused uart0 to stop working during boot
(no msgs generated by rc scripts were displayed).
Reviewed by: imp
MFC after: 3 weeks
initialization is complete. This fixes some root-on-ZFS
configurations.
Reported by: Bruno Damour <freebsd.ruomad@free.fr>
Tested by: Bruno Damour <freebsd.ruomad@free.fr>
o add the hex output of the th_flags field to the example log
line in comments
o simplify the log line length calculation and make it less
evil
o correct the test for the length panic; the line isn't on
the stack but malloc'ed
This was just wasteful when this was always called before lock_init()
(which overwrote both fields each time), but when
lock_profile_object_init() was moved into lock_init() the clearing of
lo_flags proved fatal (all locks became spin locks to _sleep(), etc.)
Reported by: kris
device's, not the bridge's, softc to be used to check the
PCIB_DISABLE_MSI flag. This resulted in randomly allowing
or denying MSI interrupts based on whatever value the driver
happened to store at sizeof(device_t) bytes into its softc.
I noticed this when I stopped getting MSI interrupts
after slighly re-arranging mxge's softc yesterday.
It's hard to measure performance improvement on my test machine, but the
change won't degrade performance for sure. I can measure slight improvement
for debugging kernel and it can also be a win for machines where atomic
operation is more expensive.
Reviewed by: kib
inline it when needed already, and the symbol is also required outside of
audit.c. This silences a new gcc warning on the topic of using __inline__
instead of __inline.
MFC after: 3 days
Implement all futex atomic operations in assembler to not depend on the
fuword() that does not allow to distinguish between -1 and failure return.
Correctly return 0 from atomic operations on success.
In collaboration with: rdivacky
Tested by: Scot Hetzel <swhetzel gmail com>, Milos Vyletel <mvyletel mzm cz>
Sponsored by: Google SoC 2007
- In rt_check() remove the senderr() macro and the "bad" label. They
used to simplify code, but now aren't.
- Remove extra RT_LOCK_ASSERT() in rt_setgate(). The RT_REMREF macro
does this.
- In rtfree() convert panics to KASSERTs.
- Strict the routing API: rtfree() should be called only in a case
when we are completely sure we've got the last reference on the
rtentry. In all other cases RTFREE_LOCKED() macro should be used.
If the reference isn't the last one spit out a warning printf.
Correct the only(?) case for this in rt_check().
- Fix typos in comments.
- Remove code to use the special wc_fifo. It has been disabled by default
in our other drivers as it actually slows down transmit by a small amount
- Dynamically determine the amount of space required for the rx_done
ring rather than hardcoding it.
- Compute the number of tx descriptors we are willing to transmit per
frame as the minimum of 128 or 1/4 the tx ring size.
- Fix a typo in the tx dma tag setup which could lead to unnecessary
defragging of TSO packets (and potentially even dropping TSO packets
due to EFBIG being returned).
- Add a counter to keep track of how many times we've needed to
defragment a frame. It should always be zero.
- Export new extended f/w counters via sysctl
Sponsored by: Myricom, Inc.
vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With
the exception of calls to vm_map_pmap_enter() from
madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter()
are both used to create speculative mappings. Thus, always
reactivating cached pages is a mistake. In principle, cached pages
should only be reactivated by an actual access. Otherwise, the
following misbehavior can occur. On a hard fault for a text page the
clustering algorithm fetches not only the required page but also
several of the adjacent pages. Now, suppose that one or more of the
adjacent pages are never accessed. Ultimately, these unused pages
become cached pages through the efforts of the page daemon. However,
the next activation of the executable reactivates and maps these
unused pages. Consequently, they are never replaced. In effect, they
become pinned in memory.
same way it was enabled for Linux binares in linuxulator.
This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.
Deal with IPv6 routing headers (see FreeBSD-SA-07:03.ipv6 for background)
Block IPv6 packets with routing headers by default, unless 'allow-opts'
is specified. Block RH0 unconditionally. Deal with ip6_plen 0.
MFC after: 1 week
Discussed with: mlaier
- Update to the latest (1.4.18) f/w. This f/w introduces a new
receive mode which allows us to use FreeBSD's physically discontinuous
MJUM9BYTES clusters.
- Switch the driver from chaining MJUMPAGESIZE clusters to using
MJUM9BYTES clusters to avoid mbuf chaining overheads. Due to this
change, people running obsolete f/w images will be limited to an MTU of
PAGE_SIZE - 16.
- Add (disabled by default) support for Large Receive Offload.
Sponsored by: Myricom, Inc.
processor is to jump to recovery code. This branching behaviour
may not be implemented by the processor and a Speculative Operation
fault is raised. The OS is responsible to emulate the branch.
Implement this, because GCC 4.2 uses advanced loads regularly.
scheduler lock is not involved. sched_lock still protects the sched_clock
call. Another patch will remedy this.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
Tested by: kris, jeff
referenced outside of mp_machdep.c
- Replace a magic 14 with the newly added IDC_ITID_SHIFT macro.
- Remove the global mp_boot_mid variable as it's not really necessary
and just replacing it with PCPU_GET(mid) doesn't have any impact on
performance once booted.
- Replace PCPU_GET(cpuid) with the curcpu shortcut.
- Replace hardcoded function names in panic strings etc with __func__
so they don't need to be updated when renaming the function.
- Use register_t instead of u_long for variables used to hold the
return value of intr_disable() so we don't need to apply any
knowledge about the actual width of that value here.
- Improve the wording of some comments.
- Fix several style(9) bugs.
- Use __FBSDID in identcpu.c.
- Remove #ifndef SUN4V around global cpu_impl variable; it doesn't
hurt on sun4v for now and once setPQL2() is gone sun4v can stop
sharing identcpu.c with sparc64, making the reminder of this file
also sparc64-only again. [1]
Submitted by: kmacy [1]
in the sun4v source in order to be able to compile the source which
is shared between sparc64 and sun4v just #include the sparc64
version here instead of duplicating it.
This is based on the approach taken by pc98 headers in order to
compile the source shared between i386 and pc98.
iommureg.h (which already began to bitrot) and iommuvar.h from the
sun4v source and adjust some of the source which is shared between
sparc64 and sun4v as appropriate.
ignore the size of any headers that were passed with the sendfile(2)
system call. Otherwise the file sent will be truncated by the header
size if the nbytes parameter was provided. The bug doesn't show up
when either nbytes is zero, meaning send the whole file, or no header
iovec is provided.
Resolve a potential error aliasing of errors from the VM and sf_buf
parts and the protocol send parts where an error of the latter over-
writes one of the former.
Update comments.
The byte accounting bug wasn't seen in earlier because none of the popular
sendfile(2) consumers, Apache, lighttpd and our ftpd(8) use it in modes
that trigger it. The varnish HTTP proxy makes full use of it and exposed
the problem.
Bug found by: phk
Tested by: phk
scheme allowed for 1024 PTE pages, each containing 256 PTEs.
This yielded 2GB of KVA. This is not enough to boot a kernel
on a 16GB box and in general too low for a 64-bit machine.
By adding a level of indirection we now have 1024 2nd-level
directory pages, each capable of supporting 2GB of KVA. This
brings the grand total to 2TB of KVA.
Fix the flags argument: M_WAITOK is not a valid flag. Its presence
leaves the indication that contigmalloc(9) will not return a NULL
pointer.
The use of contigmalloc(9) in this place is probably not a good idea
given the constraints. It's probably better to lift the constraints
and instead add a permanent mapping to the ITR. It's possible that
the first 256MB of memory is exhausted when we get here.
This fixes a kernel panic on a 16GB rx3600.
of each port and any further packets are blocked, when the all the marker frames
have been returned to us from the remote network device then we can be sure
that all interface queues are empty.
This is needed when a port is added or removed from the aggregation since it
will affect the hash based distribution, if the queues are not empty then a
packet from an existing connection may be placed on a different interface and
arrive out of order. This was previously achieved by suppressing transmission for
1 second, now that there is an active feedback this timeout as been increased
to 3 seconds and used as a fallback.
Switch ia64 kernels to -fpic. This is likely wrong, but at least gets
ia64 kernels to compile and link with GCC 4.2. The previous -mno-sdata
trick is not working anymore.
for use thoughout the tcp subsystem.
It is IPv4 and IPv6 aware creates a line in the following format:
"TCP: [1.2.3.4]:50332 to [1.2.3.4]:80 tcpflags <RST>"
A "\n" is not included at the end. The caller is supposed to add
further information after the standard tcp log header.
The function returns a NUL terminated string which the caller has
to free(s, M_TCPLOG) after use. All memory allocation is done
with M_NOWAIT and the return value may be NULL in memory shortage
situations.
Either struct in_conninfo || (struct tcphdr && (struct ip || struct
ip6_hdr) have to be supplied.
Due to ip[6].h header inclusion limitations and ordering issues the
struct ip and struct ip6_hdr parameters have to be casted and passed
as void * pointers.
tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr,
void *ip6hdr)
Usage example:
struct ip *ip;
char *tcplog;
if (tcplog = tcp_log_addrs(NULL, th, (void *)ip, NULL)) {
log(LOG_DEBUG, "%s; %s: Connection attempt to closed port\n",
tcplog, __func__);
free(s, M_TCPLOG);
}
lock and unlock conditionally, not just set the flag on it conditionally.
In practice, this bug couldn't manifest, as in the current revision of
the code, no callers pass a NULL rep.
CID: 1416
Found with: Coverity Prevent(tm)
While ng_fec called the ioctl to let interfaces in the bundle know
the list of multicast addresses had changed, it never actually
updated that list on the interfaces in the bundle. Consequently,
the multicast filters could be programmed incorrectly.
if_lagg does this correctly, by maintaining a list of addresses
that it has added to interfaces in the bundle. This commit basically
takes the if_lagg code and adds it to ng_fec.
A version of this patch for RELENG_6 has fixed some problems with
IPv6 ND over ng_fec. This is probably the problem in PR 107523.
PR: 107523
Tested by: Rob Gallagher <robert.gallagher@heanet.ie>
Obtained from: if_lagg
MFC after: 3 weeks
function calls are no more generated for vop_lock.
Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption
about vop naming conventions. This restores pre/post-condition calls.
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
speedup and will be more useful after each gains a spinlock in the
impending thread_lock() commit.
- Move initialization and asserts into init/fini routines. fini routines
are only needed in the INVARIANTS case for now.
Submitted by: Attilio Rao <attilio@FreeBSD.org>
Tested by: kris, jeff
specified in RFC4620. A new flag for icmp6_nodeinfo was added to enable the
feature.
- Also cleaned up the code so that the semantics of the icmp6_nodeinfo
flags is clearer (i.e., defined specific macro names instead of using
hard-coded values).
Approved by: gnn (mentor)
MFC after: 1 week
- Fixed RTOinfo for bounding.
- Fixed connect() to return ECONNREFUSED when an ABORT is received.
- Added comments to direct Static Analysis not to look at some things
it does not understand (comments are /* sa_ignore XXXXX */)
- Bind when colliding was broken, missing not_found = 1 before
checking to see if the port was in use caused endless bind loop.
- Cookie life needs to be in milliseconds to conform to socket api.
- Cookie life is not supposed to change if its 0, On the assoc
level set we changed it to 0 opps.
- Two more static analysis issues identified by the cisco
tool. Null checks needed.
- An issue for sendfile(). Need to validate the correct
input argument.
- When sending failed due to a no route to host, we leaked
the mbuf chain failing to call m_freem().
- Fix #ifdef issue for getting hash block len when HAVE_SHA2 is NOT defined
Reviewed by: gnn
defined. This restores the old behavior, and eliminates the
dependency on the kernconf.tmpl when INCLUDE_CONFIG_FILE isn't
included in the kernel config. There were many people in the terminal
room that had almost, but not quite, up-to-date config files that this
helps. I don't know if this is the result of skew among the cvsup
servers, or some other more subtle problem. However, this fix should
work for any config of recent vintage (I tested with the latest, and
one before the recent changes, and eye-balled the intermediate
versions).
Reviewed by: the terminal room crew
adapter list still capable, but only PCI-E adapters are now enabled.
The user can enable older PCI-X or PCI adapters using ifconfig.
Secondly, Arthur Hartwig pointed out my MSI change was not working
correctly, changed to something that now does. Thanks Arthur.
There was also a fundamental bug in the 82575 MSIX code, the MSIX
registers had to be mapped, opps :)
Rubber-stamped by: Pdeuskar
the power_nodriver tunable is off. pci_cfg_save() already checks the
tunable internally, and no other callers of pci_cfg_save() check the
tunable.
Reviewed by: imp
- Updated firmware to latest release (v3.4.8) to fix TSO + jumbo frame lockup
- Added MSI (hw.bce.msi_enable) and TSO (hw.bce.tso_enable) sysctls
- Fixed kernel panic when MSI is used and module is unloaded
- Added several new debug routines
- Removed slack space for RX/TX chains since it only covers sloppy coding
- Fixed a potential problem when programming jumbo MTU size in hardware
- Various other comment changes
MFC after: 4 weeks
because on at least my dc based cards there's garbage in there. The
recent changes in the resource code appears to have unmasked this
problem... At least dc now probes/attaches better than it did before.
Also, we no longer need to write to the cfg for the other registers.
different versions of FreeBSD source tree.
Old config(8) can now be used unless you want to use INCLUDE_CONFIG_FILE
option.
Approved by: imp
Reviewed by: imp
other than repo copied tcp_subr.c into tcp_timewait.c#1.284:
tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck()
tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset()
tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop()
tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan()
This is a mechanical move with appropriate renames and making
them static if used only locally.
The tcp_tw_2msl_scan() cleanup function is still run from the
tcp_slowtimo() in tcp_timer.c.
value in the mbuf with the result of the calculation. Previously,
if we chose to return an ICMP message, the quoted UDP checksum bytes
would be different to what was sent.
PR: 112471
Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz>
MFC after: 3 weeks
legacy codepath match the 82575, without this we were seeing bridging
fail on 82546 adapters. Secondly, I have limited TSO to PCI Express
adapters, I meant to do this and it got dropped in the earlier delta.
Next, I am dropping in the latest shared code from our development
team, consensus was that this should be done frequently, so I am :)
Approved by: pdeuskar
exists and contains the 'C' flag.
o The partition label can be the empty string. It's how labels are
cleared.
o When an action fails, lower permissions when they were raised
in order to allow the action. A failed action will not result
in any uncommitted changes.
o Allow the flags paremeter to be present but empty. It's the
equivalent of not being present.
processes under 64-bit kernels). Previously, each 32-bit process overwrote
its resource limits at exec() time. The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process. To fix this, don't
ovewrite the resource limits during exec(). Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit. We then use this when querying and
setting resource limits. Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children. However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.
MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).
MFC after: 1 week
SIGCHLD/kevent(2) notification of process termination and wait(). Now
we no longer drop locks between sending the notification and marking
the process as a zombie. Previously, if another process attempted to do
a wait() with W_NOHANG after receiving a SIGCHLD or kevent and locked
the process while the exiting thread was in cpu_exit(), then wait() would
fail to find the process, which is quite astonishing to the process
calling wait().
MFC after: 3 days
option value so that unrecognized options are ignored as specified in RFC2711.
(packets containing an MLD router alert option are passed to the upper layer
as before).
Approved by: gnn (mentor), ume (mentor)
functions from their origininal place to their own files.
TCP Reassembly from tcp_input.c -> tcp_reass.c
TCP Timewait from tcp_subr.c -> tcp_timewait.c
is caused by my latest changes to config(8). You're supposed to install new
config(8) in order to prevent yourself from seeing a warning about old
version of that tool.
You should configure the kernel with a new config(8) then.
Oked by: rwatson, cognet (mentor)
This change will let us to have full configuration of a running kernel
available in sysctl:
sysctl -b kern.conftxt
The same configuration is also contained within the kernel image. It can be
obtained with:
config -x <kernelfile>
Current functionality lets you to quickly recover kernel configuration, by
simply redirecting output from commands presented above and starting kernel
build procedure. "include" statements are also honored, which means options
and devices from included files are also included.
Please note that comments from configuration files are not preserved by
default. In order to preserve them, you can use -C flag for config(8). This
will bring configuration file and included files literally; however,
redirection to a file no longer works directly.
This commit was followed by discussion, that took place on freebsd-current@.
For more details, look here:
http://lists.freebsd.org/pipermail/freebsd-current/2007-March/069994.htmlhttp://lists.freebsd.org/pipermail/freebsd-current/2007-May/071844.html
Development of this patch took place in Perforce, hierarchy:
//depot/user/wkoszek/wkoszek_kconftxt/
Support from: freebsd-current@ (links above)
Reviewed by: imp@
Approved by: imp@
protocol entry points using functions named proto_getsockaddr and
proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr.
While it's true that sockaddrs are allocated and set, the net effect is
to retrieve (get) the socket address or peer address from a socket, not
set it, so align names to that intent.
passed zero as exit signal.
GCC 4.2 changes the kernel data segment layout not to have 0
in that memory location. This code ran by luck before and now
the luck has run out.
- All printf that was surrounded by #ifdef SCTP_DEBUG moves to
a macro that does all of this. This removes all printfs from
the code and makes the code more portable and easier to
read.
- Static Analysis (cisco) - found a few bugs, but mostly we
add checks for NULL pointers and such to make the tool
happy. We now pass the Cisco SA tools checks except for
where it does not understand tailq/lists. We still need
to look at the coverity tools output too (this is like
the cisco SA tool) and see if it wants us to fix any other
items. Hopefully this will be the last major churn in the
code other than bug fixes.
This patch does the following:
- Remove un-necessary code that is not even compiling into the driver
under TW_OSL_NON_DMA_MEM_ALLOC_PER_REQUEST defines.
- Remove bundled firmware image and associated "files" entry for tw_cl_fwimg.c
- Remove bundled firmware flashing routines. We now have tw_update userspace
FreeBSD controller flash utility.
- Fix driver crash on load due to shared interrupt.
- Fix 2 lock leaks for Giant lock.
- Fix CCB leak.
- Add support for 9650SE controllers.
Many thanks to 3Ware/AMCC for continuing to support FreeBSD.
time workaround for problems with 82571 adapters and LAAs, one port
getting reset can cause the other to have its RAR[0] also reset,
thus overwriting an LAA. This fix works around it by also keeping
the address in the last array member.
The other bug is specific to the new 575 adapter, its transmit code
logic in handling hwassists was too crude, it broken when doing
bridges. I am much happier with the new logic,we may want to change
the legacy path at some point to something similar.
Reviewed by: pdeuskar
Approved by: pdeuskar
an APIC ID of 38 for its second CPU):
- Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern
systems.
- Size the various arrays in the MADT, MP Table, and SMP code that are
indexed by APIC IDs to allow for up to MAX_APIC_ID.
- Explicitly go through and assign logical cpu ids to local APICs before
starting any of the APs up rather than doing it while starting up the
APs. This step is now where we honor MAXCPU.
MFC after: 1 week
1) adding the thread to the sleepq via sleepq_add() before dropping the
lock, and 2) dropping the sleepq lock around calls to lc_unlock() for
sleepable locks (i.e. locks that use sleepq's in their implementation).
- Split the intr_table_lock into an sx lock used for most things, and a
spin lock to protect intrcnt_index. Originally I had this as a spin lock
so interrupt code could use it to lookup sources. However, we don't
actually do that because it would add a lot of overhead to interrupts,
and if we ever do support removing interrupt sources, we can use other
means to safely do so w/o locking in the interrupt handling code.
- Replace is_enabled (boolean) with is_handlers (a count of handlers) to
determine if a source is enabled or not. This allows us to notice when
a source is no longer in use. When that happens, we now invoke a new
PIC method (pic_disable_intr()) to inform the PIC driver that the
source is no longer in use. The I/O APIC driver frees the APIC IDT
vector when this happens. The MSI driver no longer needs to have a
hack to clear is_enabled during msi_alloc() and msix_alloc() as a result
of this change as well.
- Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to
complement apic_enable_vector() and use it in the I/O APIC and MSI code
when freeing an IDT vector.
- Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an
IRQ to its irq_rman. The MSI code uses this when it creates new
interrupt sources to let the nexus know about newly valid IRQs.
Previously the msi_alloc() and msix_alloc() passed some extra stuff
back to the nexus methods which then added the IRQs. This approach is
a bit cleaner.
- Change the MSI sx lock to a mutex. If we need to create new sources,
drop the lock, create the required number of sources, then get the lock
and try the allocation again.
119373: o Remove the query verb, along with the request and response
parameters.
o Add the version and output parameters.
119390: [APM,GPT] Properly clear deleted entries.
119394: o Make the alias the standard and use the '!' to prefix
literal partition types.
o Treat schemes and partition types as case insensitive.
119462: [GPT] Fix a page fault caused when modifying a partition entry
without a new partition type.
stack will process from 50 to 15. As this is a sysctl variable it
can be tuned up or down at the user/administrator's whim.
Submitted by: itojun
MFC after: 1 day
to the coverity tool.. may even be the same one.. not sure).
- A bug in the way sctp_abort() and friends were
setting the IP_CLOSE flag.. and NOT passing the
last argument as a (,1)... so that things would
get freed..
- Update to latest (1.4.17) firmware.
- Use the new MXGEFW_CMD_UNALIGNED_TEST (added in firmare 1.4.16) to
have the firmware tell us if the PCIe chipset supports aligned PCIe
completions.
- Hard to maintain, and frequently out of date whitelist of PCIe
chipsets known to produce aligned completions removed, as it has been
replaced in its role of selecting the correct firmware to run by the
use of MXGEFW_CMD_UNALIGNED_TEST.
- Break the dma test out of mxge_reset() and into its own function
(mxge_dma_test()) so it can be used by both the normal DMA test, and
to run the unaligned test.
- Improved support for enabling ECRCs
Sponsored by: Myricom Inc.
- PR-SCTP would ignore FWD-TSN's above a rwnd's worth
of TSN's (1 byte msgs).. this left the peer hopelessly
out of sync.. or an attacker. So now we abort the assoc.
- New IFN hash, also rename hashes to match addr/ifn now
that the vrf has multiple.
- Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default
as defined in the Socket API ID.
- Export MTU information via sysctl.
- Vrf's need table id's. This is default for
BSD, but may be other things later when BSD
fully supports VRFs.
- Additional stream reset bug (caught by cisco dev-test).
- Additional validations for the address in sending a message (socket api).
-------- and -----
- Fix association notifications not to give the active open
side false notifications.
- Fix so sendfile and SENDALL will work properly (missing
flag to say socket sender is done).
- Fix Bug that prevented COOKIES from being retransmitted.
- Break out connectx into helper sub-models so that iox routines can
reuse the helpers.
- When an address is added during system init (non-dynamic mode) make
sure that the "defer use" flag is not set.
** its compiling on XR now :-D **
Reviewed by: gnn
and in_setsockaddr(), containing only stale comments on why they
exist, remove them and initialize the protosw for UDP to directly
reference in_setpeeraddr() and in_setsockaddr().
The entire code is wrapperd in #ifdef ... #endif so it won't harm
the actual implementation, but developers are encouraged to test it.
For arm, ia64, ppc, sparc64 and sun4v some work is still
needed, thus arch maintainers are encouraged to bring their arch on par
with respect to i386 and amd64.
Approved by: re (implicit?)
o push much of the i386 and amd64 MD interrupt handling code
(intr_machdep.c::intr_execute_handlers()) into MI code
(kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
(intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
and make them part of 'struct intr_event', passing them as arguments to
kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
with filter and ithread functions.
Approved by: re (implicit?)
and change it to a void function.
We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late
late segments arriving for such a connection is handled directly in the TW
code.
and show up with different names: first try to open provider using
remembered name and compare its ident, if equal, this is our provider,
if not equal or there is no provider with such name, find provider with
remembered ident and don't care about the name.
- Locks were not being unlocked when an invalid size chunk is
sent in.
- When a notification comes in, we cannot use it to look up
the fragment interleave stream information since its not
on a stream.
Seems to work on RELENG_4 through -current and also on sparc64
now. There may still be some issues with the auto attach/detach
code to sort out.
MFC after: 3 days
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory. The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used. Previously,
only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.
This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.
Discussed with: kmacy, marius, Nathan Whitehorn
PR: 112194
DIOCGFLUSH - Flush write cache (sends BIO_FLUSH).
DIOCGDELETE - Delete data (mark as unused) (sends BIO_DELETE).
DIOCGIDENT - Get provider's uniqe and fixed identifier (asks for
GEOM::ident attribute).
First two are self-explanatory, but the last one might not be. Here are
properties of provider's ident:
- ident value is preserved between reboots,
- provider can be detached/attached and ident is preserved,
- provider's name can change - ident can't,
- ident value should not be based on on-disk metadata; in other words
copying whole data from one disk to another should not yield the same
ident for the other disk,
- there could be more than one provider with the same ident, but only if
they point at exactly the same physical storage, this is the case for
multipathing for example,
- GEOM classes that consumes single providers and provide single providers,
like geli, gbde, should just attach class name to the ident of the
underlying provider,
- ident is an ASCII string (is printable),
- ident is optional and applications can't relay on its presence.
The main purpose for this is that application and remember provider's ident
and once it tries to open provider by its name again, it may compare idents
to be sure this is the right provider. If it is not (idents don't match),
then it can open provider by its ident.
OK'ed by: phk
- Make wlan_amrr depend on wlan, so that it can find various symbols in
wlan module if wlan is not compiled into kernel.
Approved by: sam (mentor)
Tested by: kevlo
- http://www.intel.com/design/chipsets/specupdt/245051.htm
AC97 Soft Audio and Soft Modem Master Abort Errata
Issue:
Use of either soft audio or soft modem on an Intel® 82443MX PCISet
based platform running a 100 MHz Processor System Bus and an AC97 codec
may result in failures. The system continues to function normally while
the AC97 hardware may not resume and may require a cold-boot to
recover. As a result of the failure, the Master Abort Status bit will
be set in the audio or modem function PCI header space.
Workaround:
Force uncacheable DMA on both BDL and pcm buffers.
Tested by: Emil Holmstr|m <emil@linux.se>
- Remove explicit call to pmap_change_attr(), since we now have proper
and functional definition of BUS_DMA_NOCACHE.
- Enable PCI(e) bus snooping for non i386/amd64 as an alternative for
uncacheable DMA.
- Codecs changes:
* Analag Device -> Analog Devices, AD1988.
* New codec: VIA VT1708 and VT1709, Realtek ALC262, ALC861-VD and
ALC885.
* Various fixups for Conexant Waikiki, fix recording (read: microphone)
on various Analog Devices codecs due to vendor BIOS mess, various
quirks for several ASUS laptops/boards.
- Fix connection list handling, closely following the specification to
handle range of nids.
- Basic Jack sense polling infrastructure for possible hardwares with
broken unsolicited response interrupt.
Ideas/Submitted/Tested by: Andriy Gapon <avg@icyb.net.ua>,
#freebsd-azalia, many.
state tcp_debug, tcp_debx. Acquire and drop as required in tcp_trace().
Move to ANSI C function header, correct prototype types so that short TCP
state is no longer promoted to int unnecessarily.
Add comments.
MFC after: 3 weeks
Updated copyright date to 2007.
Tested with BCM5706 A3.
Added ID for BCM5708 B2.
Removed unused driver version string.
Modified BCE_PRINTF macro to automatically fill-in the sc pointer.
Fixed a kernel panic when the driver was loaded as a module from the
command-line because the MII bus pointer was null (i.e. the MII bus
hadn't been enumerated yet).
Added fix proposed by Vladimir Ivanov <wawa@yandex-team.ru> to prevent
driver state corruption when releasing the lock during the ISR in
bce_rx_intr() to send packets up the stack.
Added new TX chain and register read sysctl interfaces for debugging.
Cleaned up formatting for various other debug routines.
Added a new statistic maintained by firmware which tracks the number
of received packets dropped because no receive buffers are available.
correct network drivers with respect to busmaster DMA, go over it
with at duster to make other aspects of it a role model:
Eliminate the pci specific softc, it serves no rational purpose.
Use convenience resource allocation/deallocation functions to save
code and errorhandling.
Switch from bus_space_{read|write}_%u() to bus_{read|write}_%u()
functions and forget about tags and handles, the resource will know
about those, should they be needed. This also eliminates a number
of inconsistently named local variables.
it was full and a collision occured, then we would leave
a inp locked. Also fixes a missing inp unlock if IPSEC was
on and it failed during the attach. Bug found by Weongyo Jeong.
as UF_OPENING. Disable closing of that entries. This should fix the crashes
caused by devfs_open() (and fifo_open()) dereferencing struct file * by
index, while the filedescriptor is closed by parallel thread.
Idea by: tegge
Reviewed by: tegge (previous version of patch)
Tested by: Peter Holm
Approved by: re (kensmith)
MFC after: 3 weeks
in comments for .c and .h files respectively. Jack may want to clean up
style or other aspects once he's up and about again, but this gets the
kernel compiling.
shared code infrastructure that is family specific and
modular. There is also support for our latest gigabit
nic, the 82575 that is MSI/X and multiqueue capable.
The new shared code changes some interfaces to the core
code but testing at Intel has been going on for months,
it is fairly stable.
I have attempted to be careful in retaining any fixes that
CURRENT had and we did not, I apologize in advance if any
thing gets clobbered, I'm sure I'll hear about it :)
Approved by pdeuskar
on each socket buffer with the socket buffer's mutex. This sleep lock is
used to serialize I/O on sockets in order to prevent I/O interlacing.
This change replaces the custom sleep lock with an sx(9) lock, which
results in marginally better performance, better handling of contention
during simultaneous socket I/O across multiple threads, and a cleaner
separation between the different layers of locking in socket buffers.
Specifically, the socket buffer mutex is now solely responsible for
serializing simultaneous operation on the socket buffer data structure,
and not for I/O serialization.
While here, fix two historic bugs:
(1) a bug allowing I/O to be occasionally interlaced during long I/O
operations (discovere by Isilon).
(2) a bug in which failed non-blocking acquisition of the socket buffer
I/O serialization lock might be ignored (discovered by sam).
SCTP portion of this patch submitted by rrs.
- Simplify the amount of work that has be done for each architecture by
pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
multiple MSI-X messages into a single IRQ when handling a message
shortage.
The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
to calculate the address and data values for a given MSI/MSI-X IRQ.
The x86 nexus drivers map this into a call to a new 'msi_map()' function
in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge
of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
Specifically, it now stores an array of IRQs (called "message vectors" in
the code) that have associated address and data values, and a small
virtual version of the MSI-X table that specifies the message vector
that a given MSI-X table entry uses. Sparse mappings are permitted in
the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
registers directly via custom bus_setup_intr() and bus_teardown_intr()
methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
address and data values for a given message as needed. The MD code
no longer has to call back down into the PCI bus code to set these
values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
new values of the address and data fields for a given IRQ. The x86
MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
since the only remaining diff between the two is a substring in a
bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed. Instead of accepting
indices for the allocated vectors, it accepts a mini-virtual table
(with a new length parameter). This table is an array of u_ints, where
each value specifies which allocated message vector to use for the
corresponding MSI-X message. A vector of 0 forces a message to not
have an associated IRQ. The device may choose to only use some of the
IRQs assigned, in which case the unused IRQs must be at the "end" and
will be released back to the system. This allows a driver to use the
same remap table for different shortage values. For example, if a driver
wants 4 messages, it can use the same remap table (which only uses the
first two messages) for the cases when it only gets 2 or 3 messages and
in the latter case the PCI bus will release the 3rd IRQ back to the
system.
MFC after: 1 month
set/clear it but would not do it. Now we will.
- Moved to latest socket api for extended sndrcv info struct.
- Moved to support all new levels of fragment interleave (0-2).
- Codenomicon security test updates - length checks and such.
- Bug in stream reset (2 actually).
- setpeerprimary could unlock a null pointer, fixed.
- Added a flag in the pcb so netstat can see if we are listening easier.
Obtained from: (some of the Listen changes from Weongyo Jeong)
pointers. A structure is more readable and less error-prone. It
also avoids problems when a function pointer doesn't have the
same width as a void pointer.
functions with CPUs they apply to only, otherwise default to the
plain C functions. This is modeled in a way so that f.e. a Cheetah
version of these functions can be inserted easily.
Not because I admit they are technically wrong and not because of bug
reports (I receive nothing). But because I surprisingly meets so
strong opposition and resistance so lost any desire to continue that.
Anyone who interested in POSIX can dig out what changes and how
through cvs diffs.
the UPA_IMR2 resource is also shared with/a subset of the Schizo PCI
bus B CSR bank. I'm not entirely sure how this previously managed to
escape testing...
consistent with the naming of other structure field members, and
reducing improper grep matches. Clean up and comment structure
fields in structure definition.
sc->mii_anegticks according to whether the respective BGE chip
supports Fast Ethernet only or also Gigabit Ethernet.
- At least the BGE chips I've tested with wedge when isolating them
so document this as the reason for setting MIIF_NOISOLATE and
remove the unused (and partially even #ifdef'ed out) isolation
related code. Add code that panics if we encounter a non-zero MII
instance as generally there's no way a PHY requiring MIIF_NOISOLATE
can be handled gracefully in a multi-PHY configuration (it's ok for
the internal PHY of single-PHY-only-NIC to not support isolation
though).
- Additionally set MIIF_NOLOOP as loopback doesn't seem to work
either and remove the #ifdef'ed out code for adding respective
media. The MIIF_NOLOOP flag currently triggers nothing but
hopefully will be respected by mii_phy_setmedia() later on.
Reviewed by: jkim, yongari
MFC after: 1 month
Blade 2500, Fire V210 and probably some other sparc64 machines.
These chips are typically not fitted with an EEPROM which means
that we have to obtain the MAC address via OFW and that some chip
tests will just always fail.
These changes are based on the respective code found in OpenBSD
with some additional info obtained from OpenSolaris and some style
suggestions by jkim@. They also have the desired side-effect of
respecting the 'local-mac-address?' system configuration variable
for the affected BGEs.
- In bge_attach() factor out calling bge_release_resources() before
going to the fail label into the fail label as well as replace a
magic 6 with ETHER_ADDR_LEN.
Reviewed by: yongari (before style changes), jkim
- Wake up DMA engine after adding a new receive buffer.
- Skip buffers which have unknown state after error.
- More rigid error detection.
MFC after: 1 week
as some combinations of chipset, controller and target do not behave
correctly when DMA is enabled for other commands.
PR: kern/103602
MFC after: 2 weeks
were never freed, but the big ring was freed twice.
-Don't supply rx hw csums for frames which are padded beyond the
length specified in the ip header. If the padding is non-zero,
the hw csum will be incorrect for such frames.
Sponsored by: Myricom
non-mapped data as possible at once and not page-by-page. Which this change we
combain I/Os, but also saves many VM_OBJECT_UNLOCK()/VM_OBJECT_LOCK()
operations.
Simple 'fsx -l 33554432 -o 524288 -N 10000 /tank/fsx' test shows ~23%
performance increase.
This workaround the problem in Parallels/VMWare where the emulated drivers are
slower, especially with ATA_FLUSHCACHE. The problem appears much more
frequently with ZFS which use it a lot more.
Approved: sos, pjd
- vm_page_undirty() is enough (instead of vm_page_set_validclean()), but it has
to be called before we write the data in case someone makes page dirty after
our write, but before our vm_page_undirty() call.
- Always dmu_write, not matter if uiomove() succeeded, because it could
partially be ok and we would lose some changes.
All good ideas from: ups
In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp
vnode may be VI_DOOMED due to one of the following:
- other thread finished unmount and vput()ed it, and vnode was chosen
for recycling, while vn_lock() slept;
- forced unmount of the coveredvp->v_mount fs.
In the first case, next check for changed v_mountedhere or mnt_gen counter
would be successfull. In the second case, the unmount shall be allowed.
Submitted by: sobomax
MFC after: 2 weeks
- Fix for a bug where a close would not wait for all (directio)
dirty buffers to drain. The nfsnode was not marked NMODIFIED
when there were directio dirtied buffers pending, causing this.
- No reason to vhold/vrele the vp when enqueueing DirectIO requests
for the nfsiods. The vnode can't really go way since the close
has to wait for these requests to drain.
MFC after: 1 week
Submitted by: mohans
specific request and thus should first try to be allocated from the
sys_resource pool. This avoids using the sys_resource pool for wildcard
requests that have bounded ranges coming from cbb(4) and Host-PCI pcib(4)
drivers.
Tested by: Andrea Bittau <a.bittau of cs.ucl.ac.uk fame>
Sleuthing by: Andrea Bittau as well
that the MSI mapping window is fixed at 0xfee00000 and the capability
does not include two more dwords used to program the address. Supporting
this mostly results in quieting spurious warnings during boot about
non-default MSI mapping windows.
- HT 2.00b also added a new HT capability type, so support that in pciconf.
MFC after: 3 days
Tested by: jmg
It seems that valid pause frames(Tx flow control) cause GMAC to hang
such that it resulted in watchdog timeout. As a work around don't
flush Rx MAC FIFO if we've received pause frames.
Tested by: Harald Schmalzbauer (h DOT schmalzbauer AT omnisec DOT de)
Under certain circumtances, if TSO is active, Yukon II generates
corrupted IP packets. All corrupted IP packets I noticed were the the
last segmented packet in a TSO request. The corrupted packet resulted
in retransmission of the damaged packet which in turn decreased network
performance dramatically.
Unfortunately it seems that there is no way to workaround this bug
as TSO is completely handled in hardware. Disable TSO until we find a
working workaround or a new silicon revision that doesn't have this
hardware bug.
fault. The previous method zero'd out the page tables, invalidated the
TLB, and then entered a spin loop. The idea was that the instruction after
the TLB invalidate would result in a page fault and the page fault and
subsequent double fault wouldn't be able to determine the physical page
for their fault handlers' first instruction. This stopped working when
PGE (PG_G PTE/PDE bit) support was added as a TLB invalidate via %cr3
reload doesn't clear TLB entries with PG_G set. Thus, the CPU was still
able to map the virtual address for the spin loop and happily performed
its infinite loop.
The triple fault now uses a much more deterministic sledge-hammer approach
to generate a triple fault. First, the IDT descriptor is set to point to
an empty IDT, so any interrupts (including a double fault) will instantly
fault. Second, we trigger a int 3 breakpoint to force an interrupt and
kick off a triple fault.
MFC after: 3 days
in all other file system on FreeBSD (instead from inactive() method).
A nice side-effect of this change, except that it speedups file system
when mmaped file are often open/closed, is that it makes FreeBSD's
namecache work:)
This fixes slow operations on mmaped files, because without this fix,
pages were written to disk multiple times.
If one is looking for even greater speed up for such operation, he should
disable ZIL (by setting vfs.zfs.zil_disable to 1 in /boot/loader.conf).
Disabling ZIL makes fsx run ~9 times faster.
supports software encrypt/decrypt.
The nuked code itself is quite problematic, as pointed out by sam@ ---
wk->wk_keyix should be replaced by the loop count.
Tested with WEP/TKIP/CCMP/no-protection.
Approved by: sam@ (mentor)
Noticed by: Hans Petter Selasky <hselasky@c2i.net>
o Fix linewrap issues.
o Fix two typos (s/Recomended/Recommended/ and s/tunning/tuning/)
o Remove a couple of extra instances of the word "of".
o Update names of kmem_size variables.
Approved by: pjd
where similar data structures exist to support devfs and the MAC
Framework, but are named differently.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.
by Philippe Biondi and Arnaud Ebalard. This is a temporary fix
until more discussion can be had on the exact risks involved in
allowing source routing in IPv6
Submitted by: itojun
Reviewed by: jinmei
MFC after: 1 day
- Move FreeBSD-specific code to zfs_freebsd_*() functions in zfs_vnops.c
and keep original functions as similar to vendor's code as possible.
- Add various includes back, now that we have them.
macro, as za_first_integer field also contains type. This should be fixed in
ZFS itself, but this bug is not visible on Solaris, because there, type is
not stored in za_first_integer. On the other hand it will be visible on
MacOS X.
Reported by: Barry Pederson <bp@barryp.org>
variable name conventions for arguments passed into the framework --
for example, name network interfaces 'ifp', sockets 'so', mounts 'mp',
mbufs 'm', processes 'p', etc, wherever possible. Previously there
was significant variation in this regard.
Normalize copyright lists to ranges where sensible.
labels: the mount label (label of the mountpoint) and the fs label (label
of the file system). In practice, policies appear to only ever use one,
and the distinction is not helpful.
Combine mnt_mntlabel and mnt_fslabel into a single mnt_label, and
eliminate extra machinery required to maintain the additional label.
Update policies to reflect removal of extra entry points and label.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.
the introduction of priv(9) and MAC Framework entry points for privilege
checking/granting. These entry points exactly aligned with privileges and
provided no additional security context:
- mac_check_sysarch_ioperm()
- mac_check_kld_unload()
- mac_check_settime()
- mac_check_system_nfsd()
Add mpo_priv_check() implementations to Biba and LOMAC policies, which,
for each privilege, determine if they can be granted to processes
considered unprivileged by those two policies. These mostly, but not
entirely, align with the set of privileges granted in jails.
Obtained from: TrustedBSD Project
- Redistribute counter declarations to where they are used, rather than at
the file header, so it's more clear where we do (and don't) have
counters.
- Add many more counters, one per policy entry point, so that many
individual access controls and object life cycle events are tracked.
- Perform counter increments for label destruction explicitly in entry
point functions rather than in LABEL_DESTROY().
- Use LABEL_INIT() instead of SLOT_SET() directly in label init functions
to be symmetric with destruction.
- Align counter names more carefully with entry point names.
- More constant and variable name normalization.
Obtained from: TrustedBSD Project
- Add a more detailed comment describing the mac_test policy.
- Add COUNTER_DECL() and COUNTER_INC() macros to declare and manage
various test counters, reducing the verbosity of the test policy
quite a bit.
- Add LABEL_CHECK() macro to abbreviate normal validation of labels.
Unlike the previous check macros, this checks for a NULL label and
doesn't test NULL labels. This means that optionally passed labels
will now be handled automatically, although in the case of optional
credentials, NULL-checks are still required.
- Add LABEL_DESTROY() macro to abbreviate the handling of label
validation and tear-down.
- Add LABEL_NOTFREE() macro to abbreviate check for non-free labels.
- Normalize the names of counters, magic values.
- Remove unused policy "enabled" flag.
Obtained from: TrustedBSD Project
set/clear it but would not do it. Now we will.
- Moved to latest socket api for extended sndrcv info struct.
- Moved to support all new levels of fragment interleave.
calls. Add MAC Framework entry points and MAC policy entry points for
audit(), auditctl(), auditon(), setaudit(), aud setauid().
MAC Framework entry points are only added for audit system calls where
additional argument context may be useful for policy decision-making; other
audit system calls without arguments may be controlled via the priv(9)
entry points.
Update various policy modules to implement audit-related checks, and in
some cases, other missing system-related checks.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.
- Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD.
- Use PRIV_NFS_DAEMON in the NFS server.
- In the NFS client, move the privilege check from nfslockdans(), which
occurs every time a write is performed on /dev/nfslock, and instead do it
in nfslock_open() just once. This allows us to avoid checking the saved
uid for root, and just use the effective on open. Use PRIV_NFS_LOCKD.
@118370 Correct typo.
@118371 Integrate changes from vendor.
@118491 Show backtrace on unexpected code paths.
@118494 Integrate changes from vendor.
@118504 Fix sendfile(2). I had two ways of fixing it:
1. Fixing sendfile(2) itself to use VOP_GETPAGES() instead of
hacking around with vn_rdwr(UIO_NOCOPY), which was suggested
by ups.
2. Modify ZFS behaviour to handle this special case.
Although 1 is more correct, I've choosen 2, because hack from 1
have a side-effect of beeing faster - it reads ahead MAXBSIZE
bytes instead of reading page by page. This is not easy to implement
with VOP_GETPAGES(), at least not for me in this very moment.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
@118525 Reorganize the code to reduce diff.
@118526 This code path is expected. It is simply when file is opened with
O_FSYNC flag.
Reported by: kris
Reported by: Michal Suszko <dry@dry.pl>
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.
Approved by: njl (mentor)
Reviewed by: alc
from the incoming SYN handling section of tcp_input().
Enforcement of the accept queue limits is done by sonewconn() after the
3WHS is completed. It is not necessary to have an earlier check before a
connection request enters the SYN cache awaiting the full handshake. It
rather limits the effectiveness of the syncache by preventing legit and
illegit connections from entering it and having them shaken out before we
hit the real limit which may have vanished by then.
Change return value of syncache_add() to void. No status communication
is required.
when the ACK is invalid and doesn't belong to any registered connection,
either in syncache or through SYN cookies. True but a NULL struct socket
is returned when the 3WHS completed but the socket could not be created
due to insufficient resources or limits reached.
For both cases an RST is sent back in tcp_input().
A logic error leading to a panic is fixed where syncache_expand() would
free the mbuf on socket allocation failure but tcp_input() later supplies
it to tcp_dropwithreset() to issue a RST to the peer.
Reported by: kris (the panic)
when one of links is inactive and have stale sequence number. To avoid
this sequence numbers of all links are getting updated on every
successful packet reassembling.
- ng_ppp_bump_mseq function created to simplify code.
- ng_ppp_frag_drop function separated from ng_ppp_frag_process to
simplify code.
Reviewed by: archie
Approved by: glebius (mentor)
which lead to ineffective multilink packet distribution plans.
- Changed bytesInQueue calculation math to have more precise information
about links utilization.
- Taken rough account of the link overhead. Better way to do it could be to
get exact overhead from user-level, but I have not done it to keep
binary compatibility.
Reviewed by: archie
Approved by: glebius (mentor)
be applied to dev entries. This leaves us with file times like "Jan 1 1970."
Work around this problem by replacing the tv_sec == 0 check with a
<= 3600 check. It's doubtful anyone will be booting within an hour of the
Epoch, let alone care about a few seconds worth of nonzero timestamps. It's
a hackish work around, but it does work and I have not experienced any
negatives in my testing.
Discussed with: bde
"Ok with me: phk
and new SCBs were allocated on demand later if needed. This has two
problems. First, allocating SCBs involves allocating contiguous memory,
and if memory is exhausted then the VM will try to page out to satisfy
the request, leading to recursion and deadlock. The second problem is
that it can cause lock order reversals due to parts of the VM still being
under Giant.
Fix the problem be allocating the full pool at driver attach, when it is
safe to do so.
1. CMSG_NXTHDR(mhdr, cmsg) is supposed to dereference cmsg and return
the next header in the chain. If cmsg is NULL it should return
the first header, behaving essentially like CMSG_FIRSTHDR().
2. inet6_rth_(space|init|add) should do basic checking on their input
to verify that the number of headers (segments) is
between 0 and 127 inclusive.
MFC-After: 1 month
and should only be applied on certain specific card / vendor, hence the
addition of ac97_getsubvendor().
- Fix low volume issue on several MSI laptops through ALC655 quirk.
Reported/Tested by: Christian Mueller
<raptor-freebsd-multimedia@xpls.de>
MFC after: 1 week
- For ural(4):
o Fix node leakage in ural_start(), if ural_tx_mgt() fails.
o Fix mbuf leakage in ural_tx_{mgt,data}(), if usbd_transfer() fails.
o In ural_tx_{mgt,data}(), set ural_tx_data.{m,ni} to NULL, if
usbd_transfer() fails, so they will not be freed again in ural_stop().
Approved by: sam (mentor)
- Removed free-oqueue cache.
- Fix counter for sq entries
- Increased the amount of information retained
on ASOC_TSN logging on the association.
- Made it so with the ASOC_TSN logging on
sending or recieving an abort we dump the log.
- Went through and added invariant's around some
panic's that needed them.
- decrements went to atomic_subtact_int instead of add -1
- Removed residual count increment that threw off a
strm oq count.
- Tracks and complaints if we don't have a LAST fragment and
clean up the sp structure.
- Track a new stat that counts number of abandoned msgs that
happen if you close without reading.
- Fix lookup of frag point to be aware of a 0 assoc-id.
Reviewed by: gnn
Group mutexes used in hwpmc(4) into 3 "types" in the sense of
witness(4):
- leaf spin mutexes---only one of these should be held at a time,
so these mutexes are specified as belonging to a single witness
type "pmc-leaf".
- `struct pmc_owner' descriptors are protected by a spin mutex of
witness type "pmc-owner-proc". Since we call wakeup_one() while
holding these mutexes, the witness type of these mutexes needs
to dominate that of "sleepq chain" mutexes.
- logger threads use a sleep mutex, of type "pmc-sleep".
Submitted by: wkoszek (earlier patch)
When nbytes=0, sendfile(2) should use file size. Because of the bug, it
was sending half of a file. The bug is that 'off' variable can't be used
for size calculation, because it changes inside the loop, so we should
use uap->offset instead.
contigmalloc2() was always testing the first physical page for PG_ZERO,
not the current page of interest.
Submitted by: Michael Plass
PR: 81301
MFC after: 1 week
gets a bogus irq storm detected when periodic daily kicks off at 3 am
and disconnects the disk. Change the print logic to print once per second
when the storm is occurring instead of only once. Otherwise, it appeared
that something else was causing the errors each night at 3 am since the
print only occurred the first time.
Reviewed by: jhb
MFC after: 1 week
on a snapshot directory:
- Remove PRIV_VFS_MOUNT check - regular users can mount snapshots
via lookups on snapshot directory.
- Reset mount credential to kcred, so user won't be able to unmount
the snapshot.
- Reset owner uid.
- Unlock vnode in case of a failure.
Reported by: simokawa
Previously whenever PROMISC mode turned on/off link renegotiation
occurs and it could resulted in network unavailability for serveral
seconds.(Depending on switch STP settings it could last several tens
seconds.)
Reported by: Prokofiev S.P. < proks AT logos DOT uptel DOT net >
Tested by: Prokofiev S.P. < proks AT logos DOT uptel DOT net >
This fixes stange panics when listing .zfs/snapshot/ directory for me.
Reported by: simokawa
Reported by: Johan Hendriks <Johan@double-l.nl>
- Hide cache_purge() under FREEBSD_NAMECACHE like in other files.
- Protect mnt_flag with mount interlock.
to free the oldest entry in the current bucket row. The global
entry limit may be smaller than the bucket rows and their limit
combined however. Thus only try to free a syncache entry if we
found one in this bucket row.
Reported by: kris
to move up the start address until the allocation succeeds. If the
alignment of the resource was 0, then the code would keep trying the same
request in an infinite loop and hang. Force the request to always move
start up by at least 1 byte each time through the loop.
The 6105M and 6102 does not have the DWORD alignment problem, so
don't m_defrag() every packet in the transmit path for those.
More stringent usage of tx-descriptor ring and its flags.
Tested on 6102 and 6105M, other chips may also be able to run
without the m_defrag() but I have neither hardware nor docs to
find out.
Sponsored by: Soekris Engineering
"zone", which is generally not present in zone names. This reduces the
incidence of line-wrapping in "vmstat -z " using 80-column displays.
MFC after: 3 days
The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.
The lagg(4) driver provides link aggregation, failover and fault tolerance.
Discussed on: current@
unload instead of returning EBUSY. This check tells if there are mounted
ZFS file systems or not. We can't unload if there are mounted file systems.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
requests where uio_offset is not 0 to begin with. This fixes a long-
standing bug where e.g. 'cat /proc/$$/regs' would loop forever.
MFC after: 3 weeks
- Fix bug that prevented EEOR mode from working
and simplified the can_we_split code in the process.
- Reduce lock contention for the tcb_send_lock. I did
this especially for EEOR mode, still need to look at
why I need a lock when removing from the tailq and the
->next is NOT null. A lock fixes it but it implies a
bug yet exists.
- Activated Andre's proposed changes to better use the mbuf
infrastructure.
- Fixed places that were not using the aloc macro's to take
advantage of the per assoc cache.
- Adds ifdef fix so any logging will enable stat_logging to
get the right data structures in place (suggested by Max Laier).
use to synchornize and protect all data objects that are used for that
SIM. Drivers that are not yet MPSAFE register Giant and operate as
usual. RIght now, no drivers are MPSAFE, though a few will be changed
in the coming week as this work settles down.
The driver API has changed, so all CAM drivers will need to be recompiled.
The userland API has not changed, so tools like camcontrol do not need to
be recompiled.
implement robust version of m_collapse
add support for sf_buf
add fix for m_iovappend
add calls to m_sanity under INVARIANTS
fix m_freem_vec to correctly travese the mbuf iovec chain
The pfs_info mutex is only needed to lock pi_unrhdr. Everything else
in struct pfs_info is modified only while Giant is held (during
vfs_init() / vfs_uninit()); add assertions to that effect.
Simplify pfs_destroy somewhat.
Remove superfluous arguments from pfs_fileno_{alloc,free}(), and the
assertions which were added in the previous commit to ensure they were
consistent.
Assert that Giant is held while the vnode cache is initialized and
destroyed. Also assert that the cache is empty when it is destroyed.
Rename the vnode cache mutex for consistency.
Fix a long-standing bug in pfs_getattr(): it would uncritically return
the node's pn_fileno as st_ino. This would result in st_ino being 0
if the node had not previously been visited by readdir(), and also in
an incorrect st_ino for process directories and any files contained
therein. Correct this by abstracting the fileno manipulations
previously done in pfs_readdir() into a new function, pfs_fileno(),
which is used by both pfs_getattr() and pfs_readdir().
- Reduce default number of spa_zio_* threads to N*spa_zio_issue
plus N*spa_zio_intr threads per ZIO type, where N is the number
of CPUs.
- Put ZIO type number in thread's name.
sendmsg() while using a 0-length msg_controllen. This isn't allowed in
the FreeBSD system call ABI, so detect this case and set msg_control to
NULL. This allows Linux ping to work.
Submitted by: rdivacky
- name change of prefered -> preferred
- CMT fast recover code added.
- Comment fixes in CMT.
- We were not giving a reason of cant_start_asoc per socket api
if we failed to get init/or/cookie to bring up an assoc. Change
so we don't just give a generic "comm lost" but look at actual
states of dying assoc.
- change "crc32" arguments to "crc32c" to silence strict/noisy
compiler warnings when crc32() is also declared
- A few minor tweaks to get the portable stuff truely portable
for sctp6_usrreq.c :-D
- one-2-one style vrf match problem.
- window recovery would leave chks marked for retran
during window probes on the sent queue. This would then
cause an out-of-order problem and assure that the flight
size "problem" would occur.
- Solves a flight size logging issue that caused rwnd
overruns, flight size off as well as false retransmissions.g
- Macroize the up and down of flight size.
- Fix a ECNE bug in its counting.
- The strict_sacks options was causing aborts when window probing
was active, fix to make strict sacks a bit smarter about what
the next unsent TSN is.
- Fixes a one-2-one wakeup bug found by Martin Kulas.
- If-defed out form, Andre's copy routines pending his
commit of at least m_last().. need to adjust for 6.2 as
well.. since m_last won't exist.
Reviewed by: gnn
which has already been freed by in_ifdetach(). With this cumulative change,
the removal of a member interface will not cause a panic in pfsync(4).
Requested by: yar
PR: 86848
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
for jailed root from different jails, etc.
OK'ed by: rwatson
than 2GB of RAM. This was because our physmem is long and 'physmem*PAGESIZE'
can be negative for more than 2GB of memory.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
It is not yet tested by Andrey, so there can be other problems, but this
was definiately a bug, so I'm committing a fix now.
tokens. Currently, we do not support the set{get}audit_addr(2) system
calls which allows processes like sshd to set extended or ip6
information for subject tokens.
The approach that was taken was to change the process audit state
slightly to use an extended terminal ID in the kernel. This allows
us to store both IPv4 IPv6 addresses. In the case that an IPv4 address
is in use, we convert the terminal ID from an struct auditinfo_addr to
a struct auditinfo.
If getaudit(2) is called when the subject is bound to an ip6 address,
we return E2BIG.
- Change the internal audit record to store an extended terminal ID
- Introduce ARG_TERMID_ADDR
- Change the kaudit <-> BSM conversion process so that we are using
the appropriate subject token. If the address associated with the
subject is IPv4, we use the standard subject32 token. If the subject
has an IPv6 address associated with them, we use an extended subject32
token.
- Fix a couple of endian issues where we do a couple of byte swaps when
we shouldn't be. IP addresses are already in the correct byte order,
so reading the ip6 address 4 bytes at a time and swapping them results
in in-correct address data. It should be noted that the same issue was
found in the openbsm library and it has been changed there too on the
vendor branch
- Change A_GETPINFO to use the appropriate structures
- Implement A_GETPINFO_ADDR which basically does what A_GETPINFO does,
but can also handle ip6 addresses
- Adjust get{set}audit(2) syscalls to convert the data
auditinfo <-> auditinfo_addr
- Fully implement set{get}audit_addr(2)
NOTE: This adds the ability for processes to correctly set extended subject
information. The appropriate userspace utilities still need to be updated.
MFC after: 1 month
Reviewed by: rwatson
Obtained from: TrustedBSD
- Tune number of namecache entires better (based on desiredvnodes).
- Handle vfs_lowvnodes event by releasing requested number of name cache
entries, but no less than 5%.
Reported by: simokawa
specific nodes when the process exits)
Move the vnode-cache-walking loop which was duplicated in pfs_exit() and
pfs_disable() into its own function, pfs_purge(), which looks for vnodes
marked as dead and / or belonging to the specified pfs_node and reclaims
them. Note that this loop is still extremely inefficient.
Add a comment in pfs_vncache_alloc() explaining why we have to purge the
vnode from the vnode cache before returning, in case anyone should be
tempted to remove the call to cache_purge().
Move the special handling for pfstype_root nodes into pfs_fileno_alloc()
and pfs_fileno_free() (the root node's fileno must always be 2). This
also fixes a bug where pfs_fileno_free() would reclaim the root node's
fileno, triggering a panic in the unr code, as that fileno was never
allocated from unr to begin with.
When destroying a pfs_node, release its fileno and purge it from the
vnode cache. I wish we could put off the call to pfs_purge() until
after the entire tree had been destroyed, but then we'd have vnodes
referencing freed pfs nodes. This probably doesn't matter while we're
still under Giant, but might become an issue later.
When destroying a pseudofs instance, destroy the tree before tearing
down the fileno allocator.
In pfs_mount(), acquire the mountpoint interlock when required.
MFC after: 3 weeks
a single conditional. The two operations are linked, but since the link
is not very direct, Coverity can't see it. Humans might also miss the
link as well. So, this isn't fixing any actual bugs, just improving
readability.
CID: 1787 (likely others as well)
Found by: Coverity Prevent (tm)
directly to a merged model where only one callout, the next to fire,
is registered.
Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.
The single new callout is a mutex callout on inpcb simplifying the
locking a bit.
tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.
Reviewed by: rwatson (earlier version)
Yukon II generated corrupted TCP checksum for short TCP packets
that's less than 60 bytes in size(e.g. window probe packet, pure ACK
packet etc). Padding the frame with zeros to make the frame minimum
ethernet frame size didn't work at all. Instead of dropping Tx
checksum offload support we calculate TCP checksum with S/W method
when we encounter short TCP frames.
Fortunately it seems that short UDP datagrams appear to be handled
correctly by Yukon II.
While I'm here simplify ethernet/VLAN header size calculation logic.
PR: 111384
popular names. Hence:
- comment current index() and rindex() functions, as these serve the same
functionality as, respectively, strchr() and strrchr() from userland;
- add inlined version of strchr() and strrchr(), as we tend to use them more
often;
- remove str[r]chr() definitions from ZFS code;
Reviewed by: pjd
Approved by: cognet (mentor)