global audit trail configuration. This allows applications consuming
audit trails to specify parameters for which audit records are of
interest, including selecting records not required by the global trail.
Allowing application interest specification without changing the global
configuration allows intrusion detection systems to run without
interfering with global auditing or each other (if multiple are
present). To implement this:
- Kernel audit records now carry a flag to indicate whether they have
been selected by the global trail or by the audit pipe subsystem,
set during record commit, so that this information is available
after BSM conversion when delivering the BSM to the trail and audit
pipes in the audit worker thread asynchronously. Preselection by
either record target will cause the record to be kept.
- Similar changes to preselection when the audit record is created
when the system call is entering: consult both the global trail and
pipes.
- au_preselect() now accepts the class in order to avoid repeatedly
looking up the mask for each preselection test.
- Define a series of ioctls that allow applications to specify whether
they want to track the global trail, or program their own
preselection parameters: they may specify their own flags and naflags
masks, similar to the global masks of the same name, as well as a set
of per-auid masks. They also set a per-pipe mode specifying whether
they track the global trail, or user their own -- the door is left
open for future additional modes. A new ioctl is defined to allow a
user process to flush the current audit pipe queue, which can be used
after reprogramming pre-selection to make sure that only records of
interest are received in future reads.
- Audit pipe data structures are extended to hold the additional fields
necessary to support preselection. By default, audit pipes track the
global trail, so "praudit /dev/auditpipe" will track the global audit
trail even though praudit doesn't program the audit pipe selection
model.
- Comment about the complexities of potentially adding partial read
support to audit pipes.
By using a set of ioctls, applications can select which records are of
interest, and toggle the preselection mode.
Obtained from: TrustedBSD Project
device went away while open or if you tried to change the config
number while devices were open. Based on the patch from the PR with
a number of changes as discussed with the submitter.
PR: usb/97271
Submitted by: Anish Mistry
knowledge of user vs. kernel audit records into
audit_worker_process_record(). This largely confines vnode
knowledge to audit_record_write(), but avoids that logic knowing
about BSM as opposed to byte streams. This will allow us to
improve our ability to support real-time audit stream processing
by audit pipe consumers while auditing is disabled, but this
support is not yet complete.
Obtained from: TrustedBSD Project
Break out logic to call audit_record_write() and handle error
conditions into audit_worker_process_record(). This will be the
future home of some logic now present in audit_record_write()
also.
Obtained from: TrustedBSD Project
worker.
Rename audit_commit_cv to audit_watermark_cv, since it is there to
wake up threads waiting on hitting the low watermark. Describe
properly in comment.
Obtained from: TrustedBSD Project
src/sys/security/audit:
- Clarify and clean up AUR_ types to match Solaris.
- Clean up use of host vs. network byte order for IP addresses.
- Remove combined user/kernel implementations of some token creation
calls, such as au_to_file(), header calls, etc.
Obtained from: TrustedBSD Project
- Cleanup of AUR_ data types.
- Comment fixes.
- au_close_token() definition.
- Break out of kernel vs. user space token interfaces for headers.
Note: this may briefly break the kernel build until other kernel files are
updated to match.
Obtained from: TrustedBSD Project
Eliminate unnecessary, recursive acquisitions and releases of the page
queues lock by free_pv_entry() and pmap_remove_pages().
Reduce the scope of the page queues lock in pmap_remove_pages().
Before the change if a hardware crypto driver was loaded after
the software crypto driver, calling crypto_newsession() with
hard=0, will always choose software crypto.
By using a pointer to struct dos_partition, we implicitly tell the
compiler that the pointer is 4-bytes aligned, even though we know
that's not the case. The fact that we only dereference the pointer
to access a byte-wide field (field dp_ptyp) is not a guarantee that
the compiler will in fact use a byte-wide load. On some platforms
it's more efficient to use long word or quad word loads and use
bit-shifting and bit-masking to get the intended byte. On those
platforms an misaligned load will be the result.
The fix is to use byte-wide pointer arithmetic based on sizeof() and
offsetof() to avoid invalid casts which avoids that the compiler
makes invalid assumptions.
Backtrace provided by: wilko@
MFC after: 1 week
axe_cmd() calls. Without this the device can get confused if multiple
threads attempt these operations concurrently. The problem was
easily reproducible by running "ifconfig axe0" in a loop because
eventually it would conflict with axe_tick_task().
A similar approach is probably required in all USB ethernet drivers.
- Add defines with block length for each HMAC algorithm.
- Add AES_BLOCK_LEN define which is an alias for RIJNDAEL128_BLOCK_LEN.
- Add NULL_BLOCK_LEN define.
Move the code for printing timer statistics into a test function instead of
an ifdef (accessible via the debug.acpi.hpet_test tunable). Also use defines
for register offsets instead of magic values.
Courtesy of: slow flight to HK
bread() the UFS superblock. Should eliminate crashes when trying
to do: mount -t ufs on an audio CD.
PR: kern/85893
Reported by: Russell Francis <rfrancis at ev dot net>
MFC after: 1 week
non-intuitive for the ~ to be built into the mask. All the users now
explicitly ~ the mask. In addition, add MTX_UNOWNED to the mask even
though it technically isn't a flag. This should unbreak mtx_owner().
Quickly spotted by: kris
it. We just moved it to be pci specific, so this was causing compile
problems (linking problems, so I didn't notice since I unwisely just
built the module).
vendor-specific device ids across vendors.
- Include the revision in the dc_devs[] array instead of special casing
the revid handling in dc_devtype().
- Use PCI bus accessors to read registers instead of pci_read_config()
where possible.
- Use an 8-bit write to update the latency timer.
- Use PCIR_xxx constants and remove unused DC_xxx related to standard
PCI config registers.
MFC after: 1 week
dropped. This prevents a bug introduced during the socket/pcb refcounting
work from occuring, in which occasionally the retransmit timer may fire
after a connection has been reset, resulting in the resulting R|A TCP
packet having a source port of 0, as the port reservation has been
released.
While here, fixing up some RUNLOCK->WUNLOCK bugs.
MFC after: 1 month
resulting in some build failures. Instead, to fix the problem of bpf not
being present, check the pointer before dereferencing it.
This is a temporary bandaid until we can decide on how we want to handle
the bpf code not being present. This will be fixed shortly.
forget to unbusy file system before its destruction.
This fixes the following warning on mount failure:
Mount point <X> had 1 dangling refs
Tested by: wkoszek
(1) bpf peer attaches to interface netif0
(2) Packet is received by netif0
(3) ifp->if_bpf pointer is checked and handed off to bpf
(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
initialized to NULL.
(5) ifp->if_bpf is dereferenced by bpf machinery
(6) Kaboom
This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.
Summary of changes:
- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
bpf interface structure. Once this is done, ifp->if_bpf should never be
NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
a lockless read bpf peer list associated with the interface. It should
be noted that the bpf code will pickup the bpf_interface lock before adding
or removing bpf peers. This should serialize the access to the bpf descriptor
list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present
Now what happens is:
(1) Packet is received by netif0
(2) Check to see if bpf descriptor list is empty
(3) Pickup the bpf interface lock
(4) Hand packet off to process
From the attach/detach side:
(1) Pickup the bpf interface lock
(2) Add/remove from bpf descriptor list
Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).
[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
not we have any bpf peers that might be interested in receiving packets.
In collaboration with: sam@
MFC after: 1 month
fixing speed negotiation.
Also fix the mpt_execute_req function to actually
match mpt_execute_req_a64. This may explain why
i386 users were having more grief.
result, raw_uabort() now needs to call raw_detach() directly. As
raw_uabort() is never called, and raw_disconnect() is probably not ever
actually called in practice, this is likely not a functional change, but
improves congruence between protocols, and avoids a NULL raw cb pointer
after disconnect, which could result in a panic.
MFC after: 1 month
that it just warns the user with a printf when it misaligns a piece
of memory that was requested through a busdma tag.
Some drivers (such as mpt, and probably others) were asking for alignments
that could not be satisfied, but as far as driver operation was concerned,
that did not matter. In the theory that other drivers will fall into
this same category, we agreed that panicing or making the allocation
fail will cause more hardship than is necessary. The printf should
be sufficient motivation to get the driver glitch fixed.
- remove call to getmntopts(), and just pass -o options to
nmount(). This removes some confusion as to what options
msdosfs can parse, by pushing the responsibility of option parsing
to the VFS and FS specific code in the kernel.
msdosfs_vfsops.c:
- add "force" and "sync" to msdosfs_opts. They used to be specified
in mount_msdosfs.c, so move them here. It's not clear whethere these
options should be placed into global_opts in vfs_mount.c or not.
Motivated by: marcus
notification so all interfaces including pseudo are reported. When netif
creates the clones at startup devctl_disable has not been turned off yet so the
interfaces will not be initialised twice, enforce this by adding an explicit
order between rc.d/netif and rc.d/devd.
This change allows actions to taken in userland when an interface is cloned
and the pseudo interface will be automatically configured if a ifconfig_<int>=""
line exists in rc.conf.
Reviewed by: brooks
No objections on: net
These pages are allocated from the direct map, and were not previous
tracked. This included the vm_page_array and the early UMA bootstrap
pages.
Reviewed by: peter
Correct a bug in the handling of backslash characters in smbfs which can
allow an attacker to escape from a chroot(2). [2]
Security: FreeBSD-SA-06:15.ypserv [1]
Security: FreeBSD-SA-06:16.smbfs [2]
the first and last cache line in PREREAD, and just invalidate the cache
lines in POSTREAD, instead of write-back/invalidating in POSTREAD, which
could lead to stale data overriding what has been transfered by DMA.
a) were incorrectly written and therefore never compiled into
assertions, and
b) were incorrectly specified and when compiled resulted in a
failed assertion.
for file types other than VREG, VDIR and shared memory objects.
We already handle VREG, VLNK and VDIR cases. Silently ignore
truncate requests for all the rest. Adjust comments.
PR: kern/98064
Submitted by: bde
Security: local DoS
Regress. test: regression/fifo/fifo_misc
MFC after: 2 weeks
fixes filesystem corruption when nextboot.conf is located after
cylinder 1023. The bug appears to have been introduced at the time
bd_read was copied to create bd_write.
PR: bin/98005
Reported by: yar
MFC after: 1 week
usage as of SPC2r20. Specifically, handle the BQueue
flag which will indicate that a device supports the
Basic Queueing model (no Head of Queue or Ordered tags).
When this flag is set, SID_CmdQueue is clear. This has
causes FreeBSD to assume that the device did not support
tagged operations.
MFC after: 1 month
for IOCTLs where casting data to intptr_t * isn't the right thing to do
as _IO() isn't used for them but _IOR(..., int)/_IOW(..., int) are (i.e.
for all IOCTLs except VMIO_SIOCSIFFLAGS), fixing tap(4) on big-endian
LP64 machines.
PR: sparc64/98084
OK'ed by: emax
MFC after: 1 week
`-Wundef'
Warn whenever an identifier which is not a macro is encountered in
an `#if' directive, outside of `defined'. Such identifiers are
replaced with zero.
enabled. It has been commented out for a reason I forgot but I suspect
does not apply anymore.
Technically speaking it's not required to do it, has the data and the
instruction cache have been disabled in _start(). However, it may change
in the future, so I don't want to rely on this behavior.
Submitted by: kevlo
use a different mechanism for setting warning flags, and using
WARNS here only has null or negative effects.
Submitted by: bde (I think it means "submitted")
vmspace_exitfree() and vmspace_free() which could result in the same
vmspace being freed twice.
Factor out part of exit1() into new function vmspace_exit(). Attach
to vmspace0 to allow old vmspace to be freed earlier.
Add new function, vmspace_acquire_ref(), for obtaining a vmspace
reference for a vmspace belonging to another process. Avoid changing
vmspace refcount from 0 to 1 since that could also lead to the same
vmspace being freed twice.
Change vmtotal() and swapout_procs() to use vmspace_acquire_ref().
Reviewed by: alc
can see the results of SPI negotiation w/o being overwhelmed
with other crap).
+ For U320 devices, check against both Settings *and* DV flags before
deciding whether we need to skip actual SPI settings for a device.
+ Go back to creating a 'physical disk' side of a raid/passthru bus that
is limited to the number of maximum physical disks. Actually, this isn't
probably *quite* right yet for one RAID volume, and if we ever end up
with finding a device that supports more than one RAID volume (not likely),
it probably won't quite be right either.
The problem here is that the creating of this 'physical' passthru sim is
just a cheap way to leverage off the CAM midlayer to do our negotiation
for us on the subentities that make up a RAID volume. It almost causes
more trouble than it is worth because we have to remember which side
we're talking to in terms of forming commands and which target ids are
real and so on. Bleah.
+ Skip trying to actually do SPI settings for the RAID volumes on the
real side of the raid/passthru bus pair- this just confuses the issue.
The underlying real physical devices will have the negotiation performed
and the Raid volume will inherit the resultant settings. At the sime time,
non-RAID devices can be on the same real bus, so *do* perform negotiations
with them.
+ At the end of doing all of the settings twiddling, *ahem*, remember to
go update the settings on the card itself (dunno how this got nuked).
At this point, negotiations *seem* to be being done (again) correctly for
both RAID volumes and their subentities. And they seem to be *mostly*
now right for other non-RAID entities on the same bus (I ended up with
3 out of 8 other disks still at narror/async- haven't the slightest
idea why yes).
Finally, negotiations on a normal bus seem to work (again).
There's still more work coming into this area, but we're in the
final stretch.
the passed target id is one of the RAID VolumeID. This result
is used to decide whether to try and do actual SPI negotiations
on the real side of the raid/passthru bus pair. The reason we
check this is that we can have both RAID volumes and real devices
on the same bus.
Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.
POSIX (susv3) requires this, but it is unclear what should be inherited,
duplicating whole 387 stack for new thread seems to be unnecessary and
dangerous. Revert to previous code, force a new thread to be started with
clean FP state.
USBD_FORCE_SHORT_XFER to ensure that we actually build and execute
a transfer. This means that the various alloc_sqtd_chain functions
will always construct a transfer, so it is safe to modify the
allocated descriptors on return. Previously there were cases where
a zero length transfer would cause a NULL dereference.
Reported by: bp
will allow the NFS server to call vfs_stdcheckexp() on the exported nullfs
filesystem, not the underlying filesystem being nullfs mounted.
If the lower filesystem was not NFS exported, then the NFS exported
null filesystem would not work.
Pointed out by: scottl
PR: kern/87906
MFC after: 1 week
- Reduce the number of RX and TX buffers bfe uses so that it does not use more
bounce buffers than busdma is willing to allow it to use
See if_bfe.c for comments on why this is now safe to do.
Also use BUS_DMA_ALLOCNOW to be on the safe side.
2. Look for the Descriptor Error, and Descriptor Protocol Error flags from
the card, and down the interface if we detect either.
#1 (along with fixes to busdma) makes sure that this card works in all
memory situations. Prior to this change, it was just luck that 512 count
RX/TX lists were properly aligned. Now we can use any size of RX/TX lists
and still have them properly aligned.
#2 ensures that we don't get into an endless interrupt storm if busdma fails
us. Descriptor Protocol Error would occur if we misaligned the TX/RX rings,
and Descriptor Error would occur if we tried to give the card descriptors
or rings with addresses > 1G. Trying to reinitialize the card isn't going
to fix these errors, hence we don't try.
Add a quick hack to ensure that bus_dmamem_alloc properly aligns
small allocations with large alignment requirements.
Add a panic to detect cases where we've still failed to properly align.
lookup, rename, strategy, islocked
The missing % sign meant that the lines were processed as plain
comments and the corresponding assertions were never generated.
the high 16 bits is non-zero, fxrstor instruction will generate GP fault,
resulting kernel crash, this bug can be triggered by setcontext and
ptrace(PT_SETXMMREGS).
host controllers to avoid the need to allocate any multi-page
physically contiguous memory blocks. This makes it possible to use
USB devices reliably on low-memory systems or when memory is too
fragmented for contiguous allocations to succeed.
The USB subsystem now uses bus_dmamap_load() directly on the buffers
supplied by USB peripheral drivers, so this also avoids having to
copy data back and forth before and after transfers. The ehci and
ohci controllers support scatter/gather as long as the buffer is
contiguous in the virtual address space. For uhci the hardware
cannot handle a physical address discontinuity within a USB packet,
so it is necessary to copy small memory fragments at times.
lost one set to a peninsula power failure last night. After
this, I can see both submembers and the raid volumes again,
but speed negotiation is still broken.
Add a mpt_raid_free_mem function to centralize the resource
reclaim and fixed a small memory leak.
Remove restriction on number of targets for systems with IM enabled-
you can have setups that have both IM volumes as well as other devices.
Fix target id selection for passthru and nonpastrhu cases.
Move complete command dumpt to MPT_PRT_DEBUG1 level so that just
setting debug level gets mostly informative albeit less verbose
dumping.
but large parts are rewritten by matk and tanimura.
This is old code, it's not maintained since 2003. We also don't have a
maintainer for this! Yuriy Tsibizov took it and uses it in his emu10kx
driver. Since the emu10kx driver will enter the tree "soon" (some bugs
have to be fixed after Yuriy return from his holidays), I add it here
already.
This also contains some changes to emu10k1 and cmi, so if you're lucky,
you can now make some kind of use of midi with those soundcards.
To all those poor souls which don't have such a card: feel free to send
patches, we don't have a maintainer for this.
To those which miss a specific feature in the midi code: feel free to
submit patches, we don't have a maintainer for this.
Oh, did I already told that it would be nice if someone would take care
of it? Maintainer with midi equipment wanted! :-)
If you get LOR's, submit a PR and notify multimedia@ please. If you get
panics, submit a PR with a backtrace (compile the sound system into your
kernel instead of using modules in this case) and notify multimedia@
please.
Written by: matk, tanimura
Submitted by: "Yuriy Tsibizov" <Yuriy.Tsibizov@gfk.ru>
Based upon: code from NetBSD
but large parts are rewritten by matk and tanimura.
This is old code, it's not maintained since 2003. We also don't have a
maintainer for this! Yuriy Tsibizov took it and uses it in his emu10kx
driver. Since the emu10kx driver will enter the tree "soon" (some bugs
have to be fixed after Yuriy return from his holidays), I add it here
already.
This also contains some changes to emu10k1 and cmi, so if you're lucky,
you can now make some kind of use of midi with those soundcards.
To all those poor souls which don't have such a card: feel free to send
patches, we don't have a maintainer for this.
To those which miss a specific feature in the midi code: feel free to
submit patches, we don't have a maintainer for this.
Oh, did I already told that it would be nice if someone would take care
of it? Maintainer with midi equipment wanted! :-)
If you get LOR's, submit a PR and notify multimedia@ please. If you get
panics, submit a PR with a backtrace (compile the sound system into your
kernel instead of using modules in this case) and notify multimedia@
please.
Written by: matk, tanimura
Submitted by: "Yuriy Tsibizov" <Yuriy.Tsibizov@gfk.ru>
Based upon: code from NetBSD
the LIST checks). Races may lead to list corruption, which can be
difficult to unravel in a post-mortem analysis. These checks verify
that the prev and next pointers are consistent when inserting or
removing elements, thus catching any corruption earlier.
It uses doxygen to generate the API documentation. For each subsystem
a very small (about 20 lines with comments) subsystem specific Doxyfile
has to be written (have a look at the README for more). All common doxygen
options are specified in a separate file.
The framework is configured to not only generate the HTML version, but also
a PDF version (the paper size is hardcoded to DIN A4 currently and depending
on the subsystem you have to increase some limits in the latex configuration
of your system, the README tells more about this).
It also allows cross-references between the subsystems (it generates doxygen
tag files).
Currently the docs are generated in OBJDIR, but this may change after
coordination with doc@. The makefile is prepared to generate/move various
parts of the generated docs to different destinations.
TARGET_ARCH is respected and some env-vars are set for architecture specific
handling of the source (the README tells more).
Subsystems for which docs are generated:
- cam - crypto - dev_pci
- dev_sound - dev_usb - geom
- i4b - kern - libkern
- linux - net80211 - netgraph
- netinet - netinet6 - netipsec
- opencrypto - vm
Requested by: gnn
This used to make syscons switch to vty0 when we entered DDB but this
was lost in the KDB shuffle. We may want to bring it back down the road
but it should be done by calling cn_init_t/cn_term_t instead, possibly
with a flag argument saying "Debugger!"
other timeouts could not happen while suspending, including timeouts
for things like msleep. This caused the system to hang on suspend
when the cbb was enabled, since its suspend path powered down the
socket which used a timeout to wait for it to be done.
APM now creates a thread when it is enabled, and deletes the thread
when it is disabled. This thread takes the place of the timeout by
doing its polling every ~.9s. When the thread is disabled, it will
wakeup early, otherwise it times out and polls the varius things the
old timeout polled (APM events, suspend delays, etc).
This makes my Sony VAIO 505TS suspend/resume correctly when APM is
enabled (ACPI is black listed on my 505TS).
This will likely fix other problems with the suspend path where
drivers would sleep with msleep and/or do other timeouts. Maybe
there's some special case code that would use DELAY while suspending
and msleep otherwise that can be revisited and removed.
This was also tested by glebius@, who pointed out that in the patch I
sent him, I'd forgotten apm_saver.c
MFC After: 3 weeks
sendfile(). This causes sendfile() to use the file descriptor
reference to the socket instead of bumping the socket reference
count, which avoids an additional refcount operation, as well as a
potential expensive socket refcount drop, which can lead to
contention on the accept mutex. This change also has the side
effect of further reducing the number of cases where an in-progress
I/O operation can occur on a socket after close, as using the file
descriptor refcount prevents the socket from closing while in use.
MFC after: 3 months
If B_NOCACHE is set the pages of vm backed buffers will be invalidated.
However clean buffers can be backed by dirty VM pages so invalidating them
can lead to data loss.
Add support for flush dirty page in the data invalidation function
of some network file systems.
This fixes data losses during vnode recycling (and other code paths
using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.
Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@
Reviewed by: tegge@
MFC after: 7 days
stopped before adjusting their priority and setting them on the run
q so they cannot race for resources (pointed out by njl).
While here add a console printf on thread create fails; otherwise
noone may notice (e.g. return value is always 0 and caller has no
way to verify).
Reviewed by: jhb, scottl
MFC after: 2 weeks
client into the kernel by default, and many users won't use NFS,
don't start an extra 4 kernel threads that are unused. Once NFS
becomes active, it will start nfsiod's as it needs them.
We might consider mandating a minimum iod's equal to the number of
active NFS mounts (truncated to some value), which would force some
to remain available without having to create a new one if the file
system is mostly inactive.
PR: 70880
MFC after: 2 weeks
Prodded by: cel
Head nod: peter
Pointed out by: Joe <fbsd_user at a1poweruser dot com>
was done, I believe, to work around some cards having issues in the
suspend case. I think that this helped my Sony VAIO TS505 work better
when it had certain wireless cards in it and I did a apm -z. I've not
tested suspend/resume on other laptops in a long time, so I hope this
doesn't cause greif. Please let me know if it does.
of cases where we didn't take out the lock before setting or clearing
a bit. This apparently can lead to a race at kldunload time (at least
on my Turion64 laptop, never saw it on my Sony Vaio).
This version of scsi_target.c removes all SMP locking until
we have a lock-aware CAM stack. This allows us to use KNOTE
without a panic at least.
It's not yet clear whether target mode is working yet or not.
Discussed with: Scott, Ken, Nate, Justin
return to user space w/o waiting for I/O to complete.
I tried to get several folks who know this code better than me to review it
with no luck. I *do* know that w/o this code, using the SCSI target driver
panics in userret (if it doesn't panic in knote first).
if a process's uid or gid has changed, but the /proc/<PID> directory
itself was also set to mode 0. Assuming this doesn't open any
security holes, open access to the /proc/<PID> directory for users
other than root to read or search the directory.
Reviewed by: des (back in February)
MFC after: 3 weeks
Since tags are kept while packet resides in kernelspace, it's possible to
use other kernel facilities (like netgraph nodes) for altering those tags.
Submitted by: Andrey Elsukov <bu7cher at yandex dot ru>
Submitted by: Vadim Goncharov <vadimnuclight at tpu dot ru>
Approved by: glebius (mentor)
Idea from: OpenBSD PF
MFC after: 1 month
EHCI spec for linking in new qTDs into an asynchronous QH. This
requires that there is a qTD marked as not active and not halted
at the start of the QH's list, and the hardware will know to re-fetch
the qTD on each pass rather than just looking at the overlay qTD:
"The host controller must be able to advance the queue from the
Fetch QH state in order to avoid all hardware/software race
conditions. This simple mechanism allows software to simply link
qTDs to the queue head and activate them, then the host controller
will always find them if/when they are reachable."
This is achieved by keeping an "inactivesqtd" entry on the QH list,
and re-using it each time as the start of the next transfer, and
allocating a new qTD to become the next inactivesqtd. Then a new
transfer can be activated by just setting its "active" flag, which
avoids all the previous messing with overlay qTD state in
ehci_set_qh_qtd().
mimicing the NFS reference implementation.
NFS over TCP does not need fast retransmit timeouts, since network loss
and congestion are managed by the transport (TCP), unlike with NFS over
UDP. A long timeout prevents the unnecessary retransmission of non-
idempotent NFS requests.
Reviewed by: mohans, silby, rees?
Sponsored by: Network Appliance, Incorporated
the estimator to be more easily tuned and maintained.
There should be no functional change except there is now a lower limit
on the retransmit timeout to prevent the client from retransmitting
faster than the server's disks can fill requests, and an upper limit
to prevent the estimator from taking to long to retransmit during a
server outage.
Reviewed by: mohan, kris, silby
Sponsored by: Network Appliance, Incorporated
before starting exploring (4 seconds), and extend the wait period
if new USB buses are attached while waiting.
This works around a problem seen when there is more than one EHCI
controller in the system and you kldload usb.ko after the system
has booted. The problem is that usb.ko contains 3 separate PCI
drivers which get initialised one by one (uhci, ohci, ehci), and
when each driver is initialised, all PCI buses are re-probed after
just the addition of that driver. This means that there can be a
significant delay between the attaching of a companion controller
and the subsequent EHCI attach, so it is possible for the companion
controller's USB 1.x bus to be scanned before the EHCI driver gets
a chance to check if there is really a USB 2.x device connected.
- Rename REG_DL to REG_DLL and REG_DLH.
- Always treat DLL and DLH as two separate 8-bit registers instead of one
16-bit register.
Additionally, remove the probe for the high 4 bits of IER being 0 and don't
assume we can always read/write 0 to/from those bits.
These changes allow uart(4) to drive the UARTs on the Intel XScale PXA255.
Reviewed by: marcel
- Skip PnP devices as some wedge when trying to probe them as C-NET(98)S.
This fix makes le(4) actually work with the C-NET(98)S.
Reviewed by: marius
Tested by: Watanabe Kazuhiro < CQG00620 at nifty dot ne dot jp >
Checking if the queues are empty is not enough for the crypto_proc thread
(it is enough for the crypto_ret_thread), because drivers can be marked
as blocked. In a situation where we have operations related to different
crypto drivers in the queue, it is possible that one driver is marked as
blocked. In this case, the queue will not be empty and we won't wakeup
the crypto_proc thread to execute operations for the others drivers.
Simply setting a global variable to 1 when we goes to sleep and setting
it back to 0 when we wake up is sufficient. The variable is protected
with the queue lock.
Before the change if the thread was working on symmetric operation, we
would send unnecessary wakeup after adding asymmetric operation (when
asym queue was empty) and vice versa.
twice if we call crypto_kinvoke() from crypto_proc thread.
This change also removes unprotected access to cc_kqblocked field
(CRYPTO_Q_LOCK() should be used for protection).
where crypto_invoke() returns ERESTART and before we set cc_qblocked to 1,
crypto_unblock() is called and sets it to 0. This way we mark device as
blocked forever.
Fix it by not setting cc_qblocked in the fast path and by protecting
crypto_invoke() in the crypto_proc thread with CRYPTO_Q_LOCK().
This won't slow things down, because there is no contention - we have
only one crypto thread. Actually it can be slightly faster, because we
save two atomic ops per crypto request.
The fast code path remains lock-less.
Be cognizant as to whether we're running 2KLogin f/w in target mode and
do the appropriate loopid load based upon that.
Do a first cut (seems to work, at least for amd64) at 64 bit target
mode for fibre channel cards. We could probably also do it for SPI
cards, but that's not supported right now.
report this as an allocation failure for the item type. The failure
will be separately recorded with the bucket type. This my eliminate
high mbuf allocation failure counts under some circumstances, which
can be alarming in appearance, but not actually a problem in
practice.
MFC after: 2 weeks
Reported by: ps, Peter J. Blok <pblok at bsd4all dot org>,
OxY <oxy at field dot hu>,
Gabor MICSKO <gmicskoa at szintezis dot hu>
"Why didn't he use SECASVAR_LOCK()/SECASVAR_UNLOCK() macros to
synchronize access to the secasvar structure's fields?" one may ask.
There were two reasons:
1. refcount(9) is faster then mutex(9) synchronization (one atomic
operation instead of two).
2. Those macros are not used now at all, so at some point we may decide
to remove them entirely.
OK'ed by: gnn
MFC after: 2 weeks
- Linux ioctl support, with the other Linux changes MegaCli
will run if you mount linprocfs & linsysfs then set
sysctl compat.linux.osrelease=2.6.12 or similar. This works
on i386. It should work on amd64 but not well tested yet.
StoreLib may or may not work. Remember to kldload mfi_linux.
- Add in AEN (Async Event Notification) support so we can
get messages from the firmware when something happens.
Not all messages are in defined in event detail. Use
event_log to try to figure out what happened.
- Try to implement something like SIGIO for StoreLib. Since
mrmonitor doesn't work right I can't fully test it. StoreLib
works best with the rh9 base. In theory mrmonitor isn't
needed due to native driver support of AEN :-)
Now we can configure and monitor the RAID better.
Submitted by: IronPort Systems.
full, kick the binary blob to force it to complete any pending tx
completions.
- In the watchdog routine, only reset the chip if the blob doesn't complete
any pending tx completions rather than requiring it to complete all of
the pending tx completions.
Submitted by: Nathan Whitehorn <nathanw@uchicago.edu>
MFC after: 2 weeks
a defensive programming measure.
Note that whilst these members are not used by the ip_output()
path, we are passing an instance of struct ip_moptions here
which is declared on the stack (which could be considered a
bad thing).
ip_output() does not consume struct ip_moptions, but in case it
does in future, declare an in_multi vector on the stack too to
behave more like ip_findmoptions() does.
lnc(4) on PC98 and i386. The ISA front-end supports the same non-PNP
network cards as lnc(4) did and additionally a couple of PNP ones.
Like lnc(4), the C-bus front-end of le(4) only supports C-NET(98)S
and is untested due to lack of such hardware, but given that's it's
based on the respective lnc(4) and not too different from the ISA
front-end it should be highly likely to work.
- Remove the descriptions of le(4), which where converted from lnc(4),
from sys/i386/conf/NOTES and sys/pc98/conf/NOTES as there's a common
one in sys/conf/NOTES.
entry to the PCI NICs section so it's in the same spot in all GENERIC
config files.
- Add a note to the description of pcn(4) informing that is has precedence
over le(4).
or SHA512, the blocksize is 128 bytes, not 64 bytes as anywhere else.
The bug also exists in NetBSD, OpenBSD and various other independed
implementations I look at.
- We cannot decide which hash function to use for HMAC based on the key
length, because any HMAC function can use any key length.
To fix it split CRYPTO_SHA2_HMAC into three algorithm:
CRYPTO_SHA2_256_HMAC, CRYPTO_SHA2_384_HMAC and CRYPTO_SHA2_512_HMAC.
Those names are consistent with OpenBSD's naming.
- Remove authsize field from auth_hash structure.
- Allow consumer to define size of hash he wants to receive.
This allows to use HMAC not only for IPsec, where 96 bits MAC is requested.
The size of requested MAC is defined at newsession time in the cri_mlen
field - when 0, entire MAC will be returned.
- Add swcr_authprepare() function which prepares authentication key.
- Allow to provide key for every authentication operation, not only at
newsession time by honoring CRD_F_KEY_EXPLICIT flag.
- Make giving key at newsession time optional - don't try to operate on it
if its NULL.
- Extend COPYBACK()/COPYDATA() macros to handle CRYPTO_BUF_CONTIG buffer
type as well.
- Accept CRYPTO_BUF_IOV buffer type in swcr_authcompute() as we have
cuio_apply() now.
- 16 bits for key length (SW_klen) is more than enough.
Reviewed by: sam
crypto_invoke(). This allows to serve multiple crypto requests in
parallel and not bached requests are served lock-less.
Drivers should not depend on the queue lock beeing held around
crypto_invoke() and if they do, that's an error in the driver - it
should do its own synchronization.
- Don't forget to wakeup the crypto thread when new requests is
queued and only if both symmetric and asymmetric queues are empty.
- Symmetric requests use sessions and there is no way driver can
disappear when there is an active session, so we don't need to check
this, but assert this. This is also safe to not use the driver lock
in this case.
- Assymetric requests don't use sessions, so don't check the driver
in crypto_kinvoke().
- Protect assymetric operation with the driver lock, because if there
is no symmetric session, driver can disappear.
- Don't send assymetric request to the driver if it is marked as
blocked.
- Add an XXX comment, because I don't think migration to another driver
is safe when there are pending requests using freed session.
- Remove 'hint' argument from crypto_kinvoke(), as it serves no purpose.
- Don't hold the driver lock around kprocess method call, instead use
cc_koperations to track number of in-progress requests.
- Cleanup register/unregister code a bit.
- Other small simplifications and cleanups.
Reviewed by: sam
- Implement CUIO_SKIP() macro which is only responsible for skipping the given
number of bytes from iovec list. This allows to avoid duplicating the same
code in three functions.
Reviewed by: sam
actually destroyed. Also move call to knlist_init() into tapcreate(). This
should fix panic described in kern/95357.
PR: kern/95357
No response from: freebsd-current@
MFC after: 3 days
use this ioctl to obtain the list of HCI nodes. User-space application
is expected to preallocate 'ng_btsocket_hci_raw_node_list_names' structure
and set limit in 'num_nodes' field. The 'nodes' field should be allocated
as well and it should have space for at least 'num_nodes' elements.
The SIOC_HCI_RAW_NODE_LIST_NAMES should be issued on bound raw HCI socket.
It does not really really matter what HCI name the socket is bound to, as
long as it is not empty.
MFC after: 1 week
still should return BUS_PROBE_LOW_PRIORITY instead of BUS_PROBE_DEFAULT
in order to give pcn(4) a chance to attach in case it probes after le(4).
- Rearrange the code related to RX interrupt handling so that ownership of
RX descriptors is immediately returned to the NIC after we have copied
the data of the hardware, allowing the NIC to already reuse the descriptor
while we are processing the data in ifp->if_input(). This results in a
small but measurable increase in RX throughput.
As a side-effect, this moves the workaround for the LANCE revision C bug
to am7900.c (still off by default as I doubt we will actually encounter
such an old chip in a machine running FreeBSD) and the workaround for the
bug in the VMware PCnet-PCI emulation to am79000.c, which is now also
only compiled on i386 (resulting in a small increase in RX throughput on
the other platforms).
- Change the RX interrupt handlers so that the descriptor error bits are
only check once in case there was no error instead of twice (inspired
by the NetBSD pcn(4), which additionally predicts the error branch as
false).
- Fix the debugging output of the RX and TX interrupt handlers; while
looping through the descriptors print info about the currently processed
one instead of always the previously last used one; remove pointless
printing of info about the RX descriptor bits after their values were
reset.
- Create the DMA tags used to allocate the memory for the init block,
descriptors and packet buffers with the alignment the respective NIC
actually requires rather than using PAGE_SIZE unconditionally. This might
as well fix the alignment of the memory as it seems we do not inherit
the alignment constraint from the parent DMA tag.
- For the PCI variants double the number of RX descriptors and buffers
from 8 to 16 as this minimizes the number of RX overflows im seeing with
one NIC-mainboard combination. Nevertheless move reporting of overflows
under debugging as they seem unavoidable with some crappy hardware.
- Set the software style of the PCI variants to ILACC rather than PCnet-PCI
as the former is was am79000.c actually implements. Should not make a
difference for this driver though.
- Fix the driver name part in the MODULE_DEPEND of the PCI front-end for
ether.
- Use different device descriptions for PCnet-Home and PCnet-PCI.
- Fix some 0/NULL confusion in lance_get().
- Use bus_addr_t for sc_addr and bus_size_t for sc_memsize as these are
more appropriate than u_long for these.
- Remove the unused LE_DRIVER_NAME macro.
- Add a comment describing why we are taking the LE_HTOLE* etc approach
instead of using byteorder(9) functions directly.
- Improve some comments and fix some wording.
MFC after: 2 weeks
gateways which are unreachable except through the default router. For
example, assuming there is a default route configured, and inserting
a route
"route add 64.102.54.0/24 60.80.1.1"
is currently allowed even when 60.80.1.1 is only reachable through
the default route. However, an error is thrown when this route is
utilized, say,
"ping 64.102.54.1" will return an error
This type of route insertion should be disallowed becasue:
1) Let's say that somehow our code allowed this packet to flow to
the default router, and the default router knows the next hop is
60.80.1.1, then the question is why bother inserting this route in
the 1st place, just simply use the default route.
2) Since we're not talking about source routing here, the default
router could very well choose a different path than using 60.80.1.1
for the next hop, again it defeats the purpose of adding this route.
Reviewed by: ru, gnn, bz
Approved by: andre
Current code does not report link loss correctly - when link goes down,
mii_phy_tick() will notice that with up to mii_anegticks delay.
If link goes up during this delay then link flapping will be unnoticed
by driver.
2) mii_phy_add_media(): initialize sc->mii_anegticks for 10/100 media
3) Use MII_ANEGTICKS/MII_ANEGTICKS_GIGE defines instead of hardcoded values.
Approved by: glebius (mentor)
MFC after: 1 month
mount(2) system call:
* Add cmount hook to fdescfs and pseudofs (and, by extension, procfs and
linprocfs). This (mostly) restores the ability to mount these
filesystems using the old mount(2) system call (see below for the
rest of the fix).
* Remove not-NULL check for the data argument from the mount(2) entry
point. Per the mount(2) man page, it is up to the individual
filesystem being mounted to verify data. Or, in the case of procfs,
etc. the filesystem is free to ignore the data parameter if it does
not use it. Enforcing data to be not-NULL in the mount(2) system call
entry point prevented passing NULL to filesystems which ignored the
data pointer value. Apparently, passing NULL was common practice
in such cases, as even our own mount_std(8) used to do it in the
pre-nmount(2) world.
All userland programs in the tree were converted to nmount(2) long ago,
but I've found at least one external program which broke due to this
(presumably unintentional) mount(2) API change. One could argue that
external programs should also be converted to nmount(2), but then there
isn't much point in keeping the mount(2) interface for backward
compatibility if it isn't backward compatible.
as not connected. In soclose() case rip_detach() will kill inpcb for
us later.
It makes rawconnect regression test do not panic a system.
Reviewed by: rwatson
X-MFC after: with all 1th April inpcb changes
connections and get rid of the flow_id as it is not guaranteed to be stable
some (most?) current implementations seem to just zero it out.
PR: kern/88664
Reported by: jylefort
Submitted by: Joost Bekkers (w/ changes)
Tested by "regisr" <regisrApoboxDcom>
By making the imo_membership array a dynamically allocated vector,
this minimizes disruption to existing IPv4 multicast code. This
change breaks the ABI for the kernel module ip_mroute.ko, and may
cause a small amount of churn for folks working on the IGMPv3 merge.
Previously, sockets were subject to a compile-time limitation on
the number of IPv4 group memberships, which was hard-coded to 20.
The imo_membership relationship, however, is 1:1 with regards to
a tuple of multicast group address and interface address. Users who
ran routing protocols such as OSPF ran into this limitation on machines
with a large system interface tree.
Add a new option, SKYEYE_WORKAROUNDS, which as the name suggests adds
workarounds for things skyeye doesn't simulate. Specifically :
- Use USART0 instead of DBGU as the console, make it not use DMA, and manually provoke an interrupt when we're done in the transmit function.
- Skyeye maintains an internal counter for clock, but apparently there's
no way to access it, so hack the timecounter code to return a value which
is increased at every clock interrupts. This is gross, but I didn't find a
better way to implement timecounters without hacking Skyeye to get the
counter value.
- Force the write-back of PTEs once we're done writing them, even if they
are supposed to be write-through. I don't know why I have to do that.
for nfsclient and nfs4client in order to prevent local root users
from panicing the system.
PR: kern/77463
Submitted by: Wojciech A. Koszek
Reviewed by: cel, rees
MFC after: 2 weeks
Security: Local root users can panic the system at will
divisor. This allows us to set the line speed to the maximum
of 1/4 of the device clock.
o Disable the baudrate generator before programming the line
settings, including baudrate, and enable it afterwards.
was not checked at all. There is only one case when sc_clean_up()
can fail, because of wait_scrn_saver_stop(), but it doesn't hurt
to check anyway.
Reviewed by: rodrigc
Found by: Coverity Prevent
seperately. Also use pfil hook/unhook instead of keeping the check
functions in pfil just to return there based on the sysctl. While here fix
some whitespace on a nearby SYSCTL_ macro.
When porting FreeBSD to a new platform, one of the more useful things to do is
get mi_startup() to let you know which SYSINIT it's up to. Most people tend to
whack a printf in the SYSINIT loop to print the address of the function it's
about to call. Going one better, jhb made a version that uses DDB to look up
the name of the function and print that instead. This version is essentially
his with the addition of some ifdeffery to make it optional and to allow it to
work (although using only the function address, not the symbol) if you forgot
to enable DDB.
All the cool bits by: jhb
Approved by: scottl, rink, cognet, imp
Remove an unnecessary check of the table's bus clock. CPUs that
support this feature export only the high/low settings via the MSR,
packed into 32 bits.
Hardware from: Centaur Technologies
MFC after: 1 week
vm_ksubmap_init() calls pmap_copy_page(), which uses the mini data cache
to do the copy, but we're running uncaching before cpu_setup().
For some reason it hasn't been a problem so far, but it is for the
PXA255.
Spotted out by: benno
the linux module, since it is not cross-platform
- move linprocfs from "files" and "options" to architecture specific files,
since it only makes sense to build this for those architectures, where we
also have a linuxolator
- disable the build of the linuxolator on our tier-2 architecture "Alpha":
* we don't have a linux_base port which supports Alpha and at the
same time is not outdated/obsoleted upstream/in a good condition/
currently working
* the upcomming new default linux base port is based upon Fedora
Core 3 (security support via http://www.fedoralegacy.org), which
isn't available for Alpha (like the current default linux base
port which is based upon Red Hat 8)
* nobody answered my request for testing it ~1 month ago on
current@ and alpha@ (it doesn't surprises me, see above)
* a SoC student wouldn't have to waste time on something which
nobody is willing to test
This does not remove the alpha specific MD files of the linuxolator yet.
Discussed on: arch (mostly silence)
Spiritual support by: scottl
If the embedded controller exists before the sysresource devices, for
example, it will be attached first. Instead, let the normal device
order function work as we first desired. [1]
There still remained a problem where we couldn't allocate resources in
acpi0 that were passed up by the sysresource pseudo-devices. These
devices had to probe/attach first to give their resources to acpi, then
acpi would allocate them before probing/attaching other devices. To
work around this, we attach them from acpi_sysres_alloc(). A better
approach would be to implement multi-pass probe/attach in newbus but
that's a much bigger task.
Suggested by: jhb [1]
Hardware from: Centaur Technologies
MFC after: 1 week
to ensure we match the type signature; we cannot assume HAL_BUS_TAG
and HAL_BUS_HANDLE correspond to bus_space_tag_t and bus_space_handle_t
(should probably do this for HAL_SOFTC too but leave that for now)
MFC after: 1 month
assuming them to be inflight write buffers. This is not always the case.
bufdaemon might hold the buffer lock and give up writing the buffer due to it
having dependencies, the file system being suspended or the vnode lock being
held by another thread. When bufdaemon decides to write the buffer there is
still a window before bufobj_wref() has been called, allowing other threads to
believe that the vnode has no dirty buffers or inflight writes.
Try harder to flush first block of new subdirectory to get rid of MKDIR_BODY
dependency.
for signicantly optimized UDP socket I/O when using a single UDP
socket from many threads or processes that share it, by avoiding
significant locking and other overhead in the general sosend()
path that isn't necessary for simple datagram sockets. Specifically,
this change results in a significant performance improvement for
threaded name service in BIND9 under load.
Suggested by: Jinmei_Tatsuya at isc dot org
Add back in a scheme to emulate old type major/minor numbers via hooks into
stat, linprocfs to return major/minors that Linux app's expect. Currently
only /dev/null is always registered. Drivers can register via the Linux
type shim similar to the ioctl shim but by using
linux_device_register_handler/linux_device_unregister_handler functions.
The structure is:
struct linux_device_handler {
char *bsd_driver_name;
char *linux_driver_name;
char *bsd_device_name;
char *linux_device_name;
int linux_major;
int linux_minor;
int linux_char_device;
};
Linprocfs uses this to display the major number of the driver. The
soon to be available linsysfs will use it to fill in the driver name.
Linux_stat uses it to translate the major/minor into Linux type values.
Note major numbers are dynamically assigned via passing in a -1 for
the major number so we don't need to keep track of them.
This is somewhat needed due to us switching to our devfs. MegaCli
will not run until I add in the linsysfs and mfi Linux compat changes.
Sponsored by: IronPort Systems
after ipsec4_output processing else KAME IPSec using the handbook
configuration with gif(4) will panic the kernel.
Problem reported by: t. patterson <tp lot.org>
Tested by: t. patterson <tp lot.org>
return NULL. In principle this shouldn't change the behavior, but
avoids returning a potentially invalid/inappropriate pointer to
the caller.
Found with: Coverity Prevent (tm)
Submitted by: pjd
MFC after: 3 months
functions not yet asserting it but working on global ip6_forward_rt
route cache which is not locked and perhaps should go away in the
future though cache hit/miss ration wasn't bad.
It's #if 0ed in frag6 because the code working on ip6_forward_rt is.
destination. These checks are needed so we do not install
a route looking like this:
(0) 192.0.2.200 UH tun0 =>
When removing this route the kernel will start to walk
the address space which looks like a hang on 64bit platforms
because it'll take ages while on 32bit you should see a panic
when kernel debugging options are turned on.
The problem is in rtrequest1:
if (netmask) {
rt_maskedcopy(dst, ndst, netmask);
} else
bcopy(dst, ndst, dst->sa_len);
In both cases the len might be 0 if the application forgot to
set it. If so ndst will be all-zero leading to above
mentioned strange routes.
This is an application error but we must not fail/hang/panic
because of this.
Looks ok: gnn
No objections: net@ (silence)
MFC after: 8 weeks
two places where g_io_request() is called. g_io_request() can free bio
structure so we can't reference it after and G_RAID3_FOREACH_BIO() macro
was doing this.
Found by: Coverity Prevent analysis tool (with my new models)
MFC after: 1 day
- Fix bfe_encap so that it will pass the address of the mbuf back up to its
caller if/when it modifies it, as it does when doing a m_defrag on a mbuf chain.
- Make sure to unload the dmamap for ALL fragments of a packet, not just the first
- Use BUS_DMA_NOWAIT for all bus_dmamap_load calls so that the allocation of the
map is not delayed - this driver is not set up to handle such delays.
- Reduce the number of RX and TX buffers bfe uses so that it does not use more
bounce buffers than busdma is willing to allow it to use
With these changes, the driver now works properly for a user with a 2GB system,
and it also works on my system when the acceptable address range is lowered to 128MB.
Previously, both of these setups would act up after a few minutes of activity.
from the amr_linux. This simplifies the amr_linux shim and puts the
smarts into amr.c.
I tested this with 2 amr controllers in one box. It seems to work
okay with them.
selection and not always beeping on startup. The two bytes for the extra
'jmp' instruction were obtained by removing recognition of BSD/OS
partitions.
Requested by: many
Tested by: subset of many
Head nod: imp, keramida
MFC after: 2 weeks
conformance with the mbuf and uio load routines. ENOMEM can only happen
with BUS_DMA_NOWAIT is passed in, thus the deferals are disabled. I don't
like doing this, but fixing this fixes assumptions in other important drivers,
which is a net benefit for now.
same time as it is changed back into a normal file. The locker would
get the shared "snaplk" lock which would no longer be the correct lock
for the vnode.
into its own function, udp6_append(). This mirrors a similar structure
in udp_input() and udp_append(), and makes the whole thing a lot more
readable.
While here, add missing inpcb locking in UDP6 input path.
Reviewed by: bz
MFC after: 3 months
to the unused kva in the pv memory block to thread a freelist through.
This allows us to free pages that used to be used for pv entry chunks
since we can now track holes in the kva memory block.
Idea from: ups
vn_start_write() is always called earlier in the code path and calling
the function recursively may lead to a deadlock.
Confirmed by: tegge
MFC after: 2 weeks
The packet filter may reassemble the ip fragments and return a packet that is
larger than the MTU of the sending interface. There is no check for DF or icmp
replies as we can only get a large packet to fragment by reassembling a
previous fragment, and this only happens after a call to pfil(9).
Obtained from: OpenBSD (mostly)
Glanced at by: mlaier
MFC after: 1 month
vn_finished_write() should also be called only then.
BTW. I fixed two functions here: vn_rdwr() and vn_write(). The latter seems
to be unused.
MFC after: 3 weeks
o Properly use rman(9) to manage resources. This eliminates the
need to puc-specific hacks to rman. It also allows devinfo(8)
to be used to find out the specific assignment of resources to
serial/parallel ports.
o Compress the PCI device "database" by optimizing for the common
case and to use a procedural interface to handle the exceptions.
The procedural interface also generalizes the need to setup the
hardware (program chipsets, program clock frequencies).
o Eliminate the need for PUC_FASTINTR. Serdev devices are fast by
default and non-serdev devices are handled by the bus.
o Use the serdev I/F to collect interrupt status and to handle
interrupts across ports in priority order.
o Sync the PCI device configuration to include devices found in
NetBSD and not yet merged to FreeBSD.
o Add support for Quatech 2, 4 and 8 port UARTs.
o Add support for a couple dozen Timedia serial cards as found
in Linux.
OS dependent layer. Thus, the watchdog timer can go off when the tx
engine is working fine but the OS dependent layer just hasn't been called
to cleanup finished tx transactions. To workaround this, when the watchdog
fires, poke the binary blob to force it to flush any pending tx
completions. If this drops the pending tx count to zero then just return
without logging a message or resetting the chip.
This reportedly fixes the 'device timeout()' errors with at least several
NF4 nve(4) parts.
Submitted by: Nathan Alexander Whitehorn <nathanw@uchicago.edu> (code)
Submitted by: dg (inspiration for comment and explanation)
MFC after: 1 week
stations that are associated by making ieee80211_find_txnode return
NULL when a unicast frame is to be delivered to an unassociated
station. This will be handled differently in the future but for
now putting the check here allows all drivers to immediately do
the right thing.
Reviewed by: avatar
MFC after: 1 week
Remove the code to dyanmically change the pv_entry limits. Go back
to a single fixed kva reservation for pv entries, like was done
before when using the uma zone. Go back to never freeing pages
back to the free pool after they are no longer used, just like
before.
This stops the lock order reversal due to aquiring the kernel map
lock while pmap was locked.
This fixes the recursive panic if invariants are enabled.
The problem was that allocating/freeing kva causes vm_map_entry
nodes to be allocated/freed. That can recurse back into pmap as
new pages are hooked up to kvm and hence all the problem.
Allocating/freeing kva indirectly allocate/frees memory.
So, by going back to a single fixed size kva block and an index,
we avoid the recursion panics and the LOR.
The problem is that now with a linear block of kva, we have no
mechanism to track holes once pages are freed. UMA has the same
problem when using custom object for a zone and a fixed reservation
of kva. Simple solutions like having a bitmap would work, but would
be very inefficient when there are hundreds of thousands of bits
in the map. A first-free pointer is similarly flawed because pages
can be freed at random and the first-free pointer would be rewinding
huge amounts. If we could allocate memory for tree strucures or
an external freelist, that would work. Except we cannot allocate/free
memory here because we cannot allocate/free address space to use
it in. Anyway, my change here reverts back to the UMA behavior of
not freeing pages for now, thereby avoiding holes in the map.
ups@ had a truely evil idea that I'll investigate. It should allow
freeing unused pages again by giving us a no-cost way to track the
holes in the kva block. But in the meantime, this should get people
booting with witness and/or invariants again.
Footnote: amd64 doesn't have this problem because of the direct map
access method. I'd done all my witness/invariants testing there. I'd
never considered that the harmless-looking kmem_alloc/kmem_free calls
would cause such a problem and it didn't show up on the boot test.
- Prevent possible live-lock in case of memory problems by freeing
already completed requests first.
Reported and tested by: markus, Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after: 1 day
- Comment possible event miss, which isn't critical, but probably can be
fixed by replacing the event lock usage with the queue lock.
MFC after: 2 weeks
POSTWRITE to POSTREAD.
No guarantee that all busdma is usage is perfect, but this change (in
addition to scott's last two commits) makes if_bfe work with > 1GB of
memory in my laptop.
OpenBSD changes. With these changes, PHY part of the driver becomes
functional (it senses media changes and negotiates speed just fine),
previously it just hang with no PHY message, but no data goes through
interface (error message is "can not stop transfer of Tx/Rx descriptor).
Hopefully somebody with more clue/free time will be able to pick up
after me.
buffers to go on the buf daemon's DIRTYGIANT queue.
- Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler
runs in the context of the buf daemon and may require Giant.
than trying to optimize it into a single lock. This adds more calls to
lock giant with non smpsafe filesystems but is the only way to reliably
hold the correct lock.
- Remove an invalid assert in the mountedhere case in lookup and fix the
code to properly deal with the scenario. We can actually have a lookup
that returns dp == dvp with mountedhere set with certain unmount races.
Tested by: kris
Reported by: kris/mohans
Changelog towards if_iwi.c 1.26 (some changes have been committed separately
in the mean time):
- add led support
- add firmware loading on demand
- auto-restart firmware when it crashes
- serialize operations sent to the firmware to reduce firmware crashes
- add power save operation support
- remove incorrect specification of tx power control capability
- add radio on/off switch support
- improve net80211 state machine operation
- recognize and handle beacon miss
- handle authentication and association failures better
- add shared key authentication
- fix ibss mode (many changes)
- fix wme (many changes)
- correct radiotap support (many changes)
- correct bus dma setup of s/g
- correct various locking issues
- fix monitor mode
- fix scanning (many changes)
- recover from wedged scan requests
- respect active channel list
- eliminate cases where interface was marked down on error
- don't treat parity errors as fatal
- reclaim mgt frames immediately from tx queue
- correct interrupt handling, ack early (from NetBSD)
- fix short/long preamble handling
Committed with RELENG_6 compat #if's, should compile in RELENG_6. Requires
net/iwi-firmware-kmod to function.
Much work done by: sam
Tested by: many (freebsd-net), ume, luigi
MFC after: 4 weeks
entry (PTE) have the same meaning. The exception to this rule is the
eighth bit (0x080). It is the PS bit in a PDE and the PAT bit in a
PTE. This change avoids the possibility that pmap_enter() confuses a
PAT bit with a PS bit, avoiding a panic().
Eliminate a diagnostic printf() from the i386 pmap_enter() that serves
no current purpose, i.e., I've seen no bug reports in the last two
years that are helped by this printf().
Reviewed by: jhb
This driver was generously developed and donated by Highpoint.
It is enabled for i386 only at the moment. I will enable it for amd64
shortly.
Obtained from: HighPoint Technologies, Inc.
- MPSAFE. No more recursive lock required.
- bus_dma(9) conversion. I think it should work on all architectures.
- optimized Rx handler for each normal and jumbo frames. Previously
sk(4) used jumbo frame management code to handle normal sized
frames. As the handler needs an additional lock to protect jumbo
frame management structure from races, it used two lock operations
for each received packet. Now sk(4) uses single lock operation for
normal frame.(Jumbo frame still needs two lock operations as before.)
The hardware supports DMA scatter operations for Rx descriptors such
that it's possible to take advantagee of m_cljget(9) for jumbo frames.
However, due to a unknown reasons it resulted in poor performance on
sparc64. So I dropped m_cljget(9) approach. This should be revisited
since it would reduce one lock operation for jumbo frame handling.
- Tx TCP/Rx IP checksum offload support. According to the data sheet
of SK-NET GENESIS the hardware supports Rx IP/TCP/UDP offload.
But I couldn't make it work on my Yukon hardware. So Rx TCP/UDP was
disabled at the moment. It seems that newer Yukon chips can support
Tx UDP checksum offload too. But I need more documentation first.
- Added more wait time in reading VPD data. It seems that ASUS LOM
takes a very long time to respond VPD read signal.
- Added an additional lock for MII register access callbacks.
- Added more strict received packet validation routine. Previously it
passed corrupted packets to upper layers under certain conditions.
- A new function sk_yukon_tick() to handle auto-negotiation properly.
- Interrupt handler now checks shared interrupt source and protects
the interrupt handler from NULL pointer dereference which was caused
by odd status word value. The status word can returns 0xffffffff if
cable is unplugged while Rx/Tx/auto-negotiation is in progress.
- suspend/resume support(not tested).
- Added Rx/Tx FIFO flush routine for Yukon
- Activate Tx descriptor poll timer in order to protect possible loss
of SK_TXBMU_TX_START command. Previously the driver continuously issued
SK_TXBMU_TX_START when it notices pending Tx descriptors not processed
yet in interrupt handler. That approach would add additional PCI
write access overhead under high Tx load situations and it might fail
if the first SK_TXBMU_TX_START was lost and no interrupt is generated
from the first SK_TXBMU_TX_START command.
- s/printf/if_printf/, s/printf/device_printf/, Axe sk_unit in softc.
- Setting multicast/station address is now safe on strict-alignment
architectures.
- Fix long standing bug in VLAN header length setup.
- Added/corrected register definitions for Yukon.
(Register information from Linux skge driver.)
- Added Rx status definition for Marvell Yukon/XaQti XMAC.
(Rx status register information from Linux skge driver.)
- Update if_oerrors if we encounter watchdog error.
- callout(9) conversion
Special thanks to jkim who let me know RX status differences between
Yukon and XaQti XMAC.
It seems that there is still occasional watchdog timeout error but I
couldn't reproduce it and need more information to analyze it from
users.
Tested by: bz(amd64), me(i386, sparc64), current ML
Frank Behrens frank ! pinky ( sax $ de
(i.e. no keyboard controller present), try two other common methods for
resetting i386 machine - pci reset and port 0x92 fast reset. Only if neither
works warn user and resort to "unmap entire address space and hope for good"
hack. This makes my MacBook Pro rebooting just fine and should also help
other legacy-free hardware out there.
Also, disable interrupts unconditionally in cpu_reset_real(), since we don't
want any interference.
MFC after: 1 week
Lower the minimum for memory mapped I/O from 32 bytes to 16 bytes.
This fixes bus enumeration on ia64 now that the Diva auxiliary
serial port is attached to.
per page = effectively 12.19 bytes per pv entry after overheads).
Instead of using a shared UMA zone for 24 byte pv entries (two 8-byte tailq
nodes, a 4 byte pointer, and a 4 byte address), we allocate a page at a
time per process. This provides 336 pv entries per process (actually, per
pmap address space) and eliminates one of the 8-byte tailq entries since
we now can track per-process pv entries implicitly. The pointer to
the pmap can be eliminated by doing address arithmetic to find the metadata
on the page headers to find a single pointer shared by all 336 entries.
There is an 11-int bitmap for the freelist of those 336 entries.
This is mostly a mechanical conversion from amd64, except:
* i386 has to allocate kvm and map the pages, amd64 has them outside of kvm
* native word size is smaller, so bitmaps etc become 32 bit instead of 64
* no dump_add_page() etc stuff because they are in kvm always.
* various pmap internals tweaks because pmap uses direct map on amd64 but
on i386 it has to use sched_pin and temporary mappings.
Also, sysctl vm.pmap.pv_entry_max and vm.pmap.shpgperproc are now
dynamic sysctls. Like on amd64, i386 can now tune the pv entry limits
without a recompile or reboot.
This is important because of the following scenario. If you have a 1GB
file (262144 pages) mmap()ed into 50 processes, that requires 13 million
pv entries. At 24 bytes per pv entry, that is 314MB of ram and kvm, while
at 12 bytes it is 157MB. A 157MB saving is significant.
Test-run by: scottl (Thanks!)
channel number since we're not ready at the net80211 layer to deal with them;
note this mapping has to match what's done in ieee80211_mhz2ieee
MFC after: 3 days
pointer prototypes from it into their own typedefs. No functional or
ABI change. This allows policies to declare their own function
prototypes based on a common definition from mac_policy.h rather than
duplicating these definitions.
Obtained from: SEDarwin, SPARTA
MFC after: 1 month
controller as we use in boot blocks (querying status register until
bit 1 goes off). If that doesn't happed during reasonable period assume
that the hardware doesn't have AT-style keyboard controller. This makes
FreeBSD working almost OOB on MacBook Pro (still there are issues with
putting second CPU core on-line, but since installation CD comes with
UP kernel with this change one should be able to install FreeBSD without
playing tricks with hints). Other legacy-free hardware (e.g. IBM NetVista
S40) should benefit from this as well, but since I don't have any I can't
verify.
It should make no difference on the ordinary i386 hardware (since in
that case that hardware already would be having an issues with A20
routines in boot blocks). I don't know much about AT-style keyboard
controller on other platforms (and don't have dedicated access to one),
therefore, the code is restricted to i386 for now. I suspect that amd64
may need this as well, but I would rather leave this decision to someone
who knows better about the platform(s) in question.
I have tested this change on as many "ordinary i386 boxes" as I can get
my hands on, and it doesn't create any false negatives on hardware with
AT-style keyboard present.
MFC after: 1 month
now back to using fixed-size columns for output and each line of output
should fit in 80 columns on both 32-bit and 64-bit architectures. In
general the output is close to that of the userland ps(1) with the
exception that the 'wmesg' field is mostly similar to the "state" field
in top(1) in that it will show either a wmesg, a lock name (prefixed with
an *), "CPU xx" (for a running thread), or nothing if none of those three
conditions are true. It also respects td_name when listing threads in
a multithreaded process. There is a somewhat evilly-defined PTR64 macro
I use to make account for the change in the size of the 'wchan' column
in the formatted output (wchan is now the only pointer in the ps output
and is available so it can be passed to 'show sleepq', 'show turnstile',
or 'show lock').
- Add two new commands "show proc [process]" and "show thread [thread]"
that show details about the specified process or thread (specified
either by pid/tid or pointer), respectively. If an address it not
specified, it uses the current kdb thread.
problems in ddb:
- "show threadchain [thread]" will start with the specified thread (or the
current kdb thread by default) and show it's state. If it is blocked on
a lock, it will find the owner of the lock and show its state, etc.
- "show allchains" will find all of the threads that are blocked on a
lock (but do not have any threads blocked on a lock they hold) and show
the resulting thread chain.
- "show lockchain <lock>" takes a pointer to a lock_object (such as a
mutex or rwlock). If there is a turnstile for that lock, then it will
display all the threads blocked on the lock. In addition, for each
thread blocked on the lock, it will display any contested locks they
hold, and recurse on those locks to show any threads blocked on those
locks, etc.
take the addr value passed to a ddb command and attempt to use it to
lookup a struct thread * or struct proc *, respectively. Each function
first reparses the passed in value as if it was an ID entered in base 10.
For threads the ID is treated as a thread ID, for proceses the ID is
treated as a PID. If a thread or proc matching the ID is found, it is
returned. For db_lookup_thread(), if the check_pid argument is true and
it didn't find a thread with a matching thread ID, it will treat the ID as
a PID and look for a matching process. If it finds one it returns the
first thread in the process. If none of the ID lookups succeeded, then
the functions assume that the passed in address is a thread or proc
pointer, respectively. This allows one to use tids, pids, or structure
pointers interchangeably in ddb functions that want to lookup threads or
processes if desired.
sampling_interval) fields in netflow v5 header. We do not use
them but some netflow tools show garbage.
PR: kern/96296
Submitted by: David Duchscher
Approved by: glebius
MFC after: 1 week
the fact that the loop through inpcb's in udp_input() tracks the
last inpcb while looping. We keep that name in the calling loop
but not in the delivery routine itself.
MFC after: 3 months
This allows one to change the behavior of the driver pre-boot.
NOTE: This patch was made for DragonFly BSD by Sepherosa Ziehau.
PR: kern/94833
Submitted by: Devon H. O'Dell
Obtained from: DragonFly
MFC after: 1 month
even if we're going to return an argument-based error.
Assert pcbinfo lock in in6_pcblookup_local(), in6_pcblookup_hash(), since
they walk pcbinfo inpcb lists.
Assert inpcb and pcbinfo locks in in6_pcbsetport(), since
port reservations are changing.
MFC after: 3 months
file lock, in the style of fgetsock().
Modify accept1() to use getsock() instead of fgetsock(), relying on the
file descriptor reference rather than an acquired socket reference to
prevent the listen socket from being destroyed during accept(). This
avoids additional reference count operations, which should improve
performance, and also avoids accept1() operating on a socket whose file
descriptor has been torn down, which may have resulted in protocol
shutdown starting.
MFC after: 3 months
into in_pcbdrop(). Expand logic to detach the inpcb from its bound
address/port so that dropping a TCP connection releases the inpcb resource
reservation, which since the introduction of socket/pcb reference count
updates, has been persisting until the socket closed rather than being
released implicitly due to prior freeing of the inpcb on TCP drop.
MFC after: 3 months
end for isa(4).
o Add a seperate bus frontend for acpi(4) and allow ISA DMA for
it when ISA is configured in the kernel. This allows acpi(4)
attachments in non-ISA configurations, as is possible for ia64.
o Add a seperate bus frontend for pci(4) and detect known single
port parallel cards.
o Merge PC98 specific changes under pc98/cbus into the MI driver.
The changes are minor enough for conditional compilation and
in this form invites better abstraction.
o Have ppc(4) usabled on all platforms, now that ISA specifics
are untangled enough.
caches are dangerous" to "a shared L1 data cache is dangerous". This
is a compromise between paranoia and performance: Unlike the L1 cache,
nobody has publicly demonstrated a cryptographic side channel which
exploits the L2 cache -- this is harder due to the larger size, lower
bandwidth, and greater associativity -- and prohibiting shared L2
caches turns Intel Core Duo processors into Intel Core Solo processors.
As before, the 'machdep.hyperthreading_allowed' sysctl will allow even
the L1 data cache to be shared.
Discussed with: jhb, scottl
Security: See FreeBSD-SA-05:09.htt for background material.
common pcb tear-down logic into tcp_detach(), which is called from
either. Invoke tcp_drop() from the tcp_usr_abort() path rather than
tcp_disconnect(), as we want to drop it immediately not perform a
FIN sequence. This is one reason why some people were experiencing
panics in sodealloc(), as the netisr and aborting thread were
simultaneously trying to tear down the socket. This bug could often
be reproduced using repeated runs of the listenclose regression test.
MFC after: 3 months
PR: 96090
Reported by: Peter Kostouros <kpeter at melbpc dot org dot au>, kris
Tested by: Peter Kostouros <kpeter at melbpc dot org dot au>, kris
subject: ranges of uid, ranges of gid, jail id
objects: ranges of uid, ranges of gid, filesystem,
object is suid, object is sgid, object matches subject uid/gid
object type
We can also negate individual conditions. The ruleset language is
a superset of the previous language, so old rules should continue
to work.
These changes require a change to the API between libugidfw and the
mac_bsdextended module. Add a version number, so we can tell if
we're running mismatched versions.
Update man pages to reflect changes, add extra test cases to
test_ugidfw.c and add a shell script that checks that the the
module seems to do what we expect.
Suggestions from: rwatson, trhodes
Reviewed by: trhodes
MFC after: 2 months
the same number or fewer lines of code.
Don't cast using caddr_t.
Remember to unlock the natm lock in some error cases where it was leaked
previously.
Annotate two cases where we'd like to hold the natm subsystem lock over
ioctls into the device driver.
Hold the natm lock longer in natm_usr_connect() so we can copy the npcb
fields while holding the mutex.
MFC after: 3 months
mutex is no longer required to ensure that so_pcb is valid.
Make sure to free (control) in natm_usr_send() when there M_PREPEND()
frees (m).
MFC after: 3 months
function along with the remainder of the reference checking code. Move
comment from body to header with remainder of comments. Inclusion of a
socket in a completed connection queue counts as a true reference, and
should not be handled as an under-documented edge case.
MFC after: 3 months
- Depend on opt_ddb.h, since npcb_dump() is ifdef'd DDB.
- Include ddb/ddb.h so we can call db_printf() and use DB_SHOW_COMMAND().
- Don't test results of malloc() under DIAGNOSTIC, let the memory allocator
take care of its own invariants.
MFC after: 1 month
list head structure; this improves congruence to IPv4, and also allows
in6_pcbpurgeif0() to lock the pcbinfo. Modify in6_pcbpurgeif0() to lock
the pcbinfo before iterating the pcb list, use queue(9)'s LIST_FOREACH()
for the iteration, and to lock individual inpcb's while manipulating
them.
MFC after: 3 months
date: 2006/04/12 04:22:50; author: alc; state: Exp; lines: +14 -41
Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.
Reviewed by: tegge
number state, rather than re-using pcbinfo. This introduces some
additional mutex operations during isn query, but avoids hitting the TCP
pcbinfo lock out of yet another frequently firing TCP timer.
MFC after: 3 months
holding the inpcb lock is sufficient to prevent races in reading
the address and port, as both the inpcb lock and pcbinfo lock are
required to change the address/port.
Improve consistency of spelling in assertions about inp != NULL.
MFC after: 3 months
A slight difference of this chip from its previous siblings is that
it need a gentle "wake up" on every (full) DMA buffer completion to
avoid stalled interrupt handler.
Thanks to George Hartzell for permission on doing remote debugging.
Prime MFC candidate for 6.1-RELEASE. Please reply to this commit if
there are any objections (so I won't bug re@), since the changes
are too small and only specific to VT8251.
PR: i386/95949
Tested by: [1] George Hartzel
myself (remotely)
MFC after: 3 days
[1] http://lists.freebsd.org/pipermail/freebsd-multimedia/2006-April/004003.html
Pull in some target mode changes from a private branch.
Pull in some more RELENG_4 compilation changes.
A lot of lines changed, but not much content change yet.
Make this compile, assuming that you have linux installed in a
sensible place. tag_list is disabled by default, since we don't
distribute linux, but it is desirable to allow the boot loader to boot
Linux or FreeBSD (mostly for testing).
enables multilabel, or any option for that matter, most likely they have
a reason. This will allow users to see that mulilabel is enabled via an
issued "mount" command and remove an annoying warning - printed only when
a MAC kernel is not installed - on boot up.
Discussed with: green, brueffer, Samy Al Bahra.
Probably ran past: csjp (though I can't remember).
xmodem download. Then download the image you want in the flash.
This will burn the image into the flash. You must then reset the
unit and the new flash image will be used for booting...
xmodem download. Then download the image you want in the eeprom.
This will burn the image into the eeprom. You must then reset the
unit and the new eeprom image will be used for booting...
Major differences:
* since there is no direct map region, there is no custom uma memory
allocator to modify to include its pages in the dumps.
* Various data entries are reduced from 64 bit to 32 bit to match the
native size.
dump_add_page() and dump_drop_page() are still present in case one wants to
arrange for arbitary pages to be dumped. This is of marginal use though
because libkvm+kgdb cannot address physical memory that isn't mapped into
kvm.
via the debug.minidump sysctl and tunable.
Traditional dumps store all physical memory. This was once a good thing
when machines had a maximum of 64M of ram and 1GB of kvm. These days,
machines often have many gigabytes of ram and a smaller amount of kvm.
libkvm+kgdb don't have a way to access physical ram that is not mapped
into kvm at the time of the crash dump, so the extra ram being dumped
is mostly wasted.
Minidumps invert the process. Instead of dumping physical memory in
in order to guarantee that all of kvm's backing is dumped, minidumps
instead dump only memory that is actively mapped into kvm.
amd64 has a direct map region that things like UMA use. Obviously we
cannot dump all of the direct map region because that is effectively
an old style all-physical-memory dump. Instead, introduce a bitmap
and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that
allow certain critical direct map pages to be included in the dump.
uma_machdep.c's allocator is the intended consumer.
Dumps are a custom format. At the very beginning of the file is a header,
then a copy of the message buffer, then the bitmap of pages present in
the dump, then the final level of the kvm page table trees (2MB mappings
are expanded into a 4K page mappings), then the sparse physical pages
according to the bitmap. libkvm can now conveniently access the kvm
page table entries.
Booting my test 8GB machine, forcing it into ddb and forcing a dump
leads to a 48MB minidump. While this is a best case, I expect minidumps
to be in the 100MB-500MB range. Obviously, never larger than physical
memory of course.
minidumps are on by default. It would want be necessary to turn them off
if it was necessary to debug corrupt kernel page table management as that
would mess up minidumps as well.
Both minidumps and regular dumps are supported on the same machine.
o Use a directory layout that is more akin to the i386 boot layout.
o Create a libat91 for library routines that are used by one or more
of the boot loaders.
o Create bootiic for booting from an iic part.
o Create bootspi for booting from an spi part.
o Optimize the size of many of these routines (especially emac.c). Except
for the emac.c optimizations, all these have been tested.
o eliminate the inc directory, libat91 superceeds it.
o Move linker.cfg up a layer to allow it to be shared.
state structure. This field is only for CCBs that are associated with
actions that are occurring on the HBA (i.e., XPT_CONT_IO actions).
This way we also don't get confused when the upstream listener stalls
try and look at a CCB which has already been freed (by CAM).
to reduce the pv_entry_count counter. This was found by Tor Egge. In the
same email, Tor also pointed out the pv_stats problem in the previous
commit, but I'd forgotten about it until I went looking for this email
about this allocation problem.
locked. In general the adaptive spinning is similar to the same code
for mutexes with some extra trickiness in rw_wunlock_hard(). Specifically,
even though both wait bits might be set and we might have a turnstile with
at least one waiting thread, there might not be any threads blocked on the
queue we are not waking up (they might all be spinning), and we should
only preserve the waiting flag for the queue we aren't waking up if there
are in fact threads blocked on that queue. Secondly, there might not be
any threads blocked on the queue we have chosen to waken threads from
(there might only be threads blocked on the other queue and the threads
for this queue are all spinning) in which case we disown the turnstile
instead of doing a braodcast and unpend.
stored in metadata instead of an offset in single disk.
After reboot/crash synchronization process started from a wrong offset
skipping (not synchronizing) part of the component which can lead to data
corrutpion (when synchronization process was interrupted on initial
synchronization) or other strange situations like 'graid3 status' showing
value more than 100%.
Reported, reviewed and tested by: ru
Reported by: Dmitry Morozovsky <marck@rinet.ru>
MFC after: 1 day
as pcf_ebus and pcf_isa, they should probably be fixed back to pcf),
and bti2c doesn't exist, bktr has smbus or iicbb as children..
Brought to you by: http://people.FreeBSD.org/~jmg/driver.pdf
use it in places that only care about the write owner instead of
rw_owner() as a baby step towards limited read-lock owner.
- Tidy the code that sets the WAITER flag bits to not duplicate a test
around the atomic operation and the KTR trace in both of the lock
functions.
above what's used for fast interrupts, only interrupts with the level of
the interrupt which led to calling intr_fast() (which is used with both
fast and ithread interrupts) are blocked while in that function. Thus
intr_fast() can be preempted by a fast interrupt (which are of a higher
level than ithread interrupts) while servicing an ithread interrupt. This
can lead to a stale pointer to the head of the active interrupt requests
list when back in the ithread interrupt invocation of intr_fast(), in turn
resulting in corruption of the interrupt request lists and consequently
in a panic. Solve this be turning off interrupts in intr_fast() before
reading the pointer to the head of the active list rather than after. [1]
- Add a KASSERT in intr_fast() which asserts that ir_func is non-zero before
calling it. [1]
- Increment interrupt stats after calling the handlers rather than before.
This reduces the delay until direct and fast handlers are serviced, in my
testings by 30% on average for the direct tick interrupt handler, in turn
resulting in less clock drift.
PR: 94778 [1]
Submitted by: Andrew Belashov [1]
MFC after: 2 weeks
with a given module_t. I use this in some the MOD_LOAD event handler for
some test kernel modules to ask the kernel linker to look up the linker
sets in my test modules. (I use linker sets to generate the list of
possible events that I then signal to execute via a sysctl. On non-amd64,
ld(8) would resolve the entire linker set, but on amd64 I have to ask the
kernel linker to do it for me, and having the kernel linker do it works on
all archs.)
if the specified priority is zero. This avoids a race where the calling
thread could read a snapshot of it's current priority, then a different
thread could change the first thread's priority, then the original thread
would call sched_prio() inside msleep() undoing the change made by the
second thread. I used a priority of zero as no thread that calls msleep()
or tsleep() should be specifying a priority of zero anyway.
The various places that passed 'curthread->td_priority' or some variant
as the priority now pass 0.
have not been passed to the h/w yet. This remedies watchdog timeout
of buffered multicast frames in hostap mode.
While here eliminate an extraneous check; ieee80211_beacon_update sets
the tim bit based on ncabq != 0 so there's no reason to check it too.
Noticed by: Christophe Prevotaux
compiler doesn't decide to cache td_state. Cachine the state would cause
the spinning thread to not notice when the owning thread stopped executing
(if it was preempted for example) which could result in livelock.
than keeping it locked until we exit the function to optimize the case
where the lock would be dropped and later reacquired. The optimization
was broken when kevent's were moved from UFS to VFS and the knote list
lock for a vnode kevent became the lockmgr vnode lock. If one tried
to use a kqueue that contained events for a kqueue fd followed by a vnode,
then the kq global lock would end up being held when the vnode lock was
acquired which could result in sleeping with a mutex held (and subsequent
panics) if the vnode lock was contested.
Reviewed by: jmg
Tested by: ps (on 6.x)
MFC after: 3 days
not need to clear it now, this should fix panic when msleep is recursivly
called. Patch is slightly adjusted after review.
Reviewed by: jhb
Tested by: Csaba Henk, csaba-ml at creo.hu
MFC after: 3 days
Strong candidate for backport to 6.x.
When allocating new blocks, the search for block group beginnings
would fail with a segfault. There was a side-effect read access with
an off-by-one errors. The results were not used in the error case so
the code worked in the past. But now the FreeBSD kernel has tighter
mappings and the word accessed is not mapped (for me).
The Linux kernel has rewritten most of the allocation strategy by now.
Also, the Linux kernel cleaned up the integration of these files and
it look feasable to wrap the original Linux files in wrapper that
provides their favorite arguments instead of dragging around our own
code.
For 32-bit SDRAM systems, enable D16 to D31 in the PIO controller.
Otherwise they read back as 0xffff.
Shave 8 bytes from the object size by using AT91C_BASE_PIOA directly
and by not assigning PIO_BSR to 0 in the DBGU init. That's a nop in
two ways (everything defaults to peripheral A, and writing 0 changes
nothing).
Many places used #define FOO ((unsigned int) 0x23) where a simpler
#define FOO 0x23u would have sufficed. This practice is overly
verbose and has the disadvantage that you can't say
#if FOO == BAR
#endif
because the extra "unsigned int" tokens choke cpp's little brain.
Migrate to the latter style to allow use in preprocessor statements.
The two are the same semantically anyway in a C context (at least for
the uses they are put to presently, C gurus can explain to me how they
differ).
via xmodem to the DBGU port when the AT91 comes up in recovery mode.
The recovery loader will then load your program via xmodem into SDRAM
at 1MB which can do its things. It needs to be tweaked to the
specific board one is using, but it fits in < 1kB (all of Atmel's ARM
products have at least 8kb of SRAM that I can tell, so this should
work for them all).
Parts of this code were provided by Kwikbyte with copyright
specifically disclaimed. I heavily modified it to act as a recovery
loader (before it was a bootstrap loader) and to optimize for size
(before I started the size was closer to 8k).
Bootstrap loaders for SPI and IIC to follow.
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.
Reviewed by: tegge
UDPv6 delivery.
Lock the inpcb of the UDP connection being delivered to before
processing IPSEC policy and other delivery activities.
MFC after: 3 months
Otherwise, we could match on a filename that had the wrong last character
(such as /boot/loaded instead of /boot/loader).
PR: kern/95625
Submitted by: Oliver Fromme <olli@secnetix.de>
MFC after: 1 month
+ Add boatloads of KASSERTs and *really* check out more locking
issues (to catch recursions when we actually go to real locking
in CAM soon). The KASSERTs also caught lots of other issues like
using commands that were put back on free lists, etc.
+ Target mode: role setting is derived directly from port capabilities.
There is no need to set a role any more. Some target mode resources
are allocated early on (ELS), but target command buffer allocation
is deferred until the first lun enable.
+ Fix some breakages I introduced with target mode in that some commands
are *repeating* commands. That is, the reply shows up but the command
isn't really done (we don't free it). We still need to take it off the
pending list because when we resubmit it, bad things then happen.
+ Fix more of the way that timed out commands and bus reset is done. The
actual TMF response code was being ignored.
+ For SPI, honor BIOS settings. This doesn't quite fix the problems we've
seen where we can't seem to (re)negotiate U320 on all drives but avoids
it instead by letting us honor the BIOS settings. I'm sure this is not
quite right and will have to change again soon.
controller to get ready (65K x ISA access time, visually around 1 second).
If we have wait more than that amount it's likely that the hardware is a
legacy-free one and simply doesn't have keyboard controller and doesn't
require enabling A20 at all.
This makes cdboot working for MacBook Pro with Boot Camp.
MFC after: 1 day
doesn't appear to be protecting anything. Most of consumers funsetownlst(9)
do not appear to be picking up Giant anywhere. This was originally a part
of my Giant exit(2) clean up revision 1.272 but I thought it was a good idea
to leave it out until we were able to analyze it better.
Tested by: kris
MFC after: 3 weeks
which means that devices will be destroyed on last close.
This fixes destruction order problems when, eg. RAID3 array is build on
top of RAID1 arrays.
Requested, reviewed and tested by: ru
MFC after: 2 weeks
o Implement the remove verb to remove a partition entry.
o Improve error reporting by first checking that the verb is valid.
o Add an entry parameter to the add verb. this parameter can be
both read-only as welll as read-write and specifies the entry
number of the newly added partition.
o Make sure that the provider is alive when passed to us. It may
be withering away.
o When adding a new partition entry, test for overlaps with existing
partitions.
particular provider. Use this function where g_orphan_provider()
is being called so that the flags are updated correctly and
g_orphan_provider() is called only when allowed.
Radeon memmap code, which with a new DDX driver and DRI drivers should fix
long-term stability issues with Radeons. Also adds support for r200's
ATI_fragment_shader, r300 texrect support and texture caching fixes, i915
vblank support and bugfixes, and new PCI IDs.
net.inet.ipsec.test_replay - When set to 1, IPsec will send packets with
the same sequence number. This allows to verify if the other side
has proper replay attacks detection.
net.inet.ipsec.test_integrity - When set 1, IPsec will send packets with
corrupted HMAC. This allows to verify if the other side properly
detects modified packets.
I used the first one to discover that we don't have proper replay attacks
detection in ESP (in fast_ipsec(4)).
natm_usr_detach(), which actually does the right thing. This code has
never worked properly, but also was never invoked since we only abort
connections associated with listen sockets, and netnam doesn't support
listen sockets.
MFC after: 3 months
reference. For now, we allow the possibility that the in_ppcb
pointer in the inpcb may be NULL if a timewait socket has had its
tcptw structure recycled. This allows tcp_timewait() to
consistently unlock the inpcb.
Reported by: Kazuaki Oda <kaakun at highway dot ne dot jp>
MFC after: 3 months
as being undocumented in Stevens, and was broken in 1997 during network
stack infrastructure work. It is the one remaining (and incorrect)
direct protocol reference to raw_usrreq.pru_attach; this is incorrect
because the raw socket code assumes that raw_uattach is called only after
the protocol has allocated a PCB.
MFC after: 3 months
protocols invoke after allocating a PCB, so so_pcb should be non-NULL.
It is only used by the two IPSEC implementations, so I didn't hit it in
my testing.
Reported by: pjd
MFC after: 3 months
The real problem was that ioctl handlers needed to call amr_wait_command()
with the list lock held. This not only solves the completion race, it also
prevents bounce buffer corruption that could arise from amr_start() being
called without the proper locks held.
Discussed with: ps
MFC After: 3 days
the completion of the command can occur before tsleep is called and
the command ends up blocking forever since the wakeup has already
been called.
Submitted by: ups
error on the request. Add a wrapper, gctl_set_param_err(), that
sets the error on the request from the error returned by
gctl_set_param() and update current callers of gctl_set_param()
to call gctl_set_param_err() instead.
This makes gctl_set_param() much more usable in situations where
the caller knows better what to do with certain (apparent) error
conditions and setting an error on the request is not one of the
things that need to be done.
credential: mac_associate_nfsd_label()
This entry point can be utilized by various Mandatory Access Control policies
so they can properly initialize the label of files which get created
as a result of an NFS operation. This work will be useful for fixing kernel
panics associated with accessing un-initialized or invalid vnode labels.
The implementation of these entry points will come shortly.
Obtained from: TrustedBSD
Requested by: mdodd
MFC after: 3 weeks
(tcp_sack_output_debug checks cached hints aginst computed values by walking the
scoreboard and reports discrepancies). The sack hinting code has been stable for
many months now so it is time for the debug code to go. Leaving tcp_sack_output_debug
ifdef'ed out in case we need to resurrect it at a later point.
earlier in cpu_setregs().
- If we know this CPU has a FPU via cpuid, then just assume the INT16
interface and make the npx device quiet to not clutter the dmesg. This
is true for all Pentium and later CPUs and even some of the later 486dx
CPUs.
Reviewed by: bde
Tested by: ps
MFC after: 1 week
new chips and improves support for already supported ones.
Some details, important for future merges:
- if_em.c merged manually, viewing diff between new vendor
driver and previous one.
- if_em_hw.h dropped in from vendor, and then restored revisions
1.16, 1.17, 1.18.
- if_em_hw.c dropped in from vendor, and then two liner change made,
that restores support for two rare chips.
the wire. This increases the speed considerably. Start to put
infrastructure in place to do RX side, but that requires more study
before it can be done.
so that we only have to do an ioapic_write() instead of an ioapic_read()
followed by an ioapic_write() every time we mask and unmask level triggered
interrupts. This cuts the execution time for these operations roughly in
half.
Profiled by: Paolo Pisati <p.pisati@oltrelinux.com>
MFC after: 1 week
tcp_timewait(). This corrects a bug (or lack of fixing of a bug)
in tcp_input.c:1.295.
Submitted by: Kazuaki Oda <kaakun at highway dot ne dot jp>
MFC after: 3 months
PCI devices apparently was changed from a special deferred trap with TPC
pointing to the membar #Sync following the failing load/store instruction
to a precise trap with TPC pointing to the failing load/store instruction.
Thus remove the check the check whether TPC points to a membar #Sync in
case of a data access trap as it's off-by-one for USIII CPUs and it should
be sufficient to check whether the trap happend while in fasword*() to
properly detect traps caused by peeking/poking. This also corresponds to
what other OSs do. Note that also only the USIIi manual suggests to check
the TPC for such traps while the USII one doesn't (in the public USIII
manual device peeking/poking isn't mentioned at all).
NULL. We currently do allow this to happen, but may want to remove that
possibility in the future. This case can occur when a socket is left
open after TCP wraps up, and the timewait state is recycled. This will
be cleaned up in the future.
Found by: Kazuaki Oda <kaakun at highway dot ne dot jp>
MFC after: 3 months
recycling for an unrelated filesystem. I really don't like potentially
acquiring giant in the context of a giantless filesystem but there
are reasonable objections to removing the recycling from this path.
Sponsored by: Isilon Systems, Inc.
o use atomic operations to fiddle with stopped_cpus and started_cpus.
o disable interrupts while we're waiting to be started.
o remove logic relating to cpustop_restartfunc as it's not used.
PCB in which the context of stopped CPUs is stored. To access this
PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The
definition, when present, lives in <machine/kdb.h> and abstracts
where MD code saves the context. Define KDB_STOPPEDPCB on i386,
amd64, alpha and sparc64 in accordance to previous code.
with large mmap files mapped into many processes, this saves hundreds of
megabytes of ram.
pv entries were individually allocated and had two tailq entries and two
pointers (or addresses). Each pv entry was linked to a vm_page_t and
a process's address space (pmap). It had the virtual address and a
pointer to the pmap.
This change replaces the individual allocation with a per-process
allocation system. A page ("pv chunk") is allocated and this provides
168 pv entries for that process. We can now eliminate one of the 16 byte
tailq entries because we can simply iterate through the pv chunks to find
all the pv entries for a process. We can eliminate one of the 8 byte
pointers because the location of the pv entry implies the containing
pv chunk, which has the pointer. After overheads from the pv chunk
bitmap and tailq linkage, this works out that each pv entry has an
effective size of 24.38 bytes.
Future work still required, and other problems:
* when running low on pv entries or system ram, we may need to defrag
the chunk pages and free any spares. The stats (vm.pmap.*) show that
this doesn't seem to be that much of a problem, but it can be done if
needed.
* running low on pv entries is now a much bigger problem. The old
get_pv_entry() routine just needed to reclaim one other pv entry.
Now, since they are per-process, we can only use pv entries that are
assigned to our current process, or by stealing an entire page worth
from another process. Under normal circumstances, the pmap_collect()
code should be able to dislodge some pv entries from the current
process. But if needed, it can still reclaim entire pv chunk pages
from other processes.
* This should port to i386 really easily, except there it would reduce
pv entries from 24 bytes to about 12 bytes.
(I have integrated Alan's recent changes.)
- Use FBSDID in trap.c
- Make the global trap_sig[] static as it's not used outside of trap.c.
- In sendsig() remove an unused variable.
- In trap() sync with the other archs; for fast data access MMU miss and
data access protection traps set ksi_addr to the SFAR reg which contains
the faulting address and otherwise to the TPC reg. Generally the TCP reg
contains the address of the instruction that caused the exception, except
for fast instruction access traps (and some others; more refinement may
be needed here) it also contains the faulting address.
Previously sendsig() always set si_addr to the SFAR reg which is wrong
for most traps.
- In sendsig() add support for FreeBSD old-style signals.
These changes are inspired by kmacy's sun4v changes and allow libsigsegv
to build on FreeBSD/sparc64, but it doesn't pass all checks and tests it
actually should, yet.
MFC after: 5 days
intr_disable() and intr_restore() resp. Previously, critical
regions would have interrupts disabled, but that was changed.
Consequently, the debugger could run with interrupts enabled.
This could cause problems for the low-level console code where
received characters would trigger an interrupt that causes
the interrupt handler to read the character instead of the
cngetc() function.
The INP_DROPPED check replaces the current NULL checks; the INP_TIMEWAIT
checks appear to have always been required, but not been there, which
is/was a bug. This avoids unconditionally casting of in_ppcb to a tcpcb,
when it may be a twtcb, which may have resulted in obscure ICMP-related
panics in earlier releases.
MFC after: 3 months
casts.
Consistently use intotw() to cast inp_ppcb pointers to struct tcptw *
pointers.
Consistently use intotcpcb() to cast inp_ppcb pointers to struct tcpcb *
pointers.
Don't assign tp to the results to intotcpcb() during variable declation
at the top of functions, as that is before the asserts relating to
locking have been performed. Do this later in the function after
appropriate assertions have run to allow that operation to be conisdered
safe.
MFC after: 3 months
immediately rather than jumping to the normal output handling, which
assumes we've pulled out the inpcb, which hasn't happened at this
point (and isn't necessary).
Return ECONNABORTED instead of EINVAL when the inpcb has entered
INP_TIMEWAIT or INP_DROPPED, as this is the documented error value.
This may correct the panic seen by Ganbold.
MFC after: 1 month
Reported by: Ganbold <ganbold at micom dot mng dot net>
the NS8250 class driver. The UART has FIFOs if sc_rxfifosz>1, so
test for that instead.
While here properly initialize sc_rxfifosz and sc_txfifosz in the
case the UART doesn't have FIFOs.
disconnect for fully connected sockets was dropped, meaning that if
the socket was closed while the connection was alive, it would be
leaked. Structure tcp_usr_detach() so that there are two clear
parts: initiating disconnect, and reclaiming state, and reintroduce
the tcp_disconnect() call in the first part.
MFC after: 3 months
a pv entry if the number of entries is below the high water mark for pv
entries.
Use pmap_try_insert_pv_entry() in pmap_copy() instead of
pmap_insert_entry(). This avoids possible recursion on a pmap lock in
get_pv_entry().
Eliminate the explicit low-memory checks in pmap_copy(). The check that
the number of pv entries was below the high water mark was largely
ineffective because it was located in the outer loop rather than the
inner loop where pv entries were allocated. Instead of checking, we
attempt the allocation and handle the failure.
Reviewed by: tegge
Reported by: kris
MFC after: 5 days
The following bug was just identified in OpenBSD and it looks like the same
bug exists in the other BSDen NFS servers.
A Linux client (don't know which version, but you can look at
http://bugzilla.kernel.org/show_bug.cgi?id=6256)
does a Setattr of mtime to the server's time, where the file is mode 0664 and
the client user has group access (ie. caller is not the file owner).
The BSD servers fail the Setattr with EPERM, since the VA_UTIMES_NULL flag
isn't set before doing the VOP_SETATTR.
It seems to me that this should be allowed, since it is allowed for a local
utimes(2). If so, the fix is to set VA_UTIMES_NULL for the
"set-time-to-server-time" cases of setting atime and/or mtime.
Submitted by: rick@snowhite.cis.uoguelph.ca
Reviewed by: cel
Approved by: silby
MFC after: 1 week
socket can have a tcp connection that has entered time wait
attached to it, in the event that shutdown() is called on the
socket and the FINs properly exchange before close(). In this
case we don't detach or free the inpcb, just leave the tcptw
detached and freed, but we must release the inpcb lock (which we
didn't previously).
MFC after: 3 months
pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, the receive
code no longer requires the pcbinfo lock, and the send code only
requires it if building a new connection on an otherwise unconnected
socket triggered via sendto() with an address. This should
significnatly reduce tcbinfo lock contention in the receive and send
cases.
- In order to support the invariant that so_pcb != NULL, it is now
necessary for the TCP code to not discard the tcpcb any time a
connection is dropped, but instead leave the tcpcb until the socket
is shutdown. This case is handled by setting INP_DROPPED, to
substitute for using a NULL so_pcb to indicate that the connection
has been dropped. This requires the inpcb lock, but not the pcbinfo
lock.
- Unlike all other protocols in the tree, TCP may need to retain access
to the socket after the file descriptor has been closed. Set
SS_PROTOREF in tcp_detach() in order to prevent the socket from being
freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether
or not it needs to free the socket when the connection finally does
close. The typical case where this occurs is if close() is called on
a TCP socket before all sent data in the send socket buffer has been
transmitted or acknowledged. If INP_SOCKREF is found when the
connection is dropped, we release the inpcb, tcpcb, and socket instead
of flagging INP_DROPPED.
- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.
- Annotate the existence of a long-standing race in the TCP timer code,
in which timers are stopped but not drained when the socket is freed,
as waiting for drain may lead to deadlocks, or have to occur in a
context where waiting is not permitted. This race has been handled
by testing to see if the tcpcb pointer in the inpcb is NULL (and vice
versa), which is not normally permitted, but may be true of a inpcb
and tcpcb have been freed. Add a counter to test how often this race
has actually occurred, and a large comment for each instance where
we compare potentially freed memory with NULL. This will have to be
fixed in the near future, but requires is to further address how to
handle the timer shutdown shutdown issue.
- Several TCP calls no longer potentially free the passed inpcb/tcpcb,
so no longer need to return a pointer to indicate whether the argument
passed in is still valid.
- Un-macroize debugging and locking setup for various protocol switch
methods for TCP, as it lead to more obscurity, and as locking becomes
more customized to the methods, offers less benefit.
- Assert copyright on tcp_usrreq.c due to significant modifications that
have been made as part of this work.
These changes significantly modify the memory management and connection
logic of our TCP implementation, and are (as such) High Risk Changes,
and likely to contain serious bugs. Please report problems to the
current@ mailing list ASAP, ideally with simple test cases, and
optionally, packet traces.
MFC after: 3 months
pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, in protocol
shutdown methods, and in raw IP send.
- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.
- Invoke in_pcbfree() after in_pcbdetach() in order to free the
detached in_pcb structure for a socket.
MFC after: 3 months
- in_pcbdetach(), which removes the link between an inpcb and its
socket.
- in_pcbfree(), which frees a detached pcb.
Unlike the previous in_pcbdetach(), neither of these functions will
attempt to conditionally free the socket, as they are responsible only
for managing in_pcb memory. Mirror these changes into in6_pcbdetach()
by breaking it into in6_pcbdetach() and in6_pcbfree().
While here, eliminate undesired checks for NULL inpcb pointers in
sockets, as we will now have as an invariant that sockets will always
have valid so_pcb pointers.
MFC after: 3 months
the so_pcb pointer on the socket is always non-NULL. This eliminates
countless unnecessary error checks, replacing them with assertions.
MFC after: 3 months
rather than an error. Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.
soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF. so_pcb is now entirely owned and
managed by the protocol code. Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.
Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.
In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.
netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit. In their current state they may leak
memory or panic.
MFC after: 3 months
than an int, as an error here is not meaningful. Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.
This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit. This will be corrected shortly in followup
commits to these components.
MFC after: 3 months
the file descriptor reference, rather than paying additional lock
operations to acquire a socket reference from the file descriptor.
This will also help to ensure that file descriptor based socket
requests are not delivered to a socket after close. Most consumers
have already been converted to this model.
MFC after: 3 months
be present at this point. We will eventually remove this assert because
the socket layer should never look at so_pcb, but for now it's a useful
debugging tool.
MFC after: 3 months
socket calls relating to the creation and destruction of sockets. This
will eventually form the foundation of socket(9), but is currently in too
much flux to do so.
MFC after: 3 months
There's something strange going on with async events. They seem
to be be treated differently for different Fusion implementations.
Some will really tell you when it's okay to free the request that
started them. Some won't. Very disconcerting.
This is particularily bad when the chip (FC in this case) tells you
in the reply that it's not a continuation reply, which means you
can free the request that its associated with. However, if you do
that, I've found that additional async event replies come back for
that message context after you freed it. Very Bad Things Happen.
Put in a reply register debounce. Warn about out of range context
indices. Use more MPILIB defines where possible. Replace bzero with
memset. Add tons more KASSERTS. Do a *lot* more request free list
auditting and serial number usages. Get rid of the warning about
the short IOC Facts Reply. Go back to 16 bits of context index.
Do a lot more target state auditting as well. Make a tag out
of not only the ioindex but the request index as well and worry
less about keeping a full serial number.
a different register shift and is fed by a different clock than
we use for UltraSPARC hardware. To deal with this, the regshft and
rclk fields in the class structure are removed and bus frontends
now pass the right regshft and rclk to the probe function where
they're put in the BAS and passed in to subordinate drivers.
vnode after vflush() has succeeded. This would cause a dangling vnode
panic at unmount time otherwise. Other filesystems may have this problem
via their VFS_VGET() routines.
Found by: kris
Sponsored by: Isilon Systems, Inc.
--------------------
- Seal the fate of long standing memory leak (4 years, 7 months) during
pcm_unregister(). While destroying cdevs, scan / detect possible
children and free its SLIST placeholder properly.
- Optimize channel allocation / numbering even further. Do brute cyclic
checking only if the channel numbering screwed.
- Mega vchan create/destroy cleanup:
o Implement pcm_setvchans() so everybody can use it freely instead
of implementing their own, be it through sysctl or channel auto
allocation.
o Increase vchan creation/destruction resiliency:
+ it's possible to increase/decrease total vchans even during
busy playback/recording. Busy channel will be left alone, untouched.
Abusive test sample:
# play whatever...
#
while : ; do
sysctl hw.snd.pcm0.vchans=1
sysctl hw.snd.pcm0.vchans=10
sysctl hw.snd.pcm0.vchans=100
sysctl hw.snd.pcm0.vchans=200
done
# Play something else, leave above loop running frantically.
+ Seal another 4 years old bug where it is possible to destroy (virtual)
channel even when its cdevs being referenced by other process.
The "First Come First Served" nature of dsp_clone() is the main
culprit of this issue, and usually manifest itself as dangling
channel <-> process association. Ensure that all of its cdevs
are free from being referenced before destroying it (through
ORPHAN_CDEVT() macross).
All these fixes (including previous fixes) will be MFCed, later.
to avoid possible device unregister race (impossible to reproduce, yet
possible).
- Extra sanity check to ensure proper parent channel is being selected.
- Reset parent channel once all of its children gone.
called.
- vfs_getvfs has to return a reference to prevent the returned mountpoint
from changing identities.
- Release references acquired via vfs_getvfs.
Discussed with: tegge
Tested by: kris
Sponsored by: Isilon Systems, Inc.
mount memory from being reclaimed. This resolves a number of race
conditions described in vfs_default.c and introduced with the
VFS_LOCK_GIANT macros.
- Let the mtx and lock remain valid after the mount structure has been
freed by using init and fini calls. Technically fini will never be
called but is included for completeness.
- Consistently use lockmgr directly rather than lockmgr to lock and
vfs_unbusy to unlock.
Discussed with: tegge
Tested by: kris
Sponsored by: Isilon Systems, Inc.
- Move the vn_lock of the dvp until after we've unbusied the filesystem
to avoid a LOR with the mount point lock.
- In the v_mountedhere while loop we acquire a new instance of giant each
time through without releasing the first. This would cause us to leak
Giant.
Sponsored by: Isilon Systems, Inc.
requires Giant. It is set in bgetvp and cleared in brelvp.
- Create QUEUE_DIRTY_GIANT for dirty buffers that require giant.
- In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and
only if we think there are buffers in that queue.
Sponsored by: Isilon Systems, Inc.
failing, print a message when we fail for some reason as most callers do
not check the return value (e.g. 'cuz they're called from SYSINIT)
Reviewed by: scottl
MFC after: 1 week
Vararg functions have a different calling convention than regular
functions on amd64. Casting a varag function to a regular one to
match the function pointer declaration will hide the varargs from
the caller and we will end up with an incorrectly setup stack.
Entirely remove the varargs from these functions and change the
functions to match the declaration of the function pointers.
Remove the now unnecessary casts.
Also change static struct ipprotosw[] to two independent
protosw/ip6protosw definitions to remove an unnecessary cast.
PR: amd64/95008
Submitted and tested by: Mats Palmgren
Reviewed by: rwatson
MFC after: 3 days
o Add the scc(4) manpage to the build.
o Update the uart(4) manpage to account for scc(4).
o Update the uart(4) module build to include support for scc(4).
controllers typically have multiple channels and support a number
of serial communications protocols. The scc(4) driver is itself
an umbrella driver that delegates the control over each channel
and mode to a subordinate driver (like uart(4)).
The scc(4) driver supports the Siemens SAB 82532 and the Zilog
Z8530 and replaces puc(4) for these devices.
case panic on sparc64.
The problem is in MD5(9) implementation. The Encode() function takes
'unsigned char *output' as its first argument, which is then assigned to
'u_int32_t *op'. If the 'output' argument is not 4 byte aligned (and in
geli(8) case it is not), sparc64 machine will panic.
I don't know how to fix MD5(9) in a clean way, so I'm implementing a
work-around in geli(8).
Reported by: brueffer
MFC after: 3 days
in the ISR doesn't read the actual socket event register, but instead
reads garbage (usually 0xffffffff, but other times other things).
This totally violates the PCI spec, but happens rarely enough that a
workaround is in order. This adds one test when we have a real
interrupt to service (which is very rare), and doesn't affect the
usualy 'nothing to see here' case at all.
Problem reported by many, but sam@ gave me this workaround after
diagnosing the problem.
a lock's priority to a sleeping thread. When we panic, dump a stack
trace of the thread that is asleep if DDB is compiled into the kernel
just before calling panic(). This is much more informative and useful
for debugging than the current behavior of getting a page fault and not
having an easy way of determining which thread caused the original problem.
MFC after: 1 week
a race where data could come in before we clear the INFLUX flag, and get
skipped over by knote (and hence never be activated, though it should of
been)...
Found by: glebius & co.
Reviewed by: glebius
MFC after: 3 days
some systems were designed so that AML writes to various resources shared
with OS drivers, including the RTC, PIC, PCI, etc. These writes could
collide with writes by the OS and should never be performed. For now, we
print a message if such an access occurs, but do not block it. To block
the access, the tunable "debug.acpi.block_bad_io" can be set to 1. In the
future, we will flip the switch and this will become the default.
Information about this problem was found in Microsoft KB 283649. They
block IO accesses if the BIOS indicates via _OSI that it is Windows 2001
or higher. They always block accesses to the PIC, cascaded PIC, and ELCRs,
no matter how old the BIOS.
systems (blade servers). On most systems, this is implemented as an IO
write to the SMI port and the BIOS generates the actual reset.
PR: kern/94939
Submitted by: dodell@ixsystems.com
Reviewed by: jhb
MFC after: 3 weeks
foreign per-CPU pages in cpu_ipi_send() in order to get the module IDs
of the other CPUs can cause a page fault. If this happens when doing a
TLB shootdown while dealing with another page fault this causes a panic
due to the recursive page fault. As I don't spot other code that assumes
or requires that accessing foreign per-CPU pages must not page fault
solve this by adding a statically allocated (and therefore locked in the
kernel pages) array which establishes a FreeBSD CPU ID -> module ID
relation and use that in cpu_ipi_selected() (instead of statically
allocating the per-CPU pages which would just waste memory on say a dual
CPU machine as sun4u theoretically supports up to 128 CPUs or wasting
dTLB slots for the foreign per-CPU pages). [1]
- Fix a potential race in cpu_ipi_send(); as we don't serialize the access
to cpu_ipi_selected() between MI and MD use (only MI-MI and MD-MD) we
might catch the NACK bit caused by sending another IPI. Solve this by
checking the NACK bit in the contents of the interrupt dispatch status
reg read while interrupts were still turned off instead of reading that
reg anew after interrupts were turned on again. This is also what the
CPU docs suggest to do.
- Add a workaround for the SpitFire erratum #54 bug (affecting interrupt
dispatch). While public info regarding what this CPU bug actually causes
is not available testing shows that with the workaround in place it's
less likely to get a "couldn't send ipi" panic, it doesn't solve these
panics entirely though. [2]
Reported by: kris [1]
Some clue from: kmacy [1]
Info from: Linux, OpenSolaris [2]
Additional testing by: kris
MFC after: 3 days
Saab for helping to track this down. Fix a error with 32bit DMA size
calculation that seemed to be harmless. Add a few micro-optimizations while
I'm here.
generating a coredump as the result of a signal.
- Fix a bug where we could leak a Giant lock if vn_start_write() failed
in coredump().
Reported by: jmg (2)
mddestroy() only if the file is from a non-MPSAFE VFS.
- No longer unconditionally hold Giant in the md kthread for vnode-backed
kthreads.
- Improve the handling of the thread exit race when destroying an md
device.
and use that instead of testing fdidx against -1 to determine if it should
release Giant if Giant was locked due to the requested file residing on a
non-MPSAFE VFS.
Discussed with: jeff
get_cyclecount() as that results in a saner value and makes schedgraph
much happier on Alpha. (schedgraph doesn't handle the fact that the
counters are out of sync though)
as we have to call tick_init() before cninit() in order to provide the
low-level console drivers with a working DELAY() which in turn means we
cannot use panic() in tick_init().
- s,to high, too high, in the panic string
Inspired by: kmacy's sun4v changes
MFC after: 3 days
probably never fully applied to IPv6. Over time it has become more
stale, so replace it with something more up to date.
Reviewed by: ume
MFC after: 1 month
ipsec_copypkt(), as this is already handled by the call to M_MOVE_PKTHDR(),
which also knows how to correctly handle MAC m_tags. This corrects a panic
when running with MAC and KAME IPSEC.
PR: kern/94599
Submitted by: zhouyi zhou <zhouyi04 at ios dot cn>
Reviewed by: bz
MFC after: 3 days
arguments. The first one is never used (all callers pass in 0); the
second is sometimes used to pass in a struct timespec * which is used as
a timeout and never modified. Constify that argument so callers can pass
a const struct timespec * without jumping through hoops.
back to using the RSDT instead. ACPI-CA already follows this same strategy
as a workaround for yet another instance of brain-damaged BIOS writers.
PR: i386/93963
Submitted by: Masayuki FUKUI <fukui.FreeBSD@fanet.net>
now just make it clear station statistics (could read
a stat block and assign to caller can do partial changes)
Reviewed by: avatar (previous version)
MFC after: 1 week
acquiring Giant in kern_sendfile().
Guard against the forced reclamation of a vnode in kern_sendfile().
Discussed with: jeff
Reviewed by: tegge
MFC after: 3 weeks
ipxpcb mutex. Contrary to the comment, even in 4.x this was unsafe,
as parallel use of the socket by another process would result in pcb
corruption if the mbuf allocation slept.
MFC after: 1 month
a problem with listing large number of md(4) devices. Either 'list' or
'query' mode uses XML.
Additionally, new functionality was introduced. It's possible to pass
multiple devices to -u:
# ./mdconfig -l -u md0,md1
Approved by: cognet (mentor)
REGRESSION is enabled, allows user space to dictate that sonewconn()
should skip it's "skip the hard work" check to see if the listen
queue is full, and instead proceed with allocation of a socket and
trimming of the overflowed queue. This makes it easier to test the
queue overflow logic.
MFC after: 1 month
IPXP_DROPPED before continuing, and return EINVAL or ECONNRESET if
it is flagged. It's unclear why each situation should be one or
the other, but it is copied from netinet which has the same bugs.
MFC after: 1 month
as belonging to SPX. This replaces the implicit assumption that the cb
pointer for non-SPX pcb's will be NULL. This isn't required in TCP/IP
as different pcb lists are maintained for different IP protocols; IPX
stores all pcbs on the same global ipxpcb_list.
Foot provided by: gnn
MFC after: 1 month
Kernel changes:
Inform hwpmc of executable objects brought into the system by
kldload() and mmap(), and of their removal by kldunload() and
munmap(). A helper function linker_hwpmc_list_objects() has been
added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve
the list of currently loaded kernel modules.
The unused `MAPPINGCHANGE' event has been deprecated in favour
of separate `MAP_IN' and `MAP_OUT' events; this change reduces
space wastage in the log.
Bump the hwpmc's ABI version to "2.0.00". Teach hwpmc(4) to
handle the map change callbacks.
Change the default per-cpu sample buffer size to hold
32 samples (up from 16).
Increment __FreeBSD_version.
libpmc(3) changes:
Update libpmc(3) to deal with the new events in the log file; bring
the pmclog(3) manual page in sync with the code.
pmcstat(8) changes:
Introduce new options to pmcstat(8): "-r" (root fs path), "-M"
(mapfile name), "-q"/"-v" (verbosity control). Option "-k" now
takes a kernel directory as its argument but will also work with
the older invocation syntax.
Rework string handling in pmcstat(8) to use an opaque type for
interned strings. Clean up ELF parsing code and add support for
tracking dynamic object mappings reported by a v2.0.00 hwpmc(4).
Report statistics at the end of a log conversion run depending
on the requested verbosity level.
Reviewed by: jhb, dds (kernel parts of an earlier patch)
Tested by: gallatin (earlier patch)
reason, seems to be where new flags are getting defined:
INP_DROPPED - The protocol has terminated this connection and the socket
is not reusable: when the socket code enters the protocol,
an error is immediately returned. This will substitute for
NULLing the so_pcb socket field, helping to implement the
invariant that all valid sockets have valid pcb's in TCP.
INP_SOCKREF - The protocol has become the owner of the socket reference,
and will need to free it when freeing the pcb, which will
be used when a TCP socket is closed but still has queued
data.
MFC after: 1 month
the error on sparc64 hadn't changed since the last checkin, pass
LINT on other platforms and mpt doesn't work on sparc64 anyway
and the tinderbox build didn't work for me in a cross build case
on my main build machine (which runs RELENG_6). Sigh. Still
need to try harder.
- Introduce invariant that all IPX/SPX sockets will have valid so_pcb
pointers to ipxpcb structures, and that for SPX, the control block
pointer will always be valid. Don't attempt to free the socket or
pcb at various odd points, such as disconnect.
- Add a new ipxpcb flag, IPXP_DROPPED, which will be set in place of
freeing PCB's so that this invariant can be maintained. This flag
is now checked instead of a NULL check in various socket protocol
calls.
- Introduce many assertions that this invariant holds.
- Various pieces of code, such as the SPX timer code, no longer needs
to jump through hoops in case it frees a PCB while running.
- Break out ipx_pcbfree() from ipx_pcbdetach(). Likewise
spx_pcbdetach().
- Comment on some SMP-related limitations to the SPX code.
- Update copyrights.
MFC after: 1 month
of its allocations fails. Allocate the ipxp last so as to avoid having
to free it if another allocation goes wrong.
Normalize retrieval of ipxp and cb from socket in spx_sp_attach(), and
add assertions.
MFC after: 1 month
especially reads of spx header structures, which will now be cached
in the stack until they can be copied out after releasing the lock.
Panic if a bad socket option direction is passed in by the caller.
MFC after: 1 month
Make the kernel side of FAST_IPSEC not depend on the shared
structures defined in /usr/include/net/pfkeyv2.h The kernel now
defines all the necessary in kernel structures in sys/netipsec/keydb.h
and does the proper massaging when moving messages around.
Sponsored By: Secure Computing
A) Fibre Channel Target Mode support mostly works
(SAS/SPI won't be too far behind). I'd say that
this probably works just about as well as isp(4)
does right now. Still, it and isp(4) and the whole
target mode stack need a bit of tightening.
B) The startup sequence has been changed so that
after all attaches are done, a set of enable functions
are called. The idea here is that the attaches do
whatever needs to be done *prior* to a port being
enabled and the enables do what need to be done for
enabling stuff for a port after it's been enabled.
This means that we also have events handled by their
proper handlers as we start up.
C) Conditional code that means that this driver goes
back all the way to RELENG_4 in terms of support.
D) Quite a lot of little nitty bug fixes- some discovered
by doing RELENG_4 support. We've been living under Giant
*waaaayyyyy* too long and it's made some of us (me) sloppy.
E) Some shutdown hook stuff that makes sure we don't blow
up during a reboot (like by the arrival of a new command
from an initiator).
There's been some testing and LINT checking, but not as
complete as would be liked. Regression testing with Fusion
RAID instances has not been possible. Caveat Emptor.
Sponsored by: LSI-Logic.
is derived from the phrase 'MegaRAID Firmware Interface' used by LSI. This
driver provides a block interface to logical disks on the card and a minimal
management device. It is MPSAFE, INTR_FAST, and 64-bit capable.
Thanks to Dell for providing hardware to test with and IronPort for
sponsoring the work.
Sponsored by: Dell, Ironport
MFC After: 3 days
being committed:
- Wrap comments more evenly on right border.
- Clean up braces.
Also, along similar lines:
- Assert some pointers are non-NULL before dereferencing them.
- Remove one assertion that looks, on face value, poor.
MFC after: 1 month
socket also supports the voltage. Some XV cards have appeared on the
scene (or cards that report they support XV), and in older machines
that have sockets that do not support XV, we were bogusly trying to
power them at XV rather than at 3.3V. Now, power up the card at the
lowest voltage supported by both the card and the socket.
MFC After: 3 days
with Giant, as there is current unsafety in the IPX tunneled over IP
code. There have been no reports of trouble, but there probably would
be if anyone were running this code at high speed on SMP systems.
MFC after: 3 days
The bug was that earlier, if a request was retransmitted,
we would do subsequent retransmits every 10 msecs.
This can cause data corruption under moderate loads by reordering
operations as seen by the client NFS attribute cache, and on the
server side when the retransmission occurs after the original request
has left the duplicate cache, since the operation will be committed
for a second time.
Further work on retransmission handling is needed (e.g. they are still
being done sent too often since they are scaled by HZ, and the size of
the dup cache is too small and easily overwhelmed on busy servers).
Submitted by: mohans
details see PR kern/94448.
PR: kern/94448
Original patch: Eygene A. Ryabinkin <rea-fbsd at rea dot mbslab dot kiae dot ru>Final patch: thompsa@
Tested by: thompsa@, Eygene A. Ryabinkin
MFC after: 7 days
variable on the spx_input() stack. It's not very large, and this will
avoid parallelism issues when spx_input() runs in more than one thread at
a time.
MFC after: 1 month
- [1] Make the driver friendly towards kernel without PREEMPTION.
Use msleep(9) instead of simple unlock-check_variable-lock mechanisme
since the later not really effective in non-preemptible kernel
(especially during codec detection routine).
- Free most driver resources in a sane manner to avoid possible
double free and panics especially during device detach and codec
detection failure.
MFC after: 3 days
[1] http://lists.freebsd.org/pipermail/freebsd-questions/2006-March/116515.html
relocate it), do not attempt to call pmap_vac_me_harder() on the page.
At this point m will be NULL, and we know we won't have any cache
issues with this page.
Correctly identify the user running opiepasswd(1) when the login name
differs from the account name. [2]
Security: FreeBSD-SA-06:11.ipsec [1]
Security: FreeBSD-SA-06:12.opie [2]
that have the specified kind, instead of assuming that there is
only one report of the right kind in the report descriptor.
Submitted by: Morten Johansen
Obtained from: NetBSD (indirectly)
PR: usb/77604
VFS_LOCK_GIANT/VFS_UNLOCK_GIANT calls. This completely removes Giant
acquisition in the syscall path for ffs.
Bug fix to kern_fhstatfs from: Todd Miller <Todd.Miller@sparta.com>
Sponsored by: Isilon Systems, Inc.
- rename some file local structure definitions, the names clash with
autogenerated names
- on !alpha add some compatibility defines for those renamed structures
- make some functions globally visible on alpha
o don't send management frames if the IFF_DRV_RUNNING flag is not set.
this prevents the timeout watchdog from being potentially re-armed
when the interface is brought down.
fixes a crash that occurs with RT2661 based adapters.
reported by Arnaud Lacombe.
Specifically, on mappings with PG_G set pmap_remove() not only performs
the necessary per-page invlpg invalidations but also performs an
unnecessary invalidation of the entire set of non-PG_G entries.
Reviewed by: tegge
multicast addresses from carp interface. [1]
o Rewrite carpdetach(), so that it does the following things: [1]
- Stops callouts.
- Decrements carp_suppress_preempt, if needed.
- Downs interface and sets CARP state to INIT.
- Calls carp_multicast_cleanup().
- Detaches softc from carp_if and if we are the last frees
the carp_if.
o Use new carpdetach() in carp_clone_destroy().
o In carp_ifdetach() acquire the carp_if lock and cleanup all
interfaces hanging on carp_if. [1]
o Make carp_ifdetach() static and use EVENT(9) to call it
from if_detach(). [2]
o In carp_setrun() exit if the softc doesn't have a valid pointer
to parent. [1]
Obtained from: OpenBSD [1]
Submitted by: Dan Lukes <dan obluda.cz> [2]
PR: kern/82908 [2]
- Determine open direction using 'flags', not 'mode'. This bug exist since
past 4 years.
- Don't allow opening the same device twice, be it in a same or different
direction.
- O_RDWR is allowed, provided that it is done by a single open (for example
by mixer(8)) and the underlying hardware support true full-duplex operation.
- Do various paranoid checking in case other process/thread trying to hijack
the same device twice (or more).
MFC after: 5 days
- <netipx> headers [1]
- IPX library (libipx)
- IPX support in ifconfig(8)
- IPXrouted(8)
- new MK_NCP option
New MK_NCP build option controls:
- <netncp> and <fs/nwfs> headers
- NCP library (libncp)
- ncplist(1) and ncplogin(1)
- mount_nwfs(8)
- ncp and nwfs kernel modules
User knobs: WITHOUT_IPX, WITHOUT_IPX_SUPPORT, WITHOUT_NCP.
[1] <netsmb/netbios.h> unconditionally uses <netipx> headers
so they are still installed. This needs to be dealt with.
"fdinit() fails to initialize newfdp->fd_fd.fd_lastfile to -1. This breaks
fdcopy() which will incorrectly set newfdp->fd_freefile to 1 if no files are
open and the last file descriptor marked as unused for fdp was 0. This later
causes descriptor 0 to be unavailable in newfdp when the optimization is
enabled.
When the last file descriptor previously marked as used is nonzero and marked
as unused, fdunused() incorrectly sets fdp->fd_lastfile to fd - 1 due to
fd_last_used() returning (size - 1). This hides the problem that breaks the
optimization."
This allows us to keep the optimization, while un-breaking it.
This is a RELENG_6 candidate.
PR: kern/87208
MFC after: 1 week
Submitted by: tegge
the target directory or file. This case should fail in the filesystem
anyway and perhaps kern_rename() should catch it.
Sponsored by: Isilon Systems, Inc.
branch:
Integrate audit.c to audit_worker.c, so as to migrate the worker
thread implementation to its own .c file.
Populate audit_worker.c using parts now removed from audit.c:
- Move audit rotation global variables.
- Move audit_record_write(), audit_worker_rotate(),
audit_worker_drain(), audit_worker(), audit_rotate_vnode().
- Create audit_worker_init() from relevant parts of audit_init(),
which now calls this routine.
- Recreate audit_free(), which wraps uma_zfree() so that
audit_record_zone can be static to audit.c.
- Unstaticize various types and variables relating to the audit
record queue so that audit_worker can get to them. We may want
to wrap these in accessor methods at some point.
- Move AUDIT_PRINTF() to audit_private.h.
Addition of audit_worker.c to kernel configuration, missed in
earlier submit.
Obtained from: TrustedBSD Project
Add ioctls to audit pipes in order to allow querying of the current
record queue state, setting of the queue limit, and querying of pipe
statistics.
Obtained from: TrustedBSD Project
net.inet.ip.portrange.reservedlow apply to IPv6 aswell as IPv4.
We could have made new sysctls for IPv6, but that potentially makes
things complicated for mapped addresses. This seems like the least
confusing option and least likely to cause obscure problems in the
future.
This change makes the mac_portacl module useful with IPv6 apps.
Reviewed by: ume
MFC after: 1 month
really breaking things. Simple "close(0); dup(fd)" does not return descriptor
"0" in some cases. Further, this change also breaks some MAC interactions with
mac_execve_will_transition(). Under certain circumstances, fdcheckstd() can
be called in execve(2) causing an assertion that checks to make sure that
stdin, stdout and stderr reside at indexes 0, 1 and 2 in the process fd table
to fail, resulting in a kernel panic when INVARIANTS is on.
This should also kill the "dup(2) regression on 6.x" show stopper item on the
6.1-RELEASE TODO list.
This is a RELENG_6 candidate.
PR: kern/87208
Silence from: des
MFC after: 1 week
Change send_trigger() prototype to return an int, so that user
space callers can tell if the message was successfully placed
in the trigger queue. This isn't quite the same as it being
successfully received, but is close enough that we can generate
a more useful warning message in audit(8).
Obtained from: TrustedBSD Project
transfers. This fixes some cases where the software toggle tracking
was not doing the right thing. For example, a short transfer that
transferred 0 bytes of the requested qTD transfer size does cause
a toggle change, but the existing code was assuming it didn't.
Reported and tested by: pav
MFC after: 2 weeks
Add bus attachment for the ohci device on this chip. The bus and hub
are detected correctly, but the children devices aren't detected
correctly for reasons unknown.
o update TODO list
o Better use of busdma
o mark RX dtors as COHERENT. This helps performance a lot by not requiring
so many EXPENSIVE cache flushes. The cost of accessing it non-cached
is much smaller.
o Copy data from Rx buffers to make IP header 4 byte aligned.
o CRC length included in reported length, so cope
o Don't free TX buffer twice
o Manage TX buffers better.
o Enable just the interrupts we want.
o Manage OACTIVE better
# Some of these done by cognet
# These changes let us get to # via NFS root.
o Add memory barrier to bus space
o Allow for up to 3 IRQs per device
o Move to table driven population of children devices.
o Add support for usb ohci memory mapped controller resource allocation.
o Clean up a bunch of extra writes to disable interrupts that are now
done elsewhere.
o Force all system interrupt handlers be fast. We get deadlock if they
aren't.
o Disable all interrupts that the ST can generate until we have an ISR
to service them.
o Correct clock calculation to make DELAY the right length...
Submitted by: cognet (#2)
request, the FreeBSD NFS client will quickly back off to a excessively
long wait (days, then weeks) before retrying the request.
Change the behavior of the FreeBSD NFS client to match the behavior of
the reference NFS client implementation (Solaris). This provides a fixed
delay of 10 seconds between each retry by default. A sysctl, called
nfs3_jukebox_delay, is now available to tune the delay. Unlike Solaris,
the sysctl value on FreeBSD is in seconds, rather than in HZ.
Sponsored by: Network Appliance, Incorporated
Reviewed by: rick
Approved by: silby
MFC after: 3 days
sockets as long as the sockets have not been aborted or detached. Do
not try to free the socket in pru_detach(), since sofree() will do so,
if needed, once pru_detach() returns.
Annotate a bug in ddp_abort(), which fails to free the socket; this
is probably OK as ddp_abort() should never be called, so should
instead be deleted.
pcb's allocated:
- Universally ensure (and assert) that so_pcb is not NULL, removing lots
of checks and error cases. Don't free the pcb without clearing the
so_pcb pointer.
- Don't try to free the socket in pru_detach(), since the caller will
immediately free the socket.
- Do retain the sotryfree() in pru_abort() for now, although eventually
the caller will do it unconditionally.
defined for an in-use socket. This allows us to eliminate countless tests
of whether so_pcb is non-NULL, eliminating dozens of error cases. For
now, retain the call to sotryfree() in the uipc_abort() path, but this
will eventually move to soabort().
These new assumptions should be largely correct, and will become more so
as the socket/pcb reference model is fixed. Removing the notion that
so_pcb can be non-NULL is a critical step towards further fine-graining
of the UNIX domain socket locking, as the so_pcb reference no longer
needs to be protected using locks, instead it is a property of the socket
life cycle.
K&R style function declarations with ANSI style. Also fix endian bugs
accessing ioctl arguments that are passed by value.
PR: kern/93897
Submitted by: Vilmos Nebehaj < vili at huwico dot hu >
MFC after: 1 week
debugging message if the flag PMC_F_OLDVALUE was specified in the
PMC_OP_RW request being acted upon. This should fix Coverity bug
CID 671.
Found by: Coverity Prevent
MFC after: 3 weeks
userland. The comment indicated that something in userland needed them, but
make universe can't seem to find any traces of it.
Move <sys/queue.h> include up.
consumers ignore the return value, soabort() is required to succeed,
and protocols produce errors here to report multiple freeing of the
pcb, which we hope to eliminate.
especially for vchans. It turns out that channel numbering always depend
on d->devcount counter (which keep increasing), while PCMMKMINOR() truncate
everything to 8bit length. At some point the truncation cause the newly
created character device overlapped with the existence one, causing erratic
overall system behaviour and panic. Easily reproduce with something like:
(Luckily, only root can reproduce this)
while : ; do
sysctl hw.snd.pcm0.vchans=200
sysctl hw.snd.pcm0.vchans=100
done
- Enforce channel/chardev numbering within 8bit boundary. Return E2BIG
if necessary.
- Traverse d->channels SLIST and try to reclaim "free" counter during channel
creation. Don't rely on d->devcount at all.
- Destroy vchans in reverse order.
Anyway, this is not the fault of vchans. It is just that vchans are so cute
and begging to be abused ;) . Don't blame her.
Old, hidden bugs.. sigh..
MFC after: 3 days
this corrects problems with drivers that rely on the host to do
crypto (iwi, ipw, ral, ural, wi (hostap), awi)
Hard work by: luigi, mlaier
Reviewed by: luigi, mlaier
MFC after: 1 week
applications.
Open[BGP|OSPF]D make use of this to determine the link status of
interfaces to make the right routing descisions.
Obtained from: OpenBSD
MFC after: 3 days
- Allow RTM_CHANGE to change a number of route flags as specified by
RTF_FMASK.
- The unused rtm_use field in struct rt_msghdr is redesignated as
rtm_fmask field to communicate route flag changes in RTM_CHANGE
messages from userland. The use count of a route was moved to
rtm_rmx a long time ago. For source code compatibility reasons
a define of rtm_use to rtm_fmask is provided.
These changes faciliate running of multiple cooperating routing
daemons at the same time without causing undesired interference.
Open[BGP|OSPF]D make use of these features to have IGP routes
override EGP ones.
Obtained from: OpenBSD (claudio@)
MFC after: 3 days
protocol to the socket. Normally protocol references are weak: that is,
the socket layer can tear down the socket (and hence protocol state)
when it finds convenient. This flag will allow the protocol to
explicitly declare to the socket layer that it is maintaining a
strong reference, rather than the current implicit model associated
with so_pcb pointer values and repeated attempts to possibly free the
socket.
rather than panicking later. This can occur if the kernel calls
vn_open() on a fifo, as there will be no associated file descriptor,
and therefore the file descriptor operations cannot be modified to
point to the fifo operation set.
MFC after: 3 days
Reported by: Martin <nakal at nurfuerspam dot de>
PR: 94278
Previously, we tried to allow this only for root. However, we were calling
suser() on the *target* process rather than the current process. This
means that if you can ptrace() a process running as root you can set a
hardware watch point in the kernel. In practice I think you probably have
to be root in order to pass the p_candebug() checks in ptrace() to attach
to a process running as root anyway. Rather than fix the suser(), I just
axed the entire idea, as I can't think of any good reason _at all_ for
userland to set hardware watch points for KVM.
MFC after: 3 days
Also thinks hardware watch points on KVM from userland are bad: bde, rwatson
processing, this actually means there's a double slash recorded in the
symbolic link's path name. We used to start over from / then, which
caused link targets like ../../bsdi.1.0/include//pathnames.h to be
interpreted as /pathnahes.h. This is both contradictionary to our
conventional slash interpretation, as well as potentially dangerous.
The right thing to do is (obviously) to just ignore that element.
bde once pointed out that mistake when he noticed it on the
4.4BSD-Lite2 CD-ROM, and asked me for help.
Reviewed by: bde (about half a year ago)
MFC after: 3 days
specified, the rightmost option takes effect." Fix code to obey
this. This makes e.g. "mount -r /usr" or "mount -ar" actually
mount file systems read-only.
to be 'long' instead of 'int' so that sysctl(8) correctly displays
the 8 returned bytes as a single 'long' instead of two 'int' values.
Submitted by: peter
Submitted by: green
- Speed up synchronization process by using configurable number of I/O
requests in parallel.
+ Add kern.geom.raid3.sync_requests tunable which defines how many parallel
I/O requests should be used.
+ Retire kern.geom.raid3.reqs_per_sync and kern.geom.raid3.syncs_per_sec
sysctls.
- Fix race between regular and synchronization requests.
- Reimplement raid3's data synchronization - do not use the topology lock
for this purpose, as it may case deadlocks.
- Stop synchronization from pre-sync hook.
- Fix some other minor issues.
Tested by: Mike Tancsa <mike@sentex.net>
MFC after: 3 days
requests in parallel.
+ Add kern.geom.mirror.sync_requests tunable which defines how many parallel
I/O requests should be used.
+ Retire kern.geom.mirror.reqs_per_sync and kern.geom.mirror.syncs_per_sec
sysctls.
- Fix race between regular and synchronization requests.
- Reimplement mirror's data synchronization - do not use the topology lock
for this purpose, as it may case deadlocks.
- Stop synchronization from pre-sync hook.
- Fix some other minor issues.
MFC after: 3 days
o call firmware_put() early to release the firmware module
o on firmware panics or watchdog timeouts, schedule a task to reinitialize
the interface (we may sleep in iwi_init())
o discard oversized rx frames
This does not do what I wanted as all dirty buffers must be flushed
by the call to ffs_sync and any remaining dependency work would mean
that this failed.
Pointed out by: tegge
This does not do what I wanted as all dirty buffers must be flushed
by the call to ffs_sync and any remaining dependency work would mean
that this failed.
Pointed out by: tegge
Fix detection of active unlinked files by checking VI_OWEINACT and
VI_DOINGINACT in addition to v_usecount.
Defer inactive handling for unlinked files if the file system is mostly
suspended (secondary writes being blocked).
Perform deferred inactive handling after the file system is resumed.
flag isn't enough - the filter needs to be set up too, or no multicast frames
are accepted.
Sponsored by: Philips Industrial Applications (indirectly)
MFC after: 3 days
o stop processing interrupts after a firmware fatal error or a radio kill
o clarify the possible values for the 'antenna' sysctl.
o by default, let the firmware do antenna diversity.
the firmware will periodically switch to another antenna to evaluate the
signal quality.
ATA framework. Mainly written to be able to use USB Flash keys.
This is work in progress so use with care :)
Doesn't need CAM and cannot coexist with umass.c
means that old problem was triggered (when two providers end at the same
offset, eg. ad0 and ad0s1 and the wrong was is picked up by gmirror/graid3).
Reported by: Michal Suszko <dry@dry.pl>
MFC after: 3 days
Use 'BOOT_SENSITIVE_INFO=YES' variable to turn them on.
- Use 'uint*_t' instead of 'u_int*_t', correct compilation warnings, and
update copyright while I am here.
In at least one benchmark this showed around a 20% performance increase.
If other workloads do benefit from having hyperthreads service interrupts,
we can always make this a loader tunable.
MFC after: 3 days
Tested by: ps
which is normal for own files of a device driver.
DEV_FOO should be used if an unrelated kernel file needs to know of
the `foo' driver's static presence. Obviously, module source files
should never use DEV_*.
triggers.
This should eliminate all the trivial messages which result from minor
increases in cpu_tick frequency.
Machines which don't du cpu clock fiddling shouldn't issue "backwards"
messages now.
Laptops and other machines where the initial estimate of cputicks may be
waaaay off will still issue warnings.
replacement for vn_write_suspend_wait() to better account for secondary write
processing.
Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.
Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.
actually is an mbuf to process. This catches the missing mbuf before it
would otherwise causes a NULL pointer dereference, which could be
triggered by a 0 length RPC record before the check for such records was
added in rev 1.97.
Approved by: cperciva (mentor)
The client's READDIRPLUS logic skips the attributes and
filehandle of the ".." entry. If the server doesn't send
attributes but does send a filehandle for "..", the
client's logic doesn't account for the extra "value
follows" field that indicates whether the filehandle is
present, causing the remaining entries in the reply
to be ignored.
Sponsored by: Network Appliance, Inc.
Reviewed by: rick, mohans
Approved by: silby
MFC after: 2 weeks
table. Previously, the ddb code knew of each linker set of auxiliary
commands and which explicit command list they were tied to. These changes
add a simple command_table struct that contains both the static list of
commands and the pointers for any auxiliary linker set of additional
commands. This also makes it possible for other arbitrary command tables
to be defined in other parts of the kernel w/o having to edit ddb itself.
The DB_SET macro has also been trimmed down to just creating an entry in
a linker set. A new DB_FUNC macro does what the old DB_SET did which is
to not only add an entry to the linker set but also to include a function
prototype for the function being added. With these changes, it's now also
possible to create aliases for ddb functions using DB_SET() directly if
desired.
slightly more tricky than on x86 as although the PCC is 64-bits, it is not
a simple 64-bit counter like the TSC. Instead, the upper 32-bits have
PAL-defined behavior and the lower 32-bits run as a free-running 32-bit
counter. To handle this, we detect overflows by maintaining a small amount
of per-cpu state and use this to simulate the upper 32-bits of the counter
providing a full 64-bit counter to the consumers of cpu_ticks().
an interrupt handler for the i8254. (Our clock interrupts come from
elsewhere.) Instead, use the same algo that i386 uses when the lapic
timer is in use. This lets us remove a lot of cruft that tried to handle
the i8254 interrupts that we weren't even using or setting up a handler
for.
- G/C a bunch of unused cruft while I'm here.
- Fix the code to not use the rpcc timecounter (similar to TSC) on SMP
machines to only disable that timecounter if more than one CPU is in
use by the kernel. Previously, a UP kernel on a machine with multiple
CPUs would needlessly disable this timecounter.
MFC after: 1 week
whether or not to allocate a full mbuf cluster rather than just a plain
mbuf when adding on additional mbufs in m_getm(). In practice, there wasn't
any resulting mem trashing since m_getm() doesn't ever allocate an mbuf with
a packet header, and MINCLSIZE is the available payload in an mbuf with a
header rather than the available payload in a plain mbuf.
Discussed with: andre (lightly)
will remain in the disabled state until another link event happens in the
future (if at all). Add a timer to periodically check the interface state and
recover.
Reported by: Nik Lam <freebsdnik j2d.lam.net.au>
MFC after: 3 days
enabled by default in NETSMB and smbfs.ko.
With the most of modern SMB providers requiring encryption by
default, there is little sense left in keeping the crypto part
of NETSMB optional at the build time.
This will also return smbfs.ko to its former properties users
are rather accustomed to.
Discussed with: freebsd-stable, re (scottl)
Not objected by: bp, tjr (silence)
MFC after: 5 days
generations of 802.11abg chipsets from Ralink Technology.
Get rid of the pccard front-end while I'm here since all adapters are
cardbus ones.
Obtained from: OpenBSD
vnode and a mode and checks if a given access mode is permitted.
This centralises the mac_bsdextended_enabled check and the GETATTR
calls and makes the implementation of the mac policy methods simple.
This should make it easier for us to match vnodes on more complex
attributes than just uid and gid in the future, but for now there
should be no functional change.
Approved/Reviewed by: rwatson, trhodes
MFC after: 1 month
sysinstall(8) still bogusly puts first partition at offset 0 instead of 16,
so glabel/ufs will find file system on slice instead of partition.
Before sysinstall is fixed, we must keep this code, which means that we
wont't be able to detect UFS file systems created with 'newfs -s ...'.
PS. bsdlabel(8) creates partitions properly.
MFC after: 3 days
- Include audit_internal.h to get definition of internal audit record
structures, as it's no longer in audit.h. Forward declare au_record
in audit_private.h as not all audit_private.h consumers care about
it.
- Remove __APPLE__ compatibility bits that are subsumed by configure
for user space.
- Don't expose in6_addr internals (non-portable, but also cleaner
looking).
- Avoid nested include of audit.h in audit_private.h.
Obtained from: TrustedBSD Project
- Add new comments.
- Move private data structures from public audit.h to audit_internal.h to
avoid exposing queue.h macros to undesiring consumers.
Obtained from: TrustedBSD Project
Releasing items from the mt_zone can not be done by a simple
uma_zfree() call since mt_zone is allocated with the UMA_ZONE_MALLOC
flag. Use uma_zfree_arg instead and supply the slab.
This bug caused panics in low memory situations on unloading kernel
modules containing MALLOC_DEFINE(..) statements.
Submitted by: ups
into a separate module. Accordingly, convert the option into a device
named similarly.
Note for MFC: Perhaps the option should stay in RELENG_6 for POLA reasons.
Suggested by: scottl
Reviewed by: cokane
MFC after: 5 days
re-organizing the monitor return logic. We perform interface monitoring
checks after we have determined if the CRC is still on the packet, if
it is, m_adj() is called which will adjust the packet length. This
ensures that we are not including CRC lengths in the byte counters for
each packet.
Discussed with: andre, glebius
that we might have address collisions, so make sure that this hardware address
isn't already in use on another bridge.
Submitted by: csjp
MFC after: 1 month
destination interface as a member of our bridge or this is a unicast packet,
push it through the bpf(4) machinery.
For broadcast or multicast packets, don't bother with the bpf(4) because it will
be re-injected into ether_input. We do this before we pass the packets through
the pfil(9) framework, as it is possible that pfil(9) will drop the packet or
possibly modify it, making it very difficult to debug firewall issues on the
bridge.
Further, implemented IFF_MONITOR for bridge interfaces. This does much the same
thing that it does for regular network interfaces: it pushes the packet to any
bpf(4) peers and then returns. This bypasses all of the bridge machinery,
saving mutex acquisitions, list traversals, and other operations performed by
the bridging code.
This change to the bridging code is useful in situations where individuals use a
bridge to multiplex RX/TX signals from two interfaces, as is required by some
network taps for de-multiplexing links and transmitting the RX/TX signals
out through two separate interfaces. This behaviour is quite common for network
taps monitoring links, especially for certain manufacturers.
Reviewed by: thompsa
MFC after: 1 month
Sponsored by: Seccuris Labs
be called without any vnode locks held. Remove calls to vn_start_write() and
vn_finished_write() in vnode_pager_putpages() and add these calls before the
vnode lock is obtained to most of the callers that don't already have them.
has many positive effects including improved smp locking, reducing
interdependencies between mounts that can lead to deadlocks, etc.
- Add the softdep worklist and various counters to the ufsmnt structure.
- Add a mount pointer to the workitem and remove mount pointers from the
various structures derived from the workitem as they are now redundant.
- Remove the poor-man's semaphore protecting softdep_process_worklist and
softdep_flushworklist. Several threads may now process the list
simultaneously.
- Add softdep_waitidle() to block the thread until all pending
dependencies being operated on by other threads have been flushed.
- Use softdep_waitidle() in unmount and snapshots to block either
operation until the fs is stable.
- Remove softdep worklist processing from the syncer and move it into the
softdep_flush() thread. This thread processes all softdep mounts
once each second and when it is called via the new softdep_speedup()
when there is a resource shortage. This removes the softdep hook
from the kernel and various hacks in header files to support it.
Reviewed by/Discussed with: tegge, truckman, mckusick
Tested by: kris
speeds to perform below the desired bitrate and throughput will be erratic.
This makes queueing work on the Geode SC1100, K5 model 0 and IDT WinChip C6
processors.
MFC after: 3 days
with malloc() or contigmalloc() as usual, but try to re-map the allocated
memory into a VA outside the KVA, non-cached, thus making the calls to
bus_dmamap_sync() for these buffers useless.
to call back for completition and something else is holding the taskqueue
waiting for ATA to return data.
This should clear up the "semaphore timeout !! DANGER Will Robinson !!"
in most situations, and log "taskqueue timeout - completing request directly"
instead, with a delayed "WARNING - freeing taskqueue zombie request" when
the taskqueue finally calls us back with the now stale request.
(It would have been nice if there was a way to remove a scheduled item from
a taskqueue, but that is not currently implemented in the kernel).
A real fix for this is in the works but wont make it to 6.1RELEASE
definite MFC candidate.
- Don't use a common buffer in the softc to store per-command data. Reserve
a buffer in the command itself.
- Don't allocate DMA memory for the kernel command structures when all you
really need is DMA memory for the scratch buffer embedded in them. Instead
allocate a slab for the scratch buffers and divide it up as needed.
- Call bus_dmamap_unload() at the completion of commands.
- Preserve and clear the CAM CCB status flags at completion.
- Reorder some low-level command operations to try to close races.
- Limit the simq to 32 commands for now. There are some serious problems
with the driver under load that are not well understood, so keeping the
simq lower helps avoid this. It has been tested at a higher value, but
this is a safe value that doesn't show much performance degredation.
These changes allow the driver to work reliably with >4GB of memory on i386
and amd64, and also work around deadlocks seen under very high load in
certain situations. The work-around is far from ideal, but without and
documentation it is hard to know what the right fix is.
MFC candidate
By default syscons(4) will look for the kbdmux(4) keyboard first, and then,
if not found, look for any keyboard.
Current kbd code is modified so if kbdmux(4) is the current keyboard, all
new keyboards are automatically added to the kbdmux(4).
Switch to kbdmux(4) can be done at boot time, by loading kbdmux module at
the loader prompt, or at runtime, by kldload'ing the kbdmux module and
releasing current active keyboard.
If, for whatever reason, kbdmux(4) is not required/desired then just do
not load it and everything should work as before. It is also possible to
kldunload kbdmux at runtime and syscons(4) will automatically switch to
the first available keyboard.
No response from: freebsd-current@
MFC after: 1 day
right from the beginning and partly clean up the differences in handling
between SYN_SENT and SYN_RCVD (syncache).
Further changes to this code to come. This is a first incremental step
to a general overhaul and streamlining of the TCP code.
PR: kern/15095
PR: kern/92690 (partly)
Reviewed by: qingli (and tested with ANVL)
Sponsored by: TCP/IP Optimization Fundraise 2005
- Throw out all of the logical APIC ID stuff. The Intel docs are somewhat
ambiguous, but it seems that the "flat" cluster model we are currently
using is only supported on Pentium and P6 family CPUs. The other
"hierarchy" cluster model that is supported on all Intel CPUs with
local APICs is severely underdocumented. For example, it's not clear
if the OS needs to glean the topology of the APIC hierarchy from
somewhere (neither ACPI nor MP Table include it) and setup the logical
clusters based on the physical hierarchy or not. Not only that, but on
certain Intel chipsets, even though there were 4 CPUs in a logical
cluster, all the interrupts were only sent to one CPU anyway.
- We now bind interrupts to individual CPUs using physical addressing via
the local APIC IDs. This code has also moved out of the ioapic PIC
driver and into the common interrupt source code so that it can be
shared with MSI interrupt sources since MSI is addressed to APICs the
same way that I/O APIC pins are.
- Interrupt source classes grow a new method pic_assign_cpu() to bind an
interrupt source to a specific local APIC ID.
- The SMP code now tells the interrupt code which CPUs are avaiable to
handle interrupts in a simpler and more intuitive manner. For one thing,
it means we could now choose to not route interrupts to HT cores if we
wanted to (this code is currently in place in fact, but under an #if 0
for now).
- For now we simply do static round-robin of IRQs to CPUs when the first
interrupt handler just as before, with the change that IRQs are now
bound to individual CPUs rather than groups of up to 4 CPUs.
- Because the IRQ to CPU mapping has now been moved up a layer, it would
be easier to manage this mapping from higher levels. For example, we
could allow drivers to specify a CPU affinity map for their interrupts,
or we could allow a userland tool to bind IRQs to specific CPUs.
The MFC is tentative, but I want to see if this fixes problems some folks
had with UP APIC kernels on 6.0 on SMP machines (an SMP kernel would work
fine, but a UP APIC kernel (such as GENERIC in RELENG_6) would lose
interrupts).
MFC after: 1 week
to allow PHOLD()'s of processes that have P_WEXIT set as once that flag
is set we aren't guaranteed to block in exit1() waiting for the PRELE()
(we might already be past the wait). However, curproc is a bit of a
special case. By the time P_WEXIT is set, the process is single-threaded,
so the only thread for which can do a PHOLD(curproc) is the thread
executing in exit1(). The fact that this thread is executing ensures
that the process won't go away before the current hold is released via
PRELE(). This fixes some panics due to kicking off softupdate operations
inside of exit1() after the recent PHOLD changes to fix ptrace/procfs vs
exit races.
MFC after: 1 week
Tested by: pho
mpt_soft_reset more than once. And to wait for MPT_DB_STATE_READY
twice. I mean, this is crucial- give the IOC a chance to get
ready.
If mpt_reset is called to reinit things, and we succeed, make
sure to re-enable interrupts. This is what has mostly led to
system lockup after having to hard reset the chip. Also, if
we think that interrupts aren't function in mpt_cam_timeout,
for goodness sake, turn them on again.
In read_cfg_header, return distinguishing errnos so the caller
can decide what's an error. It's *not* an error to fail to
read a RAID page from a non-RAID capable device like the FC929X.
Some whitespace fixes (removing spaces from ends of lines).
Developed with: Norbert Koch < NKoch at demig dot de >
No response from: freebsd-current@
Tested by: Norbert Koch < NKoch at demig dot de >
MFC after: 1 day
- use the cu_bridge_id rather than the cu_rootid for the bridge address [1]
- the memcmp return value is not signed so the wrong interface may have been
selected
- fix up the calculation of sc_bridge_id
PR: kern/93909 [1]
MFC after: 3 days
and want to have crypto support loaded as KLD. By moving zlib to separate
module and adding MODULE_DEPEND directives, it is possible to use such
configuration without complication. Otherwise, since IPSEC is linked with
zlib (just like crypto.ko) you'll get following error:
interface zlib.1 already present in the KLD 'kernel'!
Approved by: cognet (mentor)
Otherwise a kernel build would break in the coda5 module if the main
kernel conf file enabled CODA_COMPAT_5, too. Redefined symbols are
strictly disallowed by -Werror.
To overcome this issue, introduce a different symbol indicating coda5
build, CODA5_MODULE, and translate it to CODA_COMPAT_5 appropriately
in /sys/coda/coda.h.
MFC after: 3 days
so other threads can not see it if we unlock the proc
lock (this can happen in knlist_delete). Don't do wakeup,
it is not necessary.
2. Decrease kaio_buffer_count in biohelper rather than
doing it in aio_bio_done_notify.
3. In aio_bio_done_notify, don't send notification if KAIO_RUNDOWN
was set, because the process is already in single thread mode.
4. Use assignment to initialize aiothreadflags.
5. AIOCBLIST_RUNDOWN is not useful, axe the code using it.
6. use LIO_NOP instead of zero.
not hang the system for 5 seconds. If a TMF doesn't complete within,
oh, say 500ms, that's enough.
Put in a printout to catch mpt_recover_commands being activated with
no commands.
*both* SAS and FC, not just SAS.
b) Don't tell the chip we want it to do FIFO signalling if we actually
don't set up the address where the FIFO signal is supposed to be written
(oops).
is closed and then reopened. This appears to be necessary now that
we no longer clear endpoint stalls every time a pipe is opened.
Previously we could assume an initial toggle value of zero because
the clear-stall operation resets the device's toggle state.
Reported by: Holger Kipp
MFC after: 3 days
(atkbd(4)) and PS/2 mouse (psm(4)) are used together.
Turns out that atkbd(4) check_char() method may return "true" while
read_char() method returns NOKEY. When this happens kbdmux(4) was
simply stuck in the dead loop. Avoid dead loop in kbdmux(4) by breaking
out of the loop if read_char() method returns NOKEY.
It almost seems like a bug in atkkbd(4), atkbd_check_char() calls
kbdc_data_ready(), and, the later will return "true" if there are
pending data in either kbd or aux queue. However, because both aux
and kbd are on the same controller, I'm not sure if this is a bug
or feature.
Tested by: markus
MFC after: 1 day
is finished with it; this may only occur when the tx queue is setup as
dba-gated but since the fix is cheap apply it to all queues
while here make the queue depth signed for use in assertions
Reviewed by: apatti
MFC after: 2 weeks
inside !if defined(KERNBUILDDIR).
Utilize the fact the module will support all frames by default --
it needs no ETHER_* options unless some frames need to be disabled.
Fix the comment respectively.
Don't forget to create fake opt_ef.h if no ETHER_* are set.
MFC after: 3 days
o Add defines for the 5 interrupt sources typical for serial devices.
These defines can be used for more finegrained interrupt handling
between drivers that cooperatively handle multiple serial ports.
o Add defines for the various bitmasks applicable when all information
is passed between drivers as a single integral.
Return BUS_PROBE_LOW_PRIORITY for a successful probe. This is in
preparation of the introduction of scc(4), which is going to handle
SCCs in the near future.
simultaneous open. Both the bug and the patch were verified using the
ANVL test suite.
PR: kern/74935
Submitted by: qingli (before I became committer)
Reviewed by: andre
MFC after: 5 days
callout_drain() logic. We no longer need a separate non-spin mutex to
do sleep/wakeup with, instead we can now just use the one spin mutex to
manage all the callout functionality.
mutex.
- Don't use callout_drain() to stop the toffhandle callout while holding the
fdc mutex (this could deadlock) in functions called from softclock
(callouts aren't allowed to do voluntary sleeps). Instead, use
callout_stop(). Note that since we hold the associated mutex and are now
using callout_init_mtx(), callout_stop() is just as effective as
callout_drain(). (Though callout_drain() is still needed in detach to
make sure softclock isn't contesting on our mutex before we destroy the
mutex.)
- Remove unused callout 'tohandle' from softc.
MFC after: 1 week
the last reference is dropped. I forgot that vnodes can stick around
for a very long time until processes discover that they are dead. This
means that a vnode reference is not sufficient to keep the mount
referenced and even more code will be required to ref mount points.
Discovered by: kris
- Reorder the events in exit(2) slightly so that we trigger the S_EXIT
stop event earlier. After we have signalled that, we set P_WEXIT and
then wait for any processes with a hold on the vmspace via PHOLD to
release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is
invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops
to zero.
- Change proc_rwmem() to require that the processing read from has its
vmspace held via PHOLD by the caller and get rid of all the junk to
screw around with the vmspace reference count as we no longer need it.
- In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it
doesn't exist.
- Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers
FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem()
to clear an earlier single-step simualted via a breakpoint). We only
do one to avoid races. Also, by making the EINVAL error for unknown
requests be part of the default: case in the switch, the various
switch cases can now just break out to return which removes a _lot_ of
duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug
where a LWP ptrace command could return EINVAL with the proc lock still
held.
- Changed the locking for ptrace_single_step(), ptrace_set_pc(), and
ptrace_clear_single_step() to always be called with the proc lock
held (it was a mixed bag previously). Alpha and arm have to drop
the lock while the mess around with breakpoints, but other archs
avoid extra lock release/acquires in ptrace(). I did have to fix a
couple of other consumers in kern_kse and a few other places to
hold the proc lock and PHOLD.
Tested by: ps (1 mostly, but some bits of 2-4 as well)
MFC after: 1 week
interrupt handlers rather than BUS_SETUP_INTR() and BUS_TEARDOWN_INTR().
Uses of the BUS_*() versions in the implementation of foo_intr methods
in bus drivers were not changed. Mostly this just means that some
drivers might start printing diagnostic messages like [FAST] when
appropriate as well as honoring mpsafenet=0.
- Fix two more of the ppbus drivers' identify routines to function
correctly in the mythical case of a machine with more than one ppbus.
associated with the passed in pfs_node. If it does return a pointer, it
keeps the process locked. This allows a lot of places that were calling
pfind() again right after pfs_visible() to not have to do that and avoids
races since we don't drop the proc lock just to turn around and lock it
again. This will become more important with future changes to fix races
between procfs/ptrace and exit(2). Also, removed a duplicate pfs_visible()
call in pfs_getextattr().
Reviewed by: des
MFC after: 1 week
modules prior to looking up the directory which we will cover to avoid
this problem in mount.
- We must hold the coveredvp locked before we can busy the mountpoint to
prevent a lock order reversal with the vfs_busy() in lookup which holds
the directory lock prior to doing a vfs_busy(). The directory lock is
required to safely clear the v_mountedhere field on the directory.
MFC After: 1 week
prevent the mount point from going away while we're waiting on the lock.
The ref does not need to persist once we have the lock because the
lock prevents the mount point from being unmounted.
MFC After: 1 week
the VFS_STATFS call to prevent the mount from disappearing while we're
stating.
- Convert these routines to use MPSAFE namei semantics.
MFC After: 1 week
vop_lock_post do not trigger.
- Rearrange null_inactive to null_hashrem earlier so there is no chance
of finding the null node on the hash list after the locks have been
switched.
- We should never have a NULL lowervp in null_reclaim() so there is
no need to handle this situation. panic instead.
MFC After: 1 week
- Simplify the logic dealing with recycled vnodes in null_hashget() and
null_hashins(). Since we hold the lower node locked in both cases
the null node can not be undergoing recycling unless reclaim somehow
called null_nodeget(). The logic that was in place was not safe and
was essentially dead code.
MFC After: 1 week
object that requires Giant in vm_object_deallocate(). This is somewhat
hairy in that if we can't obtain Giant directly, we have to drop the
object lock, then lock Giant, then relock the object lock and verify that
we still need Giant. If we don't (because the object changed to OBJT_DEAD
for example), then we drop Giant before continuing.
Reviewed by: alc
Tested by: kris
a calcru() wrapper that passes a local rusage_ext on the stack that is
a snapshot to do the calculations on. Now we can pass p->p_crux to
calcru1() in calccru() again which fixes the issues with runtime going
backwards messages when dead processes are harvested by init.
Reviewed by: phk
Tested by: Stefan Ehmann shoesoft at gmx dot net
processing the interrupt events. If we clear them afterwards we
can completely miss some events as the NIC can change the source
flags while we're in the handler. In order to not get another
interrupt while we're in ifp->if_input() with the driver lock
dropped we now turn off NIC interrupts while in the interrupt
handler. Previously this was meant to be achieved by clearing the
interrupt source flags after processing the interrupt events but
didn't really work as clearing these flags doesn't actually
acknowledge and re-enable the events.
This fixes the device timeouts seen with the VMware LANCE.
- Relax the watchdog timer somewhat; don't enable it until the last
packet is enqueued and if there is a TX interrupt but there are
still outstanding ones reload the timer.
Reported and tested by: Morten Rodal <morten@rodal.no>
MFC after: 3 days
if ksocket is connected to an interface-type node somewhere later
in the graph (e.g., ng_eiface or ng_iface), the csum_data may be
applied to a wrong packet (if we encapsulate Ethernet or IP).
MFC after: 3 days
The ed driver on pc98 was broken by if_edvar.h rev1.31.
Reported by: Kaho Toshikazu (kaho at elam kais kyoto-u ac jp)
Tested by: Eiji Kato (ekato at a1 mbn or jp)
MFC after: 3 days
More info regarding these nics can be found at http://www.myri.com.
Please note that the files
sys/dev/myri10ge/{mcp_gen_header.h,myri10ge_mcp.h} are internally
shared between all our drivers (solaris, macosx, windows, linux, etc).
I'd like to keep these files unchanged, so I can just import newer
versions of them when the firmware API/ABI changes. This means I'm
stuck with some of the crazy-long #define names, and possibly
non-style(9) characteristics of these files.
Many thanks to mlaier for doing firmware(9) just as I
needed it, and to scottl for his helpful review.
Reviewed by: scottl, glebius
Sponsored by: Myricom Inc.
when option DEBUG_LOCKS is used. Trap frames are determined by checking
whether the caller was one of the tl0_*() or tl1_*() asm functions via
a newly added pair of dummy symbols in exception.S which mark the begin
and end of these functions. The tl_trap_* pair marks those in the special
.trap section and the tl_text_* in the regular .text section. Because
of their performance penalty db_search_symbol()/db_symbol_values() and
linker_ddb_search_symbol()/linker_ddb_symbol_values() aren't used here
for determining the caller, with db_search_symbol()/db_symbol_values()
additionally not being reentrant.
- For consistency, change db_backtrace() to also use the new markers for
determining the tl0_*() and tl1_*() asm functions instead of bcmp()'ing
the symbol name.
- Use FBSDID in db_trace.c.
PR: 93226
Based on a patch by: Antoine Brodin <antoine.brodin@laposte.net>
Ok'ed by: jhb
Without CODA_COMPAT_5, it would be equivalent to the plain coda
module. Therefore just add -DCODA_COMPAT_5 to CFLAGS instead of
fiddling with opt_coda.h. This is particularly important when
the module is built along with the kernel and CODA_COMPAT_5 isn't
in the kernel conf file (and so not in opt_coda.h either).
MFC after: 3 days
nfs_diskless.c if NFS_ROOT is in effect, e.g., present
in the kernel config file. Otherwise the built module
won't load due to an undefined reference to nfs_setup_diskless.
MFC after: 3 days
can't be changed from userland. Make them read-only and provide
descriptions.
kern.ipc.max_datalen must never be less than one byte. Enforce this
with a panic in net_init_domain().
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days
forcing DMA alignment to default buffer size.
- Make sure DMA pointer properly aligned to avoid being truncated by caller
which causing severe underruns and random popping (especially in 32bit
playback / recording).
- Add AC97 inverted external amplifier quirk for Maxselect x710s
- http://maxselect.ru/
MFC after: 1 week
current directory to allow user rules to create the firmware (e.g. from a
uuencoded blob). make's version of if is evaluated too early to catch this.
Found-by: gallatin
the SMI/TCO address space. Switch the bus space I/O to the
one specific for either the SMI or TCO space. Re-calibrate
the tick. Add some more device id's, 82801FBR submitted by des.
This makes it work on the platforms I've tested with.
Go ahead by: des
should fix strange link state behaviour reported for bcm5721 & bcm5704c
2) Clear bge_link flag in bge_stop()
3) Force link state check after bge_ifmedia_upd(). Otherwise we can miss link
event if PHY changes it's state fast enough.
Tested by: phk (bcm5704c)
Approved by: glebius (mentor)
MFC after: 1 week
jumbo mbuf clusters. To make the variable size clear they are named
MJUMPAGESIZE.
Having jumbo clusters with the native PAGE_SIZE is more useful than
a fixed 4k size according the device driver writers using this API.
The 9k and 16k jumbo mbuf clusters remain unchanged.
Requested by: glebius, gallatin
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days
Acknowledgement should definitly go to JMicron Technology for providing full
docs on the metadata format as the only vendor so far, big thanks from here.
threshold. Inflight doesn't make sense on a LAN as it has
trouble figuring out the maximal bandwidth because of the coarse
tick granularity.
The sysctl net.inet.tcp.inflight.rttthresh specifies the threshold
in milliseconds below which inflight will disengage. It defaults
to 10ms.
Tested by: Joao Barros <joao.barros-at-gmail.com>,
Rich Murphey <rich-at-whiteoaklabs.com>
Sponsored by: TCP/IP Optimization Fundraise 2005
suspension code. When a thread A is going to sleep, it calls
sleepq_catch_signals() to detect any pending signals or thread
suspension request, if nothing happens, it returns without
holding process lock or scheduler lock, this opens a race
window which allows thread B to come in and do process
suspension work, however since A is still at running state,
thread B can do nothing to A, thread A continues, and puts
itself into actually sleeping state, but B has never seen it,
and it sits there forever until B is woken up by other threads
sometimes later(this can be very long delay or never
happen). Fix this bug by forcing sleepq_catch_signals to
return with scheduler lock held.
Fix sleepq_abort() by passing it an interrupted code, previously,
it worked as wakeup_one(), and the interruption can not be
identified correctly by sleep queue code when the sleeping
thread is resumed.
Let thread_suspend_check() returns EINTR or ERESTART, so sleep
queue no longer has to use SIGSTOP as a hack to build a return
value.
Reviewed by: jhb
MFC after: 1 week
and it has not plenty of free pages it tries to free pages in the cache queue.
Unfortunately freeing a cached page requires the locking of the object that
owns the page. However in the context of allocating pages we may not be able
to lock the object and thus can only TRY to lock the object. If the locking try
fails the cache page can not be freed and is activated to move it out of the way
so that we may try to free other cache pages.
If all pages in the cache belong to objects that are currently locked the
cache queue can be emptied without freeing a single page. This scenario caused
two problems:
1) vm_page_alloc always failed allocation when it tried freeing pages from
the cache queue and failed to do so. However if there are more than
cnt.v_interrupt_free_min pages on the free list it should return pages
when requested with priority VM_ALLOC_SYSTEM. Failure to do so can cause
resource exhaustion deadlocks.
2) Threads than need to allocate pages spend a lot of time cleaning up the
page queue without really getting anything done while the pagedaemon
needs to work overtime to refill the cache.
This change fixes the first problem. (1)
Reviewed by: tegge@
- In em_attach() remove em_check_for_link(). Not needed here, since
already done in em_hardware_init().
- In em_attach() replace the printing block with call to
em_update_link_status().
- Remove modification of sc->link_state from em_hardware_init() and
from em_media_status(). This makes em_update_link_status() a
single point of change. Call em_update_link_status() where needed.
to getting rid u_int for uint and so on).
b) Turn back on 64 bit DAC support. Cheeze it a bit in that we have two
DMA callback functions- one when we have bus_addr_t > 4 bits in width and
the other which should be normal. Even Cheezier in that we turn off setting
up DMA maps to be BUS_SPACE_MAXADDR if we're in ISP_TARGET_MODE. More work
on this in a week or so.
c) Tested under amd64 and 1MB DFLTPHYS, sparc64, i386 (PAE, but insufficient
memory to really test > 4GB). LINT check under amd64.
MFC after: 1 month
with methods: instead of honoring non-zero values expect drivers
to write their own values on return from ieee80211_ifattach
o add a define for the default h/w bmiss count
MFC after: 2 weeks
pages, not a count of bytes. The sysctl handler for hw.realmem already
uses ctob() to convert realmem from pages to bytes. Thus, on archs that
were storing a byte count in the realmem variable, hw.realmem was inflated.
Reported by: Valerio daelli valerio dot daelli at gmail dot com (alpha)
MFC after: 3 days
it so that ip_id etc. don't get overwritten. This fixes forwarding
of fragmented IP packets through a dummynet pipe -- fragments came
out with modified and different(!) ip_id's, making it impossible to
reassemble a datagram at the receiver side.
Submitted by: Alexander Karptsov (reworked by me)
MFC after: 3 days
remote CPU. While here, abstract thread suspension code into a function
called sig_suspend_threads, the function is called when a process received
a STOP signal.
save the MCA state of the AP. Saving the MCA state of the AP requires
us to allocate memory, which uses sleep locks.
Now that we correct the spinlock nesting of the AP without having
schedlock, avoid calling spinlock_exit(). Instead call critical_exit()
and manually clear the MD spinlock count.
MFC after: 3 days
to preserve currect behaviour). When set to 0, components are not
disconnected - graid3 will try to still use them (only first error will
be logged). This is helpful when we have two broken components, but in
different places, so actually all data is available.
Such buggy component will be visible in 'graid3 list' output with flag
BROKEN.
- Never disconnect the last valid component. If we detect errors there we
will just pass them up. This wasn't reasonable to deny access to the
whole provider because of one broken sector.
Prodded by: ru
MFC after: 3 days
to preserve currect behaviour). When set to 0, components are not
disconnected - gmirror will try to still use them (only first error will
be logged). This is helpful when we have two broken components, but in
different places, so actually all data is available.
Such buggy component will be visible in 'gmirror list' output with flag
BROKEN.
- Never disconnect the last valid component. If we detect errors there we
will just pass them up. This wasn't reasonable to deny access to the
whole provider because of one broken sector.
Prodded by: ru
MFC after: 3 days
An example entries for loader.conf to make it possible:
geli_da0_keyfile0_load="YES"
geli_da0_keyfile0_type="da0:geli_keyfile0"
geli_da0_keyfile0_name="/boot/keys/da0.key0"
geli_da0_keyfile1_load="YES"
geli_da0_keyfile1_type="da0:geli_keyfile1"
geli_da0_keyfile1_name="/boot/keys/da0.key1"
geli_da0_keyfile2_load="YES"
geli_da0_keyfile2_type="da0:geli_keyfile2"
geli_da0_keyfile2_name="/boot/keys/da0.key2"
geli_da1s3a_keyfile0_load="YES"
geli_da1s3a_keyfile0_type="da1s3a:geli_keyfile0"
geli_da1s3a_keyfile0_name="/boot/keys/da1s3a.key"
Thanks for jhb and kan who showed me the right direction.
MFC after: 3 days
Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.
Don't reach across CPUs in calcru().
Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).
Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.
Use 27MHz counter on i386/Geode.
Use TSC on amd64 & i386 if present.
Use tick counter on sparc64
automatically both SATA and SAS drives. The async SAS event handling we catch
but ignore at present (so automagic attach/detach isn't hooked up yet).
Do 64 bit PCI support- we can now work on systems with > 4GB of memory.
Do large transfer support- we now can support up to reported chain depth, or
the length of our request area. We simply allocate additional request elements
when we would run out of room for chain lists.
Tested on Ultra320, FC and SAS controllers on AMD64 and i386 platforms.
There were no RAID cards available for me to regression test.
The error recovery for this driver still is pretty bad.
to work with ipmitools. It works with other tools that have an OpenIPMI
driver interface. The port will need to get updated to used this.
I have not implemented the IPMB mode yet so ioctl's for that don't
really do much otherwise it should work like the OpenIPMI version.
The ipmi.h definitions was derived from the ipmitool header file.
The bus attachments are done for smbios and pci/smbios. Differences
in bus probe order for modules/static are delt with. ACPI attachment
should be done.
This drivers registers with the watchdod(4) interface
Work to do:
- BT interface
- IPMB mode
This has been tested on Dell PE2850, PE2650 & PE850 with i386 & amd64
kernel.
I will link this into the build on next week.
Tom Rhodes, helped me with the man page.
Sponsored by: IronPort Systems Inc.
Inspired from: ipmitool & Linux
o add dfs+radar hooks; DFS is presently disabled in the hal
o channel and mode handling changes
o various api changes
o be more aggressive about iq calibration settling so ap mode
operation is better immediately after startup
o rfkill/rfsilent sysctl support
o tpc ack/cts sysctl support
MFC after: 2 weeks
o new chip support
o new platforms: powerpc-be-elf, sparc64-be-elf, and alpha-elf
(alpha is untested, others are known to work)
o many fixes and improvements
MFC after: 2 weeks
zero in this case.) A kernel driver has IFF_DRV_RUNNING
at its full disposal while IFF_UP may be toggled only by
humans or their daemonic deputies from the userland.
MFC after: 3 days
o assume all data frames have been classified so there's no need
to check if QoS is being used, just fetch the wme priority from
the mbuf
o fix double counting of noack frames
o fix nearby comment
MFC after: 2 weeks
o pull nexttbtt forward in adhoc mode too
o resync beacon timers on joining a bss or ibss as the tstamp we
collected while scanning is almost certainly out of date
Note we may need to refine the ibss mode check in ath_recv_mgmt.
Reviewed by: avatar, dyoung
Obtained from: atheros
MFC after: 2 weeks
frame and if we get a beacon miss interrupt ignore it if we've received
a frame within the beacon miss interval. This should never trigger
and the handling at the net80211 layer should likewise deal with this
but it doesn't hurt and can suppress extranous probe request frames.
Note that we can legtimately get a bmiss when under heavy load.
MFC after: 2 weeks
- Restart watchdog if we *did* processed any descriptors. [1]
- Log the watchdog event if the link is *up*. [2]
PR: kern/92948 [1]
Submitted by: Mihail Balikov <mihail.balikov interbgc.com> [1]
PR: kern/92895 [2]
Submitted by: Vladimir Ivanov <wawa yandex-team.ru> [2]
it. The former code used to hang older Intel CPUs by trying to get
non-existent TLB info 2^32 times.
Reduce code duplication around the calls to CPUID 0x02 by using
do-while loops.
PR: i386/92977
Tested by: cy
driver; this closes a race where a response could be processed
before the timer was started and cause a RUN->SCAN state change
when operating in station mode
Reviewed by: avatar, dyoung
MFC after: 1 week
Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly. Reduce it
from uint64_t to u_int.
Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret(). Use the absolute value of
td->pticks in userret() and eliminate third argument.
In the future, we may want to acquire the lock early in the function and
hold it across calls to vn_rdwr(), etc, to avoid multiple acquires.
Spotted by: kris (bugmagnet)
Obtained from: TrustedBSD Project
Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.
For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)
On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.
did not stop at the right node. Change the backtracking check from
smaller-than to smaller-or-equal to prevent this from happening.
While here fix one additional problem where the insertion of the
default route traversed the entire tree.
PR: kern/38752
Submitted by: qingli (before I became committer)
Reviewed by: andre
MFC after: 3 days
in syncache_lookup() is not cleared and may lead to an arbitrary and
bogus rtentry pointer which later gets free'd.
Reviewed by: andre
MFC after: 3 days
for acctwatch() runs to be negative or zero as this could result in either
a possible hang (or panic if INVARIANTS is on). Previously the accounting
code handled the <= 0 case by calling acctwatch on every clock tick (eww!)
due to an implementation detail of callout_reset(). (Tick counts of
<= 0 are converted to 1).
MFC after: 3 days
miibus (like today). If you want a subset, choose device mii and zero
or more phy to include. We always include unkphy. We make use of
the | functionality that ruslan recently added to config.
This allowed me to trim 57k from my KB9202 kernel.
than just deleting them. Also add comments about why we do this.
Given the current behavior of delete_child, I don't think this changes
anything. It just feels cleaner.
instead of calling acctwatch() from softclock. The acctwatch() function
needs to hold an sx lock and also makes a VFS call, and neither of these
are good things (or safe) to do from a callout. The kthread only exists
and is running when accounting is turned on; it is started and stopped
as needed. I didn't run acctwatch() via the thread taskqueue at Robert's
request as he was worried that if the accounting file was over NFS the
VFS_STAT() calls might stall other work on the taskqueue.
- Add an acct_disable() function to take care of closing the accounting
vnode and cleaning up so we don't duplicate the same code in two
different places.
MFC after: 3 days
applications to insert a "tee" in the live audit event stream. Records
are inserted into a per-clone queue so that user processes can pull
discreet records out of the queue. Unlike delivery to disk, audit pipes
are "lossy", dropping records in low memory conditions or when the
process falls behind real-time events. This mechanism is appropriate
for use by live monitoring systems, host-based intrusion detection, etc,
and avoids applications having to dig through active on-disk trails that
are owned by the audit daemon.
Obtained from: TrustedBSD Project
initialization routines into a ctor, tear-down to a dtor, cleaning
up, etc. This will allow audit records to be allocated from
per-cpu caches.
On recent FreeBSD, dropping the audit_mtx around freeing to UMA is
no longer required (at one point it was possible to acquire Giant
on that path), so a mutex-free thread-local drain is no longer
required.
Obtained from: TrustedBSD Project
hack where it assumes the first field of the driver softc is the struct
ifnet, and it copies its value in mii_phy_probe().
- In the interrupt handler, set the mbuf m_len field on packet receive.
the callers if the exec either succeeds or fails early.
- Move the code to call exit1() if the exec fails after the vmspace is
gone to the bottom of kern_execve() to cut down on some code duplication.
- Run send queue down to completion, not just one packet.
It has been observed to cause a stall queue otherwise.
- Prevent queueing multiple function calls to a node.
MFC after: 3 days
vfs_mount_destroy waiting for this ref to hit 0. We don't print an
error if we are rebooting as the root mount always retains some refernces
by init proc.
- Acquire a mnt ref for every vnode allocated to a mount point. Drop this
ref only once vdestroy() has been called and the mount has been freed.
- No longer NULL the v_mount pointer in delmntque() so that we may release
the ref after vgone() has been called. This allows us to guarantee
that the mount point structure will be valid until the last vnode has
lost its last ref.
- Fix a few places that rely on checking v_mount to detect recycling.
Sponsored by: Isilon Systems, Inc.
MFC After: 1 week
lock also protects this flag so it is not necessary.
- Don't rely on v_mount to detect whether or not we've been recycled, use
the more appropriate VI_DOOMED instead.
Sponsored by: Isilon Systems, Inc.
MFC After: 1 week
over from the Darwin implementation.
When we implement a system call as a wrapper to sysctl(), audit it as
AUE_SYSCTL. This leads to greater compatibility with Solaris audit
trails as sysctl() argument tokens are not the same as the ones for
the originaly system calls (i.e., setdomainname()).
Replace references to AUE_ events that are equivilent to AUE_NULL with
AUE_NULL. In the case of process signal configuration, this is
because these events do not require auditing.
Move from the Darwin spelling of getsockopt() to the FreeBSD/Solaris
one.
Audit nmount().
Obtained from: TrustedBSD Project
audit thread exit, but should that happen, this will prevent
unhappiness, as the thread exit system call will never return, and
hence not commit the record.
Pointed out by/with: cognet
Obtained from: TrustedBSD Project
to sys/bsm:
- Correct error in definition of audit event for Linux setfsgid().
- Add audit event identifier for sysarch().
Obtained from: TrustedBSD Project
is assigned to xenix_eaccess() instead of AUE_ACCESS, as that is the
intended meaning of the system call. xenix_eaccess() should be
reimplemented using our native eaccess() implementation so that it
works as intended.
Obtained from: TrustedBSD Project
This should not happen, but with this assert, brueffer and I would
not have spent 45 minutes trying to figure out why he wasn't
seeing audit records with the audit version in CVS.
Obtained from: TrustedBSD Project
dereferencing) since a NULL value would be a bug here.
Note: Both affected functions look very similar. A refactoring may
be beneficial.
CID: 483, 485
Found with: Coverity Prevent(tm)
Discussed with: ariff
MFC after: 5 days
needed here, except there's a bug which results in detaching the device
twice.
Move the NULL pointer check to the beginning of the function and convert
it into a KASSERT.
CID: 420
Found with: Coverity Prevent(tm)
Discussed with: ariff
MFC after: 5 days
and vnode attribute information for looked up vnodes during the lookup
operation. This will allow consumers of namei() to specify that this
information be added to the in-process audit record.
Submitted by: wsalamon
Obtained from: TrustedBSD Project
is a ARM920T based CPU with a bunch of built-in peripherals. The
inital import supports the SPI bus, the TWI bus (although iicbus
integration is not complete), the uarts, the system timer and the
onboard ethernet. Support for the Kwikbyte KB9202
(http://www.kwikbyte.com) board is also included, although there's no
reason why the 9200 and the 9201 wouldn't also work. Primitive
support for running under the skyeye emulator is also provided
(although skyeye's support for the AT91RM9200 is a little weak).
The code has been structured so that other members of Atmel's arm family can
be supported in the future. The AT91SAM9260 is not presently supported
due to lack of hardware. The arm7tdmi families are also not supported
becasue they lack an MMU.
Many thanks to cognet@ for his help and assistance in bringing up this
board. He did much of the vm work and wrote parts of the uart and
system timer code as well as the bus space implementation.
The system boots to single user w/o problem, although the serial
console is a little slow and the ethernet driver is still in flux.
This work was sponsored by Timing Solutions, Corporation. I am
grateful to their support of the FreeBSD project in this manner.
Control) devices as console. These are microcontrollers which are either
on-board or part of an add-on card and provide terminal server, remote
power switch and monitoring functionality. For console usage these are
connected to the rest of the system via a SCC or an UART. This commit adds
support for the following variants (corresponds to what 'input-device' and
'output-device' have to be set to):
rsc found on-board in E250 and supposedly some Netra, connected
via a SAB82532, com. parameters can be determined via OFW
rsc-console RSC card found in E280R, Fire V4x0, Fire V8x0, connected
via a NS16550, hardwired to 115200 8N1
lom-console LOMlite2 card found in Netra 20/T4, connected via a NS16550,
hardwired to 9600 8N1
- Add my copyright to uart_cpu_sparc64.c as I've rewritten about one third
of that file over time.
Tested on: E250, E280R
Thanks to: dwhite@ for providing access to an E280R
OK'ed by: marcel
MFC after: 1 week
casting a structure to a uint32_t *. Many drivers in the tree do this, but
I'll not update them until these changes can be reviewed by the pedantic
standards folks.
in order to query the underlying Windows driver for the station address
and some other properties. There is a slim chance that the card may
receive a packet and indicate it up to us before ndis_attach() can call
ndis_halt_nic(). This is bad, because both the softc structure and
the ifnet structure aren't fully initialized yet: many pointers are
still NULL, so if we make it into ndis_rxeof(), we will panic.
To fix this, we need to do the following:
- Move the calls to IoAllocateWorkItem() to before the call to ndis_init_nic().
- Move the initialization of the RX DPC and status callback function pointers
to before ndis_init_nic() as well.
- Modify ndis_rxeof() to check if the IFF_DRV_RUNNING flag is set. If it
isn't, we return any supplied NDIS_PACKETs to the NIC without processing
them.
This fixes a crash than can occur when activating a wireless NIC in
close proximity to a very busy wireless network, reported by Ryan
Beasley (ryan%^$!ATgoddamnbastard-****!!!DOTorg.
MFC after: 3 days
revision 1.288
date: 2006/02/04 14:11:33; author: wsalamon; state: Exp; lines: +4 -1
Hook up the audit system to system call entry and exit. System calls will
now be audited.
Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)
The patch crudely forces the NIC out of operating mode before the nve(4)
driver can initialize it; this is required to properly initialize the NIC.
It is XBox-specific, as this condition can only occur on XBoxes (Most loaders
will simply leave the NIC running, forcing us to use a crude workaround like
this to get it in a workable condition). Due to the XBox-only aspect, this has
been solved in XBox-specific initialization code and not within nve(4).
Reviewed by: imp
Approved by: imp (mentor)
No objection: bz@, obrien@, q@ontheweb.com.au
are bus_addr_t, not bus_size_t.
In any case, turn off DAC support entirely until it is revamped to actually
work *correctly* for 64 bit platforms (not using a PAE definition and for
both initiator and target mode).
Treat SIOCADDMULTI and SIOCDELMULTI the same, since they had the same code
Remove redundant assignment to error
Convert to using the altq interface completely.
we have another PCB which is bound to 0.0.0.0. If a PCB has the
INP_IPV6 flag, then we set its cost higher than IPv4 only PCBs.
Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME
MFC after: 1 week
an incompatible conversion from a 64-bit pointer to a 32-bit integer on
64-bit platforms. We will investigate whether Solaris uses a 64-bit
token here, or a new record here, in order to avoid truncating user
pointers that are 64-bit. However, in the mean time, truncation is fine
as these are rarely/never used fields in audit records.
Obtained from: TrustedBSD Project
vnode is from a file system that is not MPSAFE, as vrele() expects
Giant to be held when it is called on a non-MPSAFE vnode.
Spotted by: kris
Tested by: glebius
to sbus_mdvec.dv_clock as sbus_mdvec.dv_clock is meant to be specified
in MHz. While this was a bug it shouldn't have affected FreeBSD/sparc64
as sbus_mdvec.dv_clock is used to limit the clock rate of chips when
a machine isn't able to support them at maximum speed which isn't the
case for sun4u machines.
- Remove the code that checks whether the clock frequency returned by
sbus_get_clockfreq() is 0 and falls back to 25MHz if it is as that's
already done in sbus(4).
Approved by: mjacob
MFC after: 3 days
store some pipe pointers on stack. If user reconfigures dummynet
in the interlock gap, we can work with freed pipes after relock.
To fix this, we decided not to send packets in transmit_event(),
but fill a queue. At the end of dummynet() and dummynet_io(),
after the lock is dropped, if there is something in the queue
we run dummynet_send() to process the queue.
In collaboration with: ru
dedicated to storing pv entries, originally so that kva didn't have to be
allocated at inconvenient times. For amd64, we can get the same effect by
using the direct map area. Allocating pages is the same as with the object
backed method, but now we can just lookup the page in the direct map area.
Thus, no more pageable kva is reserved. This is the single largest
consumer of kva on our work machines and this change should help conserve
the fixed size 2GB pageable kva on the amd64 kernel.
There are a pair of sysctl nodes introduced, named the same as their
tunable counterparts. vm.pmap.shpgperproc and vm.pmap.pv_entry_max
They work just like the tunables of the same path, except the values are
linked. The pv entry cap is now dynamically changeable.
I didn't make them totally unlimited because we need some sort of safety
limit still. One could consume all physical memory without a cap.
the system when brelse() was called with B_RELBUF set on the buffer. This
could be a problem when the system was low on memory, had many buffers on
QUEUE_EMPTYKVA and started to traverse directories. For each getnewbuf(),
pages were allocated from the system, driving the free reserve downwards.
For each brelse(), the system put the buffer on QUEUE_CLEAN, with B_INVAL
set.
This commit changes the semantics of B_RELBUF to also free pages from
non-VMIO buffers.
Reviewed by: alc
routines.
- Add or replace cpu_spinwait() with DELAY(1) to a few of the busy
loops when reading from the controller to work around firmware bugs
which can crash the controller.
with pseudo header for tcp/udp packets). This could save one in_pseudo() call
per incoming tcp/udp packet.
Approved by: glebius (mentor)
MFC after: 3 weeks
filtering mechanisms to use the new rwlock(9) locking API:
- Drop the variables stored in the phil_head structure which were specific to
conditions and the home rolled read/write locking mechanism.
- Drop some includes which were used for condition variables
- Drop the inline functions, and convert them to macros. Also, move these
macros into pfil.h
- Move pfil list locking macros intp phil.h as well
- Rename ph_busy_count to ph_nhooks. This variable will represent the number
of IN/OUT hooks registered with the pfil head structure
- Define PFIL_HOOKED macro which evaluates to true if there are any
hooks to be ran by pfil_run_hooks
- In the IP/IP6 stacks, change the ph_busy_count comparison to use the new
PFIL_HOOKED macro.
- Drop optimization in pfil_run_hooks which checks to see if there are any
hooks to be ran, and returns if not. This check is already performed by the
IP stacks when they call:
if (!PFIL_HOOKED(ph))
goto skip_hooks;
- Drop in assertion which makes sure that the number of hooks never drops
below 0 for good measure. This in theory should never happen, and if it
does than there are problems somewhere
- Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep
- Drop variables which support home rolled read/write locking mechanism from
the IPFW firewall chain structure.
- Swap out the read/write firewall chain lock internal to use the rwlock(9)
API instead of our home rolled version
- Convert the inlined functions to macros
Reviewed by: mlaier, andre, glebius
Thanks to: jhb for the new locking API
- td_ar to struct thread, which holds the in-progress audit record during
a system call.
- p_au to struct proc, which holds per-process audit state, such as the
audit identifier, audit terminal, and process audit masks.
In the earlier implementation, td_ar was added to the zero'd section of
struct thread. In order to facilitate merging to RELENG_6, it has been
moved to the end of the data structure, requiring explicit
initalization in the thread constructor.
Much help from: wsalamon
Obtained from: TrustedBSD Project
option. We always build audit_syscalls.c so that the system call
stubs can return ENOSYS rather than the system call code
generating SIGSYS for the system calls. We are not yet ready to
add AUDIT to LINT, as the prototypes for system call arguments
won't be there until after the system calls for audit are added.
Much work from: wsalamon
Obtained from: TrustedBSD Project
- Management of audit state on processes.
- Audit system calls to configure process and system audit state.
- Reliable audit record queue implementation, audit_worker kernel
thread to asynchronously store records on disk.
- Audit event argument.
- Internal audit data structure -> BSM audit trail conversion library.
- Audit event pre-selection.
- Audit pseudo-device permitting kernel->user upcalls to notify auditd
of kernel audit events.
Much work by: wsalamon
Obtained from: TrustedBSD Project, Apple Computer, Inc.
couple of FreeBSD-specific modifications that may be merged out
later). These include files define the basic audit data
structures, types, and definitions use by the kernel, or shared
by the kernel and user space.
Obtained from: TrustedBSD Project, Apple Computer, Inc.
capability is present as not all devices supported by the agp_i810 driver
(such as i915) have the AGP capability. Instead, add an identify routine
to the agp_i810 driver that uses the PCI ID to determine if it should
create an agp child device.
to process. It could give us [significant?] perfomance increase if there is big
difference between RX/TX flows.
Submitted by: Mihail Balikov <mihail.balikov AT interbgc DOT com>
Approved by: glebius (mentor)
MFC after: 3 days
synchronized on every call of bge_poll_locked().
Suggested by: Mihail Balikov <mihail.balikov AT interbgc DOT com>
Approved by: glebius (mentor)
MFC after: 3 days
2) add missing bus_dmamap_sync() call in bge_intr()
Tested by: Husnu Demir <hdemir AT metu DOT edu DOT tr>
Approved by: glebius (mentor)
MFC after: 3 days
and signifincantly improve the readability of ip_input() and
ip_output() again.
The resulting IPSEC hooks in ip_input() and ip_output() may be
used later on for making IPSEC loadable.
This move is mostly mechanical and should preserve current IPSEC
behaviour as-is. Nothing shall prevent improvements in the way
IPSEC interacts with the IPv4 stack.
Discussed with: bz, gnn, rwatson; (earlier version)
The former type, size_t, was causing truncation to 32 bits on i386,
which immediately led to undersizing of VM objects backed by
files >4GB. In particular, sendfile(2) was broken for such files.
PR: kern/92243
MFC after: 5 days
without Giant held. Do this by tracking the vfslocked state for
the directory seperate from the child. This is only important
in the case where we cross a mountpoint.
Sponsored by: Isilon Systems, Inc.
MFC After: 3 days
on a lock held the last usecount ref on a vnode and the lock failed we
would not call INACTIVE. Solve this by only holding a holdcnt to prevent
the vnode from disappearing while we wait on vn_lock. Other callers
may now VOP_INACTIVE while we are waiting on the lock, however this race
is acceptable, while losing INACTIVE is not.
Discussed with: kan, pjd
Tested by: kkenn
Sponsored by: Isilon Systems, Inc.
MFC After: 1 week
directory. vrele() may lock the passed vnode, which in these cases would
give an invalid lock order of child -> parent. These situations are
deadlock prone although do not typically deadlock because the vrele
is typically not releasing the last reference to the vnode. Users of
vrele must consider it as a call to vn_lock() and order it appropriately.
MFC After: 1 week
Sponsored by: Isilon Systems, Inc.
Tested by: kkenn
in order to support the on-board LANCE in Ultra 1 and to the MI NOTES as
it should work just fine with the AMD PCnet family of chips on all archs
but is not yet meant to replace lnc(4). If a kernel includes all of le(4),
lnc(4) and pcn(4) precedence is given to lnc(4)/pcn(4) for now.
will be sent if there is an address on the bridge. Exclude the bridge from the
special arp handling.
This has been tested with all combinations of addresses on the bridge and members.
Pointed out by: Michal Mertl
- code expects memcmp() to return a signed value, our memcmp() returns 0 if
args are equal and > 0 if not.
- It's possible to hijack interface for static entry. If bridge recieves
packet from interface marked as learning it will replace the bridge_rtnode
entry for the source address even if such entry marked as static.
Submitted by: Gleb Kurtsov <k-gleb yandex.ru>
MFC after: 3 days
try to use the registrant for numbers in this file, not the OEM that
put their label on it. Use PNY's real number 0x154b. Add another PNY
atachmate with quirks from a PR filed a while ago, but that I can't
seem to find now...
This logic change was introduced in revision 1.74:
Correct an oversight in jail() that allowed processes in jail to access
ptys in ways that might be unethical, especially towards processes not in
jail, or in other jails.
It should be fine to allow root in the host environment to do this. This
allows for more effective monitoring of prisons from the host environment.
Discussed with: rwatson
MFC after: 1 week
beginning and simply refuse to attach to a parent without either
flag.
Our network stack cannot handle well IFF_BROADCAST or IFF_MULTICAST
on an interface changing on the fly. E.g., IP will or won't assign
a broadcast address to an interface and join the all-hosts multicast
group on it depending on its IFF_BROADCAST and IFF_MULTICAST settings.
Should the flags alter later, IP will miss the change and keep using
bogus settings. This can lead to evil things like supplying an
invalid broadcast address or trying to leave a multicast group that
hasn't been joined. So just avoid touching the flags since an
interface was created. This has no practical purpose.
Discussed with: -net, glebius, oleg
MFC after: 1 week
from NetBSD. This driver actually can replace lnc(4). Advantages over
lnc(4) are:
- Cleaner and more flexible regarding MD needs.
- Endian-clean and MPSAFE.
- Supports ALTQ, VLAN_MTU, ifmedia.
- Uses 32bit DMA for the PCI variants.
This commit includes front-ends for the dma(4) pseudo-bus found on SBus-
based sparc64 machines (thus supports the on-board LANCE in Sun Ultra 1)
and PCI. In order to actually replace lnc(4), the front-ends for ISA and
the PC98 CBUS would have to be added but for which I don't have hardware
to test.
Reviewed and some improvements by: yongari
Tested on: i386, sparc64
- Like lsi64854_scsi_intr() return -1 in case there was a DMA error so
the caller can distinguish it from a normal interrupt and leave the
reset of the DMA engine to the caller so we don't kill any state there.
- Move the static 'dodrain' flag to struct lsi64854_softc as there can
be more than one LSI64854 used for a LANCE in a system and reset it
again once draining the E-cache is done so we don't keep draining the
cache with every interrupt.
- Remove calling sc->sc_intrchain(), we will call lsi64854_enet_intr()
via sc->intr() in the interrupt handler of the LANCE driver and not
use it in chained mode.
o lsi64854_pp_intr():
- Like lsi64854_scsi_intr() return -1 in case there was a DMA error so
the caller can distinguish it from a normal interrupt.
o Remove the no longer used sc_intrchain* from struct lsi64854_softc.
o Make lsi64854_reset(), lsi64854_setup*() and lsi64854_*_intr() static
to lsi64854.c as we do and will only call them via the respective
function pointers in struct lsi64854_softc.
o While here fix style(9) bugs (variable definition inside a nested scope).
It detects both: buffer underflows and buffer overflows bugs at runtime
(on free(9) and realloc(9)) and prints backtraces from where memory was
allocated and from where it was freed.
Tested by: kris
interrupt handler for the LANCE devices and remove dma_setup_intr(). We
just can't completely ignore the DMA engine in a LANCE driver anyway and
calling the DMA engine interrupt handler in the LANCE driver directly
allows to cover it by the LANCE driver lock.
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.
The most important changes:
o Instead of global linked list of all vlan softc use a per-trunk
hash. The size of hash is dynamically adjusted, depending on
number of entries. This changes struct ifnet, replacing counter
of vlans with a pointer to trunk structure. This change is an
improvement for setups with big number of VLANs, several interfaces
and several CPUs. It is a small regression for a setup with a single
VLAN interface.
An alternative to dynamic hash is a per-trunk static array with
4096 entries, which is a compile time option - VLAN_ARRAY. In my
experiments the array is not an improvement, probably because such
a big trunk structure doesn't fit into CPU cache.
o Introduce an UMA zone for VLAN tags. Since drivers depend on it,
the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
This change is a big improvement for any setup utilizing vlan(4).
o Use rwlock(9) instead of mutex(9) for locking. We are the first
ones to do this! :)
o Some drivers can do hardware VLAN tagging + hardware checksum
offloading. Add an infrastructure for this. Whenever vlan(4) is
attached to a parent or parent configuration is changed, the flags
on vlan(4) interface are updated.
In collaboration with: yar, thompsa
In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>
this is more consistent with the placement of slaves in /dev/pts. The
actual name doesn't matter as it's not part of the exposed API or used by
libc. In some sense, it would be nice if these device nodes didn't have to
have names in devfs at all.
Suggested by: Stephen McKay <smckay at internode dot on dot net>
however IPv4-in-IPv4 tunnels are now stable on SMP. Details:
- Add per-softc mutex.
- Hold the mutex on output.
The main problem was the rtentry, placed in softc. It could be
freed by ip_output(). Meanwhile, another thread being in
in_gif_output() can read and write this rtentry.
Reported by: many
Tested by: Alexander Shiryaev <aixp mail.ru>
support. Which reminds me that I'm not even sure if this works on _any_
laptop at all. :-o
PR: kern/90607
Submitted by: "Wojciech A. Koszek" <dunstan -at- freebsd.czest.pl>
MFC after: 3 days
This is supposed to fix some Coverity Prevent errors (Ariff didn't
looked at the CID's (ENOTIME), I just told him that there are some problems
in function dsp_ioctl()).
CID: 215-218
Found with: Coverity Prevent(tm)
Submitted by: ariff
MFC after: 5 days
o Change MEM_READ_1/MEM_READ_4 into macros (move them to if_iwireg.h)
o Add support for association LED
o Silently discard f/w notifications that are unknown (fixes spurious
"unknown notification 15" in logs with latest firmware)
o Fix scanning of 5GHz channels
of physical RAM instead of the bottom was a sound idea, but the implementation
left a lot to be desired. Scans would spend considerable time looking at
pages that are above of the address range given by the caller, and multiple
calls (like what happens in busdma) would spend more time on top of that
rescanning the same pages over and over.
Solve this, at least for now, with two simple optimizations. The first is
to not bother scanning high ordered pages that are outside of the provided
address range. Second is to cache the page index from the last successful
operation so that subsequent scans don't have to restart from the top. This
is conditional on the numpages argument being the same or greater between
calls.
MFC After: 2 weeks
specially crafted module. There are several handrolled sollutions to this
problem in the tree already which will be replaced with this. They include
iwi(4), ipw(4), ispfw(4) and digi(4).
No objection from: arch
MFC after: 2 weeks
X-MFC after: some drivers have been converted
kern_prot.c. This API handles reference counting among many other things.
Notably, if MAC is compiled into the kernel, it will properly initialize the
MAC labels when the ucred is allocated.
This work is in preparation for a new MAC entry point which will be responsible
for properly initializing policy specific labels for the NFS server credential.
Utilization of the crfree/crget APIs reduce the complexity associated with
this label's management.
Submitted by: green (with changes) [1]
Obtained from: TrustedBSD Project
Discussed with: rwatson, alfred
[1] I moved the ucred allocation outside the scope of the NFS server lock to
prevent M_WAIKOK allocations from occurring with non-sleep-able locks held.
Additionally, to reduce complexity, the ucred persist as long as the NFS
server descriptor.
implementation is by no means perfect as far as some of the algorithms
that it uses and the fact that it is missing some functionality (try
locks and upgrades/downgrades are not there yet), however it does seem
to work in my local testing. There is more detail in the comments in the
code, but the short version follows.
A reader/writer lock is very much like a regular mutex: it cannot be held
across a voluntary sleep; it can be acquired in an interrupt thread; if
the lock is held by a writer then the priority of any threads that block
on the lock will be lent to the owner; the simple case lock operations all
are done in a single atomic op. It also shares some similiarities
with sx locks: it supports reader/writer semantics (multiple readers,
but single writers); readers are allowed to recurse, but writers are not.
We can extend this implementation further by either improving algorithms
or adding new functionality, but this should at least give us a base to
work with now.
Reviewed by: arch (in theory)
Tested on: i386 (4 cpu box with a kernel module that used 4 threads
that randomly chose between read locks and write locks
that ran w/o panicing for over a day solid. It usually
panic'd within a few seconds when there were bugs during
testing. :) The kernel module source is available on
request.)
queues in turnstiles. Add a new thread member td_tsqueue which contains
the sub-queue of a turnstile that a thread is on when it is blocked on a
turnstile.
each turnstile. Also, allow for the owner thread pointer of a turnstile
to be NULL. This is needed for the upcoming reader/writer lock
implementation.
- Add a new ddb command 'show turnstile' that will look up the turnstile
associated with the given lock argument and display useful information
like the list of threads blocked on each queue, etc. If there isn't an
active turnstile for a lock at the specified address, then the function
will see if there is an active turnstile at the specified address and
display info about it if so.
- Adjust the mutex code to handle the turnstile API changes.
Tested on: i386 (all), alpha, amd64, sparc64 (1 and 3)
argument and looks for a sleep queue associated with that wait channel.
If it finds one it will display information such as the list of threads
sleeping on that queue. If it can't find a sleep queue for that wait
channel, then it will see if that address matches any of the active
sleep queues. If so, it will display information about the sleepq at the
specified address.
is a fatal fault if we are holding any non-sleepable locks. This should
cut down on the number of bogus LORs we currently get when the kernel
panics due to a NULL (or bogus) pointer dereference that goes wandering
off into the VM system which tries to acquire locks and then kicks off
the spurious LORs. This should probably be ported to all the archs at
some point.
Tested on: i386
The difference between WITNESS_CHECK() and WITNESS_WARN() is that
WITNESS_CHECK() should be used in the places that the return value of
witness_warn() is checked, whereas WITNESS_WARN() should be used in places
where the return value is ignored. Specifically, in a kernel without
WITNESS enabled, WITNESS_WARN() evaluates to an empty string where as
WITNESS_CHECK evaluates to 0. I also updated the one place that was
checking the return value of WITNESS_WARN() to use WITNESS_CHECK.
sysctl then it will clear the KTR buffer. Note that if you have active
KTR traces at the same time as a clear operation the behavior is undefined,
though it shouldn't panic.
they are. They should be NULL at this point, except if we're coming from
swapdev_strategy().
It should only affect the case where we're swapping directly on a file over
NFS.
into the card's memory.
# this eliminates a more of the ifdef soup in if_ed and if_edvar
# I've fixed the cbus drivers, but can't test them all easily.
If I've broken anything, please let me know.
by NGM_PPPOE_SETMODE message. When D-Link compat mode is on, we will
broadcast PADI with empty Service-Name to all listening hooks.
o Rewrite the compatibility options. Before we had two modes - standard
and non-standard (aka 3Com). Now we have standard mode and two compat
flags, that can be combined.
o Be consistent and do s/STUPID/3COM/g. I don't say that 3Com mode isn't
stupid, just want to make code easier to read.
to properly configure the right interface to use.
Also call the mediachg function when we set flags UP and are already
running. If this were a pure ifmedia driver, we'd not need to do this
since we'd be ignoring the linkX flags.
This reduces the number of ifdefs to support sub-devices a little as a
nice side effect. It also reduces the number of hpp interfaces
exposed by 33%.
man page that the ifconfig option link2 is used to disable the AUI
transceiver on the 3com boards (should also say HP PC Lan+). This
makes the connection clearer.
Add a note about why we set this flag prior to attaching the device.
We never set or clear the flag later, only test it. There can be no
races here, but this might be asthetically displeasing to some. Also
note that we may no longer need to have this knob at all as we may be
able to do it with the more sophisticated rc.d scripts we have today I
think the only reason it is there is because we didn't used to allow
its proper setting when configured to get the IP address via DHCP.
I'll note that this would be better handled by using ifmedia for all
ed cards, not just those with a miibus...
This is important with MegaLib, when issuing a GET_REBUILD_PROG since
it returns an error if the drive is not in rebuild state.
This will be MFC'ed shortly.
Submitted by: ps
Reviewed by: scottl
Found by: ambrisko
and resume methods so these events propagate through the device driver
hierarchy.
- In dma(4) enable the chaining of the DMA engine interrupt handler for
the LANCE devices via a dma_setup_intr(). This was commented out before
as I was unsure whether I'd use it but this is probably cleaner than
fiddling with the DMA engine interrupt in the LANCE driver directly.
- In ebus_setup_dinfo() free 'intrs' instead of 'reg' twice in case
setting up a child fails due to routing one of its interrupts fails. [1]
Found by: Coverity Prevent [1]
MFC after: 3 days
expands to the GCC format_arg attribute if supported.
This fixes a syntax error in <nl_types.h> for compilers/tools not
implementing the GCC __attribute__ extensions.
system LED on or off. Unlike the EBus LED AUXIO register where the
remaining bits are unused the upper bits of the SBus AUXIO register
are used to control other things like the link test enable pin of
the on-board NIC which we don't want to change as a side-effect.
- Remove the superfluous bzero()'ing of the softc obtained from
device_get_softc().
Reviewed by: yongari
MFC after: 3 days
PPPoE AC, servicing a specific Service-Name, when client sends a PADI
with an empty Service-Name. Should it reply with all available service
names or should it be silent? Our implementation had chosen the latter,
while some other had chosen the former (they say Linux and Cisco). Now
some PPPoE clients appear, that rely on the assumption that AC will
send all names in a PADO reply to a PADI with wildcard Service-Name.
These clients can't connect to FreeBSD AC.
I have requested comments from authors of RFC2516 via email, but
received no reply.
This change makes FreeBSD AC compatible with D-Link DI-614+ and
D-Link DI-624+ SOHO routers, and probably others.
Big thanks to D-Link's Russian office, namely Victor Platov, for
assistance and support in investigation and testing of this change.
Details:
o Split pppoe_match_svc() into three different functions serving
different purposes:
- pppoe_match_svc() - match non-empty Service-Name tag from PADI
against all available hooks in listening state.
- pppoe_find_svc() - check that given Service-Name is not yet
registered.
- pppoe_broadcast_padi() - send a copy of PADI packet with empty
Service-Name tag to all listening hooks.
o For NGM_PPPOE_LISTEN message use pppoe_find_svc().
o In ng_pppoe_rcvdata() in a PADI case use pppoe_match_svc() for
a non-empty Service-Name tag, and pppoe_broadcast_padi() in
either case.
A side effect from the above changes is that now pppoed(8) and mpd
will reply to a empty Service-Name PADI sending a PADO with two
Service-Name tags - an empty one and correct one. This is not fatal,
and will be corrected in pppoed(8) and mpd later. No need to update
node interface version.
Supported by: D-Link
linux_ioctl.[ch] : Implement LINUX_TIOCGPTN, which returns the pty number
linux_stats.c :
- Return the magic number for devfs.
- In various stats()-related functions, check that we're stating a
file in /dev/pts, and if so, change the st_rdev field to match what linux
expects to be there for a slave pty device. The glibc checks for this, and
their openpty() fails if it is no correct.
It should play nicely with the existing BSD ptys.
By default, the system will use the BSD ptys, one can set the sysctl
kern.pts.enable to 1 to make it use the new pts system.
The max number of pty that can be allocated on a system can be changed with the
sysctl kern.pts.max. It defaults to 1000, and can be increased, but it is not
recommanded, as any pty with a number > 999 won't be handled by whatever uses
utmp(5).
the resident page count matches the object size. We know it fully backs
its parent in this case.
Reviewed by: acl, tegge
Sponsored by: Isilon Systems, Inc.
statement. Specifically, a break statement that previously broke out of
the enclosing switch was not changed. Consequently, the enclosing loop
terminated prematurely.
This could result in "vm_page_insert: page already inserted" panics.
Submitted by: tegge
modified bit emulation traps on Alpha while holding locks in the
sysctl handler.
A better solution would be to pass a hint to the Alpha pmap code to
tell mark these pages as modified when they as they are being wired,
but that appears to be more difficult to implement.
Suggested by: jhb
MFC after: 3 days
placeholder similar to KTR_DEV. Explain the use of KTR_DEV and
KTR_SUBSYS in a comment as well.
- Retire KTR_WITNESS and instead have KTR_WITNESS default to off but use
KTR_SUBSYS if it is enabled.
Linux LSI MegaRaid tools can run on FreeBSD until Linux emulation.
Add in the Linux IOCTL shim and create the megadev0 device so
Linux LSI MegaRaid tools can run on FreeBSD until Linux emulation.
Add glue to build the modules but don't tie it into the build
yet until I test it from the CVS repo. via the mirror on an
amd64 machine.
Tie this into the Linux32 emulation on amd64 so the tools can
run on amd64 kernel.
Cleaned up by: ps (amr_linux.c)
ip_forward() would report back a zero MTU in ICMP needfrag messages
because on a IPSEC SP lookup failure no MTU got computed.
Fix this by changing the logic to compute a new MTU in any case if
IPSEC didn't do it.
Change MTU computation logic to use egress interface MTU if available
or the next smaller MTU compared to the current packet size instead
of falling back to a very small fixed MTU.
Fix associated comment.
PR: kern/91412
MFC after: 3 days
ia_hash only if it actually is an AF_INET address. All other places
test for sa_family == AF_INET but this one.
PR: kern/92091
Submitted by: Seth Kingsley <sethk-at-meowfishies.com>
MFC after: 3 days
If net.link.ether.inet.useloopback=1 and we send broadcast packet using our
own source ip address it may be rejected by uRPF rules.
Same bug was fixed for IPv6 in rev. 1.115 by suz.
PR: kern/76971
Approved by: glebius (mentor)
MFC after: 3 days
side effect that legacy ATA controllers at irq14 and irq15 cannot share
interrupts with anything else without major problems.
This fixes the ATAPI DMA problems some systems/devices have seen.
1) unregsiter kqueue filter for EVFILT_LIO.
2) free uma_zones.
3) call setsid directly to enter another session rather than
implementing by itself.
Submitted by: jhb
(1) Fix DMA alignment, based on bytes per sample.
feeder_rate.c:
Handle strayed bytes (mostly caused by #1) better.
This DMA alignment issues are extremely hard to reproduce unless
the user happen to have a 32bit capable soundcards (ATI IXP) and
knowledgeable enough to force it to operate under pure 32bit
operations on both record and play directions.
The success of the cluster allocation is checked through a field in the
mbuf structure. This change is non-functional but properly silences code
inspection tools.
Found by: Coverity Prevent(tm)
Coverity ID: CID807
Sponsored by: TCP/IP Optimization Fundraise 2005
up to date. Principle changes for this reelase is to support 2K Port Login
firmware. This allows us to support the 2322 (and 2422 4Gb) cards which only
come with the 2K Port Login firmware. The 2322 should now work- but we don't
have firmware sets for it in ispfw (as the change to load 2K Port Login f/w
hasn't been made- that f/w is so big it has to be loaded in more than one
chunk).
Other changes are the beginnings of cleaning up some long standing target
mode issues. The next changes here will incorporate a lot of bug fixes
from others.
Finally, some copyright cleanup and attempts to make the parts of the
driver that are FreeBSD specific start conforming more to FreeBSD style.
MFC after: 1 month
lookup() instead of EPERM when a DELETE or RENAME operation is
attempted on "..".
In kern_unlink(), remap EINVAL errors returned from namei() to EPERM
to match existing (and POSIX required) behaviour.
Discussed with: bde
MFC after: 3 days
feeder.h:
feeder.c:
- Implement scoring mechanisme to select best format for conversion.
This is actually part of newer format chaining procedures which
will be commited someday. Confusion during chaining process solved
by this scoring since it will try to reduce list of from/to formats
to a single, best format.
Related PR: kern/91683
channel.c:
- Simplify feeder building process since we have smarter format
chaining.
feeder_fmt.c:
- Add few more sign conversion feeders for 24 and 32 bit format.
feeder_rate.c:
- Force buffer / bytes allignment. Unaligned buffer may cause
panics during recording on pure 32bit sample format if it
involves feeder_rate as part of feeders chain.
Tested on: ATI IXP, force 32bit recording.
MFC after: 5 days
used by utilities to reset moused(8), for example. The syntax is:
!system=kern subsystem=power type=resume
Note that it would be nice to have notification of suspend, but it's more
difficult since there would have to be a method of doing request/ack
to userland and automatically timing out if no response. apm(4) has a
similar mechanism.
MFC after: 2 weeks
An executable contains at most one PT_INTERP program header. Therefore,
the loop that searches for it can terminate after it is found rather than
iterating over the entire set of program headers.
Eliminate an unneeded initialization.
Reviewed by: tegge
the last component of the path name is "..". This keeps VOP_LOOKUP()
from locking vnodes in reverse order.
Tested by: Denis Shaposhnikov <dsh AT vlink DOT ru>
MFC after: 3 days
prototypes, as the majority of new functions added have been in this
style. Changing prototype style now results in gcc noticing that the
implementation of vn_pollrecord() has a 'short' argument instead of
'int' as prototyped in vnode.h, so correct that definition. In practice
this didn't matter as only poll flags in the lower 16 bits are used.
MFC after: 1 week
Vararg functions have a different calling convention than regular
functions on amd64. Casting a varag function to a regular one to
match the function pointer declaration will hide the varargs from
the caller and we will end up with an incorrectly setup stack.
Entirely remove the varargs from these functions and change the
functions to match the declaration of the function pointers.
Remove the now unnecessary casts.
Lots of explanations and help from: peter
Reviewed by: peter
PR: amd64/89261
MFC after: 6 days
o fix contention window
o silently discard received frames that are too short
o simplify lookup of 802.11a channels (we know they exist)
o fix short preamble support
o add short slot support
o fix eifs settings
o many consistency tweaks
share devclass pointers, a mistake I've encouraged in the past) and
move the declaration of the pci_driver kobj class from cardbus.c to
pci_private.h so that other drivers can inherit from pci_driver.
devclass's parent pointer if the two drivers share the same devclass. This
can happen if the drivers use the same new-bus name. For example, we
currently have 3 drivers that use the name "pci": the generic PCI bus
driver, the ACPI PCI bus driver, and the OpenFirmware PCI bus driver. If
the ACPI PCI bus driver was defined as a subclass of the generic PCI bus
driver, then without this check the "pci" devclass would point to itself
as its parent and device_probe_child() would spin forever when it
encountered the first PCI device that did have a matching driver.
Reviewed by: dfr, imp, new-bus@
went away in the generated files? This didn't happen on my amd64
test machine but did when I committed it on my other i386 machine.
I need to figure this out since a regen on the amd64 doesn't fix it
now. For now make the build work again. Matt caught this before
my local mirror caught up.
This fixes reconnect after, for example, tcp idle disconnection.
Previously this would fail if a normal user tried to bind to a privileged
port.
Submitted by: cel@citi.umich.edu
MFC after: 1 week
hacking:
- Remove all spaces at eol.
- Improve style(9) in most frequently edited functions.
- In em_encap() push variables for 82544 workaround in the block
where they are only used.
- In em_get_buf() remove unused variable.
errors from rn_inithead back to the ipfw initialization function.
- Check return value of rn_inithead for failure, if table allocation has
failed for any reason, free up any tables we have created and return ENOMEM
- In ipfw_init check the return value of init_tables and free up any mutexes or
UMA zones which may have been created.
- Assert that the supplied table is not NULL before attempting to dereference.
This fixes panics which were a result of invalid memory accesses due to failed
table allocation. This is an issue mainly because the R_Zalloc function is a
malloc(M_NOWAIT) wrapper, thus making it possible for allocations to fail.
Found by: Coverity Prevent (tm)
Coverity ID: CID79
MFC after: 1 week
but not 'fragment reassemble'), which can cause some fragments to get
inserted into the cache twice, thereby violating an invariant, and panic-
ing the system subsequently.
Reviewed by: mlaier
MFC after: 1 day
o lock the list walk
o malloc a results buffer instead of copying out one result at a time
using an on-stack buffer
o fix definition of ieee80211req_scan_result so size of variable-length
information elements is large enough to hold all possible ie's
(still only return wpa+wme, at some point may return all)
o make rssi+noise data signed; they should've been so all along
o add a bit more padding for future additions while we're here
o define a new ioctl for new api and add compat code for old ioctl
under COMPAT_FREEBSD6 (temporarily enabled local to the file)
Reviewed by: Scott Long
MFC after: 2 weeks
to have a value that's not been used before; this fixes the problem
where the first traversal of the scan list did nothing because the
entries were initialized with the current generation number (a
separate issue)
MFC after: 1 week
This fixes a bug in the previous commit.
Found by: Coverity Prevent(tm)
Coverity ID: CID253
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days
change the mbuf pointer and we don't have any way of passing
it back to the callers. Instead just fail silently without
updating the checksum but leaving the mbuf+chain intact.
A search in our GNATS database did not turn up any match for
the existing warning message when this case is encountered.
Found by: Coverity Prevent(tm)
Coverity ID: CID779
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days
that currently can't be triggered. But better be safe than sorry
later on. Additionally it properly silences Coverity Prevent for
future tests.
Found by: Coverity Prevent(tm)
Coverity ID: CID802
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days
route MTU.
This bug is very difficult to reach and not remotely exploitable.
Found by: Coverity Prevent(tm)
Coverity ID: CID162
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days
may have changed by m_pullup() during fastforward processing.
While this is a bug it is actually never triggered in real world
situations and it is not remotely exploitable.
Found by: Coverity Prevent(tm)
Coverity ID: CID780
Sponsored by: TCP/IP Optimization Fundraise 2005
as input/output interface errors.
- Keep values of rx/tx discards & tx collisions inside struct bge_softc.
So we can keep statistic across ifconfig down/up runs (cause bringing
bge up will reset chip).
Approved by: glebius (mentor)
MFC after: 1 week
equal to NULL several times later. p_ucred "should probably not" be NULL
if the process isn't PRS_NEW anyway. This is strongly reinforced by the fact
that we don't see frequent crashes here. Remove the checks after p_cansee and
add a KASSERT right before it.
Found by: Coverity Prevent (tm)
Also trim one nearby trailing space.
last few days. I tracked it down to the fact that nfs_reclaim()
is setting vp->v_data to NULL _before_ calling vnode_destroy_object().
After silence from the mailing list I checked further and discovered
that ufs_reclaim() is unique among FreeBSD filesystems for calling
vnode_destroy_object() early, long before tossing v_data or much
of anything else, for that matter. The rest, including NFS, appear
to be identical, as if they were just clones of one original routine.
The enclosed patch fixes all file systems in essentially the same
way, by moving the call to vnode_destroy_object() to early in the
routine (before the call to vfs_hash_remove(), if any). I have
only tested NFS, but I've now run for over eighteen hours with the
patch where I wouldn't get past four or five without it.
Submitted by: Frank Mayhar
Requested by: Mohan Srinivasan
MFC After: 1 week
lock_obj objects:
- Add new lock_init() and lock_destroy() functions to setup and teardown
lock_object objects including KTR logging and registering with WITNESS.
- Move all the handling of LO_INITIALIZED out of witness and the various
lock init functions into lock_init() and lock_destroy().
- Remove the constants for static indices into the lock_classes[] array
and change the code outside of subr_lock.c to use LOCK_CLASS to compare
against a known lock class.
- Move the 'show lock' ddb function and lock_classes[] array out of
kern_mutex.c over to subr_lock.c.
ATI EHCI controllers exhibit simmilar stall issues and require
this dropped interrupts workaround. Be verbose about it.
ehci.c:
ehcivar.h:
Slight change in comments to note about issues surrounding both
VIA and ATI EHCI controllers.
Approved by: iedowse
long the string is in userspace, afterwards we call malloc(M_WAITOK),
which could sleep for an unknown amount of time. Check the return
value of copyin(9) just to be sure that nothing has changed during that
time.
Found with: Coverity Prevent (tm)
MFC after: 1 week
an interrupt appears to occur before the transfer has been marked
as completed. This caused umass transfers to get stuck, especially
when writing large files. The workaround sets up a timer that
rechecks for missed completed transfers if some operations are still
pending. Other suggested workarounds, such as performing a PCI read
immediately after acknowledging the interrupts, do not appear to
help.
Obtained from: OpenBSD
Since we are using vfs_busy() on a freshly allocated mount structure, use
(void) to show that we do not care about the return value.
Found with: Coverity Prevent (tm)
MFC after: 2 weeks
ipq_zone, to allocate fragment headers from, rather than using cast mbuf
storage. This was one of the few remaining uses of mbuf storage for
local data structures that relied on dtom(). Implement the resource
limit on ipq's using UMA zone limits, but preserve current sysctl
semantics using a sysctl proc.
MFC after: 3 weeks
same behavior to be controlled by the sysctls, hw.syscons.kbd_kbdkey
and hw.syscons.kbd_reboot respectively.
Apologies to the submitter for taking so long to commit this simple
change.
PR: kern/72728
Submitted by: Luca Morettoni <morettoni at libero dot it>
MFC After: 3 days
I couldn't find the ID for the EG1064 anywhere in our sources
so I removed the reference for now.
Pointed out by: Robert Huff <roberthuffi at rcn dot com> [1]
Reviewed by: simon
an application to upon a tape (yea, even the non-control device) even if
it cannot establish a mount session. If the open cannot establish a mount
session and O_NONBLOCK was specified, the tape becomes 'open pending mount'.
All I/O operations that would require access to a tape thereafter until
a close attempt to initiate the mount session. If the mount session succeeds,
the tape driver transitions to full open state, else returns an appropriate
I/O error (ENXIO).
At the same time, add a change that remembers whether tape is being opened
read-only. If so, disallow 'write' operations like writing filemarks that
bypass the normal read-only filtering operations that happen in the write(2)
syscall.
Reviewed by: ken, justin, grog
MFC after: 2 weeks
Suggested by: The Bacula Team
resulted in deadcode, as 'error' could never be 0. What this logic
was originally meant to handle is not clear -- it's been this way
(broken) since at least RELENG_4.
Found with: Coverity Prevent(tm)
MFC after: 3 days
restored when its removed from the bridge.
At the moment we only clear IFCAP_TXCSUM. Since a locally generated packet on
the bridge may be sent out any one or more interfaces it cant be assumed that
every card does hardware csums. Most bridges don't generate a lot of traffic
themselves so turning off offloading won't hurt, bridged packets are
unaffected.
Tested by: Bruce Walker (bmw borderware.com)
MFC after: 5 days
taskqueue_start_threads(struct taskqueue **, int count, int pri,
const char *name, ...);
This allows the creation of 1 or more threads that will service a single
taskqueue. Also rework the taskqueue_create() API to remove the API change
that was introduced a while back. Creating a taskqueue doesn't rely on
the presence of a process structure, and the proc mechanics are much better
encapsulated in taskqueue_start_threads(). Also clean up the
taskqueue_terminate() and taskqueue_free() functions to safely drain
pending tasks and remove all associated threads.
The TASKQUEUE_DEFINE and TASKQUEUE_DEFINE_THREAD macros have been changed
to use the new API, but drivers compiled against the old definitions will
still work. Thus, recompiling drivers is not a strict requirement.
- Add support for adjusting the fan speed if the fan control mode is manual
Documentation for the relevant embedded controller register was obtained from
http://www.thinkwiki.org/wiki/Patch_for_controlling_fan_speed
Tested on: R51 by Fabian Keil
T41p by markus
Requested by: many
Approved by: philip
MFC after: 1 week
ipxpcb is NULL or not: in attach it will be, and on detach it won't be.
If for any reason these invariants don't hold true, panicking is a good
idea.
Noticed by: Coverity Prevent analysis tool
MFC after: 3 days
may be a stale pointer at this point, and we're interested in whether or
not m_pulldown() failed.
Noticed by: Coverity Prevent analysis tool
MFC after: 3 days
are the contents of the forwarded mbuf ever copied into mcopy, so there's
no need to have mcopy, conditionally look at mcopy, or conditionally free
it.
Noticed by: Coverity Prevent analysis tool
MFC after: 3 days
ef_clone(); we were testing the original ifnet, not the one allocated.
When aborting ef_clone() due to if_alloc() failing, free the allocated
efnet structure rather than leaking it.
Noticed by: Coverity Prevent analysis tool
MFC after: 3 days
SLIST_FOREACH_SAFE() rather than SLIST_FOREACH(), as elements are
freed on each iteration of the loop. This prevents use-after-free.
Noticed by: Coverity Prevent analysis tool
MFC after: 3 days
intended for use solely with atomic datagram socket types, and relies
on the previous break-out of sosend_copyin(). Changes to allow UDP to
optionally use this instead of sosend() will be committed as a
follow-up.
2) use more robust way of link state handling for BCM5700 rev.B2 chip
3) workaround bug of some BCM570x chips which cause spurious "link up" messages
4) fix bug: some BCM570x chips was unable to detect link state changes after
ifconfig down/up sequence until any 'non-link related' interrupt generated.
(this happened due to pending internal link state attention which blocked
interrupt generation)
Approved by: glebius (mentor)
MFC after: 1 week
the the interface has been configured. I'm not sure how this could ever
have worked before, but it should be fixed now. Also break out the interrupt
degresitration function into it's own step.
in flags. When sending export datagram from expiry thread, then
use default zero flags. This removes unpleasant contention of the
interrupt thread on mutexes (usually ng_ksocket's socket buffer
mutex).
before. The symptom is that the battery inform us its charge and discharge
at the same time...
* fix bst.rate to correctly output the (dis)charging rate. We'll use
the current average over one minute command and not the at_rate command.
Note that this method is not correct if the capacity_mode is set, but
since we don't set it ourself, it is not a problem.
The at_rate do not give the actual rate but is used to compute the
estimated time for (dis)charging a battery. We should actually
write an estimation of the actual rate using at_rate cmd and then
perform a read to the various estimators.
Approved by: njl
MFC after: 2 days
had been replied, the reply was always delivered to the originator
synchronously.
With introduction of netgraph item callbacks and a wait channel with
mutex in ng_socket(4), we have fixed the problem with ngctl(8) returning
earlier than the command has been proceeded by target node. But still
ngctl(8) can return prior to the reply has arrived to its node.
To fix this:
- Introduce a new flag for netgraph(4) messages - NGM_HASREPLY.
This flag is or'ed with message like NGM_READONLY.
- In netgraph userland library if we have sent a message with
NGM_HASREPLY flag, then select(2) until reply comes.
- Mark appropriate generic commands with NGM_HASREPLY flag,
gathering them into one enum {}. Bump generic cookie.
when checking whether it's greater than a struct stat st_size in order
to also catch the case when st_size is -1. Previously this check didn't
trigger on sparc64 when st_size is -1 (as it's the case for a file on
a bzipfs, TFTP server etc.), causing the content of the linker hints
file to be copied to memory referenced by a null-pointer.
PR: 91231
MFC after: 1 week
operands are consumed so use the appropriate constraint modifier.
Before this change GCC used one register for both an input and an
unrelated output operand of in_addword(), causing the input to be
overwritten before it was consumed and thus breaking in_addword().
For in_cksum_hdr() and in_pseudo() this change is more or less
cosmetic.
- Fix a misspelling in a nearby comment.
Reported & tested by: yongari
MFC after: 1 week
The minimum / maximum speed was way too low / high!
minspeed = 2000 - is this for real ?
maxspeed = 767999 - is this for real ?????
Wrap everything into 8000 - 48000 boundary, just to be safe.
MFC after: 3 days
Correct insecure temporary file usage in ee. [06:02]
Correct a race condition when setting file permissions, sanitize file
names by default, and fix a buffer overflow when handling files
larger than 4GB in cpio. [06:03]
Fix an error in the handling of IP fragments in ipfw which can cause
a kernel panic. [06:04]
Security: FreeBSD-SA-06:01.texindex
Security: FreeBSD-SA-06:02.ee
Security: FreeBSD-SA-06:03.cpio
Security: FreeBSD-SA-06:04.ipfw
- Mark MPSAFE since most of the locking procedures already implemented.
- Turn on inverted external amplifier sense flag for selected boards.
Tested by: bland
MFC after: 1 week
attempted to cast a struct ifnet to a struct fw_com which resulted in
data corruption.
PR: kern/91307
Submitted by: Alex Semenyaka <alex at semenyaka do ru>
MFC After: 6 days
operations before returning. Point the bus at a dummy cam_sim
structure so that any CCBs will complete immediately with a
CAM_DEV_NOT_THERE status, and ensure that any xpt_schedule() calls
on the bus's devices will immediately call the peripheral's
periph_start() routine. Also repeat the async messages because
devices that were part of the way through being probed may appear
after the original AC_LOST_DEVICE was sent, and would otherwise
never go away.
These changes make it possible to deregister a bus and free the SIM
at most stages during bus probing without the usual crashes in
camisr(). In particular, plugging in a umass device and then
unplugging it as soon as the first probe messages appeared would
almost always result in a crash. Now the device just goes away with
a few CAM errors and all references to the CAM bus, target and
device are dropped correctly.
- Only update the rx ring consumer pointer after running through the rx loop,
not with each iteration through the loop.
- If possible, use a fast interupt handler instead of an ithread handler. Use
the interrupt handler to check and squelch the interrupt, then schedule a
taskqueue to do the actual work. This has three benefits:
- Eliminates the 'interrupt aliasing' problem found in many chipsets by
allowing the driver to mask the interrupt in the NIC instead of the
OS masking the interrupt in the APIC.
- Allows the driver to control the amount of work done in the interrupt
handler. This results in what I call 'adaptive polling', where you get
the latency benefits of a quick response to interrupts with the
interrupt mitigation and work partitioning of polling. Polling is still
an option in the driver, but I consider it orthogonal to this work.
- Don't hold the driver lock in the RX handler. The handler and all data
associated is effectively serialized already. This eliminates the cost of
dropping and reaquiring the lock for every receieved packet. The result
is much lower contention for the driver lock, resulting in lower CPU usage
and lower latency for interactive workloads.
The amount of work done in the taskqueue is controlled by the sysctl
dev.em.N.rx_processing_limit
and tunable
hw.em.rx_process_limit
Setting these to -1 effectively removes the limit.
The fast interrupt and taskqueue can be disabled by defining NO_EM_FASTINTR.
This work has been shown to increase fast-forwarding from ~570 kpps to
~750 kpps (note that the same NIC hardware seems unable to transmit more than
800 kpps, so this increase appears to be limited almost solely by the
hardware). Gains have been shown in other workloads, ranging from better
performance to elimination of over-saturation livelocks.
Thanks to Andre Opperman for his time and resources from his network
performance project in performing much of the testing. Thanks to Gleb
Smirnoff and Danny Braniss for their help in testing also.
to COMPAT_43TTY.
Add COMPAT_43TTY to NOTES and */conf/GENERIC
Compile tty_compat.c only under the new option.
Spit out
#warning "Old BSD tty API used, please upgrade."
if ioctl_compat.h gets #included from userland.
Instead of dragging the entire ICH4/82801DB into this mess, select
only few boards based on pci subdevice / subvendor.
Tested by: Daisuke Orikasa <luxury-acura-3.5rl at nifty.com>
MFC after: 3 days
fast taskqueues. The following have been added:
TASKQUEUE_FAST_DEFINE() - create a global task queue.
an arbitrary execution context.
TASKQUEUE_FAST_DEFINE_THREAD() - create a global taskqueue that uses a
dedicated kthread.
taskqueue_create_fast() - create a local/private taskqueue.
These are all complimentary of the standard taskqueue functions. They are
primarily useful for fast interrupt handlers that can only use spinlock for
synchronization.
I personally think that the taskqueue API is starting to get too narrow and
hairy, but fixing it will require a major redesign on the API. Such a
redesign would be good but would break compatibility with FreeBSD 6.x, so
it really isn't desirable at this time.
Submitted by: sam
This is based on MCPC USB mobile phone guide line (MCPC-GL005)
Some other 3G system or so will work with this driver.
Kyocera PHS terminal (a.k.a. Kyopon) is known to work, which
is now supported by umodem(4) driver.
o record tsf in tx+rx frames
o switch from raw rssi to dbm for signal data and record both
signal and noise floor data (hacked for now to assume a fixed
noise floor; is correct with new hal)
o add monpass sysctl to control which rx'd frames are passed
up with errors; especially useful to see frames with CRC errors
o mark 'd packets w/ a CRC error with radiotap's BADFCS flag
Also add placeholder code for calibrating the noise floor when
using newer hals.
Reviewed by: avatar
MFC after: 1 week
param.h. Per request, I've placed these just after the
_NO_NAMESPACE_POLLUTION ifndef. I've not renamed anything yet, but
may since we don't need the __.
Submitted by: bde, jhb, scottl, many others.
CAM_LUN_INVALID or CAM_TID_INVALID. Retries were being triggered
here when a umass device was unplugged, and while the retries
themselves are probably harmless, they complicated finding the real
SIM removal problems.
the names of directories to include in the base ldconfig script.
This will eliminate the need for each port to install its own
boot script which does nothing but ldocnfig a given directory.
This code was developed by flz (ports committer), discussed on
freebsd-rc@, and modified slightly by me.
Submitted by: flz
Reviewed by: brooks
better, I discovered sn doing too many pointer dereferences. This
driver would do silly things like:
sn_foo(struct ifnet *ifp)
{
struct sn_softc *sc = ifp->if_softc;
sc->ifp->mumble
/* Other stuff */
}
while /* other stuff */ usually needed sc, the extra deref isn't
needed. Eliminate a few dozen of them.
and subsequently broke the build. This change is supposed to fix the
case where doing a mtx_destroy() off a spin mutex while you hold it fails.
If it had been tested I would just leave it in, but it hasn't been tested
yet, so it will have to wait until later.
to old-style signals, to be the DAR register for DSI miss exceptions.
This gives the address of the access rather than the instruction
address. The behaviour is now the same as on i386.
Found by: libsigsegv tests
defined to return an int, but on LP64 platforms the return value of
FD_ISSET() for file descriptors with a bit-index larger than 31 would
not fit an int (due to __fd_mask being defined as an unsigned long).
The fix is to explicitly test against 0.
PR: ia64/91421
Submitted by: Tanaka Akira (akr at m17n dot org)
MFC after: 1 week
modules would have overlapping names.
- Only create /dev/si_control for unit 0.
Tested by: Joerg Lehners Joerg dot Lehners at informatik dot
uni-oldenburg dot de (on 6.x)
MFC after: 1 week
various pcib drivers to use their own private devclass_t variables for
their modules.
- Use the DEFINE_CLASS_0() macro to declare drivers for the various pcib
drivers while I'm here.
struct sx). Instead of storing a direct pointer to a our lock_class
struct in lock_object, reserve 4 bits in the lo_flags field to serve as an
index into a global lock_classes array that contains pointers to the lock
classes. Only debugging code such as WITNESS or INVARIANTS checks and KTR
logging need to access the lock_class member, so this shouldn't add any
overhead to production kernels. It might add some slight overhead to
kernels using those debug options however.
As with the previous set of changes to lock_object, this is going to
completely obliterate the kernel ABI, so be sure to recompile all your
modules.
doesn't have any actual interrupts is listed in a _PRT entry, only print
a warning rather than panic'ing when we walk the _PRT's to build up count
of entries that reference a given link (the counts are used as weights so
that we can attempt to balance the load across IRQs used by link devices).
Instead, only panic if we attempt to use the _PRT entry to route an
interrupt for a device.
PR: i386/89545
Tested by: anders
- MPSAFE
- Fix / reorganize attach routine. Device specific initialization must
be done after generic bus / DMA setup. At last, Virtual Channels
(vchan) works as expected.
Note: Recent commit / fix against this driver proves that major enhancements
on the generic sound layer does indeed help to expose flaw within
device specific code. There are probably other drivers that need to
be addressed as well.
Tested by: barner
MFC after: 1 week
that NetBSD implemented it independently of them (don't know which one
was actually first). This saves about 24k for those times you don't
need snapshot support (like when running off a ram disk, or in an
embedded environment where size matters).
returns EBADF. That errno is correct and is mandated by POSIX. It also
goes back to revision 1.1 of our CVS history (i.e. 4.4BSD).
The _fget() function should probably also be upated as it currently returns
EINVAL in that case rather than EBADF. (It does return EBADF for reads
on a write-only descriptor without any XXX comments oddly enough.)
Discussed with: scottl, grog, mjacob, bde
ifm_status and ifm_active. IFM_10_T gets set in the ifm_active field,
not in the ifm_status field, as far as I can tell.
Note: this was to enable a workaround that's rarely enabled. I don't know
how to corrupt my eeprom to test it, and would rather not know...
- Remove a conditional in the AMD cache detection, it's always false. [2]
- Don't try to detect a cache if only compiled for i386.
Analyzed by: Antoine Brodin <antoine.brodin@laposte.net> [1]
Submitted by: Antoine Brodin <antoine.brodin@laposte.net> [2]
that a file's atime and mtime are only set to correct fractional
second values (0-999999000ns with the current interface).
Prior to this change users could create files with values outside
that range. Moreover, on 32-bit machines tv_usec offsets larger than
4.3s would result in an unnormalized AND wrong timestamp value,
due to overflow.
MFC after: 1 week
success instead of EOPNOTSUPP when being loaded. Secondly, if there are no
ibcs2 processes running when a MOD_UNLOAD request is made, break out to
return success instead of falling through into the default case which
returns EOPNOTSUPP. With these fixes, I can now kldload and subsequently
kldunload the ibcs2 module.
PR: kern/82026 (and several duplicates)
Reported by: lots of folks
MFC after: 1 week
signal is received during the msleep, the msleep is retried
indefinitely as it just keeps returning ERESTART because of
the pending signal.
Instead, just don't PCATCH - the signal can wait.
Sponsored by: Sophos/ActiveState
requiried to keep consistent softc state before/after callback function
invocation and supposed to be sligntly faster than previous one as it
wouldn't incur callback overhead. With this change callback function
was gone.
- Decrease TI_MAXTXSEGS to 32 from 128. It seems that most mbuf chain
length is less than 32 and it would be re-packed with m_defrag(9) if
its chain length is larger than TI_MAXTXSEGS. This would protect ti(4)
against possible kernel stack overflow when txsegs[] is put on stack.
Alternatively, we can embed the txsegs[] into softc. However, that
would waste memory and make Tx/Rx speration hard when we want to
sperate Tx/Rx handlers to optimize locking.
- Fix dma map tracking used in Tx path. Previously it used the dma map
of the last mbuf chain in ti_txeof() which was incorrect as ti(4)
used dma map of the first mbuf chain when it loads a mbuf chain with
bus_dmamap_load_mbuf(9). Correct the bug by introducing queues that
keep track of active/inactive dma maps/mbuf chain.
- Use ti_txcnt to check whether driver need to set watchdog timer instead
of blidnly clearing the timer in ti_txeof().
- Remove the 3rd arg. of ti_encap(). Since ti(4) now caches the last
descriptor index(ti_tx_saved_prodidx) used in Tx there is no need to
pass it as a fuction arg.
- Change data type of producer/consumer index to int from u_int16_t in
order to remove implicit type conversions in Tx/Rx handlers.
- Check interface queue before getting a mbuf chain to reduce locking
overhead.
- Check number of available Tx descriptores to be 16 or higher in
ti_start(). This wouldn't protect Tx descriptor shortage but it would
reduce number of bus_dmamap_unload(9) calls in ti_encap() when we are
about to running out of Tx descriptors.
- Command NIC to send packets ony when the driver really has packets
enqueued. Previously it always set TI_MB_SENDPROD_IDX which would
command NIC to DMA Tx descriptors into NIC local memory regardless
of Tx descriptor changes.
Reviewed by: scottl
allocating a resource that's in the card itself.
Remove more now-redundant resource_list_add, and now-redunant code
that lives in the pci layer.
# This fixes the atheros card that I have which had its CIS in one of
# the BARs. Don't know yet if this fixes the amd64 issues reported.
particular this fixes use of wme in adhoc demo mode, it wasn't possible
to set the txop limit because the aggressive mode logic would override
Reviewed by: apatti
MFC after: 2 weeks
the year bump.
# If we behaved like book publishers, we'd do this in July. I can't find
# a good reference for why they do it then, but it has been explained to
# me that copyrights in the last 1/2 of the year expire as if they were
# published in the following year. I can't confirm this info, but if you
# have a pointer, please send it to me.
- provide an interface (macros) to the page coloring part of the VM system,
this allows to try different coloring algorithms without the need to
touch every file [1]
- make the page queue tuning values readable: sysctl vm.stats.pagequeue
- autotuning of the page coloring values based upon the cache size instead
of options in the kernel config (disabling of the page coloring as a
kernel option is still possible)
MD changes:
- detection of the cache size: only IA32 and AMD64 (untested) contains
cache size detection code, every other arch just comes with a dummy
function (this results in the use of default values like it was the
case without the autotuning of the page coloring)
- print some more info on Intel CPU's (like we do on AMD and Transmeta
CPU's)
Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.
Based upon work by: Chad David <davidc@acns.ab.ca> [1]
Reviewed by: alc, arch (in 2004)
Discussed with: alc, Chad David, arch (in 2004)
security.mac.biba.interfaces_equal
If non-zero, all network interfaces be created with the label:
biba/equal(equal-equal)
This is useful where programs which initialize network interfaces
do not have any labeling support. This includes dhclient and ppp. A
long term solution is to add labeling support into dhclient(8)
and ppp(8), and remove this variable.
It should be noted that this behavior is different then setting the:
security.mac.biba.trust_all_interfaces
sysctl variable, as this will create interfaces with a biba/high label.
Lower integrity processes are not able to write to the interface in this
event. The security.mac.biba.interfaces_equal will override
trust_all_interfaces.
The security.mac.biba.interfaces_equal variable will be set to zero
or disabled by default.
MFC after: 2 weeks
USB HID device that allows to plug two PS2 controllers. This specific
device doesn't work yet but will as soon as we support devices with
multiple report IDs.
MFC after: 3 days
broken report descriptor. While I'm here, make all the other report
descriptors const to match the newly added one.
Obtained from: NetBSD
MFC after: 1 week
lack a report descriptor and don't use the standard interface class.
This patch works around these deficiencies so that the uhid(4) driver
can recognize and use those broken devices.
PR: usb/90141
Submitted by: Ed Schouten <ed@fxq.nl> (with minor mods from me)
MFC after: 1 week
force allocation of unallocated BARs (cardbus uses this to preallocate
everything). Add a prefetchmask to allow for busses that get prefetch
hints to set them. Addjust pci_add_map and pci_ata_maps to take a new
force flag which pci_add_resources will pass in. Implement 'force' in
pci_add_map. Write new value of allocated resource into the bar, if
the allocation succeeded (we should have done this before, but with
the new force the bug was very obvious).
- Provide tunable vm.memguard.desc, so one can specify memory type without
changing the code and recompiling the kernel.
- Allow to use memguard for kernel modules by providing sysctl
vm.memguard.desc, which can be changed to short description of memory
type before module is loaded.
- Move as much memguard code as possible to memguard.c.
- Add sysctl node vm.memguard. and move memguard-specific sysctl there.
- Add malloc_desc2type() function for finding memory type based on its
short description (ks_shortdesc field).
- Memory type can be changed (via vm.memguard.desc sysctl) only if it
doesn't exist (will be loaded later) or when no memory is allocated yet.
If there is allocated memory for the given memory type, return EBUSY.
- Implement two ways of memory types comparsion and make safer/slower the
default.
This should reduce huge playback / recording latency for
applications that try to act smarter and manage their own
buffering (XMMS, Skype, etc.).
Note to Skype + via8xxx users: Remove previous hackish
"hint.pcm.<unit>.via_dxs_disabled" from kernel hint and see
whether this changes cure all those annoying sound issues.
but broke handling of the turboG channel; since we aren't ready to revamp
the channel list just check for turboA channels for now so channel 6 is
considered in auto mode
Noticed by: gibbs
try very hard to be perfect. However, these attempts broke down when
there were large numbers of resources. We'd not be able to map them all.
Instead, accept that we might pass more range to thse subbus than
might be optimal be able to compute. However, there's little harm in
this and it allows us to pass greater resources through.
# it has been suggested that we allocate a fixed amount of resources
# on attach and give it out upon request. This might not be a bad idea...
The race is very real, but conditions needed for triggering it are rather
hard to meet now.
When gjournal will be committed (where it is quite easy to trigger) we need
to fix it.
For now, verify if it is really hard to trigger.
Discussed with: kan
of msleep(). msleep_spin() doesn't support changing the priority of the
thread while it is asleep nor does it support interruptible sleeps (PCATCH)
or the PDROP flag. It does support timeouts however. It differs from
msleep() in that the passed in mutex is a spin mutex. This means one can
use msleep_spin() and wakeup() with a spin mutex similar to msleep() and
wakeup() with a regular mutex. Note that the spin mutex in question needs
to come before sched_lock and the sleepq locks in lock order.
spin locks that are not in the static order list. It is not safe to call
printf while holding the witness spin mutex since the console drivers that
back printf may need to use their own spin locks which would try to talk
to witness when they were locked. Given this, it is possible for one
CPU to lock a console driver lock (such as sio) which then tries to lock
the witness lock while another CPU is doing the printf while holding the
witness lock. Fix this by moving the printf outside of the witness lock.
All other printf's in witness are already correct.
MFC after: 3 days
ETHERTYPE_IPV6 frames. Change this to be a sysctl knob so that is able to still
bridge non-IP packets if desired.
Also return early if all pfil_* sysctls are turned off, the user obviously does
not want to filter on the bridge.
cardbus_cis.c to this file, some code was not merged and thus resource
list entries were invalid. They didn't have a resources attached to
them.
However, the problem was masked for some time later, because newer
resources list entries were added to the head of the list, and
resource_list_find() always returned the first matching resource list
entry. Usually the underlying driver allocated a valid resource and
added it to the head of the list, and invalid one wasn't used.
In rev. 1.174 of subr_bus.c the sorting of resource list entries was
reversed demasking the problem in cardbus_alloc_resources().
This commit fixes the problem returning back some code from
cardbus_cis.c, pre-1.49 revisions.
PR: kern/87114
PR: kern/90441
Hardware provided by: Vasily Olekhov <olekhov yandex.ru>
Reviewed by: imp
. remove unnecessay header files after Scott's bus_dma(9) commit.
. remove global variable tis which was introduced at the time of
zero_copy(9) changes. The variable tis was not used at all. The
same applyes to ti_links in softc so axe it.
. deregister variables.
. axe ti_vhandle and switch to use explicit register access for
accessing NIC local memory. Creates three variants of ti_mem to
read/write NIC local memory(ti_mem_read, ti_mem_write) and clearing
NIC local memory(ti_mem_zero). This greatly enhances code
readability and have ti(4) drop using shared memory scheme for
Tigon 1. As Tigon 1 switched to use explicit register access for Tx,
axe ti_tx_ring_nic/ti_cmd_ring in softc.(Tigon 2 used to host ring
scheme which means there is no need to access NIC local memory via
register access for Tx and NIC would DMA the modified Tx rings into
its local memory.) [1]
. introduce new macro TI_EVENT_*/TI_CMD_* to handle NIC envent/command.
Instead of using bit fields assginment for accessing the event, use
shift operations to set/get it. [1]
. add additional check for valid DMA tags in ti_free_dmamaps().
. add missing bus_dmamap_sync/bus_dmamap_unload in ti_free_*_ring_*.
. fix locking nits(MTX_RECURSE mutex) and make ti(4) MPSAFE.
. change data type of ti_rdata_phys to bus_addr_t and don't blindly
cast to uint32_t.
. rearrange detach path and make ti(4) survive during device detach.
. for Tigon 1, use explicit register access for checking Tx descriptors
in ti_encap()/ti_txeof(). [1]
. properly call bus_dmamap_sync(9) for updating statistics.
. remove extra semicolon in ti_encap()
. rewrite loading MAC address to work on strict-alignment architectures.
. move TI_RD_OFF macro to if_tireg.h
. axe ETHER_ALIGN as it's already defined in <net/ethernet.h>.
. make macros immuine from expansion by adding parenthesis and do-while.
. remove alpha specific hack as vtophys(9) is no longer used in ti(4)
after Scott's bus_dma(9) fix.
Reviewed by: scottl
Obtained from: OpenBSD [1]
UMA_SLAB_MALLOC flag.
In some circumstances (I observed it when I was doing a lot of reallocs)
UMA_SLAB_MALLOC can be set even if us_keg != NULL.
If this is the case we have wonderful, silent data corruption, because less
data is copied to the newly allocated region than should be.
I'm not sure when this bug was introduced, it could be there undetected
for years now, as we don't have a lot of realloc(9) consumers and it was
hard to reproduce it...
...but what I know for sure, is that I don't want to know who introduce
the bug:) It took me two/three days to track it down (of course most of
the time I was looking for the bug in my own code).
amd64_set_watch() as 'unsigned int' and 'unsigned int' is 32bit long on amd64.
Even with that fix hardware watchpoint don't work for me on amd64, ie. when
I set the watchpoint and write a byte there, nothing happens.
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.
PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>
"established" state.
Similar to OpenBSD's rev. 1.499 by joel but not breaking ABI.
Obtained from: OpenBSD (with changes)
Reported by: Bruno Afonso
MFC after: 3 days
X-MFC: together with local_flags
POSIX. This also makes the struct correct we ever implement an i386-time64
architecture. Not that we need too.
Reviewed by: imp, brooks
Approved by: njl (acpica), des (no objects, touches procfs)
Tested with: make universe
Specifically, it is required for the I/O that may be performed by
elfN_load_section().
Avoid an obscure deadlock in the a.out, elf, and gzip image
activators. Add a comment describing why the deadlock does not occur
in the common case and how it might occur in less usual circumstances.
Eliminate an unused variable from exec_aout_imgact().
In collaboration with: tegge
by debugger, e.g process is dumping core. Only access p_xthread if
P_STOPPED_TRACE is set, this means thread is ready to exchange signal
with debugger, print a warning if P_STOPPED_TRACE is not set due to
some bugs in other code, if there is.
The patch has been tested by Anish Mistry mistry.7 at osu dot edu, and
is slightly adjusted.
I'm holding off on building on sparc64 and others because I don't know
if this driver has had all the inb/outb removed (I think it has). Nor
do I know if there are byte ordering issues. There are very few word
operations on an NE2000, but I've not had time to audit them all.
Suggested by: Daniel O'Connor
handling code so the stack trace unwinders don't start trying to
go into user-space.
Found by trying to create core dumps with a KTR_COMPILE/KTR_GEOM
kernel, which results in a stack_save() call in the ast() coredump
path - this created a panic, and then calling 'trace' in ddb resulted
in the black screen of death after printing out most of the backtrace.
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly. In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures. Basically,
all the archs were taking a trapframe and converting it into a clockframe
one way or another. Now they can just extract the PC and usermode values
directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
timecounter, call hardclock() directly. This removes an extra
conditional check from every clock interrupt on Alpha on the BSP.
There is probably room for even further pruning here by changing Alpha
to use the simplified timecounter we use on x86 with the lapic timer
since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
is false, so add a KASSERT() to that affect and remove a condition
to slightly optimize the non-lapic case.
- Change prototypeof arm_handler_execute() so that it's first arg is a
trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.
Tested on: alpha, amd64, arm, i386, ia64, sparc64
Reviewed by: bde (mostly)
if we need a valid MAC address (for probing the media for example) before
ether_ifattach() has been called since IF_LLADDR() is NULL then.
Tested by: tisco
except for BGE_CHIPID_BCM5700_B0, which is buggy.
- All bge(4) supported hardware, has a bug that produces incorrect checksums
on Ethernet runts. However, in case of a transmitted packet, the latter can
be padded with zeroes, and the checksum would be correct. (Probably chip
includes the pad data into checksum). In case of receive, we just don't
trust checksum data in received runts.
Obtained from: NetBSD (jonathan) via Mihail Balikov
Previously it always returned 0 which means success regardless of
EEPROM status.
While here, add a check whether EEPROM read is successful.
Submitted by: jkim
- removed unused funtion bge_handle_events().
- removed bus_dmamap_destroy(9) calls for DMA maps created by
bus_dmamem_alloc(9). This should fix panics seen on sparc64
in device detach.
- added check for parent DMA tag creation.
- switched to use __NO_STRICT_ALIGNMENT as bge(4) supports all
architectures.
- added missing bus_dmamap_sync(9) in bge_txeof().
- added missing bus_dmamap_sync(9) in bge_encap().
- corrected memory synchronization operation on status block.
As the driver just read status block that was DMAed by NIC it
should use BUS_DMASYNC_POSTREAD. Likewise the driver does not
need to write status block back, so remove unnecessary
bus_dmamap_sync(9) calls in bge_intr().
- corrected memory synchronization operation on RX return ring.
The driver only read the block so remove unnecessary
bus_dmamap_sync(9) in bge_rxeof().
- force bus_dmamap_sync(9) for only modified descriptors. Blindly
synching all desciptor rings would reduce performance.
- call bus_dmamap_sync(9) for DMA maps that were modified in bge_rxeof().
Reviewed by: jkim(initial version)
Tested by: glebius(i386), jkim(amd64 initial version)
interfaces to bridges, which will then send and receive IP protocol 97 packets.
Packets are Ethernet frames with an EtherIP header prepended.
Obtained from: NetBSD
MFC after: 2 weeks
it and reacquiring it in vrele(). Consequently, there is no reason to
increase the reference count on the vm object caching the file's pages.
Reviewed by: tegge
Eliminate unused parameters to elfN_load_file().
3MB of physical memory for heap instead of range between 1MB and 4MB.
This makes this feature working with PAE and amd64 kernels, which are
loaded at 2MB. Teach i386_copyin() to avoid using range allocated by
heap in such case, so that it won't trash heap in the low memory
conditions.
This should make loading bzip2-compressed kernels/modules/mfs images
generally useable, so that re@ team is welcome to evaluate merits
of using this feature in the installation CDs.
Valuable suggestions by: jhb
the addition of pci_find_extcap().
- Change the drm drivers to attach to vgapci. This is #ifdef'd so the
code can be shared across branches.
- Use pci_find_extcap() to look for AGP and PCIE capabilities in drm.
- GC all the drmsub stuff for i810/i830/i915. The agp and drm devices are
now both children of vgapci.
as a bus so that other drivers such as drm(4), acpi_video(4), and agp(4)
can attach to it thus allowing multiple drivers for the same device. It
also removes the need for the drmsub hack for the i8[13]0/i915 drm and agp
drivers.
as a bus so that other drivers such as drm(4), acpi_video(4), and agp(4)
can attach to it thus allowing multiple drivers for the same device. It
also removes the need for the drmsub hack for the i8[13]0/i915 drm and agp
drivers.
attach to the hostb driver instead. This means that agp can now be loaded
at runtime (in theory at least). Also, the drivers no longer have to
explicity call device_verbose() to cancel out any earlier calls to
device_quiet() by the hostb(4) driver (this shows a limitation in new-bus,
drivers really shouldn't be doing device_quiet() until they know they are
going to drive that device, i.e. in attach).
duplicated anyways) and into a single MI driver. Extend the driver a bit
to implement the bus and PCI kobj interfaces such that other drivers can
attach to it and transparently act as if their parent device is the PCI
bus (for the most part).
ELF trampoline build its own target, "trampoline".
It makes it possible to construct a bootable gzipped kernel without having
to build in the same process.
drivers already map sections into KVA as needed anyway. Note that this
will probably break the nvidia driver, but I will coordinate to get that
fixed.
MFC after: 2 weeks
to search for a specific extended capability. If the specified capability
is found for the given device, then the function returns success and
optionally returns the offset of that capability. If the capability is
not found, the function returns an error.
The purpose of this change is consistency (not performance improvement:)),
as it was hard to tell if fdrop() is MPSAFE or not when I saw it sometimes
under the Giant and sometimes without it.
Glanced at by: ssouhlal, kan
special handling when zero. This caused no PFSYNC_ACT_DEL message and thus
disfunction of pfflowd and state synchronisation in general.
Discovered by: thompsa
Good catch by: thompsa
MFC after: 7 days
provide enough room for decompression (up to 2.5MB is necessary). This
should be safe to do since we load i386 kernels after 8MB mark now, so
that 16MB is the minimum amount of RAM necessary to even boot FreeBSD.
This makes bzip2-support practically useable.
memory directly available to loader(8) and friends was limited to 640K on i386.
Those times have passed long time ago and now loader(8) can directly access
up to 4GB of RAM at least theoretically. At the same time, there are several
places where it's assumed that malloc() will only allocate memory within
first megabyte.
Remove that assumption by allocating appropriate bounce buffers for BIOS
calls on stack where necessary.
This allows using memory above first megabyte for heap if necessary.
1. The ELF-64 typedefs are now standardized, so that the libelf port
(devel/libelf) does not need to compensate for not having the
Elf64_Xword and Elf64_Sxword types.
2. ELF Symbol versioning support has been added. This also affects
the libelf port (though configure should detect this correctly).
we can cache its value in the softc. Eliminates one PCI register
write per call to bge_start().
A 1.8% speedup for UDP_RR test on my old box.
Obtained from: NetBSD(jonathan) via delphij
to be compatible with symbol versioning support as implemented by
GNU libc and documented by http://people.redhat.com/~drepper/symbol-versioning
and LSB 3.0.
Implement dlvsym() function to allow lookups for a specific version of
a given symbol.
case if memory allocation failed.
- Remove fourth argument from VLAN_INPUT_TAG(), that was used
incorrectly in almost all drivers. Indicate failure with
mbuf value of NULL.
In collaboration with: yongari, ru, sam
means:
o Remove Elf64_Quarter,
o Redefine Elf64_Half to be 16-bit,
o Redefine Elf64_Word to be 32-bit,
o Add Elf64_Xword and Elf64_Sxword for 64-bit entities,
o Use Elf_Size in MI code to abstract the difference between
Elf32_Word and Elf64_Word.
o Add Elf_Ssize as the signed counterpart of Elf_Size.
MFC after: 2 weeks
o Remove the unused and non-standard SHT_NUM, PT_COUNT and DT_COUNT.
o Add the STV_DEFAULT, STV_INTERNAL, STV_HIDDEN and STV_PROTECTED
symbol visibility constants.
o Add the ELF32_ST_VISIBILITY and ELF64_ST_VISIBILITY macros to
get the symbol visibility from the st_other field.
o Add the ELFOSABI_AIX, ELFOSABI_OPENVMS and ELFOSABI_NSK constants.
o Add the ET_LOOS, ET_HIOS, ET_LOPROC and ET_HIPROC constants.
o Further flesh out the list of machine types. Note that EM_ALPHA
remains non-standard. The standard value for EM_ALPHA is given
by EM_ALPHA_STD (which is a non-standard name :-)
o Add the SHN_LOOS, SHN_HIOS and SHN_XINDEX constants.
o Add the SHT_INIT_ARRAY, SHT_FINI_ARRAY, SHT_PREINIT_ARRAY, SHT_GROUP
and SHT_SYMTAB_SHNDX constants.
o Add the SHF_MERGE, SHF_STRINGS, SHF_INFO_LINK, SHF_LINK_ORDER,
SHF_OS_NONCONFORMING, SHF_GROUP and SHF_MASKOS constants.
o Add the PF_MASKOS and PF_MASKPROC constants.
o Add the STB_LOOS andf STB_HIOS constants.
o Add the STT_COMMON, STT_LOOS and STT_HIOS constants.
MFC after: 1 week
32-bit entity. Also, don't cast the resulting symbol type value to
a datatype smaller than the st_info field type as a quick way to
mask off the upper bits as it may cause inconsistent behaviour when
the macro is used (without explicit casting) on varargs functions.
MFC after: 1 week
span ports when they disappear. The span port does not have a pointer to the
softc so revert r1.31 and bring back the softc linked-list.
MFC after: 2 weeks
and KTR_IO as they were never used. Remove KTR_CLK since it was only
used for hardclock firing and use KTR_INTR there instead. Remove
KTR_CRITICAL since it was only used for crit enter/exit and use
KTR_CONTENTION instead.
AMD-8111 SMBus 2.0 controller) are all SMBus 2.0 controllers,
and need another implementation of SMBus access methods, while
this driver supports AMD-756 SMBus 1.0 controller and clones,
including AMD-8111 SMBus 1.0 controller.
Tested by: Vladimir Timofeev (0x006410de),
mezz (0x008410de),
ru (0x00d410de)
All of us got the same(!) nonsense when running ``mbmon -S'',
repeated every four rows.
really should be a fptrdiff_t if we had that) in profclock().
- Don't try to profile kernel pc's that are >= the kernel lowpc to avoid
underflows when computing a profiling index.
- Use the PC_TO_I() macro to compute the kernel profiling index rather than
doing it inline.
Discussed with: bde
ephemeral mappings that are used as the source for three copy
operations from kernel space to user space. There are two reasons for
making this change: (1) Under heavy load exec_map can fill up causing
vm_map_find() to fail. When it fails, the nascent process is aborted
(SIGABRT). Whereas, this reimplementation using sf_buf_alloc()
sleeps. (2) Although it is possible to sleep on vm_map_find()'s
failure until address space becomes available (see kmem_alloc_wait()),
using sf_buf_alloc() is faster. Furthermore, the reimplementation
uses a CPU private mapping, avoiding a TLB shootdown on
multiprocessors.
Problem uncovered by: kris@
Reviewed by: tegge@
MFC after: 3 weeks
SMBus 1.0 and not SMBus 2.0.
AMD-8111 hub (datasheet is publically available) implements both SMBus
2.0 (a separate PCI device) and SMBus 1.0 (a subfunction of the System
Management Controller device with the base I/O address is accessible
through the CSR 0x58). This driver only supports AMD-756 SMBus 1.0
compatible devices.
With the patched sysutils/xmbmon port (to also fix PCI ID and to enable
smb(4) support), I now get:
pciconf:
none0@pci0:7:2: class=0x0c0500 card=0x746a1022 chip=0x746a1022 rev=0x02 hdr=0x00
vendor = 'Advanced Micro Devices (AMD)'
device = 'AMD-8111 SMBus 2.0 Controller'
class = serial bus
subclass = SMBus
amdpm0@pci0:7:3: class=0x068000 card=0x746b1022 chip=0x746b1022 rev=0x05 hdr=0x00
vendor = 'Advanced Micro Devices (AMD)'
device = 'AMD-8111 ACPI System Management Controller'
class = bridge
dmesg:
amdpm0: <AMD 756/766/768/8111 Power Management Controller> port 0x10e0-0x10ff at device 7.3 on pci0
smbus0: <System Management Bus> on amdpm0
# mbmon -A -d
Summary of Detection:
* SMB monitor(s)[ioctl:AMD8111]:
** Winbond Chip W83627HF/THF/THF-A found at slave address: 0x50.
** Analog Dev. Chip ADM1027 found at slave address: 0x5C.
* ISA monitor(s):
** Winbond Chip W83627HF/THF/THF-A found.
I think the confusion comes from the fact that nobody really tried
SMBus with xmbmon :-), since sysutils/xmbmon port doesn't come with
SMBus support enabled, neither in FreeBSD 4, nor in later versions,
so mbmon(1) was just showing the values from the Winbond sensors
accessible through the ISA I/O method (mbmon -I), for me anyway.
On my test machine, the amdpm(4) didn't even attach due to I/O port
allocation failure (who knows what the hell it read from CSR 0x58
of the SMBus 2.0 device :-), which isn't in the CSR space).
I've also checked that lm_sensors.org uses correct PCI ID for SMBus
1.0 of AMD-8111:
i2c-amd756.c: {PCI_VENDOR_ID_AMD, 0x746B, PCI_ANY_ID, PCI_ANY_ID, 0, 0, AMD8111 },
This driver is analogous to our amdpm.c which supports SMBus 1.0
AMD-756 and compatible devices, including SMBus 1.0 on AMD-8111.
i2c-amd8111.c: { 0x1022, 0x746a, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
This driver is analogous to nForce-2/3/4, i2c-nforce2.c, which
supports SMBus 2.0, and which our amdpm.c does NOT support
(SMBus 2.0 uses a different, ACPI-unified, API to talk to SMBus).
At least I know for sure it doesn't work with my nForce3. :-)
(The xmbmon port will be fixed to correct the PCI ID too and to
enable the smb(4) support.)
command. This fixes some weird booting issues on newer versions
of the firmware on the MSA20.
Reported by: Philippe Pegon <Philippe dot Pegon at crc dot u-strasbg dot fr>
which existed to cleanup the linux_osname mutex. Now that MTX_SYSINIT()
has grown a SYSUNINIT to destroy mutexes on unload, the extra destroy here
was redundant and resulted in panics in debug kernels.
MFC after: 1 week
Reported by: Goran Gajic ggajic at afrodita dot rcub dot bg dot ac dot yu
- Give up endianess support and switch to native-endian format for
accessing hardware structures. In fact embedded processor for
BCM57xx is big-endian architure(MIPS) and it requires native-endian
format for NIC structures.The NIC performs necessary byte/word
swapping depending on programmed endian type.
- With above changes all htole16/htole32 calls were gone.
- Remove bge_vhandle member in softc and changed to use explicit
register access. This may add additional performance penalty
that than that of previous memory access. But most of the access
is performed on initialization phase(e.g. RCB setup), it would be
negligible.
Due to incorrect use of bus_dma(9) in bge(4) it still panics sparc64
system in device detach path. The issue would be fixed in next patch.
Reviewed by: jkim (initial version)
Silence from: ps
Tested by: glebius
Obtained from: NetBSD via OpenBSD
mbuf chain that starts with a cluster containing just MHLEN bytes. This
happened because m_dup called m_get or m_getcl depending on the amount of
data to copy, but then always set the size available in the first mbuf to
MHLEN.
Submitted by: Matt Koivisto <mkoivisto at sandvine dot com>
Approved by: jmg
Silence from: rwatson (mentor)
SMBus busses. Because of limitations in smbus_if.m, the second smbus is
attached to an amdpm1 device that is a child of amdpm0.
Submitted by: Artemiev Igor ai (at) bmc dot brk dot ru
1. Implement a large set of ioctl shims so that the Linux management apps
from LSI will work. This includes infrastructure to support adding, deleting
and rescanning arrays at runtime. This is based on work from Doug Ambrosko,
heavily augmented by LSI and Yahoo.
2. Implement full 64-bit DMA support. Systems with more than 4GB of RAM
can now operate without the cost of bounce buffers. Cards that cannot do
64-bit DMA will automatically revert to using bounce buffers. This option
can be forced off by setting the 'hw.amr.force_sg32" tunable in the loader.
It should only be turned off for debugging purposes. This work was sponsored
by Yahoo.
3. Streamline the command delivery and interrupt handler paths after
much discussion with Dell and LSI. The logic now closely matches the
intended design, making it both more robust and much faster. Certain
i/o failures under heavy load should be fixed with this.
4. Optimize the locking. In the interrupt handler, the card can be checked
for completed commands without any locks held, due to the handler being
implicitely serialized and there being no need to look at any shared data.
Only grab the lock to return the command structure to the free pool. A
small optimization can still be made to collect all of the completions
together and then free them together under a single lock.
Items 3 and 4 significantly increase the performance of the driver. On an
LSI 320-2X card, transactions per second went from 13,000 to 31,000 in my
testing with these changes. However, these changes are still fairly
experimental and shouldn't be merged to 6.x until there is more testing.
Thanks to Doug Ambrosko, LSI, Dell, and Yahoo for contributing towards
this.
to use busdma. Unlike most of the other drivers, but similar to the
if_em driver, pre-allocate the dmamaps at init time instead of allocating
them on the fly when descriptors need to be filled. This isn't ideal right
now because a map is allocated for every descriptor slot in the tx, rx, mini,
and jumbo rings (which is a lot!) in order to simplify the bookkeeping, even
though the driver might support filling only a subset of those slots.
Luckily, maps are typically NULL on i386 and amd64, so the cost isn't
very high. It could be an issue with sparc64, but the driver isn't endian
clean either, and that is a much bigger problem to solve first.
Note that jumbo frame support is under-tested, and I'm not even sure if
it till really works correctly given the evil VM magic that is does.
The changes here attempt to preserve the existing semanitcs.
Thanks to Martin Nillson for contributing the Netgear card for this work.
MFC-After: 3 weeks
class, then it displays various information about the lock and calls a
new function pointer in lock_class (lc_ddb_show) to dump class-specific
information about the lock as well (such as the owner of a mutex or
xlock'ed sx lock). This is easier than staring at hex dumps of locks to
figure out who owns the lock, etc. Note that extending lock_class doesn't
affect the ABI for any kernel modules as the only code that deals with
lock_class structures directly is kern_mutex.c, kern_sx.c, and witness.
MFC after: 1 week
originally thought. The BIOS that cleared CPUID_APIC actually managed
to disable the local APIC entirely and even Windows 64 doesn't boot on
it.
Reported by: bz
they are passed by reference. Handle the difference within the
linux_ioctl_termio on the LINUX_TCFLSH path.
Submitted by: Jaroslav Drzik <jaro_AT_coop-voz_dot_sk>
if the boot CPU has a local APIC because some BIOS vendors are not
competent enough to set this bit. Instead, just assume that we always have
a local APIC on amd64. For i386 the check is a bit more subtle. FreeBSD
requires either an MP Table or an ACPI MADT table to enumerate APICs. The
only systems that have one of those tables that don't have local APICs are
some presumably rare (and old) SMP 486 systems using external APICs. Thus,
instead of checking the CPUID_APIC flag, check the CPU class and abort if
we are running on a 486.
MFC after: 1 week
Reported by: bz
action argument with the value obtained from table lookup. The feature
is now applicable only to "pipe", "queue", "divert", "tee", "netgraph"
and "ngtee" rules.
An example usage:
ipfw pipe 1000 config bw 1000Kbyte/s
ipfw pipe 4000 config bw 4000Kbyte/s
ipfw table 1 add x.x.x.x 1000
ipfw table 1 add x.x.x.y 4000
ipfw pipe tablearg ip from table(1) to any
In the example above the rule will throw different packets to different pipes.
TODO:
- Support "skipto" action, but without searching all rules.
- Improve parser, so that it warns about bad rules. These are:
- "tablearg" argument to action, but no "table" in the rule. All
traffic will be blocked.
- "tablearg" argument to action, but "table" searches for entry with
a specific value. All traffic will be blocked.
- "tablearg" argument to action, and two "table" looks - for src and
for dst. The last lookup will match.
changes DELAY to use the TSC once it has been calibrated. This does NOT
use the TSC for long-term timekeeping. It only uses it to bound the
DELAY() spinloop. This should not be affected by the Athlon64 X2 TSC
quirks because the cpu is not halted while we use DELAY().
compilation of kernels without ns8250 support but using the uart framework.
These kernels will be for machines where size matters more, so including code
that can never be executed is undesriable...
this can happen under certain conditions when scanning. This logic
will eventually go away with the new scanning code.
While here de-inline the routine.
MFC after: 1 week
in the 802.11 layer: we send a directed probe request frame to the
current ap bmiss_max times (w/o answer) before scanning for a new ap.
MFC after: 2 weeks
working at all and only saw "nve0: device timeout (N)" messages.
- Setup PHY before handing control to NVidia API setting
speed, duplex, enabling interrupts, etc.
- Add restriction of MAXADDR_32BIT for high address to contigmalloc
to make the driver work on machines with 4+GB of memory.
PR: kern/85583, kern/88045
Tested by: scottl, others earlier version
MFC after: 10 days
- Implement cv_wait_unlock() method which has semantics compatible
with the sv_wait() method in IRIX. For cv_wait_unlock(), the lock
must be held before entering the function, but is not held when the
function is exited.
- Implement the existing cv_wait() function in terms of cv_wait_unlock().
Submitted by: kan
Feedback from: jhb, trhodes, Christoph Hellwig <hch at infradead dot org>
being hold by current thread or ignored by current process,
otherwise, it is very possible the thread will enter an infinite loop
and lead to an administrator's nightmare.
- number of read I/O requests,
- number of write I/O requests,
- number of read bytes,
- number of written bytes.
Add 'reset' subcommand for resetting statistics.
transmitted bits was between 8.6180us and 8.6200us when we used a RCLK
of 16.500MHz. This is a little low (should be 8.6805us). This error
is exactly the error one would expect if it actually had a 16.384MHz
watch oscillator (as suggested by garrett) instead of using the PCI
RCLK. Assume that the pci clock therefore wasn't really used, but
instead the cheap 16.384MH watch quartz oscillator. This gives bits
in the 8.6800us to 8.6810us ranage, which matches theoretical.
Submitted by: garrett
- Move PUSH_FRAME and POP_FRAME to asmacros.h and use PUSH_FRAME in
atpic entry points.
- Move PCPU_* asm macros out of the middle of the asm profiling macros.
- Pass IRQ vector argument as an int rather than void * to reduce diffs
with i386.
- EOI the lapic in C for the lapic timer handler.
- GC unused Xcpuast function.
- Split IPI_STOP handling code of ipi_nmi_handler() out into a
cpustop_handler() function and call it from Xcpustop rather than
duplicating all the logic in assembly.
- Fixup the list of symbols with interrupt frames in ddb traces.
Xatpic_fastintr* have never existed on amd64, and the lapic timer
handler and various IPI handlers were missing.
- Use trapframe instead of intrframe for interrupt entry points (on amd64
the interrupt vector was already a separate argument, so the two frames
were already identical) and GC intrframe.
Submitted by: peter (3)
cluster allocator, that wasn't MPSAFE. Instead, utilize our new generic
UMA jumbo cluster allocator. Since UMA gives us a 9k piece that is contigous
in virtual memory, but isn't contigous in physical memory we need to handle
a few segments. To deal with this we utilize Tigon chip feature - extended
RX descriptors, that can handle up to four DMA segments for one frame.
Details:
o Remove bge_alloc_jumbo_mem(), bge_free_jumbo_mem(),
bge_jalloc(), bge_jfree() functions.
o Remove SLIST heads, bge_jumbo_tag, bge_jumbo_map from softc.
o Use extended RX BDs for Jumbo receive producer ring, and
initialize it appropriately.
o New bge_newbuf_jumbo():
- Allocate an mbuf with Jumbo cluster with help of m_cljget().
- Load the cluster for DMA with help of bus_dmamap_load_mbuf_sg().
- Assert that we got 3 segments in the DMA mapping.
- Fill in these 3 segments into the extended RX descriptor.
2) rework link state detection code & use it in POLLING mode
3) fix 2 bugs in link state detection code:
a) driver unable to detect link loss on bcm5721
b) on bcm570x chips (tested on bcm5700 bcm5701 bcm5702) driver fails
to detect link loss with probability 1/6 (solved in brgphy.c)
Devices working in TBI mode should not be affected by this change.
Approved by: glebius (mentor)
MFC after: 1 month
4k clusters in addition to 9k and 16k ones.
struct mbuf *m_getjcl(int how, short type, int flags, int size)
void *m_cljget(struct mbuf *m, int how, int size)
m_getjcl() returns an mbuf with a cluster of the specified size attached
like m_getcl() does for 2k clusters.
m_cljget() is different from m_clget() as it can allocate clusters
without attaching them to an mbuf. In that case the return value
is the pointer to the cluster of the requested size. If an mbuf was
specified, it gets the cluster attached to it and the return value
can be safely ignored.
For size both take MCLBYTES, MJUM4BYTES, MJUM9BYTES, MJUM16BYTES.
Reviewed by: glebius
Tested by: glebius
Sponsored by: TCP/IP Optimization Fundraise 2005
"done" method so that for non-repeat operations we have completely
finished with the transfer by the time the callback is invoked.
This makes it possible to recycle a transfer from within the callback
routine for the same transfer. Previously this almost worked, but
with OHCI controllers calling the "done" method after the callback
would zero out some important fields needed by the recycled transfer.
Only some usb peripheral drivers such as ucom appear to rely on the
ability to reuse a transfer from its callback.
MFC after: 1 week
- Move vtophys() macros next to vtopte() where vtopte() exists to match
comments above vtopte().
- Remove references to the alternate address space in the comment above
vtopte(). amd64 never had the alternate address space, and i386 lost it
prior to PAE support being added.
- s/entires/entries/ in comments.
Reviewed by: alc
KTR_* class macros via genassym.c. Together with sys/sys/ktr.h
rev. 1.34 this has the desired side-effect of providing a default
value for KTR_COMPILE. Thus this fixes warnings from -Wundef
regarding KTR_COMPILE not being defined for .S files.
Requested by: ru
Reviewed by: ru
ktr_tracepoint() and the macros using it. This allows this header
to be included in .S files for obtaining the KTR_* class macros
directly and providing a default value for KTR_COMPILE in case it's
not specified in the kernel config file including defining it to 0
when not using 'options KTR' at all.
Requested by: ru
Reviewed by: ru
MACHINE_ARCH and MACHINE). Their purpose was to be able to test
in cpp(1), but cpp(1) only understands integer type expressions.
Using such unsupported expressions introduced a number of subtle
bugs, which were discovered by compiling with -Wundef.
_MACHINE == i386 test always succeeds, even on non-i386 (both
sides of expressions become 0). Remove the comment since
_MACHINE and _MACHINE_ARCH are going away.
of the radix lookup tables. Since several rnh_lookup() can run in
parallel on the same table, we can piggyback on the shared locking
provided by ipfw(4).
However, the single entry cache in the ip_fw_table can't be used lockless,
so it is removed. This pessimizes two cases: processing of bursts of similar
packets and matching one packet against the same table several times during
one ipfw_chk() lookup. To optimize the processing of similar packet bursts
administrator should use stateful firewall. To optimize the second problem
a solution will be provided soon.
Details:
o Since we piggyback on the ipfw(4) locking, and the latter is per-chain,
the tables are moved from the global declaration to the
struct ip_fw_chain.
o The struct ip_fw_table is shrunk to one entry and thus vanished.
o All table manipulating functions are extended to accept the struct
ip_fw_chain * argument.
o All table modifing functions use IPFW_WLOCK_ASSERT().
(suggested by alfred@)
o Reuse si_band field in struct __siginfo, add a mqd member which will
be used by mqueue.
o Add code SI_KERNEL to indicate a signal is queued by kernel.
Use the following kernel configuration option to enable:
options BPF_JITTER
If you want to use bpf_filter() instead (e. g., debugging), do:
sysctl net.bpf.jitter.enable=0
to turn it off.
Currently BIOCSETWF and bpf_mtap2() are unsupported, and bpf_mtap() is
partially supported because 1) no need, 2) avoid expensive m_copydata(9).
Obtained from: WinPcap 3.1 (for i386)
time ago appears to be based not on the typical 1.8432MHz clock, or
the other more typical multiple of 8 of this (14.7456MHz), but instead
it appears to be 1/2 the PCI clock rate or 16.50000MHz. I'm not 100%
sure that this is right, but since I did the original entry, I'm going
to go ahead and modify it. With the 14.7456MHz value, I was getting
bits that were ~7.3us instead of ~8.6us like they are supposed to be.
My measuring gear for today is a stupid handheld scope with two
signficant digits. So I don't know if it is 33.000000/2 MHz or some
other value close to 16.5MHz, but 16.5MHz works well enough for me to
use a couple of different devices at 115200 baud, and is a nice even
multiple of a well known clock frequency...
rather than embedding it in the intrframe as if_vec. This reduces diffs
with amd64 somewhat.
- Remove cf_vec from clockframe (it wasn't used anyway) and stop pushing
dummy vector arguments for ipi_bitmap_handler() and lapic_handle_timer()
since clockframe == trapframe now.
- Fix ddb to handle stack traces across interrupt entry points that just
have a trapframe on their stack and not a trapframe + vector.
- Change intr_execute_handlers() to take a trapframe rather than an
intrframe pointer.
- Change lapic_handle_intr() and atpic_handle_intr() to take a vector and
trapframe rather than an intrframe.
- GC struct intrframe now that nothing uses it anymore.
- GC CLOCK_TO_TRAPFRAME() and INTR_TO_TRAPFRAME().
Reviewed by: bde
Requested by: peter
ipi_nmi_handler() and into a new cpustop_handler() function. Change
the Xcpustop IPI_STOP handler to call this function instead of
duplicating all the same logic in assembly.
- EOI the local APIC for the lapic timer interrupt in C rather than
assembly.
- Bump the lazypmap IPI counter if COUNT_IPIS is defined in C rather than
assembly.
commit to atpic.c) there may not be an IRQ 13. Instead, just keep going.
If the INT16 interface doesn't work then we will eventually panic anyway.
FWIW: We could probably just axe the support for IRQ 13 altogether at this
point. The only thing we'd lose support for are 486sx systems with
external 487 FPUs.
MFC after: 1 week
working IRQ0 with APIC anymore. Previously, it was possible to have
some other ATPIC IRQS "leak" through in a few edge cases. For example, on
my x86 test machine, ACPI re-routes the SCI (IRQ 9) to intpin 13 on the
first I/O APIC. This leaves a hole for IRQ 13 (since the APIC doesn't
provide a source for IRQ 13 in that case) with the result that the ATPIC
IRQ13 source was registered instead. This changes the 8259A drivers to
only register their interrupt sources if none of the 16 ISA IRQs have an
interrupt source already installed.
MFC after: 1 week
- Add a new SET_KERNEL_SREGS macro that sets up %ds and %es to point to
kernel data and %fs to point to per-CPU data and use the new macro
in several kernel entry points including trap and interrupt handlers.
- Convert the IPI_STOP handler Xcpustop to push a standard trap frame
rather than an application frame.
- Make the TRAP() macro private to exception.s since it is only used
there.
- Move the PCPU_*() macros in asmacros.h out of the middle of the
profiling macros.
Reviewed by: bde
Requested by: bde (4, 5)
acquired anywhere in the driver now.
- Axe the spin mutex used for the nve_oslock*() routines. The driver lock
already provides sufficient synchronization.
- Don't mess around with IFF_UP when the link state changes. IFF_UP is
an administrative flag, not a link status indicator.
MFC after: 1 week
lock object (and thus off of each mutex and sx lock):
- Rename the all_locks list to pending_locks and only put locks initialized
before SI_SUB_WITNESS on the list so that the SI_SUB_WITNESS can add them
to witness once it starts up.
- Now that pending_locks is only used during early startup, change it from
a TAILQ to an STAILQ. This removes a pointer from the STAILQ_ENTRY in
struct lock_object.
- Since the pending_locks list is only used during the single-threaded
early boot it no longer needs to be protected by a mutex, so remove
all_mtx.
- Since the lo_list member of struct lock_object is now only used during
early boot before witness is running, collapse lo_list and lo_witness
into a union. This shaves the second pointer off of struct lock_object.
- Axe lock_cur_cnt and lock_max_cnt.
With these changes, struct mtx shrinks from 36 to 28 bytes on 32-bit
platforms and from 72 to 56 bytes on 64-bit platforms. Note that this
commit will completely and utterly destroy the kernel ABI, so no MFC.
Tested on: alpha, amd64, i386, sparc64
immediately from acpi_pci_link_route_interrupt() since we aren't going
to have a valid pci_link device to talk to try to route interrupts. This
fixes a page fault if you disable just pci_link. Note that trying to use
ACPI without pci_link is probably not advised however.
MFC after: 1 week
Tested by: Eugene Grosbein eugen at kuzbass dot ru
since mount_smbfs(8) assumed long name mounting by default unless "-n long"
was explicitly specified.
Rather than supplying a "long" option in mount_smbfs(8), this commit brings
back the original behaviour by associating SMBFS_MOUNT_NO_LONG with the
"nolong" option. This should fix the broken long file names on smbfs people
observed recently.
Reported by: Vladimir Grebenschikov <vova at fbsd dot ru>
Reviewed by: phk
Tested by: Slawa Olhovchenkov <slw at zxy dot spb dot ru>
the eaddr array (introduced in rev. 1.174) prior to writing to it. As
dc_read_eeprom() is told to write only 3 16-bit words to eaddr but eaddr
in fact is somewhat larger removal of the zeroing defeated the check
whether the MAC address is all zero as there can be some random garbage
in eaddr past the 3 words written to it and the check verifys all bits
in eaddr. Solve this by changing the check to verify only the 3 words
(happenning to be ETHER_ADDR_LEN bytes) written to eaddr.
- While here change the notation of "FCode" in a nearby comment to the
official way.
Ok'ed by: marcel, ru
o plug memory leak in adhoc mode: on rx the sender may be the
current master so simply checking against ic_bss is not enough
to identify if the packet comes from an unknown sender; must
also check the mac address
o split neighbor node creation into two routines and fillin state
of nodes faked up on xmit when a beacon or probe response frame
is later received; this ensures important state like the rate set
and advertised capabilities are correct
Obtained from: netbsd
MFC after: 1 week
polarity. Some machines route PCI IRQs to an ISA IRQ but fail to include
an interrupt override entry to set the polarity and trigger of the given
ISA IRQ in their MADT table.
PR: usb/74989
Reported by: Julien Gabel jpeg at thilelli dot net
MFC after: 1 week
from sys/sparc64/include/ofw_upa.h to sys/sparc64/pci/ofw_pci.h and
rename them to struct ofw_pci_ranges and OFW_PCI_RANGE_* respectively.
This ranges struct only applies to host-PCI bridges but no to other
bridges found on UPA. At the same time it applies to all host-PCI
bridges regardless of whether the interconnection bus is Fireplane/
Safari, JBus or UPA.
- While here rename the PCI_CS_* macros in sys/sparc64/pci/ofw_pci.h
to OFW_PCI_CS_* in order to be consistent and change this header to
use uintXX_t instead of u_intXX_t.
the bridge (PCI bus A or B) we are attaching to rather than registering
both handlers at once when attaching to the first half we encounter.
This is a bit cleaner as it corresponds to which PCI bus error interrupt
actually is assigned to the respective half by the OFW and allows to
collapse both PCI bus error interrupt handlers into one function easily.
- Use the actual RID of the respective interrupt resource as index into
sc_irq_res and also use it when allocating the resource. For now this
is a bit cleaner and will be mandatory later on.
- According to OpenSolaris the spare hardware interrupt is used as the
over-temperature interrupt in systems with Psycho bridges. Unlike as
with the SBus-based workstations I didn't manage to trigger it when
covering the fan outlets of an U60 but better be safe than sorry and
register a handler anyway.
MFC after: 1 month
bug by explaining what the problem is and how the workaround works.
- Fix some cosmetics nits, mainly properly terminate sentences in comments,
which I missed when backporting the style changes to psycho(4) in psycho.c
rev. 1.54 due to lack of corresponding code.
- The "USIIe version of the Sabre bridge" actually is termed "Hummingbird";
name it as such in comments and messages.
and some fixes from Motomichi Matsuzaki. Testing involved many people, but the
final, successful testing was from rwatson who endured several rounds of "it
crashes at XYZ stage" "oh, please correct this typo and try again." The Linux
driver, and to a small extent the limited specs, were both used as a reference
for how to program the chipset.
PR: kern/80396
Submitted by: Martin Mersberger
the base rcorder. This is accomplished by running rcorder twice,
first to get all the disks mounted (through mountcritremote),
then again to include the local_startup directories.
This dramatically changes the behavior of rc.d/localpkg, as
all "local" scripts that have the new rc.d semantics are now
run in the base rcorder, so only scripts that have not been
converted yet will run in rc.d/localpkg.
Make a similar change in rc.shutdown, and add some functions in
rc.subr to support these changes.
Bump __FreeBSD_version to reflect this change.
revision 1.179 to correctly set/clear execute permission on the mapping
it creates. Thus, mmap(2)ing a memory resident file will not result in
the file being mapped with execute permission when execute permission was
not requested.
Eliminate an unneeded Instruction Memory Barrier (IMB) in
pmap_enter_quick(). Since there was no previous (instruction) mapping
for the given virtual address prior to pmap_enter_quick(), there can be
no instructions from the given virtual address in the pipeline that need
flushing.
MFC after: 1 week
Update Intel MatrixRAID support to be able to pick up RAID0+1 (RAID10)
and RAID5 arrays without panic'ing.
This has the side effect of now also supporting multiple volumes on
MatrixRAID's now I have the metadata better understood..
HW sponsored by: Mullet Scandinavia AB
commit. Copy the ethernet address into a local buffer, which we know
is sufficiently aligned for the width of the memory accesses that we
do. This also eliminates all suspicious and potentionally harmful
casts.
In collaboration with: ru
plain file bsdlabel(8) always writes label at a fixed offset from
its beginning (512 bytes), regardless of the sector size. At the same
time, bsdlabel geom class expects label to be available at the very
beginning of the second sector.
As a result, images prepared in userland for media with sector size
different from 512 bytes (i.e. 2k for cdroms) are not recognized by
the tasting mechanism.
Solve the problem by always looking for the label at 512-byte offset
if we can't find it at the beginning of the second sector and sector
size is not 512 bytes.
o The only indication of error condition is NULL value returned by
the function;
o value pointed to by error argument is undefined in the case when
operation completes successfully.
Discussed with: phk
only now) symbolic links in the kernel compile directory, rather
than relying on config(8) to do this. (The changes to config(8)
will be committed separately.) This is aimed towards making the
config(8) as lightweight as possible.
Idea by: bde (all bugs are mine)
sosend(). Robert accidentally changed the snderr() macro to jump to the
out label which assumes the lock is already released rather than the
release label which drops the lock in his previous change to sosend().
This should fix the recent panics about returning from write(2) with the
socket lock held and the most recent LOR on current@.
The sys/sys/stddef.h is here for some time now to fulfil the
kernel needs. It also was not reliable due to the exists(@)
check: in an empty module directory, "make depend; mv .depend
.depend~; make depend" ran mkdep(1) with different arguments.
o Do not use ipfw_insn_pipe->pipe_ptr in locate_flowset(). The
_ipfw_insn_pipe isn't touched by this commit to preserve ABI
compatibility.
o To optimize the lookup of the pipe/flowset in locate_flowset()
introduce hashes for pipes and queues:
- To preserve ABI compatibility utilize the place of global list
pointer for SLIST_ENTRY.
- Introduce locate_flowset(queue nr) and locate_pipe(pipe nr).
o Rework all the dummynet code to deal with the hashes, not global
lists. Also did some style(9) changes in the code blocks that were
touched by this sweep:
- Be conservative about flowset and pipe variable names on stack,
use "fs" and "pipe" everywhere.
- Cleanup whitespaces.
- Sort variables.
- Give variables more meaningful names.
- Uppercase and dots in comments.
- ENOMEM when malloc(9) failed.
- S3 Savage driver ported.
- Added support for ATI_fragment_shader registers for r200.
- Improved r300 support, needed for latest r300 DRI driver.
- (possibly) r300 PCIE support, needs X.Org server from CVS.
- Added support for PCI Matrox cards.
- Software fallbacks fixed for Rage 128, which used to render badly or hang.
- Some issues reported by WITNESS are fixed.
- i915 module Makefile added, as the driver may now be working, but is untested.
- Added scripts for copying and preprocessing DRM CVS for inclusion in the
kernel. Thanks to Daniel Stone for getting me started on that.
rare case of a stray interrupt to an unregistered source (such as a stray
interrupt from the 8259As when using APIC), this could result in a page
fault when it tried to walk the list of interrupt handlers to execute
INTR_FAST handlers. This bug was introduced with the intr_event changes,
so it's not present in 5.x or 6.x.
Submitted by: Mark Tinguely tinguely at casselton dot net
process as over the limit when its time is >= to the limit rather than >
the limit. Technically, if p->p_rux.rux_runtime.sec == p->p_pcpulimit
and p->p_rux.rux_runtime.frac == 0, the process hasn't exceeded the limit
yet. However, having the fraction exactly equal to 0 is rather rare, and
it is not worth the overhead to handle that edge case. With just the >
comparison, the process would have to exceed its limit by almost a second
before it was killed.
PR: kern/83192
Submitted by: Maciej Zawadzinski mzawadzinski at gmail dot com
Reviewed by: bde
MFC after: 1 week
chains and copying in mbufs from the body of the send logic, creating
a new function sosend_copyin(). This changes makes sosend() almost
readable, and will allow the same logic to be used by tailored socket
send routines.
MFC after: 1 month
Reviewed by: andre, glebius
appeared to rely on all kinds of non-guaranteed behaviours: the
transfer abort code assumed that TDs with no interrupt timeout
configured would end up on the done queue within 20ms, the done
queue processing assumed that all TDs from a transfer would appear
at the same time, and there were access-after-free bugs triggered
on failed transfers.
Attempt to fix these problems by the following changes:
- Use a maximum (6-frame) interrupt delay instead of no interrupt
delay to ensure that the 20ms wait in ohci_abort_xfer() is enough
for the TDs to have been taken off the hardware done queue.
- Defer cancellation of timeouts and freeing of TDs until we either
hit an error or reach the final TD.
- Remove TDs from the done queue before freeing them so that it
is safe to continue traversing the done queue.
This appears to fix a hang that was reproducable with revision 1.67
or 1.68 of ulpt.c (earlier revisions had a different transfer
pattern). With certain HP printers, the command "true > /dev/ulpt0"
would cause ohci_add_done() to spin because the done queue had a
loop. The list corruption was caused by a 3-TD transfer where the
first TD completed but remained on the internal host controller
done queue because it had no interrupt timeout. When the transfer
timed out, the TD got freed and reused, so it caused a loop in the
done queue when it was inserted a second time from a different
transfer.
Reported by: Alex Pivovarov
MFC after: 1 week
application wishes to request high precision time stamps be returned:
Alias Existing
CLOCK_REALTIME_PRECISE CLOCK_REALTIME
CLOCK_MONOTONIC_PRECISE CLOCK_MONOTONIC
CLOCK_UPTIME_PRECISE CLOCK_UPTIME
Add experimental low-precision clockid_t names corresponding to these
clocks, but implemented using cached timestamps in kernel rather than
a full time counter query. This offers a minimum update rate of 1/HZ,
but in practice will often be more frequent due to the frequency of
time stamping in the kernel:
New clockid_t name Approximates existing clockid_t
CLOCK_REALTIME_FAST CLOCK_REALTIME
CLOCK_MONOTONIC_FAST CLOCK_MONOTONIC
CLOCK_UPTIME_FAST CLOCK_UPTIME
Add one additional new clockid_t, CLOCK_SECOND, which returns the
current second without performing a full time counter query or cache
lookup overhead to make sure the cached timestamp is stable. This is
intended to support very low granularity consumers, such as time(3).
The names, visibility, and implementation of the above are subject
to change, and will not be MFC'd any time soon. The goal is to
expose lower quality time measurement to applications willing to
sacrifice accuracy in performance critical paths, such as when taking
time stamps for the purpose of rescheduling select() and poll()
timeouts. Future changes might include retrofitting the time counter
infrastructure to allow the "fast" time query mechanisms to use a
different time counter, rather than a cached time counter (i.e.,
TSC).
NOTE: With different underlying time mechanisms exposed, using
different time query mechanisms in the same application may result in
relative non-monoticity or the appearance of clock stalling for a
single clockid_t, as a cached time stamp queried after a precision
time stamp lookup may be "before" the time returned by the earlier
live time counter query.
This shouldn't happen as far as the self-id buffer is vaild but
some people have this problem.
PR: kern/83999
Submitted by: Markus Wild <fbsd-lists@dudes.ch>
MFC after: 3 days
- In ifc_name2unit(), disallow leading zeroes in a unit.
Exploit: ifconfig lo01 create
- In ifc_name2unit(), properly handle overflows. Otherwise,
either of two local panic()'s can occur, either because
no interface with such a name could be found after it was
successfully created, or because the code will bogusly
assume that it's a wildcard (unit < 0 due to overflow).
Exploit: ifconfig lo<overflowed_integer> create
- Previous revision made the following sequence trigger
a KASSERT() failure in queue(3):
Exploit: ifconfig lo0 destroy; ifconfig lo0 destroy
This is because IFC_IFLIST_REMOVE() is always called
before ifc->ifc_destroy() has been run, not accounting
for the fact that the latter can fail and leave the
interface operating (like is the case for "lo0").
So we ended up calling LIST_REMOVE() twice. We cannot
defer IFC_IFLIST_REMOVE() until after a call to
ifc->ifc_destroy() because the ifnet may have been
removed and its memory has been freed, so recover from
this by re-inserting the ifnet in the cloned interfaces
list if ifc->ifc_destroy() indicates a failure.
the geom creation to a seperate init function and ignore the tasting.
The config is now parsed only in the vinumdrive geom, which hopefully
fixes the problem, that the drive class tasted before the vinum class
had a chance, for good.
Also restore the behaviour that the module can be loaded at boot time
and on a running system.
directly. We need to copyin() the strings in the iovec before
we can strcmp() them. Also, when we want to send the errmsg back
to userspace, we need to copyout()/copystr() the string.
Add a small helper function vfs_getopt_pos() which takes in the
name of an option, and returns the array index of the name in the iovec,
or -1 if not found. This allows us to locate an option in
the iovec without actually manipulating the iovec members. directly via
strcmp().
Noticed by: kris on sparc64
- Add locked variants of start, init, and ifmedia_upd.
- Add a mutex to the softc and remove spl calls.
- Use callout(9) rather than timeout(9).
- Setup interrupt handler last in attach.
- Use M_ZERO rather than bzero.
MFC after: 1 week
Tested by: wpaul
- Improve panic message if we fail to read the PCI bus number from a bridge
device.
- Don't try to lookup a BIOS IRQ for a link unless the link is routed via
an ISA IRQ since BIOSen currently only route PCI link devices via ISA
IRQs.
Tested by: Mathieu Prevot bsdhack at club-internet dot fr
MFC after: 1 week
but not provide a panic(9) implementation. Thus, enable the sanity
checks under INVARIANTS only if _KERNEL is also defined.
Submitted by: jmallett
Approved by: rwatson (mentor)
Instead, re-evaluate _BIF only when we get a notify and use the cached
results. We also still evaluate _BIF once on boot. Also, optimize the
init loop a little by only querying for a particular info if it's not valid.
MFC after: 2 days
during SMP startup. We haven't had any issues with starting up the APs
on i386 in quite a while now which is all this code is really useful for.
If someone ever does really need it they can always dig it up out of the
attic.
INO) for incorrect interrupt map entries on E250 machines. These
incorrect entries caused the INO of the on-board HME to be also
assigned to the second on-board NS16550 and to the on-board printer
port controller. Further down the road caused hme(4) to fail to attach
to the on-board HME in FreeBSD 5 and 6 as INTR_FAST and non-INTR_FAST
handlers can't share the same IRQ there (it's unknown what whould
happen in -CURRENT now that INTR_FAST and non-INTR_FAST handlers can
share an IRQ but I'd expect funny problems with uart(4)).
- Make sure there are exactly 4 PCI ranges instead of just checking
that the bridge has a 'ranges' property in the OFW device tree at all.
Besides the fact that currently the 64bit memory range isn't used by
this driver it we can't really work with less than 4 ranges and don't
have memory for more than 4 bus handles for the ranges in the softc.
- Remove sc_range and sc_nrange from softc; for the bridges supported
by this driver we no longer need to know the ranges besides the bus
handles obtained from them once this driver is attached. That way we
also can free the memory allocated for sc_range during attach again.
- Remove sc_dvmabase from the softc and pass it to psycho_iommu_init()
via an additional argument as we no longer need to know the DVMA base
in this driver once the IOMMU is initialized.
- Remove sc_dmatag from the softc, there isn't much sense in keeping
the nexus dma tag around locally.
PR: 88279 [1]
Info from: OpenSolaris [1]
Tested by: kensmith [1]
MFC after: 1 month
between this driver and other Host-PCI bridge drivers based on this one:
- Make the code fit into 80 columns.
- Make the code adhere style(9) (don't use function calls in initializers,
use uintXX_t instead of u_intXX_t, add missing prototypes, ...).
- Remove unused and superfluous struct declaration, softc member, casts,
includes, etc.
- Use FBSDID.
- Sprinkle const.
- Try to make comments and messages consistent in style throughout the
driver.
- Use convenience macros for the number of interrupts and ranges of the
bridge.
- Use __func__ instead of hardcoded function names in panic strings and
error messages. Some of the hardcoded function names actually were
outdated through moving code around. [1]
- Rename softc members related to the PCI side of the bridge to sc_pci_*
in order to make it clear which side of the bridge they refer to (so
stuff like sc_bushandle vs. sc_bh is less confusing while reading the
code).
PR: 76052 [1]
additionally on ebus(4) as the 'SUNW,envctrl' devices (as well as
'SUNW,envctrltwo' and 'SUNW,rasctrl', which we might want to also
support in envctrl.c in the future) are only found on EBus.
on powerpc (more or less...). That way people updating from FreeBSD 5 to
FreeBSD 6 and beyond on sparc64 will get an error from config(8) rather
than a mysterious compile error when they have a stale 'device zs' in
their kernel config file.
MFC after: 2 weeks
ofw_bus_gen_get_*() for providing the ofw_bus KOBJ interface in order
to reduce code duplication.
- While here sync the various sparc64 bus drivers a bit (handle failure
to attach a child gracefully instead of panicing, move the printing
of child resources common to bus_print_child() and bus_probe_nomatch()
implementations of a bus into a <bus>_print_res() function, ...) and
fix some minor bugs and nits (plug memory leaks present when attaching
a bus or child device fails, remove unused struct members, ...).
Additional testing by: kris (central(4) and fhc(4))
a newly introduced struct ofw_bus_devinfo which can hold the OFW info
of a device recallable via the ofw_bus KOBJ interface. Introduce a set
of functions ofw_bus_gen_get_*() which use ofw_bus_default_get_devinfo()
to provide generic subroutines for implementing the rest of the ofw_bus
KOBJ interface in a bus driver.
This is inspired by bus_get_resource_list() and bus_generic_rl_*_resource()
and allows to reduce code duplication in bus drivers as they only have
to provide an ofw_bus_default_get_devinfo() implementation in order to
provide the ofw_bus KOBJ interface via ofw_bus_gen_get_*().
- While here add a comment to ofw_bus_if.m describing the intention of
the ofw_bus KOBJ interface.
Reviewed by: marcel
If the complete reply on the TRANS2_FIND_FIRST2 request fits exactly
into one responce packet, then next call to TRANS2_FIND_NEXT2 will return
zero entries and server will close current transaction. To avoid
subsequent errors we should not perform FIND_CLOSE2 request.
PR: kern/78953
Submitted by: Jim Carroll
connection queue for a new connection. It was removing connections
from the wrong list.
Submitted by: Paul Mikesell
Sponsored by: Isilon Systems
MFC after: 1 week
the tree.
- Add locked variants of nve_start(), nve_init(), and nve_ifmedia_upd().
- Use callout_* to manage callouts rather than timeout(9).
- Mark interrupt handler MPSAFE (IFF_NEEDGIANT was already clear).
- Lock the driver lock in driver entry points such as the interrupt
handler, if_start, and if_init rather than locking the driver mutex
in the various work functions called by the binary blob. The spin lock
used by the binary block can probably be stubbed out now.
- Use IFQ_DRV_IS_EMPTY() macro rather than doing it by hand.
- Fix locking in detach.
- Remove some unused fields from the softc.
Tested by: cognet
MFC after: 2 weeks
the IRQ set by the BIOS in existing devices to actually get the correct
bus number of the child PCI bus. I was not reading the bus number from
the bridge device correctly. The __BUS_ACCESSOR() macros (from which
pcib_get_bus() is built) assume that the passed in argument is a child
device. However, at the time I'm reading the bus there is no child
device yet, so I was passing in the pcib device as the child device.
The parent of the pcib device probably returned an error in the case of
a host bridge, thus resulting in random stack garbage for the bus number.
For PCI-PCI bridges, the bus number being used was actually the subvendor
of the PCI-PCI bridge device itself.
MFC after: 1 week
- Don't call tulip_addr_filter() to reset the RX address filter in
tulip_reset() since that gets called before ether_ifattach(). Just
call it in tulip_init_locked().
- Use be16dec() and le16dec() to parse MAC addresses when programming
the RX filter.
- Let ether_ioctl() handle SIOCSIFMTU since we were doing the exact same
thing with the added bonus that we leaked the driver lock if the MTU
was > ETHERMTU in the homerolled version. This part will be MFC'd.
Clue from: wpaul (1)
Stolen from: marcel (2 via patch for dc(4))
MFC after: 1 week
via the DEFAULTS kernel configs. This allows folks to turn it that option
off in the kernel configs if desired without having to hack the source.
This is especially useful since PUC_FASTINTR hangs the kernel boot on my
ultra60 which has two uart(4) devices hung off of a puc(4) device.
I did not enable PUC_FASTINTR by default on powerpc since powerpc does not
currently allow sharing of INTR_FAST with non-INTR_FAST like the other
archs.
'device mem' over from GENERIC to DEFAULTS to be consistent with i386 and
amd64. Additionally, on ia64 enable ACPI by default since ia64 requires
acpi.
event of an error, does the right thing, in terms of setting
the error flags in the buf header. That fixes a crash from
bstrategy().
- Treat ETIMEDOUT as a "recoverable" error, causing the buffer
to be re-dirtied. ETIMEDOUT can occur on soft mounts, when
the number of retries are exceeded, and we don't want data loss
in that case.
Submitted by: Mohan Srinivasan
during boot up. Now we do a full reset of the 8259As and setup a simple
interrupt handler (we actually borrow the apic one that just does an
immediate iret) to handle any spurious interrupts triggered by either chip.
This should fix some folks that were getting a Trap 30 during bootup of
certain SMP AMD systems. This might get pushed into the 6.0 branch as an
errata. For now a suitable workaround is to add 'device atpic' to your
kernel config.
Tested by: scottl
Helpful info from: dillon
MFC after: 1 week
buildkernel: provide a real but dummy name to ${DEPENDFILE}
so that the relevant exists() check in bsd.prog.mk fails and
ensures that ${GENHDRS} are built before any other objects.
MFC after: 3 days
- don't force busdma to pre-allocate bounce pages for parent tag.
- use system supplied roundup2 macro instead of rolling its own version.
- TX/RX decriptor length should be multiple of 128. There is no
no need to expand the size with the multiple of 4096.
- don't create/destroy DMA maps in TX/RX handlers. Use pre-allocated
DMA maps. Since creating DMA maps on sparc64 is time consuming
operations(resource mananger overhead), this change should boost
performance on sparc64. I could get > 2x speedup on Ultra60.
- TX/RX descriptors could be aligned on 128 boundary. Aligning them
on PAGE_SIZE is waste of resource.
- don't blindly create TX DMA tag with size of MCLBYTES * 8. The size
is only valid under jumbo frame environments. Instead of using the
hardcoded value, re-compute necessary size on the fly.
- RX side bus_dmamap_load_mbuf_sg(9) support.
- remove unused macro EM_ROUNDUP and constant EM_MMBA.
Reviewed by: scottl
Tested by: glebius
that enabling busmastering would result in PCR bit ON after codec
reset.
While I'm here add DELAY(1) to codec access routine to give reasonable
time to codec operation. Without the delay, it would cause problems on
super-fast machines(> 2GHz). Also enable legacy audio for all 6300ESB,
82801[D-G]B chips. Previously, it enabled legacy audio for 82801DB(ICH4)
chip only.
Reported by: Maxim Maximov mcsi AT mcsi DOT pp DOT ru
Andrew Bliznak andriko.b AT gmail DOT com
Tested by: brueffer, Maxim Maximov, Andrew Bliznak
for export structure and pass that to vfs_export().
Currently in userland mount(8), an export structure is unconditionally
passed in, only for UFS. This is an attempt to move that UFS-specific
behavior out of mount(8) and into the UFS filesystem code.
Don't allocate potentially large variables on the stack.
Check strsep() return values when the string comes from userland.
Shorten variable names for lucidity's sake.
most of the stuff:
Pointed out by: njl@
use the base time in case the real-time clock is bogus or behind the
base time. Most importantly, don't sanity-check the base time up front
because it may be zero. This is not a preposterous condition. It just
means that none of the file systems have their mount time updated.
MFC after: 1 week
for a Windows ISR is 'BOOLEAN isrfunc(KINTERRUPT *, void *)' meaning
the ISR get a pointer to the interrupt object and a context pointer,
and returns TRUE if the ISR determines the interrupt was really generated
by the associated device, or FALSE if not.
I had mistakenly used 'void isrfunc(void *)' instead. It happens the
only thing this affects is the internal ndis_intr() ISR in subr_ndis.c,
but it should be fixed just in case we ever need to register a real
Windows ISR vi IoConnectInterrupt().
For NDIS miniports that provide a MiniportISR() method, the 'is_our_intr'
value returned by the method serves as the return value from ndis_isr(),
and 'call_isr' is used to decide whether or not to schedule the interrupt
handler via DPC. For drivers that only supply MiniportEnableInterrupt()
and MiniportDisableInterrupt() methods, call_isr is always TRUE and
is_our_intr is always FALSE.
In the end, there should be no functional changes, except that now
ntoskrnl_intr() can terminate early once it finds the ISR that wants
to service the interrupt.
When all file systems have a time stamp of zero, which is the case
for example when the root file system is on a read-only medium, we
ended up not calling inittodr() at all. A potential uncleanliness
existed as well. If multiple file systems had a non-zero time stamp,
we would call inittodr() multiple times. While this should not be
harmful, it's definitely not ideal.
Fix both issues by iterating over the mounted file systems to find
the largest time stamp and call inittodr() exactly once with that
time stamp. This could of course be a zero time stamp if none of the
mounted file systems have a non-zero time stamp. In that case the
annoying errors mentioned in the commit log for revision 1.186 still
haven't been avoided. The bottom line is that inittodr() should not
complain when it gets a time base of zero. At the time of this
commit only alpha seems to have that problem.
Reported by: Dario Freni (saturnero at freesbie dot org)
MFC after: 1 week
is called. It looks like there are lots of different mount flags checked
in vfs_domount(), so we need to do the parsing for these particular
mount flags earlier on. The new flags parsed are:
async, force, multilabel, noasync, noatime, noclusterr, noclusterw,
noexec, nosuid, nosymfollow, snapshot, suiddir, sync, union.
Existing code which uses mount() to mount UFS filesystems is not
affected, but new code which uses nmount() to mount UFS filesystems
should behave better.
Add functions to rename objects and to move a subdisk from one drive
to another.
Obtained from: Chris Jones <chris.jones@ualberta.ca>
Sponsored by: Google Summer of Code 2005
MFC in: 1 week
have any know to enable it from userland and could only be enabled by
either setting it to 1 at compile time or through the kernel debugger.
In the future it may be brought back as KTR tracing points.
Discussed with: rwatson
Sponsored by: TCP/IP Optimization Fundraise 2005
directory by default) without requiring the user to load them by hand using
e.g iwicontrol. Get rid of the old ioctl crud.
Updated iwi-firmware port coming soon.
Obtained from: OpenBSD
synonyms for "shortname" and "longname" mount options. The old
(before nmount()) mount_msdosfs program accepted "shortnames" and "longnames",
but the kernel nmount() checked for "shortname" and "longname".
So, make the kernel accept "shortnames", "longnames", "shortname", "longname"
for forwards and backwarsd compatibility.
Discovered by: Rainer Hurling <rhurlin at gwdg dot de>
include ip_options.h into all files making use of IP Options functions.
From ip_input.c rev 1.306:
ip_dooptions(struct mbuf *m, int pass)
save_rte(m, option, dst)
ip_srcroute(m0)
ip_stripoptions(m, mopt)
From ip_output.c rev 1.249:
ip_insertoptions(m, opt, phlen)
ip_optcopy(ip, jp)
ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m)
No functional changes in this commit.
Discussed with: rwatson
Sponsored by: TCP/IP Optimization Fundraise 2005
to list corruption, which can be difficult to unravel in a post-mortem
analysis. These checks verify that prev and next pointers are consistent
when inserting or removing elements, thus catching any corruption earlier.
Also use TRASHIT to break LIST and SLIST link pointers on element removal,
from mlaier via -hackers.
Reviewed by: mlaier
Approved by: rwatson (mentor)
mystery traps. If we don't have a message for a given trap, just use
UNKNOWN for the message.
- Add trap messages for T_XMMFLT and T_RESERVED.
MFC after: 1 week
have free space in it. Allocate correct mbuf from the beginning.
This allows icmp_error() to quote the entire TCP header in error
messages.
Sponsored by: TCP/IP Optimization Fundraise 2005
callpath via vfs_getopt(), and set the appropriate MNT_* flag:
-> acls, async, force, multilabel, noasync, noatime,
-> noclusterr, noclusterw, snapshot, update
- Allow errmsg as a valid mount option via vfs_getopt(),
so we can later add a hook to propagate mount errors back
to userspace via vfs_mount_error().
the underlying drive had been hot-unplugged from the system. Here
is a specific example. Filesystem code had opened /dev/da1s1e.
Subsequently, the drive was hot-unplugged. This (correctly) caused
all of the associated /dev/da1* entries to be deleted. When the
filesystem later realized that the drive was gone it closed the
device, reducing the write-access counts to 0 on the geom providers
for da1s1e, da1s1, and da1. This caused geom to re-taste the
providers, resulting in the devices being created again. When the
drive was hot-plugged back in, it resulted in duplicate /dev entries
for da1s1e, da1s1, and da1.
This fix adds a new disk_gone() function which is called by CAM when a
drive goes away. It orphans all of the providers associated with the
drive, setting an error condition of ENXIO in each one. In addition,
we prevent a re-taste on last close for writing if an error condition
has been set in the provider.
Sponsored by: Isilon Systems
Reviewed by: phk
MFC after: 1 week
in, and if so, set MNT_UPDATE filesystem flag.
vfs_nmount() calls vfs_domount(), and there is special logic
inside vfs_domount() if MNT_UPDATE is set. This is very important
when we want to do an update mount of the root filesystem, using nmount().
Prevent backup CARP hosts from replying to arp requests, fixes strangeness
with some layer-3 switches. From Bill Marquette.
Tested by: Kazuaki Oda <kaakun highway.ne.jp>
for a notebook with em(4) adapter.
- Introduce tunables em.hw.txd and em.hw.rxd, which allow administrator
to configure number of transmit and receive descriptors.
- Check em.hw.txd and em.hw.rxd against hardware limits [*] and require
them to be multiple of 128.
[*] According to comments in if_em.h the 82540EM/82541ER chips can handle
more than 256 descriptors. Since we don't have this hardware to test,
we decided to mimic NetBSD wm(4) driver, that limits these chips to
256 descriptors.
In collaboration with: yongari
IPI_STOP handling code use atomic_readandclear() to execute the restart
function on the first CPU to resume and restore the behavior of always
executing the restart function on the BSP since this is in fact what the
non-NMI IPI_STOP handler does. I did add back in a statement to clear
the restart function pointer after it is executed to match the behavior
of the non-NMI IPI_STOP handler.
I/O APIC that doesn't exist, then a read of the version register is going
to return -1 which is 0xffffffff not 0xffffff.
Tested on: i386
Tested by: Nikos Ntarmos ntarmos at ceid dot upatras dot gr
MFC after: 1 week
that can potentially be mapped to negative ieee #'s.
NB: before operation on the latter can be supported we need to cleanup
various code that assumes ieee channel #'s are >= 0
stale when called to reset rate control state causing us to
pickup an invalid index, check for this and skip 'em (things
will eventually get fixed up so this is not harmful)
of volumes that might need administrator attention through device
specific sysctl to simplify device monitoring.
Submitted by: Deomid Ryabkov <myself at rojer dot pp dot ru>
execute a ET_DYN binary (shared object).
This does not make much sense, but some linux scripts expect to be able to
execute /lib/ld-linux.so.2 (ldd comes to mind).
The sysctl defaults to 0.
MFC after: 3 days
currently present is minor and offers no real semantic issues, it also
doesn't make sense since an earlier lockless check has already
occurred. Also hold the mutex longer, over a manipulation of
per-process ktrace state, which requires synchronization.
MFC after: 1 month
Pointed out by: jhb
This one simply tries to simplify the logic to select the
buffer sizes. I am not sure it is necessary but the code
seems a bit more readable to me. And at least i have tried
to document how the buffer sizes are computed.
Thanks to luigi for deciphering one of the most cryptic part of
sound driver.
Submitted by: luigi
Approved by: netchild (mentor)
In SNDCTL_DSP_SETFRAGMENT, if you specify both read and
write channels, the existing code first acts on the
read channel, but as a side effect it updates the
arguments (maxfrags, fragsz) passed by the caller according
to acceptable values for the read channel, and then uses the
modified values to act on the write channel.
The problem with this approach is that, given a
(maxfrags, fragsz) user-specified value, the actual
values computed by the read and write channels may differ:
e.g. the read channel might want to allocate more fragments
than what the user specified because it has no side-effects
on the delay and it helps in case of slow readers,
whereas the write channel needs to use as few fragments
as possible to keep the audio latency low (very important
with telephony apps).
This patch stores the values computed by the read channel
into temproary variables so the write channel will use
the actual arguments of the ioctl.
This patch is very helpful with telephony apps such as asterisk.
Submitted by: luigi
Approved by: netchild (mentor)
- Added new codec id for CX20468-21 and VIA1617A.
Submitted by: Chen Lihong <lihong.chen@gmail.com>
- Re-enable SOUND_MIXER_IGAIN, but set the default level as 0 (mute)
Suggested by: luigi
mixer.c:
- Set default value for SOUND_MIXER_IGAIN as 0 (mute) to avoid
feedback problems on some laptops (was disabled by jhb during
ac97.c revision 1.42).
Approved by: netchild (mentor)
erratic system slowdown (beaten to a pulp) and possible panic. This
issue has bugged me for as long as I could remember, until I
realized that it is possible for register base offset to hold zero
value which is definitely a "FALSE".
Approved by: netchild (mentor)
compatible AC97 codec.
- As the driver supports so many variants, create a table ids for
ease of probing and maintenance.
Submitted by: yongari
Reviewed/Tested by: multimedia@
- From luigi:
The code to compute fragment sizes in the ich driver almost
invariably ends up using the full buffer available, no matter
how the user specifies fragment size and number.
With audio telephony (8khz, 16bit-stereo) and the 16k buffer
size this results in an unbearable 500ms delay.
This patch makes sure that we never use more than 4 fragments,
(i don't think we need more unless there are huge interrupt
servicing latencies), and obey to the requested fragment size,
so that latency is acceptable.
Based on this (and after much regression tests), I can conclude
that this driver works best with 2 fragments, thus solving various
long standing issues of ICH driver not capable to flush or play
short files perfectly.
Suggested by: luigi (the idea of smaller fragments)
- MPSAFE conversion.
Approved by: netchild (mentor)
distinct hardware playback channels. DAC configuration can be
accessed through kernel hint - hint.pcm.<unit>.dac="val" with
following possible values:
0 = Enable both DACs (default)
1 = Enable single DAC (DAC1)
2 = Enable single DAC (DAC2)
3 = Enable both DACs, swap position (DAC2 comes first instead
of DAC1)
Special case for ES1370:
Unlike ES1371,2,3/CT5880, volume for each DAC 1 and 2 can be
controlled indepedently (synth for DAC1, pcm for DAC2). It is
possible that user will confuse by this behaviour, since both
DACs are enabled by default. Thus, provide a knob through sysctl
hw.snd.pcm<unit>.single_pcm_mixer:
0 = each DACs will be controlled separately (synth/pcm).
1 = combine both DACs volume mixer controller into a single
"pcm" (default)
As a side note, fixed rate operation (provided by previous
commit) is not a mandatory if the configuration space does not
involve DAC2 (perhaps disabled by user through the above kernel
hint). Unlike DAC2, DAC1 has its own register / control space,
not affected by the speed settings of ADC.
Tested by: multimedia@
Approved by: netchild (mentor)
mask to recdev_l and recdev_r, since each have its own unique mask.
Submitted by: Watanabe Kazuhiro <CQG00620@nifty.ne.jp>
Approved by: netchild (mentor)
verbs. Only the create verb operates on a provider. All other verbs
operate on a GPT geom. Also, the GPT entry oriented verbs require
a non-downgraded GPT.
o Have all verbs take an optional flags parameter. The flags parameter
is a string of single-letter flags. The typical use of these flags
is to enable certain behaviour in support fo the gpt(8) tool.
o Add dummy implementations for the destroy and recover verbs.
This change causes test 2 of the GPT regression test suite to fail.
The presence of a geom parameter is now required even for unknown
verbs.
If the packet is rejected from pfil(9) then continue the loop rather than
returning, this means that we can still try to send it out the remaining
interfaces but more importantly the mbuf is freed and refcount decremented on
exit.
just discard the received frame and reuse the old mbuf.
This should prevent the connection from stalling after high network traffic.
MFC after: 2 weeks
a new mbuf, just discard the received frame and reuse the old mbuf.
This should fix kernel panics on high network traffic.
Obtained from: NetBSD (joerg@)
MFC after: 2 weeks
It may be the case that you may hear some unwanted noise while
playing back with 24/32 bit. This is a problem in the USB system.
Explanation from Hans Petter Selasky:
---snip---
The current USB sound driver only uses one isochronous
buffer, that is restarted when it is completed. This will lead to a short
period of time, +1ms, where no sound data is sent to the external USB device.
Depending on the load of your computer, this can be as much as 50ms. So the
USB sound driver must use 2 isochronous transfers. At the beginning one will
queue both. Then these are restarted on completion. This will result in a
constant-rate data stream to the external sound device, a minimum sound
buffer equal to the size of the isochronous buffer, and possibly the sound
will reach your ears with less delay. Little delay is a result of constant
data rate. Currently only my USB driver will support that. If one tries that
with the USB driver in *BSD, then it will crash at the first moment one gets
a buffer underrun.
---snip---
Submitted by: Kazuhito HONDA <kazuhito@ph.noda.tus.ac.jp>
Mono-recording still not tested by: julian
reliability when tracing fast-moving processes or writing traces to
slow file systems by avoiding unbounded queueuing and dropped records.
Record loss was previously possible when the global pool of records
become depleted as a result of record generation outstripping record
commit, which occurred quickly in many common situations.
These changes partially restore the 4.x model of committing ktrace
records at the point of trace generation (synchronous), but maintain
the 5.x deferred record commit behavior (asynchronous) for situations
where entering VFS and sleeping is not possible (i.e., in the
scheduler). Records are now queued per-process as opposed to
globally, with processes responsible for committing records from their
own context as required.
- Eliminate the ktrace worker thread and global record queue, as they
are no longer used. Keep the global free record list, as records
are still used.
- Add a per-process record queue, which will hold any asynchronously
generated records, such as from context switches. This replaces the
global queue as the place to submit asynchronous records to.
- When a record is committed asynchronously, simply queue it to the
process.
- When a record is committed synchronously, first drain any pending
per-process records in order to maintain ordering as best we can.
Currently ordering between competing threads is provided via a global
ktrace_sx, but a per-process flag or lock may be desirable in the
future.
- When a process returns to user space following a system call, trap,
signal delivery, etc, flush any pending records.
- When a process exits, flush any pending records.
- Assert on process tear-down that there are no pending records.
- Slightly abstract the notion of being "in ktrace", which is used to
prevent the recursive generation of records, as well as generating
traces for ktrace events.
Future work here might look at changing the set of events marked for
synchronous and asynchronous record generation, re-balancing queue
depth, timeliness of commit to disk, and so on. I.e., performing a
drain every (n) records.
MFC after: 1 month
Discussed with: jhb
Requested by: Marc Olzheim <marcolz at stack dot nl>
- several TerraTec TValue [1]
- PixelView PlayTV Pro REV-4C [2]
In case you have the PixelView card, please tell us the "pciconf -v -l"
output on multimedia@FreeBSD.org if it works. There are revisions out there
which may not work and we need to know which ones work.
PR: 53383 [1], 76002 [2]
Submitted by: Tanja Wittke <tawi@gruft.de> [1], barner [1],
Dan Angelescu <mrhsaacdoh@yahoo.com> [2]
MFC after: 2 months
This is the only file of > 1700 files in a buildkernel here doing that.
It makes reproducible builds (same source => same binary) impossible.
Spotted by: devel/ccache
is that gdb knows SIGLWP and will pass it to program, otherwise gdb
will print out "unknown signal" and discard it, and then thread
cancellation won't work for libthr under gdb.
MFC: 3 days
MD class. Previously only the DISK class was dumped. The only
consumer of this sysctl is libdisk (i.e. sysinstall) and it tests
explicitly for instances of the DISK class. Dumping other classes
is therefore harmless.
By also dumping the MD class regression tests can be written that
use the MD class for operations that would normally be done on the
DISK class. The sysctl can now be used to test if those operations
took an effect. An example is partitioning.
from the printer and discarding the data even if the ulpt device
was opened for reading. This resulted in crashes because two
conconcurrent read transfers were using the same transfer structure.
PR: usb/88886
Reported By: Alex Pivovarov
MFC after: 1 week
happiness, as well as correct other bugs:
- Replace notion of current and saved accounting credential/vnode with a
single credential/vnode and an acct_suspended flag. This simplifies the
accounting logic substantially.
- Replace acct_mtx with acct_sx, a sleepable lock held exclusively during
reconfiguration and space polling, but shared during log entry
generation. This avoids holding a mutex over sleepable VFS operations.
- Hold the sx lock over the duration of the I/O so that the vnode I/O
cannot occur after vnode close, which could occur previously if
accounting was disabled as a process exited.
- Write the accounting log entry with Giant conditionally acquired based
on the file system where the log is stored. Previously, the accounting
code relied on the caller acquiring Giant.
- Acquire Giant conditionally in the accounting callout based on the file
system where the accounting log is stored. Run the callout MPSAFE.
- Expose acct_suspended via a read-only sysctl so it is possibly to
programmatically determine whether accounting is suspended or not without
attempting to parse logs.
- Check both acct_vp and acct_suspended lock-free before entering the
accounting sx lock in acct().
- When accounting is disabled due to a VBAD vnode (i.e., forceable unmount),
generate a log message indicating accounting has been disabled.
- Correct a long-standing bug in how free space is calculated and compared
to the required space: generate and compare signed results, not unsigned
results, or negative free space will cause accounting to not be suspended
when required, or worse, incorrectly resumed once negative free space is
reached.
MFC after: 2 weeks
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.
The following repo-copies were made (by Mark Murray):
sys/i386/isa/spkr.c -> sys/dev/speaker/spkr.c
sys/i386/include/speaker.h -> sys/dev/speaker/speaker.h
share/man/man4/man4.i386/spkr.4 -> share/man/man4/spkr.4
copy of Ethernet address.
- Change iso88025_ifattach() and fddi_ifattach() to accept MAC
address as an argument, similar to ether_ifattach(), to make
this work.
socket file descriptor garbage collection code, which is intended to
detect and clear cycles of orphaned file descriptors that are "in-flight"
in a socket when that socket is closed before they are received. The
algorithm present was both run at poor times (resulting in recursion and
reentrance), and also buggy in the presence of parallelism. In order to
fix these problems, make the following changes:
- When there are in-flight sockets and a UNIX domain socket is destroyed,
asynchronously schedule the garbage collector, rather than running it
synchronously in the current context. This avoids lock order issues
when the garbage collection code reenters the UNIX domain socket code,
avoiding lock order reversals, deadlocks, etc. Run the code
asynchronously in a task queue.
- In the garbage collector, when skipping file descriptors that have
entered a closing state (i.e., have f_count == 0), re-test the FDEFER
flag, and decrement unp_defer. As file descriptors can now transition
to a closed state, while the garbage collector is running, it is no
longer the case that unp_defer will remain an accurate count of
deferred sockets in the mark portion of the GC algorithm. Otherwise,
the garbage collector will loop waiting waiting for unp_defer to reach
zero, which it will never do as it is skipping file descriptors that
were marked in an earlier pass, but now closed.
- Acquire the UNIX domain socket subsystem lock in unp_discard() when
modifying the unp_rights counter, or a read/write race is risked with
other threads also manipulating the counter.
While here:
- Remove #if 0'd code regarding acquiring the socket buffer sleep lock in
the garbage collector, this is not required as we are able to use the
socket buffer receive lock to protect scanning the receive buffer for
in-flight file descriptors on the socket buffer.
- Annotate that the description of the garbage collector implementation
is increasingly inaccurate and needs to be updated.
- Add counters of the number of deferred garbage collections and recycled
file descriptors. This will be removed and is here temporarily for
debugging purposes.
With these changes in place, the unp_passfd regression test now appears
to be passed consistently on UP and SMP systems for extended runs,
whereas before it hung quickly or panicked, depending on which bug was
triggered.
Reported by: Philip Kizer <pckizer at nostrum dot com>
MFC after: 2 weeks
state about each open file, and identify the first process in the process
table that references the file. This is helpful in debugging leaks of
file descriptors.
MFC after: 1 week
The PR and patch have the details. The ultimate fix requires architectural
changes and clarifications to the VFS API, but this will prevent the system
from panicking when someone does "ls /dev" while running in a shell under the
linuxulator.
This issue affects HEAD and RELENG_6 only.
PR: 88249
Submitted by: "Devon H. O'Dell" <dodell@ixsystems.com>
MFC after: 3 days
with the file descriptor. When a file descriptor is closed as a result
of garbage collecting a UNIX domain socket, the file descriptor will
not have any associated thread, so the logic to identify advisory locks
held by that thread is not appropriate. Check the thread for NULL to
avoid this scenario. Expand an existing comment to say a bit more about
this.
MFC after: 1 week
thread context. While it doesn't matter too much at the moment, in
the future we could be back in the same boat if/when more restrictions
are placed (or enforced) in a SWI.
Suggested by: njl, bde, jhb, scottl
overruns and number of watchdog timeouts.
- Do not log(9) RX overrun events, since this pessimizes
things under load [1].
- Do not increase if->if_oerrors in em_watchdog(), since
this leads to counter slipping back, when if->if_oerrors
is recalculated in em_update_stats_counters(). Instead
increase watchdog counter in em_watchdog() and take it
into account in em_update_stats_counters().
Submitted by: ade [1]
- disable jumbo frame support on strict alignment architectures due
to the limitation of hardware. The driver needs a fix-up code for
RX side. The fix will show up in near future.
- fix endian issue for 82544 on PCI-X bus. I couldn't test this as
I don't have the NIC/hardware.
- prefer PCIR_BAR to hardcoded EM_MMBA.
- Properly checks for for 64bit BAR [1]
- replace inl/outl with bus_space(9) [1]
- fix endian issue on VLAN handling.
- reorder header files and remove unnecessary one.
Reviewed by: cognet
No response from: pdeuskar, tackerman
Obtained from: OpenBSD [1]
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.
Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.
Reviewed by: tegge
in the hardware interrupt context (even if it is likely just an
ithread). We don't document that suspend/resume routines are run from
such a context and some of the things that happen in those routines
aren't interrupt safe. Since there's no real need to run from that
context, this restores assumptions that suspend routines have made.
This fixes Thierry Herbelot's 'Trying to sleep while sleeping is
prohibited' problem.
nearly identical to wintel/ia32, with a couple of tweaks. Since it is
so similar to ia32, it is optionally added to a i386 kernel. This
port is preliminary, but seems to work well. Further improvements
will improve the interaction with syscons(4), port Linux nforce driver
and future versions of the xbox.
This supports the 64MB and 128MB boxes. You'll need the most recent
CVS version of Cromwell (the Linux BIOS for the XBOX) to boot.
Rink will be maintaining this port, and is interested in feedback.
He's setup a website http://xbox-bsd.nl to report the latest
developments.
Any silly mistakes are my fault.
Submitted by: Rink P.W. Springer rink at stack dot nl and
Ed Schouten ed at fxq dot nl
to user-space if a parameter named "errmsg" is passed into the iovec.
Used in conjunction with vfs_mount_error(), more useful error messages
than errno can be passed back to userspace when mounting a filesystem
fails.
Discussed with: phk, pjd
have started aio, instead, initialize aio management structure
if it hasn't been done, the reason to adjust this behavior is
to make it a bit friendly for threaded program, consider two
threads, one submits aio_write, and another just calls
aio_waitcomplete to wait any I/O to be completed and recycle the
aio requests, before submitter doing any I/O, the recycler wants
to wait in kernel. This also fixes inconsistency with other aio
syscalls.
This hasn't been true on i386 for at least a decade, probably longer, but
I'm too lazy to look up the exact year that PAE support was introduced.
Thus, this driver doesn't work on PAE.
X-MFC After: now
softc lists and associated mutex are now unused so these have been removed.
Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.
Idea by: brooks
Reviewed by: brooks
- Use curthread for calls to knlist_delete() and add a big comment
explaining why as well as appropriate assertions.
- Use TAILQ_FOREACH and TAILQ_FOREACH_SAFE instead of handrolling them.
- Use fget() family of functions to lookup file objects instead of
grovelling around in file descriptor tables.
- Destroy the aio_freeproc mutex if we are unloaded.
Tested on: i386
-Change unconditional aquisition of Giant to only pickup Giant if the vnode
for the controlling tty resides on a non-mpsafe file system.
-Pickup Giant around executable vnode reference counting operations only if
the executable resides on a non-mpsafe file system.
-If this process is being traced, pickup Giant for trace file reference count
operations only if it resides on a non-mpsafe file system.
Discussed with: jhb
Tested by: kris
the RocketPort unit number in the name of the devices. This means that
unit 0 device names will change from ttyR0 .. ttyRf to ttyR00 .. ttyR0f.
Reviewed by: phk
retransmitted without suppression, while there is demand for
such ARP entry. As before, retransmission is rate limited to
one packet per second. Details:
- Remove net.link.ether.inet.host_down_time
- Do not set/clear RTF_REJECT flag on route, to
avoid rt_check() returning error. We will generate error
ourselves.
- Return EWOULDBLOCK on first arp_maxtries failed
requests , and return EHOSTDOWN/EHOSTUNREACH
on further requests.
- Retransmit ARP request always, independently from return
code. Ratelimit to 1 pps.
For each child process whose status has been changed, a SIGCHLD instance
is queued, if the signal is stilling pending, and process changed status
several times, signal information is updated to reflect latest process
status. If wait() returns because the status of a child process is
available, pending SIGCHLD signal associated with the child process is
discarded. Any other pending SIGCHLD signals remain pending.
The signal information is allocated at the same time when proc structure
is allocated, if process signal queue is fully filled or there is a memory
shortage, it can still send the signal to process.
There is a booting time tunable kern.sigqueue.queue_sigchild which
can control the behavior, setting it to zero disables the SIGCHLD queueing
feature, the tunable will be removed if the function is proved that it is
stable enough.
Tested on: i386 (SMP and UP)
the interface. This allows run-time selection of MMU code, based
on CPU-type detection, or tunable-overrides when testing new code.
Pre-requisite for G5 support.
conf/files.powerpc
- remove pmap.c
- add mmu_if.h, mmu_oea.c, pmap_dispatch.c
powerpc/include/mmuvar.h
- definitions for MMU implementations
powerpc/include/pmap.h
- remove pmap_pte_spill declaration
- add pmap_mmu_install declaration
- size the phys_avail array
- pmap_bootstrapped is now global-scope
powerpc/powerpc/machdep.c
- call kobj_machdep_init early in the boot sequence to allow
kobj usage prior to SI_SUB_LOCK
- install the OEA pmap code. This will be moved to CPU-specific
init code in the future.
powerpc/powerpc/mmu_if.m
- Kobj MMU interface definitions
powerpc/powerpc/pmap_dispatch.c
- central dispatch for pmap calls
- contains the global mmu kobj and the routine to locate the
the mmu implementation and init the kobj
by the zero-copy sockets method, and written to before the transmission
completes, we need to destroy all of the existing mappings to the page,
not just the one that we fault on. Otherwise, the mappings will no longer
be to the same page and changes made through one of the mappings will not
be visible through the others.
Observed by: tegge
acpi_resource change was a minor nit offered as an early candidate for
the recent ACPICA import problem and the acpi.c change is one I need to
test still that makes the ordered probing of system devices actually work
as advertised (probe devices in order based on the type of device rather
than in the order we encounter them in the device tree).
entry that is not zero, assume that it is really a hard-wired IRQ (commonly
used for APIC routing) and not a source index. In practice, we've only
ever seen source indices of 0 for legitimate non-hard-wired _PRT entries.
Reviewed by: njl
Tested by: Alex Lyashkov shadow at psoft dot net
MFC after: 2 weeks
After a number of tests using nop's to change the alignment, it was
confirmed that the mtibat instructions should be cache-aligned.
FreeScale app note AN2540 indicates that the isync before and after
the mtdbat is the right thing to do, but sync/isync isn't required
before the mtibat so it has been removed.
Fix by using a ".balign 32" to pull the code in question to the correct
alignment.
MFC after: 3 days
Intel's web site requires some minor tweaks to get it to work:
- The driver seems to have been released with full WMI tracing enabled,
and makes references to some WMI APIs, namely IoWMIRegistrationControl(),
WmiQueryTraceInformation() and WmiTraceMessage(). Only the first
one is ever called (during intialization). These have been implemented
as do-nothing stubs for now. Also added a definition for STATUS_NOT_FOUND
to ntoskrnl_var.h, which is used as a return code for one of the WMI
routines.
- The driver references KeRaiseIrqlToDpcLevel() and KeLowerIrql()
(the latter as a function, which is unusual because normally
KeLowerIrql() is a macro in the Windows DDK that calls KfLowewIrql()).
I'm not sure why these are being called since they're not really
part of WDM. Presumeably they're being used for backwards
compatibility with old versions of Windows. These have been
implemented in subr_hal.c. (Note that they're _stdcall routines
instead of _fastcall.)
- When querying the OID_802_11_BSSID_LIST OID to get a BSSID list,
you don't know ahead of time how many networks the NIC has found
during scanning, so you're allowed to pass 0 as the list length.
This should cause the driver to return an 'insufficient resources'
error and set the length to indicate how many bytes are actually
needed. However for some reason, the Intel driver does not honor
this convention: if you give it a length of 0, it returns some
other error and doesn't tell you how much space is really needed.
To get around this, if using a length of 0 yields anything besides
the expected error case, we arbitrarily assume a length of 64K.
This is similar to the hack that wpa_supplicant uses when doing
a BSSID list query.
Move what can be moved (UMA zones creation, pv_entry_* initialization) from
pmap_init2() to pmap_init().
Create a new function, pmap_postinit(), called from cpu_startup(), to do the
L1 tables allocation.
pmap_init2() is now empty for arm as well.
sio(4) will claim it. This change therefore only affects how ports
are handled when they are not claimed by sio(4), and in principle
will improve hardware support.
MFC after: 2 months
from there. All others get broken up and free'd individually to the mbuf
and cluster zones.
The packet zone is a secondary zone to the mbuf zone. There is currently
a limitation in UMA which prevents decreasing the packet zone stock when
the mbuf and cluster zone are drained and all their members are part of
packets. When this is fixed this change may be reverted.
- Fix a typo in rsmisc.c and a style change for consistency.
This patch will also appear in future ACPI-CA release.
Submitted by: Robert Moore <robert dot moore at intel dot com>
Tested by: ru
descriptor. This should fix the "memory modified after free" panics. This
patch will appear in a future acpi-ca distribution.
Submitted by: Robert Moore <robert.moore / intel.com>
Tested by: Peter Holm
Previously, pvzone's initialization was split between pmap_init() and
pmap_init2(). This split initialization was the underlying cause of
some UMA panics during initialization. Specifically, if the UMA boot
pages was exhausted before the pvzone was fully initialized, then UMA,
through no fault of its own, would use an inappropriate back-end
allocator leading to a panic. (Previously, as a workaround, we have
increased the UMA boot pages.) Fortunately, there is no longer any
reason that pvzone's initialization cannot be completed in
pmap_init().
Eliminate a check for whether pv_entry_high_water has been initialized
or not from get_pv_entry(). Since pvzone's initialization is
completed in pmap_init(), this check is no longer needed.
Use cnt.v_page_count, the actual count of available physical pages,
instead of vm_page_array_size to compute the maximum number of pv
entries.
Introduce the vm.pmap.pv_entries tunable on alpha and ia64.
Eliminate some unnecessary white space.
Discussed with: tegge (item #1)
Tested by: marcel (ia64)
a synchronous reprogramming of hardware MAC filters if the physical
interface are up and running. Previously, MAC filters would be
reconfigured only when the fec interface was brought up.
- Disallow bundle reconfiguration when virtual
interface is running; otherwise, removing a
port from a running configuration will cause
a panic in the start() method on the next packet
on an assumption that a bundle has an even
number of ports (2 or 4).
- Disallow bringing of virtual interface to a
running state when a bundle size is 0; otherwise,
adding and then removing the port will similarly
cause a panic.
- Add missing initialization of fec_ifstat when
adding a new port and fix media status reporting
when virtual interface isn't yet up (check for
fec_status of 1 rather than != 0).
previously, ifp->if_type was set to IFT_ETHER by
ether_ifattach(), now it's done by if_alloc() so
an assignment of if_type to IFT_PROPVIRTUAL after
if_alloc() but before ether_ifattach() broke it.
This makes arp(8) and friends happy about the fec
interfaces, and will allow us to use if_setlladdr()
on the fec interface.
- Set/reset IFF_DRV_RUNNING/IFF_DRV_OACTIVE in init()
and stop() methods rather than in ioctl(), like the
rest of the drivers do. This fixes a bug when an
"ifconfig fec0 ipv4_address" would not have made
the interface running, didn't launch the ticker
function to track media status of bundled ports,
etc.
used in the base system. This has been much discussed in the past
(typically people giving me a hard time for it). Since all that was
added to config was nocpu, and since we don't use it, we don't need to
bump the version.
current context in the IPI_STOP handler so that we can get accurate stack
traces of threads on other CPUs on these two archs like we do now on i386
and amd64.
Tested on: alpha, sparc64
buffers *and* there are no buffers queued up for writing. The bug
was that NMODIFIED was being cleared even while there were buffers
scheduled to be written out, which leads to all sorts of interesting
bugs - one where the file could shrink (because of a post-op getattr
load, say) causing data in buffer(s) queued for write to be tossed,
resulting in data corruption.
Submitted by: Mohan Srinivasan
both proc pointer and thread pointer, if thread pointer is NULL,
tdsignal automatically finds a thread, otherwise it sends signal
to given thread.
Add utility function psignal_event to send a realtime sigevent
to a process according to the delivery requirement specified in
struct sigevent.
source is first enabled similar to how intr_event's now allocate ithreads
on-demand. Previously, we would map IDT vectors 1:1 to IRQs. Since we
only have 191 available IDT vectors for I/O interrupts, this limited us
to only supporting IRQs 0-190 corresponding to the first 190 I/O APIC
intpins. On many machines, however, each PCI-X bus has its own APIC even
though it only has 1 or 2 devices, thus, we were reserving between 24 and
32 IRQs just for 1 or 2 devices and thus 24 or 32 IDT vectors. With this
change, a machine with 100 IRQs but only 5 in use will only use up 5 IDT
vectors. Also, this change provides an API (apic_alloc_vector() and
apic_free_vector()) that will allow a future MSI interrupt source driver to
request IDT vectors for use by MSI interrupts on x86 machines.
Tested on: amd64, i386
for code to start out on one CPU when thunking into Windows
mode in ctxsw_utow(), and then be pre-empted and migrated to another
CPU before thunking back to UNIX mode in ctxsw_wtou(). This is
bad, because then we can end up looking at the wrong 'thread environment
block' when trying to come back to UNIX mode. To avoid this, we now
pin ourselves to the current CPU when thunking into Windows code.
Few other cleanups, since I'm here:
- Get rid of the ndis_isr(), ndis_enable_interrupt() and
ndis_disable_interrupt() wrappers from kern_ndis.c and just invoke
the miniport's methods directly in the interrupt handling routines
in subr_ndis.c. We may as well lose the function call overhead,
since we don't need to export these things outside of ndis.ko
now anyway.
- Remove call to ndis_enable_interrupt() from ndis_init() in if_ndis.c.
We don't need to do it there anyway (the miniport init routine handles
it, if needed).
- Fix the logic in NdisWriteErrorLogEntry() a little.
- Change some NDIS_STATUS_xxx codes in subr_ntoskrnl.c into STATUS_xxx
codes.
- Handle kthread_create() failure correctly in PsCreateSystemThread().
based jumbo 9k and jumbo 16k cluster support.
All mbuf's with external storage attached are mandatory reference
counted. For clusters and jumbo clusters UMA provides the refcnt
storage directly. It does not have to be separatly allocated. Any
other type of external storage gets its own refcnt allocated from
an UMA mbuf refcnt zone instead of normal kernel malloc.
The refcount API MEXT_ADD_REF() and MEXT_REM_REF() is no longer
publically accessible. The proper m_* functions have to be used.
mb_ctor_clust() and mb_dtor_clust() both handle normal 2K as well
as 9k and 16k clusters.
Clusters and jumbo clusters may be obtained without attaching it
immideatly to an mbuf. This is for high performance cluster
allocation in network drivers where mbufs are attached after the
cluster has been filled.
Tested by: rwatson
Sponsored by: TCP/IP Optimizations Fundraise 2005
destruction:
- Backout 1.62, since it doesn't fix all possible
problems.
- Upon node creation, put an additional reference on node.
- Add a mutex and refcounter to struct ngsock. Netgraph node,
control socket and data socket all count as references.
- Introduce ng_socket_free_priv() which removes one reference
from ngsock, and frees it when all references has gone.
- No direct pointers between pcbs and node, all pointing
is done via struct ngsock and protected with mutex.
- Introduce ng_topo_mtx, a mutex to protect topology changes.
- In ng_destroy_node() protect with ng_topo_mtx the process
of checking and pointing at ng_deadnode. [1]
- In ng_con_part2() check that our peer is not a ng_deadnode,
and protect the check with ng_topo_mtx.
- Add KASSERTs to ng_acquire_read/write, to make more
understandible synopsis in case if called on ng_deadnode.
Reported by: Roselyn Lee [1]
- Introduce a new flags NGQF_QREADER and NGQF_QWRITER,
which tell how the item should be actually applied,
overriding NGQF_READER/NGQF_WRITER flags.
- Do not differ between pending reader or writer. Use only
one flag that is raised, when there are pending items.
- Schedule netgraph ISR in ng_queue_rw(), so that callers
do not need to do this job.
- Fix several comments.
Submitted by: julian
Having an additional MT_HEADER mbuf type is superfluous and redundant
as nothing depends on it. It only adds a layer of confusion. The
distinction between header mbuf's and data mbuf's is solely done
through the m->m_flags M_PKTHDR flag.
Non-native code is not changed in this commit. For compatibility
MT_HEADER is mapped to MT_DATA.
Sponsored by: TCP/IP Optimization Fundraise 2005
ktr_tid as part of gathering of ktr header data for new ktrace
records. The continued use of intptr_t is required for file layout
reasons, and cannot be changed to lwpid_t at this point.
MFC after: 1 month
Reviewed by: davidxu
intptr_t. The buffer length needs to be written to disk as part
of the trace log, but the kernel pointer for the buffer does not.
Add a new ktr_buffer pointer to the kernel-only ktrace request
structure to hold that pointer. This frees up an integer in the
ktrace record format that can be used to hold the threadid,
although older ktrace files will have a garbage ktr_buffer field
(or more accurately, a kernel pointer value).
MFC after: 2 weeks
Space requested by: davidxu
If a copy-on-write fault occurs on the page, the new copy should inherit
a part of the original page's wire count.
Submitted by: tegge
MFC after: 1 week
fails, reclaim a pv entry by destroying a mapping to an inactive
page.
Change the format strings in many of the assertions that were recently
converted from PMAP_DIAGNOSTIC printf()s so that they are compatible
with PAE. Avoid unnecessary differences between the amd64 and i386
format strings.
since the link takes a bit to negotiate, the information is pretty
much never available during the probe. As such, the boot output
pretty much always prints N/A for speed and duplex. Since we print
out the output of ifconfig during the user space boot, this early
boot information is also generally redundant, and added to the noise.
MFC after: 2 weeks
before dereferencing it. Certain corrupt kernel modules might not have
a valid hash table, and would cause a kernel panic when they were loaded.
Instead of panic'ing, the kernel now prints out a warning that it is
missing the symbol hash table.
Tested by: Benjamin Close Benjamin dot Close at clearchain dot com
MFC after: 1 week
PCI-ISA bridge. Thus, when viapm0 or viapropm0 attaches, isab0 dosen't
attach so there is no isa0 bus hung off of that bridge. In the non-ACPI
case, legacy0 will add an isa0 anyway as a fail-safe, but ACPI assumes that
any ISA bus will be enumerated via a bridge. To fix this, call
isab_attach() to attach an isa0 ISA child bus device if the pm or propm
device we are probing is a PCI-ISA bridge. Both drivers now have to
implement the bus_if interface via the generic methods for resource
allocation, etc. to work. Also, we now add 2 new ISA bus drivers that
attach to viapm and viapropm devices.
PR: kern/87363
Reported by: Oliver Fromme olli at secnetix dot de
Tested by: glebius
MFC after: 1 week
- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.
- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.
- Disambiguate some collisions by adding subsystem prefixes to some
memory types.
- Generally prefer lower case to upper case.
- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.
Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.
in bytes to start off with. This caused the GPT geom sniffer to attempt
a seek just back from the end of the 'disk', which resulted in a > 4G
seek, causing gdb psim to exit since it only supports 32-bit seeks.
The size of the disk should really be specified in the psim device tree,
but for now do the minimal amount of work to get psim to run again.
aac_alloc_sync_fib(). aac_alloc_sync_fib() will assert that the I/O locks
are held. This fixes a panic on system boot up when the aac(4) device's
bus_generic_attach() routine is called.
Reviewed by: scottl
OpenFirmware. FreeBSD/ppc uses SPRG0 as the per-cpu data area pointer,
and SPRG1-3 as temporary registers during exception handling. There
have been a few instances where OpenFirmware does require these to
be part of it's context, such as cd-booting an eMac.
reported by: many
MFC after: 3 days
following the protocol pru_listen() call to solisten_proto(), so
that it occurs under the socket lock acquisition that also sets
SO_ACCEPTCONN. This requires passing the new backlog parameter
to the protocol, which also allows the protocol to be aware of
changes in queue limit should it wish to do something about the
new queue limit. This continues a move towards the socket layer
acting as a library for the protocol.
Bump __FreeBSD_version due to a change in the in-kernel protocol
interface. This change has been tested with IPv4 and UNIX domain
sockets, but not other protocols.
that caused a premature exit after calling a fast interrupt handler
and bypassing a much needed critical_exit() and the scheduling of
the interrupt thread for non-fast handlers. In short: unbreak :-)
- Return EINVAL if play_format or rec_format is set but the corresponding
sample rate is 0.
- Don't try to set the playback or recording format to 0. Previously,
issuing an AIOSFMT ioctl with an all-zeroes snd_chan_param would
trigger a KASSERT in chn_fmtchain(); I'm unsure about the effects on
a kernel without INVARIANTS. After this commit, issuing AIOSFMT with
an all-zeroes snd_chan_param is equivalent to issuing AIOGFMT.
MFC after: 2 weeks
trying to access user-space stack addresses when a user fault
is encountered, as occurs when GEOM KTR code is handling a page fault
and is using stack_save() to capture a trace for debug purposes.
It may be possible to walk beyond the trap-frame if it is a kernel fault,
as db_backtrace() does, but I don't think that complexity is needed in
this routine.
MFC after: 3 days
convert to or from timeval frequently.
Introduce function itimer_accept() to ack a timer signal in signal
acceptance code, this allows us to return more fresh overrun counter
than at signal generating time. while POSIX says:
"the value returned by timer_getoverrun() shall apply to the most
recent expiration signal delivery or acceptance for the timer,.."
I prefer returning it at acceptance time.
Introduce SIGEV_THREAD_ID notification mode, it is used by thread
libary to request kernel to deliver signal to a specified thread,
and in turn, the thread library may use the mechanism to implement
SIGEV_THREAD which is required by POSIX.
Timer signal is managed by timer code, so it can not fail even if
signal queue is full filled by sigqueue syscall.
actual resource values we received from the system rather than the range
we requested. Since we request a range starting at 0, we would record
that number. Later, since this == 0, we'd allocate again. However,
we wouldn't write the new resource into the BAR. This resulted in
a resource leak as well as a BAR that couldn't access the resource at
all since rman_get_start, et al, were wrong.
MFC After: 1 week (assuming RELENG_6 is open for business)
the ifp, so you can't call it before doing if_alloc(). Also, there's
really no need to call it here anyway: the code I originally ported from
OpenBSD incorrectly set the station address only once at device attach
time, instead of setting in txp_init(). This meant you couldn't change
the address with ifconfig txp0 ether xx:xx:xx:xx:xx:xx. I added the
call to txp_set_filter() in txp_init() to correct this, but forgot to
remove the call from txp_attach(). Until now, it never mattered.
With this fix, the txp driver tests good:
txp0: <3Com 3cR990-TX-97 Etherlink with 3XP Processor> port 0xb800-0xb87f mem 0xe6800000-0xe683ffff irq 12 at device 10.0 on pci0
txp0: Ethernet address: 00:01:03:d4:91:4f
and channel to ifconfig. Also use the SSID and channel info from
the association info that we already have instead of using ndis_get_info()
to ask the driver for it again.
memory for request.
I was sure graid3 should handle such situations well, but green@ reported
it is not and we want to fix it before 6.0.
Submitted by: green
kern/87959 cracauer ext2fs: no cp(1) possible, mmap returns EINVAL
ext2fs was missing vnode_create_vobject.
(Reisefs probably has the same problem but I want to get this in quick
for 6-release)
drivers I started quite some time before.
Retire the old i386-only pcf driver, and activate the new general
driver that has been sitting in the tree already for quite some
time.
Build the i2c modules for sparc64 architectures as well (where I've
been developing all this on).
o Fix typo in comment
o s/-100/BUS_PROBE_GENERIC/
o s/err/error/ for consistency
o Remove non-applicable comment
o Allow uart_bus_probe() to return the predefined BUS_PROBE_*
contants. In this case: explicitly test for error > 0.
With this change, the driver tests good (at least on i386):
wb0: <Winbond W89C840F 10/100BaseTX> port 0xb800-0xb87f mem 0xe6800000-0xe680007f irq 12 at device 10.0 on pci0
miibus1: <MII bus> on wb0
amphy0: <Am79C873 10/100 media interface> on miibus1
amphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
wb0: Ethernet address: 00:00:e8:18:2a:02
wb0: link state changed to DOWN
wb0: link state changed to UP
- Add locked variants of init() and start().
- Use callout_*() to manage callout.
- Test IFF_DRV_RUNNING rather than IFF_UP in wb_intr() to see if we are
still active when an interrupt comes in.
I couldn't find any of these cards anywhere to test on myself, and google
turns up references to FreeBSD and OpenBSD manpages for this driver when
trying to locate a card that way. I'm not sure anyone actually uses these
cards with FreeBSD.
Tested by: NO ONE (despite repeated requests)
I had to initialize the ifnet a bit earlier in attach so that the
if_printf()'s in vr_reset() didn't explode with a page fault.
- Use M_ZERO with contigmalloc() rather than an explicit bzero.
even initialized it, but it never used it.
- Use callout_*() to manage the callout.
- Use m_devget() to copy data out of the rx buffers rather than doing it
all by hand.
- Use m_getcl() to allocate mbuf clusters rather than doing it all by hand.
- Don't free the software descriptor for a rx ring entry if we can't
allocate an mbuf cluster for it. We left a dangling pointer and never
reallocated the entry anyway. OpenBSD's code (from which this was
derived) has the same bug.
Tested by: NO ONE (despite repeated requests)
Reviewed by: wpaul (5)
gdb(1) command better, though I must admit it's confusing: these
files have not only [debugging] symbols, but much more than that.
Requested by: obrien
'device npx' (both of which aren't really optional right now) and
'device io' and 'device mem' (to preserve POLA for 4.x users upgrading
to 6.0) from GENERIC into DEFAULTS.
Requested by: scottl
Reviewed by: scottl
set. When watchdogd(1) is terminated intentionally it clears the bit,
which should then disable it in the kernel.
PR: kern/74386
Submitted by: Alex Hoff <ahoff at sandvine dot com>
Approved by: phk, rwatson (mentor)
can't acquire an sx lock in ttyinfo() because ttyinfo() can be called
from interrupt handlers (such as atkbd_intr()). Instead, go back to
locking the process group while we pick a thread to display information for
and hold that lock until after we drop sched_lock to make sure the
process doesn't exit out from under us. sched_lock ensures that the
specific thread from that process doesn't go away. To protect against
the process exiting after we drop the proc lock but before we dereference
it to lookup the pid and p_comm in the call to ttyprintf(), we now copy
the pid and p_comm to local variables while holding the proc lock.
This problem was found by the recently added TD_NO_SLEEPING assertions for
interrupt handlers.
Tested by: emaste
MFC after: 1 week
our kernel linker will only load PT_LOAD segments, apparently not.
Instead, produce .dbg objects from .debug objects, and install
them together with non-debug objects, as described in objcopy(1).
Original code by: obrien
debug.kdb.panic and debug.kdb.trap alongside the existing debug.kdb.enter
sysctl. 'panic' causes a panic, and 'trap' causes a page fault. We used
these to ensure that crash dumps succeed from those two common failure
modes. This avoids the need for creating a 'panic' kld module.
* Don't recursively panic if we've already paniced and the local apic is
now stuck.
* Add hw.apic.* tunables/sysctls for extint controls
* Change "lapic%d timer" to "cpu%d timer" intname to match i386
the start of the section headers has to take into account the fact
that the image_nt_header is really variable sized. It happens that
the existing calculation is correct for _most_ production binaries
produced by the Windows DDK, but if we get a binary with oddball
offsets, the PE loader could crash.
Changes from the supplied patch are:
- We don't really need to use the IMAGE_SIZEOF_NT_HEADER() macro when
computing how much of the header to return to callers of
pe_get_optional_header(). While it's important to take the variable
size of the header into account in other calculations, we never
actually look at anything outside the non-variable portion of the
header. This saves callers from having to allocate a variable sized
buffer off the heap (I purposely tried to avoid using malloc()
in subr_pe.c to make it easier to compile in both the -D_KERNEL and
!-D_KERNEL case), and since we're copying into a buffer on the
stack, we always have to copy the same amount of data or else
we'll trash the stack something fierce.
- We need <stddef.h> to get offsetof() in the !-D_KERNEL case.
- ndiscvt.c needs the IMAGE_FIRST_SECTION() macro too, since it does
a little bit of section pre-processing.
PR: kern/83477
threads can wait for a thread to exit, and safely assume that the thread
has left userland and is no longer using its userland stack, this is
necessary for pthread_join when a thread is waiting for another thread
to exit which has user customized stack, after pthread_join returns,
the userland stack can be reused for other purposes, without this change,
the joiner thread has to spin at the address to ensure the thread is really
exited.
and ndis_halt_nic(). It's been disabled for some time anyway, and
it turns out there's a possible deadlock in NdisMInitializeTimer() when
acquiring the miniport block lock to modify the timer list: it's
possible for a driver to call NdisMInitializeTimer() when the miniport
block lock has already been acquired by an earlier piece of code. You
can't acquire the same spinlock twice, so this can deadlock.
Also, implement MmMapIoSpace() and MmUnmapIoSpace(), and make
NdisMMapIoSpace() and NdisMUnmapIoSpace() use them. There are some
drivers that want MmMapIoSpace() and MmUnmapIoSpace() so that they can
map arbitrary register spaces not directly associated with their
device resources. For example, there's an Atheros driver for
a miniPci card (0x168C:0x1014) on the IBM Thinkpad x40 that wants
to map some I/O spaces at 0xF00000 and 0xE00000 which are held by
the acpi0 device. I don't know what it wants these ranges for,
but if it can't map and access them, the MiniportInitialize() method
fails.
o Oxford Semiconductor PCI Dual Port Serial
o Netmos Nm9845 PCI Bridge with Dual UART
Add PCI IDs for single-port cards:
o Various SIIG Cyber Serial
o Oxford Semiconductor OXCB950 UART
Update description as per puc(4).
immediately, back off to the next higher Cx sleep state. Some machines
with a Via chipset report a valid C3 but a register read doesn't actually
halt the CPU. This would cause the machine to appear unresponsive as it
repeatedly called cpu_idle() which immediately returned. Causing interrupts
(i.e. by pressing the power button) would cause the system to make forward
progress, showing that it wasn't actually hung.
Also, enable interrupts a little earlier. We don't need them disabled
to calculate the delta time for the read.
Reported by: silby
MFC after: 2 weeks
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces. struct intr_event holds the list
of interrupt handlers associated with interrupt sources.
struct intr_thread contains the data relative to an interrupt thread.
Currently we still provide a 1:1 relationship of events to threads
with the exception that events only have an associated thread if there
is at least one threaded interrupt handler attached to the event. This
means that on x86 we no longer have 4 bazillion interrupt threads with
no handlers. It also means that interrupt events with only INTR_FAST
handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
intr_foo naming convention. This did require renaming the powerpc
MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
powerpc. This means that multiple INTR_FAST handlers can attach to the
same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
to the same interrupt. Sharing INTR_FAST handlers may not always be
desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
either. Drivers can always still use INTR_EXCL to ask for an interrupt
exclusively. The way this sharing works is that when an interrupt
comes in, all the INTR_FAST handlers are executed first, and if any
threaded handlers exist, the interrupt thread is scheduled afterwards.
This type of layout also makes it possible to investigate using interrupt
filters ala OS X where the filter determines whether or not its companion
threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
dumping their state. It also has a '/v' verbose switch which dumps
info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
handler is removed because it would suck for things like ppbus(8)'s
braindead behavior. The code is present, though, it is just under
#if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
event into a separate function so that ithread_loop() becomes more
readable. Previously this code was all in the middle of ithread_loop()
and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
curthread's proc against idlethread's proc. (Not really related to intr
changes)
Tested on: alpha, amd64, i386, sparc64
Tested on: arm, ia64 (older version of patch by cognet and marcel)
to the actual dates when code actually changed. Also add special case
link state change handling for RELENG_5, which doesn't have
if_link_state_change(). No actual operational changes are done.
devices can be opened multiple times simultaneously but we're
expected to be able to do so by the rest of the loader.
This fixes booting from disks attached to the on-board SCSI
controller of Sun Ultra 1 (previously this triggered a trap)
and probably also of AX1115 boards.
- While here, remove unused variables and add empty lines where
style(9) requires such.
Tested on: powerpc (grehan), sparc64
MFC after: 1 month
Try to make everyone happy: David (to have debug kernels installed
by default), Warner (to be able to override that), and myself (for
actually making it all work and to be consistent).
Now, if kernel was configured for debugging (through DEBUG=-g in
the kernel config file or "config -g"), doing "make install" will
install debug versions of kernel and module objects with their
canonical names,
kernel.debug -> /boot/kernel/kernel
if_fxp.ko.debug -> /boot/kernel/if_fxp.ko
Installing a kernel not configured for debugging, or debug kernel
with INSTALL_NODEBUG variable defined, will install non-debug
kernel and module objects.
Also, restore the install.debug and reinstall.debug targets that
are part of the existing API (they cause some additional gdb(1)
scripts to be installed).
of the PCIR_HDRTYPE register. It's the value returned from this
read access that determines whether or not we decide a device is
present at the current slot index. For some reason that I can't
adequately explain, this read fails on my machine when probing the
USB controller on my machine (which happens a multifunction device
at slot index 3 hung off the PCI-PCI bridge on the AMD8111 (bus
index 1)). The read will return 0xFF even though it should return
0x80 to indicate the presence of a multifunction device.
As near as I can tell, there's some timing issue involved with reading
the 'dead' slot indexes 0 through 2 that causes the read of the actual
device at slot 3 to fail. I tried a couple of different tricks to
correct the problem (the patch to amd64/pci/pci_cfgreg.c fixes it
for the amd64 arch), but adding this delay is the only thing that
always allows the USB controllers to be correctly probed 100% of the
time. Whatever the problem is, it's likely confined to the AMD8111
chipset. However, a simple 1us delay is fairly harmless and should
have no side effects for other hardware. I consider this to be
voodoo, but it's fairly benign voodoo and it makes my USB keyboard
and mouse work again.
Note that this is the second time that I've had to resort to a
1us delay to fix a PCI-related problem with this AMD8111/Opteron
system (the first being a fix I made a while back to the NDISulator).
It's possible the delay really belongs in the cfgreg code itself,
or that pci_cfgreg needs some custom hackery for an errata in the
8111. (I checked but couldn't find any documented errata on AMD's
site that could account for these problems.)
other OSes (Solaris, Linux, VxWorks). It's not necessary to write a 0
to the config address register when using config mechanism 1 to turn
off config access. In fact, it can be downright troublesome, since it
seems to confuse the PCI-PCI bridge in the AMD8111 chipset and cause
it to sporadically botch reads from some devices. This is the cause
of the missing USP ports problem I was experiencing with my Sun Opteron
system.
Also correct the case for mechanism 2: it's only necessary to write
a 0 to the ENABLE port.
- Move hardware counter reading/zeroing to hme_tick(). This saves
8 register access per interrupt. [1]
- Use imax macro for getting max. argument between two integers.
- Invoke bus_dmamap_sync(9) first before freeing mbuf.
- Check driver queue first to reduce locking operation in hme_start_locked()
and interrupt handler.
- Simplyfy watchdog timer setup in interrupt handler.
- Don't log normal errors such as RX overrun. If we have DMA stuck
condition, reinitialize the driver and log it.
Reviewed by: marius
Obtained from: OpenBSD [1]
confused with the Credit Card Adapter II and its spawn (which the xe
driver supports). These changes get my card probing and attaching. I
recently won one of these (and a NEC rebadged version) in an lot
auction. The NEC didn't work, so I took it apart and found the
MB86960A chip and then modified if_fe_pccard.c to attach. I can't
test this card further since I have no dongle for this card.
be installed. It should have been optional to install a non-debug
one, just like it was formerly optional to install a debug one. In
order to do that, most of 1.84 had to go.
Instead, make installing the debug kernel the default, but create a
new option INSTALL_NODEBUG for those people that have small /
partitions and good source control habits.
This preserves the behavior of 1.84 while allowing it to be overriden
for people (like me) that do not have the time to upgrade to get a
bigger / and also don't have time for stupid makefile tricks when
upgrading their older system, but still want a kernel.debug around if
things go south.
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
enabled if an attempt is made to send an IPI_STOP IPI. If the kernel
option is enabled, there is also a sysctl to change the behavior at
runtime (debug.stop_cpus_with_nmi which defaults to enabled). This
includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
that is restarted making use of atomic_readandclear() rather than
assuming that the BSP is always included in the set of restarted CPUs.
Also, the NMI handler didn't clear the function pointer meaning that
subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
of stoppedpcbs[] and always enable it for i386 and amd64 instead of
being dependent on KDB_STOP_NMI. It works fine in both the NMI and
non-NMI cases.
still works. Also, this is consistent with 'show pcpu' vs
'show allpcpu'. (And 'show allstacks' on OS X for that matter.)
- Add 'bt' as an alias for 'trace'. We already have a 'where' alias as
well, so this makes it easier for gdb-wired hands to work in ddb.
Ok'd by: rwatson (1)
Requested by: scottl (2)
MFC after: 1 day
is on AC power (i.e. not a laptop). This allows power_profile to run once
for desktop systems as well, for instance, to set C3 or CPU frequency.
MFC after: 2 weeks
promisc flag from the member interface, this is a no-op anyway since the
interface is disappearing. The driver may have already released
its resources such as miibus and this is likely to panic the kernel.
Submitted and tested by: Wojciech A. Koszek
MFC after: 2 weeks
from being reclaimed before it was wired. Use pmap_extract_and_hold()
instead of pmap_extract() and retain the hold on the page until it has been
wired.
clock are supported. I have plan to merge XSI timer ITIMER_REAL and other
two CPU timers into the new code, current three slots are available for
the XSI timers.
The SIGEV_THREAD notification type is not supported yet because our
sigevent struct lacks of two member fields:
sigev_notify_function
sigev_notify_attributes
I have found the sigevent is used in AIO, so I won't add the two members
unless the AIO code is adjusted.
2. Introduce flags KSI_EXT and KSI_INS. The flag KSI_EXT allows a ksiginfo
to be managed by outside code, the KSI_INS indicates sigqueue_add should
directly insert passed ksiginfo into queue other than copy it.
OR the flags bits with the driver managed status flags. This fixes an
issue where RUNNING flags would not be reported to processes, which
conflicts with the flags information provided by ifconfig(8).
support the CM-battery interface. Smart batteries can eventually be
supported without ACPI via a separate SMBus interface. The ACPI interface
uses the embedded controller for reading/writing to the SMBus, and normal
ASL definitions for locating the battery controller (since SMBus can't be
enumerated.) Also import definitions for the smart battery interface.
This was written by Hans Petter Selasky with minor cleanups from myself.
Submitted by: Hans Petter Selasky <hselasky / c2i.net>
* Use ACPI_BATT_UNKNOWN instead of constants
* Use maxunit instead of a count of devices since we may have sparse
battery devices in the future. Only userland should be using unit
numbers anyway, so provide a translation function. (Kernel use of
batteries should be restricted to looking up a device_t and calling
methods directly.
* Don't check acpi_BatteryIsPresent() in acpi_battery. Leave it up to
the hardware-specific driver (i.e. cmbat) since smart batteries seem
to not report the "battery present" flag.
* Convert mA to mW if the battery uses those units. CM-batteries only
used mW so this deficiency went unnoticed.
* Clean strings reported in the battery info from any control chars.
* Only dereference the unit from ioctl_arg if the full struct is present.
Unit wouldn't have been used later if it wasn't present but this is
cleaner. Translate the unit if it's not ACPI_BATTERY_ALL_UNITS.
* bzero structs before returning them to usermode for future compat.
Most of this work was submitted by Hans Petter Selasky and then majorly
reworked by myself.
Submitted by: Hans Petter Selasky <hselasky / c2i.net>
vm_object_backing_scan() was not written to handle. Specifically, a wired
page within a backing object that is shadowed by a page within the shadow
object. Handle this state by removing the wired page from the backing
object. The wired page will be freed by socow_iodone().
Stop masking errors: If a page is being freed by vm_object_backing_scan(),
assert that it is no longer mapped rather than quietly destroying any
mappings.
Tested by: Harald Schmalzbauer
too. This fixes problem when connected prefixes overlap.
Obtained from: OpenBSD (rev. 1.40 by claudio);
[ I came to this fix myself, and then found out that
OpenBSD had already fixed it the same way.]
attach routine. Go ahead and ask for it in the probe routine and be
just as wrong as all the other cards that ask for it there...
# this gets the RTL8019 on a SBC at work fully functional. 6.0 still treats
# the 8019 as a generic NE-2000, so these changes aren't relevant there.
This avoids the need for sched_bind() in the default case so that you
can start up the NDIS subsystem at boot time when only CPU 0 is running.
There are potentially ways to fix it so that the DPC threads aren't
started until after the other CPUs are launched, but doing it correctly
is tricky. You need to defer the startup of the ntoskrnl subsystem
(ntoskrnl_libinit()), not just defer ndis_attach().
For now, I don't think it will make much difference having just the
single DPC thread (I started out with just one anyway). Note that this
turns the KeSetTargetProcessorDpc() routine into a no-op, since the
CPU number in struct kdpc is now ignored.
ed_probe_rtl80x9. In the pci case we call ed_probe_rtl80x9 first. In
the PCI case we were using the correct nic_offset by accident because
softc is initialized to zero. In the isa case we were using the wrong
value by accident, since ed_probe_WD80x3 sets the offset value to
0x10. This lead to the identification routines failing. Fix this
problem by always initalizing the nic_offset and asic_offset before
making ed_{asic,nic}_{in,out}* calls.
get a new pv under high system load where the available pv entries
have been exhausted before the pagedaemon has a chance to wake up
to reclaim some.
Prior to this, the NULL pointer dereference ended up causing
secondary panics with rather less than useful resulting tracebacks.
Reviewed by: alc, jhb
MFC after: 1 week
in reiserfs_lookup() that was used to fix an actual race in
ufs_lookup.c:1.78. This is not currently a hazard, but the
bug would be activated by marking reiserfs as MPSAFE.
Reviewed by: mux (mentor)
MFC after: 2 weeks
is KeRaiseIrql(newirql, &oldirql), not oldirql = KeRaiseIrql(newirql).
(The macro ultimately translates to KfRaiseIrql() which does use
the latter API, so this has no effect on generated code.)
Also, wait for thread termination the right way: kthread_exit()
will ultimately do a wakeup(td->td_proc). This is the event we
should wait on. Eliminate the previous synchronization machinery
for this since it was never guaranteed to work correctly.
to (max block - 1) * bsize. For DEV_BSIZE, this doubles the limit from
0.5 TB to 1 TB. For the old 4.4 FFS case, decrease the limit from 0.5 TB
to 2 GB - 1. Older systems had a 32 bit off_t so they couldn't access the
larger files anyway.
Collaboration with: bde
processor, to insure DPC thread 0 runs on CPU0, DPC thread 1 runs on
CPU1, and so on.
Elevate the priority of the workitem threads, though don't use as
high a priority as the DPC threads.
available kernel malloc types. Quite useful for post-mortem debugging of
memory leaks without a dump device configured on a panicked box.
MFC after: 2 weeks
- Destroy mutex in case of attach failure. [1]
- Lock properly em_watchdog(). [1]
- Lock properly em_sysctl_int_delay(). [1]
- Remove unused global adapter linked list.
- Remove unused dma_size field from struct em_dma_alloc.
- Do not touch interface statistics, that must be edited
only by upper layers. [1]
Submitted by: yongari [1]
o Do not mask the RX overrun interrupt.
o Rewrite em_intr():
- Axe EM_MAX_INTR.
- Cycle acknowledging interrupts and processing
packets until zero interrupt cause register is
read.
- If RX overrun comes in log this fact. [ NetBSD also
resets adapter in this case, but my tests showed that
this is not needed and only pessimizes behavior under
heavy load. ]
- Since almost all functions is rewritten, style the
remaining lines.
This fixes em(4) interfaces wedging under high load.
In collaboration with: wpaul, cognet
Obtained from: NetBSD
to unload the usb.ko module after boot if it was originally preloaded
from "/boot/loader.conf". When processing preloaded modules, the
linker erroneously added self-dependencies the each module's reference
count. That prevented usb.ko's reference count from ever going to 0,
so it could not be unloaded.
Sponsored by Isilon Systems.
Reviewed by: pjd, peter
MFC after: 1 week
- disable IPv6 operation if DAD fails for some EUI-64 link-local addresses.
- export get_hw_ifid() (and rename it) as a subroutine for this process.
Obtained from: KAME
Reviewd by: ume, gnn
MFC after: 2 week
Fix a debugging printf to printf after a variable is first assigned,
not before.
These are purely build fixes, and need inspection to make sure they
were what the original author of the previous changes intended.
flag. This fixes panic, when 'ifconfig em0 down' was called and it calls
em_stop() while the em_process_receive_interrupts() has temporarily
dropped the lock.
- fixed typos
- improved some comment descriptions
- use NULL, instead of 0, to denote a NULL pointer
- avoid embedding a magic number in the code
- use nd6log() instead of log() to record NDP-specific logs
- nuked an unnecessay white space
Obtained from: KAME
MFC after: 1 day
sampling rate:
- Improve vchan chn_setspeed() strategy. Try to avoid FEEDER_RATE
on parent channel if the requested value is not supported
by the hardware.
- Fix vchan default speed calculation. In any case, vchan should
rely on parent bufsoft speed instead of bufhard since it is
possible that the entire feeder chain might involve FEEDER_RATE.
This is possible under extreme, rare condition if the above
chn_setspeed() strategy failed.
Approved by: netchild (mentor)
- Change ndis_return() from a DPC to a workitem so that it doesn't
run at DISPATCH_LEVEL (with the dispatcher lock held).
- In if_ndis.c, submit packets to the stack via (*ifp->if_input)() in
a workitem instead of doing it directly in ndis_rxeof(), because
ndis_rxeof() runs in a DPC, and hence at DISPATCH_LEVEL. This
implies that the 'dispatch level' mutex for the current CPU is
being held, and we don't want to call if_input while holding
any locks.
- Reimplement IoConnectInterrupt()/IoDisconnectInterrupt(). The original
approach I used to track down the interrupt resource (by scanning
the device tree starting at the nexus) is prone to problems when
two devices share an interrupt. (E.g removing ndis1 might disable
interrupts for ndis0.) The new approach is to multiplex all the
NDIS interrupts through a common internal dispatcher (ntoskrnl_intr())
and allow IoConnectInterrupt()/IoDisconnectInterrupt() to add or
remove interrupts from the dispatch list.
- Implement KeAcquireInterruptSpinLock() and KeReleaseInterruptSpinLock().
- Change the DPC and workitem threads to use the KeXXXSpinLock
API instead of mtx_lock_spin()/mtx_unlock_spin().
- Simplify the NdisXXXPacket routines by creating an actual
packet pool structure and using the InterlockedSList routines
to manage the packet queue.
- Only honor the value returned by OID_GEN_MAXIMUM_SEND_PACKETS
for serialized drivers. For deserialized drivers, we now create
a packet array of 64 entries. (The Microsoft DDK documentation
says that for deserialized miniports, OID_GEN_MAXIMUM_SEND_PACKETS
is ignored, and the driver for the Marvell 8335 chip, which is
a deserialized miniport, returns 1 when queried.)
- Clean up timer handling in subr_ntoskrnl.
- Add the following conditional debugging code:
NTOSKRNL_DEBUG_TIMERS - add debugging and stats for timers
NDIS_DEBUG_PACKETS - add extra sanity checking for NdisXXXPacket API
NTOSKRNL_DEBUG_SPINLOCKS - add test for spinning too long
- In kern_ndis.c, always start the HAL first and shut it down last,
since Windows spinlocks depend on it. Ntoskrnl should similarly be
started second and shut down next to last.
called during early init before cninit().
Tested on: i386, alpha, sparc64
Reviewed by: phk, imp
Reported by: Divacky Roman xdivac02 at stud dot fit dot vutbr dot cz
MFC after: 1 week
the descriptors set.
- In em_process_receive_interrupts(), call bus_dmamap_sync() for the
descriptors set each time we modify one descriptor, instead of doing it only
at the function exit, to make sure the adapters know he can re-use the
descriptor.
This helps on arm with write-back data cache (and possibly on other arches
with bounce pages, I don't know) under heavy network load. Without this,
if we attempt to process more than num_rx_desc descriptors, the adapter
would just stop processing rx interrupts.
While here, support up to four sections because it was trivial to do
and cheap. (One pointer per section).
For amd64 with "-fpic -shared" format .ko files, using a single PT_LOAD
section is important to avoid wasting about 1MB of KVM and physical ram
for the 'gap' between the two PT_LOAD sections. amd64 normally uses
.o format kld files and isn't affected normally. But -fpic -shared modules
are actually possible to produce and load... (And with a bugfix to
binutils, we can build and use plain -shared .ko files without -fpic)
i386 only wastes 4K per .ko file, so that isn't such a big deal there.
and fs.base. We always update pcb.pcb_gsbase and pcb.pcb_fsbase
when user wants to set them, in context switch routine, we only need to
write them into registers, we never have to read them out from registers
when thread is switched away. Since rdmsr is a serialization instruction,
micro benchmark shows it is worthy to do.
Reviewed by: peter, jhb
the former is the ISA part, not the latter.
MFC After 6.0 is unfrozen (this bug doesn't exist in 6.0 because I didn't
MFC the rtl80x9 changes for ISA due to an error on my part)
cache_lookup() has returned a ref'ed and locked vnode since
vfs_cache.c:1.96, dated Tue Mar 29 12:59:06 2005 UTC. This change
is similar to the change made to smbfs_lookup() in smbfs_vnops.c:1.58.
Tested by: "Antony Mawer" ant AT mawer.org
MFC after: 2 weeks
I benchmarked this by simultaneously extracting 4 large tarballs (basically
world images) on a 4-processor AMD64 system, in a malloc-backed md.
With this patch, system time was reduced by 43%, and wall clock time by 33%.
Submitted by: jeff
MFC after: 1 week
cd9660_lookup() that was used to fix an actual race in ufs_lookup.c:1.78.
This is not currently a hazard, but the bug would be activated by
marking cd9660 as MPSAFE.
Requested by: bde
ext2_lookup() that was used to fix an actual race in ufs_lookup.c:1.78.
This is not currently a hazard, but the bug would be activated by
marking ext2fs as MPSAFE.
Requested by: bde
MFC after: 2 weeks
to the 100/1000 BCM5400 phy. This fixes the problem with
the GEM port not syncing up on Sawtooth G4's.
Obtained from: NetBSD
Reported by: Ben Rosengart <ben + freebsd org at narcissus net>
so we are ready for mpsafevfs=1 by default on sparc64 too. I have been
running this on all my sparc64 machines for over 6 months, and have not
encountered MD problems.
MFC after: 1 week
the kernel by wrapping all targets for fake opt_*.h files in
.if defined(KERNBUILDDIR). Thus, such fake files won't be
created at all if modules are built with the kernel.
Some modules undergo cleanup like removing unused or unneeded
options or .h files, without which they wouldn't build this way
or the other.
Reviewed by: ru
Tested by: no binary changes in modules built alone
Tested on: i386 sparc64 amd64
provided in the kernel build directory, fix modules that were
failing to build this way due to not quite correct kernel option
usage. In particular:
ng_mppc.c uses two complementary options, both of which are listed
in sys/conf/files. Ideally, there should be a separate option for
including ng_mppc.c in kernel build, but now only
NETGRAPH_MPPC_ENCRYPTION is usable anyway, the other one requires
proprietary files.
nwfs and smbfs were trying to ensure they were built with proper
network components, but the check was rather questionable.
Discussed with: ru
- Add newer CPUID definitions for future use.
Many thanks to Mike Tancsa <mike at sentex dot net> for providing test
cases for Intel Pentium D and AMD Athlon 64 X2.
Approved by: anholt (mentor)
case by saving the value of dp->i_ino before unlocking the vnode
for the current directory and passing the saved value to VFS_VGET().
Without this change, another thread can overwrite dp->i_ino after
the current directory is unlocked, causing ufs_lookup() to lock
and return the wrong vnode in place of the vnode for its parent
directory. A deadlock can occur if dp->i_ino was changed to a
subdirectory of the current directory because the root to leaf vnode
lock ordering will be violated. A vnode lock can be leaked if
dp->i_ino was changed to point to the current directory, which
causes the current vnode lock for the current directory to be
recursed, which confuses lookup() into calling vrele() when it
should be calling vput().
The probability of this bug being triggered seems to be quite low
unless the sysctl variable debug.vfscache is set to 0.
Reviewed by: jhb
MFC after: 2 weeks
amd64, and is a factor of 3 less than the value previously auto-sized on
a 12GB machine, which would cause an overflow in calculations involving the
maxbcache int, causing bufinit() to loop forever at boot.
Reviewed by: mlaier, peter
correspond to the commit log. It changed the maxswzone and maxbcache
parameters from int to long, without changing the extern definitions
in <sys/buf.h>.
In fact it's a good thing it did not, because other parts of the system
are not yet ready for this, and on large-memory sparc machines it causes
severe filesystem damage if you try.
The worst effect of the change was that the tunables controlling the
above variables stopped working. These were necessary to allow such
large sparc64 machines (with >12GB RAM) to boot, since sparc64 did not
set a hard-coded upper limit on these parameters and they ended
up overflowing an int, causing an infinite loop at boot in bufinit().
Reviewed by: mlaier
cards and teach the re(4) driver to attach to revision 3 cards.
Submitted by: Fredrik Lindberg fli+freebsd-current at shapeshifter dot se
MFC after: 2 weeks
Reviewed by: imp, mdodd
originally wrote it for 4.x and hasn't really had the time to fully update
it to 5.x and later. Also, the author doesn't use the hardware anymore as
well. If someone does need this driver they can always resurrect it from
the Attic.
Requested by: Frank Mayhar frank at exit dot com
the modified memory rather than using register operands that held a pointer
to the memory. The biggest effect is that we now correctly tell the
compiler that these functions change the memory that these functions
modify.
Reviewed by: cognet
do not support the GETINFO immediate command, unlike just about every other
variant of the hardware. Also document some magic values and fix some minor
nearby whitespace.
MFC After: 3 days
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.
2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.
3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.
4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.
5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.
6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.
Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64
The receive function em_process_receive_interrupts() unlocks the
adapter while ether_input() processes the packet, and then locks
it back. In the meantime, em_init() may be called, either from
em_watchdog() from softclock interrupt or from the ifconfig(8)
program. The em_init() resets the card, in particular it sets
adapter->next_rx_desc_to_check to 0 and resets hardware RX Head
and Tail descriptor pointers. The loop in
em_process_receive_interrupts() does not expect these things to
change, and a mess may result.
This fixes long wedges of em(4) interfaces receive part under high
load and IP fastforwarding enabled.
PR: kern/87418
Submitted by: Dmitrij Tejblum <tejblum yandex-team.ru>
that the following funtions are not used, wrap in '#ifdef noused' for the
moment.
bstp_enable_change_detection
bstp_disable_change_detection
bstp_set_bridge_priority
bstp_set_port_priority
bstp_set_path_cost
the ExCA spec, and close cousins:
o Write an activate routine that works.
o merge a couple of items from oldcard before they are lost
o write a deactivate routine
I suspect we're still a ways away from having this work, but maybe for
6.1/5.5?
- move the function pointer definitions to if_bridgevar.h
- move most of the logic to the new BRIDGE_INPUT and BRIDGE_OUTPUT macros
- remove unneeded functions from if_bridgevar.h and sort a little.
previous revision only restored the MP optimization.
Describe the optimization strategy for TLB invalidations in a comment.
Reviewed by: ups@
MFC after: 3 days
Use bridge_ifdetach() to notify the bridge that a member has been detached. The
bridge can then remove it from its interface list and not try to send out via a
dead pointer.
as a Novell NE-2000. This is necessary for unpatched qemu working
correctly. qemu claims to be a RTL8029, but doesn't implement the
RTL8029 specific registers at this time. I've created patches for
that, but there's no reason we can't use qemu's emulation w/o these
patches. This should make life easier for those folks that boot
FreeBSD via qemu.
ed_probe_generic8390 where we're calling it. It will be done as part
of ed_probe_Novel_generic after things are setup in a way that
ed_probe_generic8390 will grok.
o Fix operator precedence botch that causes a panic when setting the media
type for 10baseT connections.
o Save the type of device so that it prints with the rest of the probe.
# this should make it work with qemu again, but only if it has my patches
# to actually implement the RTL8029 specific registers.
- Use device_printf() and if_printf() and remove nge_unit.
- Use callout_init_mtx() and remove nge_tick_locked() as nge_tick() is now
always called with the driver lock held.
- Use M_ZERO to contigmalloc() when allocating nge_ldata. It was possible
for the random garbage to be used in certain cases otherwise.
- Cleanup attach error handling including no longer leaking nge_ldata.
- Add locking to the ifmedia callouts.
- Lock accesses to if_hwassist and if_capenable in nge_ioctl().
Submitted by: Yuriy N. Shkandybin jura at networks dot ru (1, 3, 4)
Tested by: Yuriy N. Shkandybin jura at networks dot ru
MFC after: 3 days
cloner. This ensures that ifc->ifc_units is not prematurely freed in
if_clone_detach() before the clones are destroyed, resulting in memory modified
after free. This could be triggered with if_vlan.
Assert that all cloners have been destroyed when freeing the memory.
Change all simple cloners to destroy their clones with ifc_simple_destroy() on
module unload so the reference count is properly updated. This also cleans up
the interface destroy routines and allows future optimisation.
Discussed with: brooks, pjd, -current
Reviewed by: brooks
notifications when LIO operations completed. These were the problems
with LIO event complete notification:
- Move all LIO/AIO event notification into one general function
so we don't have bugs in different data paths. This unification
got rid of several notification bugs one of which if kqueue was
used a SIGILL could get sent to the process.
- Change the LIO event accounting to count all AIO request that
could have been split across the fast path and daemon mode.
The prior accounting only kept track of AIO op's in that
mode and not the entire list of operations. This could cause
a bogus LIO event complete notification to occur when all of
the fast path AIO op's completed and not the AIO op's that
ended up queued for the daemon.
Suggestions from: alc
auto-start, set cnp.cn_lkflags to LK_EXCLUSIVE. This flag must now
be set so that lockmgr knows what kind of lock to acquire, and it
will panic if not specified. This resulted in a panic when using
extended attributes on UFS1 as of locking work present in the 6.x
branch.
This is a RELENG_6_0 merge candidate.
Reported by: lofi
MFC after: 3 days
TLB shootdown requirements. Otherwise a CPU may not get the needed
TLB invalidation.
The PTE valid and access flags can not be used here to avoid TLB
shootdowns unless sf->cpumask == all_cpus.
( Otherwise some CPUs may still hold an even older entry in the TLB)
Since sf_buf_alloc mappings are normally always used this is
also not really useful and presetting accessed and modified
allows the CPU to speculatively load the entry into the TLB.
Both bugs can cause random data corruption.
MFC after: 3 days
based on XMAC II chip should be ready for this in their initial
mode of operation, and Yukon-based NICs are configured so by
the driver.
PR: kern/79998
MFC after: 1 month
semantics, and then was reused for next node, it still would be applied
as writer again.
To fix the regression the decision is made never to alter item->el_flags
after the item has been allocated. This requires checking for overrides
both in ng_dequeue() and in ng_snd_item().
Details:
- Caller of the ng_apply_item() knows what is the current access to
node and specifies it to ng_apply_item(). The latter drops the
given access after item has beem applied.
- ng_dequeue() needs to be supplied with int pointer, where it stores
the obtained access on node.
- Check for node/hook access overrides in ng_dequeue().
generic sounding CIS "PCMCIA", "FAST ETHERENT CARD" and a bogus MANFID
code (0xffff and 0x1090). However, since I'm not aware of 'generic'
cards that aren't NE-2000oids, go with that and hope for the best.
First and most importantly, I threw out the thread priority-twiddling
implementation of KeRaiseIrql()/KeLowerIrq()/KeGetCurrentIrql() in
favor of a new scheme that uses sleep mutexes. The old scheme was
really very naughty and sought to provide the same behavior as
Windows spinlocks (i.e. blocking pre-emption) but in a way that
wouldn't raise the ire of WITNESS. The new scheme represents
'DISPATCH_LEVEL' as the acquisition of a per-cpu sleep mutex. If
a thread on cpu0 acquires the 'dispatcher mutex,' it will block
any other thread on the same processor that tries to acquire it,
in effect only allowing one thread on the processor to be at
'DISPATCH_LEVEL' at any given time. It can then do the 'atomic sit
and spin' routine on the spinlock variable itself. If a thread on
cpu1 wants to acquire the same spinlock, it acquires the 'dispatcher
mutex' for cpu1 and then it too does an atomic sit and spin to try
acquiring the spinlock.
Unlike real spinlocks, this does not disable pre-emption of all
threads on the CPU, but it does put any threads involved with
the NDISulator to sleep, which is just as good for our purposes.
This means I can now play nice with WITNESS, and I can safely do
things like call malloc() when I'm at 'DISPATCH_LEVEL,' which
you're allowed to do in Windows.
Next, I completely re-wrote most of the event/timer/mutex handling
and wait code. KeWaitForSingleObject() and KeWaitForMultipleObjects()
have been re-written to use condition variables instead of msleep().
This allows us to use the Windows convention whereby thread A can
tell thread B "wake up with a boosted priority." (With msleep(), you
instead have thread B saying "when I get woken up, I'll use this
priority here," and thread A can't tell it to do otherwise.) The
new KeWaitForMultipleObjects() has been better tested and better
duplicates the semantics of its Windows counterpart.
I also overhauled the IoQueueWorkItem() API and underlying code.
Like KeInsertQueueDpc(), IoQueueWorkItem() must insure that the
same work item isn't put on the queue twice. ExQueueWorkItem(),
which in my implementation is built on top of IoQueueWorkItem(),
was also modified to perform a similar test.
I renamed the doubly-linked list macros to give them the same names
as their Windows counterparts and fixed RemoveListTail() and
RemoveListHead() so they properly return the removed item.
I also corrected the list handling code in ntoskrnl_dpc_thread()
and ntoskrnl_workitem_thread(). I realized that the original logic
did not correctly handle the case where a DPC callout tries to
queue up another DPC. It works correctly now.
I implemented IoConnectInterrupt() and IoDisconnectInterrupt() and
modified NdisMRegisterInterrupt() and NdisMDisconnectInterrupt() to
use them. I also tried to duplicate the interrupt handling scheme
used in Windows. The interrupt handling is now internal to ndis.ko,
and the ndis_intr() function has been removed from if_ndis.c. (In
the USB case, interrupt handling isn't needed in if_ndis.c anyway.)
NdisMSleep() has been rewritten to use a KeWaitForSingleObject()
and a KeTimer, which is how it works in Windows. (This is mainly
to insure that the NDISulator uses the KeTimer API so I can spot
any problems with it that may arise.)
KeCancelTimer() has been changed so that it only cancels timers, and
does not attempt to cancel a DPC if the timer managed to fire and
queue one up before KeCancelTimer() was called. The Windows DDK
documentation seems to imply that KeCantelTimer() will also call
KeRemoveQueueDpc() if necessary, but it really doesn't.
The KeTimer implementation has been rewritten to use the callout API
directly instead of timeout()/untimeout(). I still cheat a little in
that I have to manage my own small callout timer wheel, but the timer
code works more smoothly now. I discovered a race condition using
timeout()/untimeout() with periodic timers where untimeout() fails
to actually cancel a timer. I don't quite understand where the race
is, using callout_init()/callout_reset()/callout_stop() directly
seems to fix it.
I also discovered and fixed a bug in winx32_wrap.S related to
translating _stdcall calls. There are a couple of routines
(i.e. the 64-bit arithmetic intrinsics in subr_ntoskrnl) that
return 64-bit quantities. On the x86 arch, 64-bit values are
returned in the %eax and %edx registers. However, it happens
that the ctxsw_utow() routine uses %edx as a scratch register,
and x86_stdcall_wrap() and x86_stdcall_call() were only preserving
%eax before branching to ctxsw_utow(). This means %edx was getting
clobbered in some cases. Curiously, the most noticeable effect of this
bug is that the driver for the TI AXC110 chipset would constantly drop
and reacquire its link for no apparent reason. Both %eax and %edx
are preserved on the stack now. The _fastcall and _regparm
wrappers already handled everything correctly.
I changed if_ndis to use IoAllocateWorkItem() and IoQueueWorkItem()
instead of the NdisScheduleWorkItem() API. This is to avoid possible
deadlocks with any drivers that use NdisScheduleWorkItem() themselves.
The unicode/ansi conversion handling code has been cleaned up. The
internal routines have been moved to subr_ntoskrnl and the
RtlXXX routines have been exported so that subr_ndis can call them.
This removes the incestuous relationship between the two modules
regarding this code and fixes the implementation so that it honors
the 'maxlen' fields correctly. (Previously it was possible for
NdisUnicodeStringToAnsiString() to possibly clobber memory it didn't
own, which was causing many mysterious crashes in the Marvell 8335
driver.)
The registry handling code (NdisOpen/Close/ReadConfiguration()) has
been fixed to allocate memory for all the parameters it hands out to
callers and delete whem when NdisCloseConfiguration() is called.
(Previously, it would secretly use a single static buffer.)
I also substantially updated if_ndis so that the source can now be
built on FreeBSD 7, 6 and 5 without any changes. On FreeBSD 5, only
WEP support is enabled. On FreeBSD 6 and 7, WPA-PSK support is enabled.
The original WPA code has been updated to fit in more cleanly with
the net80211 API, and to eleminate the use of magic numbers. The
ndis_80211_setstate() routine now sets a default authmode of OPEN
and initializes the RTS threshold and fragmentation threshold.
The WPA routines were changed so that the authentication mode is
always set first, followed by the cipher. Some drivers depend on
the operations being performed in this order.
I also added passthrough ioctls that allow application code to
directly call the MiniportSetInformation()/MiniportQueryInformation()
methods via ndis_set_info() and ndis_get_info(). The ndis_linksts()
routine also caches the last 4 events signalled by the driver via
NdisMIndicateStatus(), and they can be queried by an application via
a separate ioctl. This is done to allow wpa_supplicant to directly
program the various crypto and key management options in the driver,
allowing things like WPA2 support to work.
Whew.
routine, create all the child bio objects before starting the
requests, rather than starting them as created. This closes a race
whereby some number of child operations could complete before the
rest were ever created, and prematurely freeing the parent bio.
This fixes the panics installing in VMWare and qemu
updated by a process holding the snapshot lock. Another process updating a
different inode in the same inodeblock will do copy on write checks and lock in
the opposite direction.
The snapshot code force a copy on write of these blocks manually (cf. start of
expunge_ufs[12]) and these inode blocks are later put on snapblklist.
This partial fix is to 'drain' the relevant ffs_copyonwrite() operation after
installing new snapblklist. This is not a 100% solution since a failed block
allocation can cause implicit fsync() which might deadlock before the new
snapblklist has been installed.
file is flushed by a process not holding snaplk (e.g. bufdaemon). Another
process might hold snaplk and try to access the block due to ffs_copyonwrite
processing.
the cg map buffer being held when writing indirect blocks. The process ends up
in ffs_copyonwrite(), attempting to get snaplk while holding the cg map buffer
lock.
Another process might be in ffs_copyonwrite(), trying to allocate a new block
for a copy. It would hold snaplk while trying to get the cg map buffer lock.
Release the cg map buffer early and use the copy for most of the cgaccount
processing to avoid this deadlock.
skipping the call from ffs_snapremove() if the block number is zero.
Simplify snapshot locking in ffs_copyonwrite() and ffs_snapblkfree() by using
the same locking protocol for low block numbers as for larger block numbers.
This removes a lock leak that could happen if vn_lock() succeeded after
lockmgr() failed in ffs_snapblkfree().
Check if snapshot is gone before retrying a lock in ffs_copyonwrite().
reclamation. If the vnode previously was a fifo then v_op would point to
ffs_fifoops[12] instead of the expected ffs_vnodeops[12], causing a panic at
the end of ffsext_strategy.
the UDF specification specifies a logical sectorsize of 2048.
Instead, get it from GEOM.
- When reading the UDF Anchor Volume Descriptor, use the logical
sectorsize of 2048 when calculating the offset to read from, but
use the actual sectorsize to determine how much to read.
- works with reading a DVD disk and a DVD disk image file via mdconfig
- correctly returns EINVAL if we try to mount_udf an audio CD, instead
of panicking inside GEOM when INVARIANTS is set
the modified interface that they use. Changes include:
- Register a different interrupt handler for the new interface. This one is
INTR_MPSAFE, not INTR_FAST, and directly processes completions and AIFs.
- Add an event registration and callback mechanism for the ioctl and CAM
modules can know when a resource shortage clears. This condition was
previously fatal in CAM due to programming oversights.
- Fix locking to play better with newbus.
- Provide access methods for talking to cards with the NEWCOMM interface.
- Fix up the CAM module to be better suited for dealing with newer firmware
on the PERC Si/Di series that requires talking to plain SCSI via aac.
- Add a whole slew of new PCI Id's.
Thanks to Adaptec for providing an initial version of this work and for
answering countless questions about it. There are still some rough edges in
this, but it works well enough to commit and test for now.
Obtained from: Adaptec, Inc.
o Rather than just try to turn off EXCA_INTR_RESET, set the entire register
to 0. This is slightly faster, and a better hammer.
o Move attempted clearing of the output enable (EXCA_PWRCTL_OE) back to
after we turn off the power. Modify it to write 0 so that we don't get
Bad Vcc messages on TI bridges (untested, but ru@ sent me a similar patch)
while at the same time avoiding interrupt storms on Ricoh bridges (tested
by me on my Sony).
# Many of my observations of 'breakage' for this patch are due to some bug
# in the load/unload of cbb.ko unlreated to this change. I'll be investigating
# and fixing that bug in the fullness of time.
an allocation. This fixes the malloc 'use after free' panic on boot that
many were seeing. It doesn't solve the problem of the allocations being
cached and then written past their bounds later. That will take more work.
Submitted by: kan
http://lists.freebsd.org/pipermail/cvs-src/2004-October/033496.html
The same problem applies to if_bridge(4), too.
- Copy-and-paste the if_bridge(4) related block from
if_ethersubr.c to ng_ether.c
- Add XXXs, so that copy-and-paste would be noticed by
any future editors of this code.
- Also add XXXs near if_bridge(4) declarations.
Silence from: thompsa
the 5GHz band.
o Enable 802.11a channels scanning for 2915ABG adapters.
o Fix a typo (negociated->negotiated).
With hints from NetBSD.
MFC after: 2 days
- Rename vxfoo() functions to vx_foo() to improve readability and
consistency with other drivers.
- Prefix most the softc members with 'vx_' (the other members already had
the prefix).
- Switch to using callout_init_mtx() and callout_*() rather than
timeout() and untimeout().
- Add some missing calls to if_free() in some failure cases in vx_attach().
- Use if_printf() and remove the unit number from the softc.
- Remove uses of the 'register' keyword and spls.
- Add locked variants of vx_init() and vx_start().
- Add a mutex to the softc and lock it in various appropriate places.
- Setup the interrupt handler last during attach.
Tested by: imp
MFC after: 1 week
It allows to specify options for NFS root file system.
Currently supported options are: soft, intr, conn, lockd.
I'm adding this functionality mostly for 'lockd' option, which is only
honored when performing the initial mount and will be silently ignored
if used while updating the mount options.
This will allow to use flock(2) without the need of using varmfs or
rpc.lockd and friends.
Example of use:
boot.nfsroot.options="intr,lockd"
MFC after: 2 weeks
if (foo);
bar();
to:
if (foo)
bar();
Really, really nasty bug and a very nice catch of mine.
Unfortunately, I'll not become a hero of the day, because the code is
commented out.
module name to something that wouldn't conflict with
sys/dev/firewire/firewire.c.
Submitted by: Cai, Quanqing <caiquanqing at gmail dot com>
PR: kern/82727
MFC after: 3 days
- Don't keep the SPDIF state in the driver private struct since it
can be overriden by hand with pciconf(8), query it when needed instead.
Regarding the locking I let Ariff explain it himself:
---snip---
About the locking, that is what I'm intended to do since the beginning.
The reason I'm not putting that along since my first patchset was
because several people especially from amd46 camp reported that it cause
lots of LORs, which is weird considering that I've never encounter such
in a pretty much strict locking environment (i386). However, since our
previous discussion with Pyun YongHyeon about strict locking, I've
decided to bring it back for all the affected drivers, not just for
es137x. It turns out that the root of the problem was within dsp.c
during device open, which has been fixed since dsp.c revision 1.84.
---snip---
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
code which may help.
People with a ich compatible soundcard which want to help out should
change the "#if 1" to a "#if 0" and try if the soundcard still works.
Reports about working or not-working soundcards with this change to
multimedia@ please.
PR: 73987
opt_device_polling.h
- Include opt_device_polling.h into appropriate files.
- Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that
can be compiled as loadable modules.
Reviewed by: bde
modules along with kernel.
After this change it is possible to embrace opt_*.h includes with ifdef
HAVE_KERNEL_OPTION_HEADERS. And thus, avoid editing a lot of Makefiles
in modules directory each time we introduce a new opt_xxx.h.
Requested by: bde
o Add support for Tamarack TC5299J + MII found on SMC 8041TX V.2
and corega PCCCCTXD
o Add support for ISA/PCI RTL80[12]9 chips
o Improve support for the ax88790 based
o minor code movement
Submitted by: (#2) David Madole
the arp code will search all local interfaces for a match. This triggers a
kernel log if the bridge has been assigned an address.
arp: ac🇩🇪48:18:83:3d is using my IP address 192.168.0.142!
bridge0: flags=8041<UP,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.142 netmask 0xffffff00
ether ac🇩🇪48:18:83:3d
Silence this warning for 6.0 to stop unnecessary bug reports, the code will need
to be reworked.
Approved by: mlaier (mentor)
MFC after: 3 days
the MAC result, as well as avoid losing the DAC check result when MAC
is enabled.
MFC after: 3 days
Reported by: Patrick LeBlanc <Patrick dot LeBlanc at sparta dot com>
Before this change a copy operation with cp(1) would not update the
file access times.
According to the POSIX mmap(2) documentation: the st_atime field
of the mapped file may be marked for update at any time between the
mmap() call and the corresponding munmap() call. The initial read
or write reference to a mapped region shall cause the file's st_atime
field to be marked for update if it has not already been marked for
update.
framework. This makes Giant protection around MAC operations which inter-
act with VFS conditional, based on the MPSAFE status of the file system.
Affected the following syscalls:
o __mac_get_fd
o __mac_get_file
o __mac_get_link
o __mac_set_fd
o __mac_set_file
o __mac_set_link
-Drop Giant all together in __mac_set_proc because the
mac_cred_mmapped_drop_perms_recurse routine no longer requires it.
-Move conditional Giant aquisitions to after label allocation routines.
-Move the conditional release of Giant to before label de-allocation
routines.
Discussed with: rwatson
instead of an int. No other FreeBSD architecture does this. Patch over
this problem in the lmc driver. While I'm here, correct a mistake with
DEVICE_POLLING.
stale flag bits left over from before the inode was recycled.
Without this change, a leftover IN_SPACECOUNTED flag could prevent
softdep_freefile() and softdep_releasefile() from incrementing
fs_pendinginodes. Because handle_workitem_freefile() unconditionally
decrements fs_pendinginodes, a negative value could be reported at
file system unmount time with a message like:
unmount pending error: blocks 0 files -3
The pending block count in fs_pendingblocks could also be negative
for similar reasons. These errors can cause the data returned by
statfs() to be slightly incorrect. Some other cleanup code in
softdep_releasefile() could also be incorrectly bypassed.
MFC after: 3 days
- Don't bzero the softc first thing in attach.
- Cleanup error handling in attach() to avoid lots of duplication.
- Don't initialize the callout handle twice.
MFC after: 3 days
The DMA controller driver only knows how to do memory to memory copies, and
the AAU driver how to zero a chunk of memory.
Use them to process big (>=1KB) copying/zeroing.
dedicated sysctl handlers. Protect manipulations with
poll_mtx. The affected sysctls are:
- kern.polling.burst_max
- kern.polling.each_burst
- kern.polling.user_frac
- kern.polling.reg_frac
o Use CTLFLAG_RD on MIBs that supposed to be read-only.
o u_int32t -> uint32_t
o Remove unneeded locking from poll_switch().
- Use the new API for pmap_copy_page() and pmap_zero_page().
- Just write-back the pages in pmap_qenter(), and invalidate it in
pmap_qremove().
- Nuke the cache flushing in pmap_enter_quick(), it's not needed anymore.
possible for do_execve() to call exit1() rather than returning. As a
result, the sequence "allocate memory; call kern_execve; free memory"
can end up leaking memory.
This commit documents this astonishing behaviour and adds a call to
exec_free_args() before the exit1() call in do_execve(). Since all
the users of kern_execve() in the tree use exec_free_args() to free
the command-line arguments after kern_execve() returns, this should
be safe, and it fixes the memory leak which can otherwise occur.
Submitted by: Peter Holm
MFC after: 3 days
Security: Local denial of service
whether the interface being accessed is IFF_NEEDSGIANT or not. This
avoids lock order reversals when calling into the interface ioctl
handler, which could potentially lead to deadlock.
The long term solution is to eliminate non-MPSAFE network drivers.
Discussed with: jhb
MFC after: 1 week
interface polling, compiles on 64-bit platforms, and compiles on NetBSD,
OpenBSD, BSD/OS, and Linux. Woo! Thanks to David Boggs for providing this
driver.
Altq, sppp, netgraph, and bpf are required for this driver to operate.
Userland tools and man pages will be committed next.
Submitted by: David Boggs
to the parent interface, such as IFF_PROMISC and
IFF_ALLMULTI. In addition, vlan(4) gains ability
to migrate from one parent to another w/o losing
its own flags.
PR: kern/81978
MFC after: 2 weeks
as it is done for usual promiscuous mode already. This info is important
because promiscuous mode in the hands of a malicious party can jeopardize
the whole network.
calling sysctl_out_proc(). -- fix from jhb
Move the code in fill_kinfo_thread() that gathers data from struct proc
into the new function fill_kinfo_proc_only().
Change all callers of fill_kinfo_thread() to call both
fill_kinfo_proc_only() and fill_kinfo() thread. When gathering
data from a multi-threaded process, fill_kinfo_proc_only() only needs
to be called once.
Grab sched_lock before accessing the process thread list or calling
fill_kinfo_thread().
PR: kern/84684
MFC after: 3 days
- Make it so one can't call db_setup_paging() if it has already been called
before. traceall needs this, or else the db_setup_paging() call from
db_trace_thread() will reset the printed line number, and override its
argument.
This is not perfect for traceall, because even if one presses 'q' while in
the middle of printing a backtrace it will finish printing the backtrace
before exiting, as db_trace_thread() won't be notified it should stop, but
it is hard to do better without reworking the pager interface a lot more.
sampling rate between playback and recording. This can be
disabled / enabled via kernel hints
(hint.pcm.<unit>.fixed_rate=0/4000-48000) or sysctl
hw.snd.pcm<unit>.fixed_rate=0/4000-48000). Default to 48khz
fixed rate. [1]
* Basic cleanup. *_es1371x_* -> *_es137x_*.
* Some locking fixes. [2]
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
Discussed with: yongari [2]
See also: http://lists.freebsd.org/pipermail/freebsd-multimedia/2005-September/002758.html [1]
Reported by: Jos Backus <jos at catnook.com> [1]
* General spl* cleanup. It doesn't serve any purpose anymore.
* Nuke sndstat_busy(). Addition of sndstat_acquire() /
sndstat_release() for sndstat exclusive access. [1]
sys/dev/sound/pcm/sound.c:
* Remove duplicate SLIST_INIT()
* Use sndstat_acquire() / release() to lock / release the entire
sndstat during pcm_unregister(). This should fix LOR #159 [1]
sys/dev/sound/pcm/sound.h:
* Definition of SD_F_SOFTVOL (part of feeder volume)
* Nuke sndstat_busy(). Addition of sndstat_acquire() /
sndstat_release() for exclusive sndstat access. [1]
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
LOR: 159 [1]
Discussed with: yongari [1]
* Added codec id for CMI9761.
* feeder_volume *whitelist* through ac97_fix_volume()
sys/dev/sound/pcm/ac97.h:
* Added AC97_F_SOFTVOL definition.
sys/dev/sound/pcm/channel.c:
* Slight changes for chn_setvolume() to conform with OSS.
* FEEDER_VOLUME is now part of feeder building process.
sys/dev/sound/pcm/mixer.c:
* General spl* cleanup. It doesn't serve any purpose anymore.
* Main hook for feeder_volume.
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
Tested by: multimedia@
threads. This is quite useful if generating a debug log for post-mortem
by another developer, in which case the person at the console may not
know which threads are of interest. The output of this can be quite
long.
Discussed with: kris
MFC after: 3 days
This is a special case because tcp_twstart() destroys a tcp control
block via tcp_discardcb() so we cannot call tcp_drop(struct *tcpcb) on
such connections. Use tcp_twclose() instead.
MFC after: 5 days
- WEP TX fix:
The original code called software crypto, ieee80211_crypto_encap(),
which never worked since IEEE80211_KEY_SWCRYPT was never flagged due to
ieee80211_crypto_newkey() assumes that wi always supports hardware based
crypto regardless of operational mode(by virtue of IEEE80211_C_WEP).
This fix works around that issue by adding wi_key_alloc() to force
the use of s/w crypto. Also if anyone ever decides to cleanup ioctl
handling where key changes wouldn't cause a call to wi_init() every time,
we'll need wi_key_alloc() to DTRT.
In addition to that, this fix also adds code to wi_write_wep() to force
existing keys to be switched between h/w and s/w crypto such that an
operation mode change(sta <-> hostap) will flag IEEE80211_KEY_SWCRYPT
properly.
- WEP RX fix:
Clear IEEE80211_F_DROPUNENC even in hostap mode. Quote from Sam:
"This is really gross but I don't see an easy way around it.
By doing it we lose the ability to independently drop unencode
frames (and support mixed wep/!wep use). We should really be
setting the EXCLUDE_UNENCRYPTED flag written in wi_write_wep
based on IEEE80211_F_DROPUNENC but with our clearing it we can't
depend on it being set properly."
Reported by: Holm Tiffe <holm at freibergnet dot de>
Submitted by: sam
MFC after: 3 days
o Axe poll in trap.
o Axe IFF_POLLING flag from if_flags.
o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.
o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.
o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.
Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.
Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.
Reviewed by: ru, sam, jhb
to avoid touching pageable memory while holding a mutex.
Simplify argument list replacement logic.
PR: kern/84935
Submitted by: "Antoine Pelisse" apelisse AT gmail.com (in a different form)
MFC after: 3 days
sys/fs/nwfs/nwfs_vfsop= s.c, introduced with the conversion to
nmount with revision 1.38. This causes mount_nwfs to fail with
the error message:
mount_nwfs: mount error: /mnt/netware: syserr = No such file or directo=
ry
This is caused by a typo on line 178, which specifies "nwfw_args"
rather than "nwfs_args".
Submitted by: Antony Mawer <gnats@mawer.org>
Fat fingers: phk
PR: 86757
MFC: 3 days
up. This make iostat report operations passed down to the device driver
instead of operations passed down to GEOM disk. The transfer size limit
imposed by the device driver is no longer hidden, improving the correlation
between iostat output and device driver workload.
> Cause all flags passed by boot2 to set the respective loader(8)
> boot_* variable. The end effect is that all flags from boot2
> are now passed to the kernel.
Add a new private thread flag to indicate that the thread should
not sleep if runningbufspace is too large.
Set this flag on the bufdaemon and syncer threads so that they skip
the waitrunningbufspace() call in bufwrite() rather than than
checking the proc pointer vs. the known proc pointers for these two
threads. A way of preventing these threads from being starved for
I/O but still placing limits on their outstanding I/O would be
desirable.
Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from
blocking on the runningbufspace check while holding snaplk. This
prevents snaplk from being held for an arbitrarily long period of
time if runningbufspace is high and greatly reduces the contention
for snaplk. The disadvantage is that ffs_copyonwrite() can start
a large amount of I/O if there are a large number of snapshots,
which could cause a deadlock in other parts of the code.
Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace
before attempting to grab snaplk so that I/O requests waiting on
snaplk are not counted in runningbufspace as being in-progress.
Increment runningbufspace again before actually launching the
original I/O request.
Prior to the above two changes, the system could deadlock if enough
I/O requests were blocked by snaplk to prevent runningbufspace from
falling below lorunningspace and one of the bawrite() calls in
ffs_copyonwrite() blocked in waitrunningbufspace() while holding
snaplk.
See <http://www.holm.cc/stress/log/cons143.html>
the directory's inode after queuing the dirrem that will decrement
the parent directory's link count. This will force the update of
the parent directory's actual link to actually be scheduled. Without
this change the parent directory's actual link count would not be
updated until ufs_inactive() cleared the inode of the newly removed
directory, which might be deferred indefinitely. ufs_inactive()
will not be called as long as any process holds a reference to the
removed directory, and ufs_inactive() will not clear the inode if
the link count is non-zero, which could be the result of an earlier
system crash.
If a background fsck is run before the update of the parent directory's
actual link count has been performed, or at least scheduled by
putting the dirrem on the leaf directory's inodedep id_bufwait list,
fsck will corrupt the file system by decrementing the parent
directory's effective link count, which was previously correct
because it already took the removal of the leaf directory into
account, and setting the actual link count to the same value as the
effective link count after the dangling, removed, leaf directory
has been removed. This happens because fsck acts based on the
actual link count, which will be too high when fsck creates the
file system snapshot that it references.
This change has the fortunate side effect of more quickly cleaning
up the large number dirrem structures that linger for an extended
time after the removal of a large directory tree. It also fixes a
potential problem with the shutdown of the syncer thread timing out
if the system is rebooted immediately after removing a large directory
tree.
Submitted by: tegge
MFC after: 3 days
interrupt handler from Alpha. Instead, expand the scheduler pinning
in the interrupt handling code so that curthread is pinned while executing
fast interrupt handlers.
MFC after: 1 week
the softc.
- Use callout_init_mtx() and rather than timeout/untimeout in both rl(4)
and re(4).
- Fix locking for ifmedia by locking the driver in the ifmedia handlers
rather than in the miibus functions. (re(4) didn't lock the mii stuff
at all!)
- Fix some locking in re_ioctl().
Note: the two drivers share the same softc declared in if_rlreg.h, so they
had to be change simultaneously.
MFC after: 1 week
Tested by: several on rl(4), none on re(4)
routing, etc. in a static pci_assign_interrupt() function.
- Add a sledgehammer that allows the user to override the interrupt
assignment of any PCI device via a tunable (e.g. "hw.pci0.7.INTB=5" would
force any functions on the pci device in slot 7 of bus 0 that use B# to
use IRQ 5). This should be used with great caution! Generally, if the
interrupt routing in use provides specific tunables (such as hard-wiring
the IRQ for a given $PIR or ACPI PCI link device), then those should be
used instead. One instance where this tunable might be useful is if a
box has an MPTable with duplicate entries for the same PCI device with
different IRQs.
MFC after: 1 week
the Intel 82371AB PCI-ISA bridge. We now do this all the time for the
!APIC case in the atpic driver. This cuts the raw line count for this
driver by about 40%.
MFC after: 1 week
There seems to be very little documentary evidence outside this
implementation to suggest a these checks are neccessary, and more
than one camera-formatted flash disk fails the check, but mounts
successfully on most other systems.
Reviewed By: bde@
make function reenterable. In the runtime the race is masked by serializing
of em_process_receive_interrupts() either by interrupt thread, or by
polling. The race can be triggered when polling is switched on or off.
bio may have been freed and reassigned by the wakeup before being
tested after releasing the bdonelock.
There's a non-zero chance this is the cause of a few of the crashes
knocking around with biodone() sitting in the stack backtrace.
Reviewed By: phk@
I'm able to suspend/resume my laptop without this change, but then I need
to wait for the watchdog to reset the card.
With this change, it is ready immediately.
Glanced at by: glebius
subdrivers to hook up.
It should probably be rewritten to implement a simple bus to which
the sub drivers attach using some kind of hint.
Until then, provide a couple of crutch functions with big warning
signs so it can survive the recent changes to struct resource.
can be enabled by enabling COUNT_IPIS in smptests.h. When enabled, each
CPU provides an interrupt counter for nearly all of the IPIs it receives
(IPI_STOP currently doesn't have a counter) that can be examined using
vmstat -i, etc.
MFC after: 3 days
Requested by: rwatson
on big-endian archs like sparc64, e.g.:
uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 43 pnpid @HEd041 on isa0
is now correctly printed as:
uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 43 pnpid PNP0501 on isa0
There are probably other endianness issues lurking in the PnP code which
however aren't exhibited on sparc64 as the PnP devices there are sort of
PnP BIOS devices rather than ISA PnP devices.
Tested on: i386, sparc64
MFC after: 1 week
and do some preparations for handling 12x22 fonts (currently lots of code
implies and/or hardcodes a font width of 8 pixels). This will be required
on sparc64 which uses a default font size of 12x22 in order to add font
loading and saving support as well as to use a syscons(4)-supplied mouse
pointer image.
This API breakage is committed now so it can be MFC'ed in time for 6.0
and later on upcoming framebuffer drivers destined for use on sparc64
and which are expected to rely on using font loading internally and on
a syscons(4)-supplied mouse pointer image can be easily MFC'ed to
RELENG_6 rather than requiring a backport.
Tested on: i386, sparc64, make universe
MFC after: 1 week
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:
Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.
This aligns this code more with the eventual locking of ttys.
Suggested by: bde
of whether or not Giant was picked up by the filesystem. Add VFS_LOCK_GIANT
macros around vrele as it's possible that this can call in the VOP_INACTIVE
filesystem specific code. Also while we are here, remove the Giant assertion.
from the sysctl handler, we do not actually require Giant here so we
shouldn't assert it. Doing so will just complicate things when Giant is removed
from the sysctl framework.
sleep lock status while kdb_active, or we risk contending with the
mutex on another CPU, resulting in a panic when using "show
lockedvnods" while in DDB.
MFC after: 3 days
Reviewed by: jhb
Reported by: kris
control register and AGP bridge seems to be inconsistent with some BIOS.
Instead of relying on BIOS settings, we just take the initial aperture size
and encode them for both miscellaneous control register and AGP bridge.
Some idea was borrowed from agp_nvidia.c.
- Add preliminary ULi M1689 chipset support. The idea was taken from Linux
because hardware and documentation are unavailable. Not tested.
- Add more VIA chipset PCI IDs taken from Linux driver.
Approved by: anholt (mentor)
Tested by: Adam Gregoire <ebola at psychoholics dot org>
Ganael Laplanche <ganael.laplanche at martymac dot com>
K Wieland <kwieland at wustl dot edu>
kernel modules. We actually need to include any addends and the symbol
offset value, but for gcc/binutils didn't set it anywhere I've found on
'cc -fpic -shared' kernel modules.
available and can give the wrong impression when there are memory holes.
Report the total amount of usable memory that we detected instead of the
highest address.
replacement and has additional features which make it superior.
Discussed on: -arch
Reviewed by: thompsa
X-MFC-after: never (RELENG_6 as transition period)
variable and returns the previous value of the variable.
Tested on: i386, alpha, sparc64, arm (cognet)
Reviewed by: arch@
Submitted by: cognet (arm)
MFC after: 1 week
----------------------------
revision 1.27
date: 2005/09/19 03:10:16; author: imp; state: Exp; lines: +3 -2
Make sure that we call if_free(ifp) after bus_teardown_intr. Since we
could get an interrupt after we free the ifp, and the interrupt
handler depended on the ifp being still alive, this could, in theory,
cause a crash. Eliminate this possibility by moving the if_free to
after the bus_teardown_intr() call.
In fact, this change do nothing for this driver. It is protected from
this by cp_destroy variable. This variable also protects driver from initiation
of any activity from network stack with disabled intr handler with this change
applied.
a fifo. While this did indeed close the race, confirming suspicions
about the nature of the problem, it causes difficulties with blocking
I/O on fifos.
Discussed with: ups
Also spotted by: Peter Holm <peter at holm dot cc>
----------------------------
revision 1.26
date: 2005/09/07 09:53:35; author: obrien; state: Exp; lines: +1452 -1453
Reorder code to not depend on an ISO-C illegal forward extern declaration.
----------------------------
Reason: do not move large functions location without serious reason. The same
could be done by forward function declaration. Please do not enlarge diff
without a reason any more.
Backout if_cp 1.27
----------------------------
revision 1.27
date: 2005/09/19 03:10:16; author: imp; state: Exp; lines: +3 -2
Make sure that we call if_free(ifp) after bus_teardown_intr. Since we
could get an interrupt after we free the ifp, and the interrupt
handler depended on the ifp being still alive, this could, in theory,
cause a crash. Eliminate this possibility by moving the if_free to
after the bus_teardown_intr() call.
Reason: bad previous commit. Would be restored by next commit.
flag on IP packets. Currently this option is only repected on udp
and raw ip sockets. On tcp sockets the DF flag is controlled by the
path MTU discovery option.
Sending a packet larger than the MTU size of the egress interface
returns an EMSGSIZE error.
Discussed with: rwatson
Sponsored by: TCP/IP Optimization Fundraise 2005
- Remove sis_unit and use device_printf() and if_printf() instead.
- Use callout_init_mtx() for the callout.
- Remove spls.
- Fix locking for ifmedia to happen in the ifmedia handlers rather than in
sis_ioctl().
- Log an error message if we fail to allocate any resources. Perform
cleanup if we fail to allocate any resources so that we don't leave
a mutex hanging around.
Tested by: Jason Tsai jason dot tsai at newcyberian dot com (1-4)
MFC after: 3 days
number of cards have been discovered to be matching on the strings of
the cis rather than manufacturer/product id for cards we already had a
prod id for. This is a result of getting the list from the NetBSD
driver which also includes the OID for the cards where such a
distinction mattered (since it was tested against the MAC address we
got from the card). Since we do not try to match OIDs, we do not need
the extra entries and they just waste space.
I'm guessing that some of the dlink entires (DE-660, DE-660+) and many
of the corega cards may fall into this boat and can safely be removed.
the per-cpu data for all CPUs. This is easier to ask users to do than
"figure out how many CPUs you have, now run show pcpu, then run it
once for each CPU you have".
MFC after: 3 days
the vast majority of cases, these functions are called without mutexes
held, meaning that in all but two cases, there will be no ordering
issues with doing this, and it will eliminate the need for changes in
the caller. In two cases, mutexes are held, so Giant must be acquired
before those mutexes such that uprintf() and tprintf() recurse Giant
rather than generating a lock order reversal.
Suggested by: bde
that socket during open, not the write socket receive buffer. This
might explain clearing of the sb_state SB_LOCK flag seen occasionally
in soreceive() on fifos.
MFC after: 3 days
Spotted by: ups
driver-induced errors, instead be better about propagating error status
upwards. Add more error definitions, courtesy of the linux driver. Fix
a command leak in the ioctl handler. Re-arrange some of the command handlers
to localize error handling.
MFC After: 3 days
remove the unconditional acquisition of Giant for extended attribute related
operations. If the file system is set as being MP safe and debug.mpsafevfs is
1, do not pickup Giant.
Mark the following system calls as being MP safe so we no longer pickup Giant
in the system call handler:
o extattrctl
o extattr_set_file
o extattr_get_file
o extattr_delete_file
o extattr_set_fd
o extattr_get_fd
o extattr_delete_fd
o extattr_set_link
o extattr_get_link
o extattr_delete_link
o extattr_list_file
o extattr_list_link
o extattr_list_fd
-Pass MPSAFE flags to namei(9) lookup and introduce vfslocked variable which
will keep track of any Giant acquisitions.
-Wrap any fd operations which manipulate vnodes in VFS_{UN}LOCK_GIANT
-Drop VFS_ASSERT_GIANT into function which operate on vnodes to ensure that
we are sufficiently protected.
I've tested these changes with various TrustedBSD MAC policies which use
extended attribute a lot on SMP and UP systems (thanks to Scott Long for
making some SMP hardware available to me for testing).
Discussed with: jeff
Requested by: jhb, rwatson
of bus tag+handle.
Instead of
bus_space_write_1(sc->tag, sc->handle, ...)
this macros offer
bus_write_1(sc->resource, ...)
The name+argument transformation is constant and the the macros are
generated (by hand) by the script in tools/bus_macro.sh.
The external part is still called 'struct resource' but the contents
is now visible to drivers etc. This makes it part of the device
driver ABI so it not be changed lightly. A comment to this effect
is in place.
The internal part is called 'struct resource_i' and contain its external
counterpart as one field.
Move the bus_space tag+handle into the external struct resource, this
removes the need for device drivers to even know about these fields
in order to use bus_space to access hardware. (More in following commit).
and bus_free_resources(). These functions take a list of resources
and handle them all in one go. A flag makes it possible to mark
a resource as optional.
A typical device driver can save 10-30 lines of code by using these.
Usage examples will follow RSN.
MFC: A good idea, eventually.
attribute memory at 0xff0 to find its MAC address. This is another
instance of the IBM ethercard II from all apperances (short of popping
the lid). Update the entry to document which cards we support
actually need this functionality.
slots to probe. Problems have been reported in this area, lets hope this
bandaid helps.
!! Owners of EISA-equipped Alpha machines are requested to at least
!! boot-test a 6-BETA build and report back to the Alpha list. Thanks!
Approved by: re (scottl)
Suggested by: ticso
---snip---
FYI this bit isn't needed for FreeBSD - I think it came from either
OpenBSD or NetBSD where arc4random() wasn't available during cold
boot.
---snip---
Explained by: iedowse
when we mount and get zero cost if no rules are used in a mountpoint.
Add code to deref rules on unmount.
Switch from SLIST to TAILQ.
Drop SYSINIT, use SX_SYSINIT and static initializer of TAILQ instead.
Drop goto, a break will do.
Reduce double pointers to single pointers.
Combine reaping and destroying rulesets.
Avoid memory leaks in a some error cases.
in rev. 1.40 of ufs_inode.c, which allows an inode being truncated
even when the filesystem itself is marked RDONLY. A subsequent
call of UFS_TRUNCATE (ffs_truncate) would panic the system as it
asserts that it can only be called when the filesystem is mounted
read-write (same changeset, rev. 1.74 of sys/ufs/ffs/ffs_inode.c).
Because ffs_mount() already takes care of sync'ing the filesystem
to disk before being downgraded to readonly, it appears to be more
desirable that we should not permit this sort of writes to disk.
This change would fix a panic that occours when read-only mounted
a corrupted filesystem and doing some file operations.
MT6/5/4 candidate
Reviewed by: mckusick
It confuses the lock manager since in some places thread0 is
then used for vnode locking while curthread is used for vnode unlocking.
Found by: Yahoo!
Reviewed by: ps@,jhb@
MFC after: 3 days
underlying the POSIX fifo implementation. In 6.x/7.x, fifo access is
moved from the VFS layer, where it was serialized using the vnode
lock, to the file descriptor layer, where access is protected by a
reference count but not serialized. This exposed socket buffer
locking to high levels of parallelism in specific fifo workloads, such
as make -j 32, which expose as yet unresolved socket buffer bugs.
fi_sx re-adds serialization about the read and write routines,
although not paths that simply test socket buffer mbuf queue state,
such as the poll and kqueue methods. This restores the extra locking
cost previously present in some cases, but is an effective workaround
for the instability that has been experienced. This workaround should
be removed once the bug in socket buffer handling has been fixed.
Reported by: kris, jhb, Julien Gabel <jpeg at thilelli dot net>,
Peter Holm <peter at holm dot cc>, others
MFC after: 3 days
chips where setting the FAILDIS bit is not effective. While here,
try again to make it clear that reported parity errors indicate
a failure of some PCI device *other than* the aic7xxx controller.
timer reset rather than the timer of an SCB still pending on the
controller after recovery completed. This should correct timeout
loops seen in the field.
copied mbuf, which keeps the IP header 32-bit aligned. This copied mbuf is
reinjected back into ether_input and off to the IP routines.
Reported and tested by: Peter van Dijk
Approved by: mlaier (mentor)
MFC after: 3 days
a thread holding critical resource, e.g mutex or other implicit
synchronous flags. Give thread which exceeds nice threshold a minimum
time slice.
PR: kern/86087
has been removed. It has been replaced by hw.pci.do_power_nodriver
and hw.pci.do_power_resume. The former defaults to 0 while the latter
defaults to 1.
When do_powerstate was set to 0, it broke suspend/resume for a lot of
people as an unintended consequence. This change will only affect the
areas that were intended to affect. This change will have no effect on
servers, but will help laptops quite a bit.
MFC After: 3 days.
NULL. The NFS client expects that a thread will always be present for a
VOP so that it can check for signal conditions, and will dereference a
NULL pointer if one isn't present.
MFC after: 3 days
I'm not sure this is the right thing to do, but at least I don't panic
anymore when swapping on a NFS file without using md(4).
X-MFC after: proper review
- Rearrange code so that in a case of failure the affected
route is not changed. Otherwise, a bogus rtentry will be
left and later rt_check() can recurse on its lock. [1]
- Remove comment about protocol cloning.
- Fix two places where rtentry mutex was recursed on, because
accessed via two different pointers, that were actually pointing
to the same rtentry in some cases. [1]
- Return EADDRINUSE instead of bogus EDQUOT, in case when gateway
uses the same route. [2]
Reported & tested by: ps, Andrej Zverev <az inec.ru> [1]
PR: kern/64090 [2]
The FXP_SCR_FLOWCONTROL registers is at offset 0x19, but 2 bytes wide.
It cannot be read as a word without causing a panic on architectures
that enforce strict alignment.
MFC after: 3 days
nor uprintf() is believed to perform tsleep() or msleep() as written,
as ttycheckoutq() is called with '0' as its sleep argument.
Remove recently added WITNESS warnings for sleep as the comment was
incorrect. This should silence a warning from the nfs_timer() code.
Discussed with: bde
Give DEVFS a proper inode called struct cdev_priv. It is important
to keep in mind that this "inode" is shared between all DEVFS
mountpoints, therefore it is protected by the global device mutex.
Link the cdev_priv's into a list, protected by the global device
mutex. Keep track of each cdev_priv's state with a flag bit and
of references from mountpoints with a dedicated usecount.
Reap the benefits of much improved kernel memory allocator and the
generally better defined device driver APIs to get rid of the tables
of pointers + serial numbers, their overflow tables, the atomics
to muck about in them and all the trouble that resulted in.
This makes RAM the only limit on how many devices we can have.
The cdev_priv is actually a super struct containing the normal cdev
as the "public" part, and therefore allocation and freeing has moved
to devfs_devs.c from kern_conf.c.
The overall responsibility is (to be) split such that kern/kern_conf.c
is the stuff that deals with drivers and struct cdev and fs/devfs
handles filesystems and struct cdev_priv and their private liason
exposed only in devfs_int.h.
Move the inode number from cdev to cdev_priv and allocate inode
numbers properly with unr. Local dirents in the mountpoints
(directories, symlinks) allocate inodes from the same pool to
guarantee against overlaps.
Various other fields are going to migrate from cdev to cdev_priv
in the future in order to hide them. A few fields may migrate
from devfs_dirent to cdev_priv as well.
Protect the DEVFS mountpoint with an sx lock instead of lockmgr,
this lock also protects the directory tree of the mountpoint.
Give each mountpoint a unique integer index, allocated with unr.
Use it into an array of devfs_dirent pointers in each cdev_priv.
Initially the array points to a single element also inside cdev_priv,
but as more devfs instances are mounted, the array is extended with
malloc(9) as necessary when the filesystem populates its directory
tree.
Retire the cdev alias lists, the cdev_priv now know about all the
relevant devfs_dirents (and their vnodes) and devfs_revoke() will
pick them up from there. We still spelunk into other mountpoints
and fondle their data without 100% good locking. It may make better
sense to vector the revoke event into the tty code and there do a
destroy_dev/make_dev on the tty's devices, but that's for further
study.
Lots of shuffling of stuff and churn of bits for no good reason[2].
XXX: There is still nothing preventing the dev_clone EVENTHANDLER
from being invoked at the same time in two devfs mountpoints. It
is not obvious what the best course of action is here.
XXX: comment out an if statement that lost its body, until I can
find out what should go there so it doesn't do damage in the meantime.
XXX: Leave in a few extra malloc types and KASSERTS to help track
down any remaining issues.
Much testing provided by: Kris
Much confusion caused by (races in): md(4)
[1] You are not supposed to understand anything past this point.
[2] This line should simplify life for the peanut gallery.
in an IBSS. Store ids directly into ieee80211_node's instead of managing
our own private association table. Idea and code by Sam Leffler.
Submitted by: sam
MFC after: 5 days
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).
Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions. In most cases, this means adding an
acquisition of Giant immediately around the function. In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.
With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.
NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.
NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset. This needs to be fixed, but was a problem before
this change.
NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.
MFC after: 1 week
provided access to the root file system before the start of the
init process. This was used briefly by SEBSD before it knew about
preloading data in the loader, and using that method to gain
access to data earlier results in fewer inconsistencies in the
approach. Policy modules still have access to the root file system
creation event through the mac_create_mount() entry point.
Removed now, and will be removed from RELENG_6, in order to gain
third party policy dependencies on the entry point for the lifetime
of the 6.x branch.
MFC after: 3 days
Submitted by: Chris Vance <Christopher dot Vance at SPARTA dot com>
Sponsored by: SPARTA
Remove md_mtx.
Remove GIANT from the mdctl device driver and avoid DROP_GIANT,
PICKUP_GIANT and geom events since we can call into GEOM directly
now.
Pick up Giant around vn_close().
Apply an exclusive sx around mdctls ioctl and preloading to protect
lists etc..
Don't initialize our lock (md_mtx or md_sx) from a
SYSINIT when there is a perfectly good pair of _fini/_init
functions to do it from.
Prune any final fractional sector from the mediasize to
keep GEOM happy.
Cleanups:
Unify MDIOVERSION check in (x)mdctlioctl()
Add pointer to start() routine to softc to eliminate a switch{}
Inline guts of mddetach().
Always pass error pointer to mdnew(), simplify implementation.
could get an interrupt after we free the ifp, and the interrupt
handler depended on the ifp being still alive, this could, in theory,
cause a crash. Eliminate this possibility by moving the if_free to
after the bus_teardown_intr() call.
so that UUIDs can be generated from within the kernel. The uuidgen(2)
syscall now allocates kernel memory, calls the generator, and does a
copyout() for the whole UUID store. This change is in support of GPT.
and other applications to query the state of the stack regarding the
accept queue on a listen socket:
SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog)
SO_LISTENQLEN Return the value of so_qlen (complete sockets)
SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets)
Minor white space tweaks to existing socket options to make them
consistent.
Discussed with: andre
MFC after: 1 week
o eliminate the ED_NO_MIIBUS option. Now, you need miibus to use ed with
pccard. If you have an old ISA or PCI card w/o a miibus, then you'll still
be able to use the ed driver w/o miibus in the kernel. If you have pccard
you'll need mii now. Most pccards these days have miibus, and many
cards have ISSUES if you don't attach miibus. issues I don't want to
constantly rediagnose.
- Add new media_ioctl, mediachg and tick function pointers. The core
driver will call these if they aren't NULL, or return an error if they
are.
- migrate remaining mii code into if_ed_pccard.
o include some notes from my datasheet fishing. this may allow us to
get media status from some pccards.
o Fix one bug that's common to many drivers. call if_free(ifp) after
we tear down the interrupt. ed_intr() depends on ifp being there and
freeing it while interrupts can still happen is, ummm, bad.
panics, which occur when stale ifnet pointers are left in struct
moptions hung off of inpcbs:
- Add in_ifdetach(), which matches in6_ifdetach(), and allows the
protocol to perform early tear-down on the interface early in
if_detach().
- Annotate that if_detach() needs careful consideration.
- Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR --
this is not the place to detect interface removal! This also
removes what is basically a nasty (and now unnecessary) hack.
- Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP
IPv4 sockets.
It is now possible to run the msocket_ifnet_remove regression test
using HEAD without panicking.
MFC after: 3 days
the switch statement in order to make this driver more like other
Ethernet NIC drivers.
- In gem_attach() call gem_stop() in addition to gem_reset() to make
sure the chip actually is stopped and not just reset.
- In gem_stop() also stop the gem_rint_timeout() callout in case the
driver is compiled with GEM_RINT_TIMEOUT defined.
Merge some locking improvements from hme(4):
- Use callout_init_mtx() to close races between gem_stop() and gem_tick()
as weel as gem_stop() and gem_rint() in case the driver is compiled
with GEM_RINT_TIMEOUT defined.
- Use the driver lock instead of Giant in a bus dma callback.
- Lock the driver lock around mii operations.
- Cleanup locking in gem_ioctl().
- Remove redundant assertions that the driver lock is not held in
gem_attach() and gem_detach() since mtx_lock() will assert that
already since the driver lock is not recursive.
- Add callout_drain()'s to gem_detach() after calling gem_stop() to make
sure that if softclock is running on another CPU and is blocked on our
driver lock, we will wait until it has acquired the lock, seen that it
was cancelled, dropped the lock, and awakened us so that we can safely
destroy the mutex.
Synchronise with NetBSD upto rev 1.19:
- Allow 32 chars in the saved vendor string.
- Some NetBSD-only changes.
- Some missing parts (define, variable).
ehci_pci.c:
Add vendor ids for ATI and Philips.
Add identification strings for the following:
o ALi's M5239
o AMD 8111
o ATI SB200, SB400
o Intel 6300ESB, ICH4, ICH5, ICH7
o NVIDIA nForce 2, nForce 3, nForce 4
o Philips ISP156x
ehcireg.h:
We're at the same level as rev 1.18 from NetBSD.
usb_port.h:
NetBSD/OpenBSD specific things
Obtained from: NetBSD via DragonFly
No comment from: usb@
quite a bit of reading to figure it out, and I want to avoid figuring
it out again.
Convert an if (foo) else printf("this is almost a panic") into a
KASSERT.
MFC after: 3 days
This kernel config briefly describes some of the major MAC policies
available on FreeBSD. The hope is that this will raise the awareness
about MAC and get more people interested.
Discussed with: scottl
unconditional acquisition of Giant for ACL related operations. If the file
system is set as being MP safe and debug.mpsafevfs is 1, do not pickup
giant.
For any operations which require namei(9) lookups:
__acl_get_file
__acl_get_link
__acl_set_file
__acl_set_link
__acl_delete_file
__acl_delete_link
__acl_aclcheck_file
__acl_aclcheck_link
-Set the MPSAFE flag in NDINIT
-Initialize vfslocked variable using the NDHASGIANT macro
For functions which operate on fds, make sure the operations are locked:
__acl_get_fd
__acl_set_fd
__acl_delete_fd
__acl_aclcheck_fd
-Initialize vfslocked using VFS_LOCK_GIANT before we manipulate the vnode
Discussed with: jeff
o Allow association with APs that do not broadcast SSID (with hints from
Nick Hudson and Hajimu Umemoto).
o IFQ_DRV_PREPEND mbuf when h/w ring is full so it can be sent later.
o Increment if_oerrors when appropriate.
o Did some cleanup while I'm here.
MFC after: 1 day
requests. The following features have been added:
1. Extensive checking and validation of both the primary and
secondary headers to protect against corrupted data and to
take advantage of the redundancy to allow the GPT to be
used in the face of recoverable corruption.
2. Dynamic data-structures to avoid hardcoding gratuitous
table limits so as to support the creation of GPT tables
of (as of yet) unspecified size.
3. Only allow kernel dumps to swap partitions to provide the
necessary anti-footshooting measures. Linux swap partitions
are allowed.
4. Complete dump of the GPT configuration, including labels.
5. Supports Byte Order Mark (U+FEFF) handling for big-endian,
little-endian and mixed-endian partition names.
the Linux driver, since specs are unavailable. Many thanks to Adam Kirchhoff
for multiple useful testing cycles, and Ralf Wostrack for the final fix to get
it working.
PR: i386/75251
Submitted by: anholt
9200 according to one responder. The primary issue was not setting some bits
to say that the entries were active, but also fix one place where some memory
wasn't being used as volatile as it should. While here, change some use of ffs
to a relatively short case statement, to make it more obvious what's going on.
PR: kern/71638, kern/72372, kern/71547?
Submitted by: Andrew J. Caines <A.J.Caines@halplant.com>,
Robin Schoonover <end@endif.cjb.net>,
Jason Henson <jason@ec.rr.com>
- Use __func__ consistently instead of copying function name
to message strings. Code tends to migrate around source files.
- DIAGNOSTIC is for information, INVARIANTS is for panics.
m_tag_locate(). This adds little overhead of a simple
bitwise operation in case hardware VLAN acceleration
is on, yet saves the more expensive function call if
the acceleration is off.
Reviewed by: ru, glebius
X-MFC-after: 6.0
with some Dell servers that booted w/o a problem[*] on 5.4, but failed
with 6.0-BETA.
On the PCI bus, when we do lazy resource allocation, we narrow the
range requested as we pass through bridges to reflect how the bridges
are programmed and what addresses they pass. However, when we're
doing an allocation on a bus that's directly connected to a host
bridge, no such translation can take place. We already had a fallback
range for memory requests, but none for ioports. As such, provide a
fallback for I/O ports so we don't allocate location 0, which will
have undesired side effects when the resources are actually used.
This fixes a problem with booting a Dell server with usb in the
kernel. However, it is an unsatisfying solution. I don't like the
hard coded value, and I think we should start narrowing the resources
returned to not be in the so-called isa alias area (where the ranage &
0x0300 must be 0 iirc). Doing such filtering will have to wait for
another day.
This may be a good 6 candidate, maybe after its had a chance to be
refined.
Tested by: glebius@
tunable (until we get REPORT LUNS in place).
If we're probing luns, and each probe succeeds, we keep going past
lun 7 if we're a SCSI3 or better device (until we fail to probe).
If we're probing luns, and a probe fails, we only keep going if
we're quirked *for* it (CAM_QUIRK_HILUNS), and if we're not quirked
*against* it (CAM_QUIRK_NOHILUNS), or we're a SCSI3 or better device
and the tunable (kern.cam.cam_srch_hi) is set non-zero.
Reviewed by: nate@rootlabs.org, gibbs@scsiguy.com, ken@kdm.com, scottl@samsco.org
MFC after: 1 week
constraint is actually only allowed for register operands. Instead, use
separate input and output memory constraints.
Education from: alc
Reviewed by: alc
Tested on: i386, alpha
MFC after: 1 week
any other non-sleepable lock. In plain English: Giant comes before all
other mutexes.
- Add some extra description to the lock order reversal printf's to indicate
when a reversal is triggered by a hard-coded implicit rule.
Requested by: truckman (2)
MFC after: 1 week
possible method to prevent panicing in interrupt handler
after re_shutdown(), sometimes seen on SMP systems.
This would work here only because re_detach() clears
IFF_UP (to prevent another race) and it was demonstrated
that it's not enough to call vr_detach() in vr_shutdown()
to prevent a panic.
state where sleeping on a sleep queue is not allowed. The facility
doesn't support recursion but uses a simple private per-thread flag
(TDP_NOSLEEPING). The sleepq_add() function will panic if the flag is
set and INVARIANTS is enabled.
- Use this new facility to replace the g_xup and g_xdown mutexes that were
(ab)used to achieve similar behavior.
- Disallow sleeping in interrupt threads when invoking interrupt handlers.
MFC after: 1 week
Reviewed by: phk
- Fixed if_free() logic screw-up that can either result
in freeing a NULL pointer or leaking "struct ifnet".
- Move if_free() after re_stop(); the latter accesses
"struct ifnet". This bug was masked by a previous bug.
- Restore the fix for a panic on detach caused by racing
with BPF detach code by Bill by moving ether_ifdetach()
after re_stop() and resetting IFF_UP; this got screwed
up in revs. 1.30 and 1.36.
attempts to deallocate busdma tags and resources that haven't been
allocated yet, causing a panic every time a dc interface fails to
attach. Fix by checking that we really have something to dealloc
before calling bus_dma*() functions.
Approved by: jhb
MFC after: 1 week
and reloading it in i386_extend_pcb() rather than trying to force a context
switch to reload the TSS via the TDF_NEEDRESCHED flag. Optimizations to
avoid calling cpu_switch() when the new thread was identical to the old
thread defeated the attempt to force a TSS reload. Explicitly loading the
new TSS is what we really want to do anyway.
PR: i386/84842
Reported by: Alexander Best arundel at h3c dot de
MFC after: 1 week
Reviewed by: bde (mostly)
- Use callout_init_mtx() and static callouts rather than timeout().
- m_getcl() in one place to simplify the code.
Tested by: Gavin Atkinson gavin dot atkinson at ury dot york dot ac dot uk
MFC after: 1 week
links and the execution of ELF binaries. Two problems were found:
1) The link path wasn't tagged as being MP safe and thus was not properly
protected.
2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was
insufficiently protected.
This commit makes the following changes:
-Sets the MPSAFE flag in NDINIT for symbolic link paths
-Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which
will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been
picked up.
-Drop in an assertion into vfs_lookup which ensures that if the MPSAFE
flag is NOT set, that we have picked up giant. If not panic (if WITNESS
compiled into the kernel). This should help us find conditions where vnode
operations are in-sufficiently protected.
This is a RELENG_6 candidate.
Discussed with: jeff
MFC after: 4 days
shutdown procedures (which have a duration of more than 120 seconds).
We have two user-space affecting shutdown timeouts: a "soft" one in
/etc/rc.shutdown and a "hard" one in init(8). The first one can be
configured via /etc/rc.conf variable "rcshutdown_timeout" and defaults
to 30 seconds. The second one was originally (in 1998) intended to be
configured via sysctl(8) variable "kern.shutdown_timeout" and defaults
to 120 seconds.
Unfortunately, the "kern.shutdown_timeout" was declared "unused" in 1999
(as it obviously is actually not used within the kernel itself) and
hence was intentionally but misleadingly removed in revision 1.107 from
init_main.c. Kernel sysctl(8) variables are certainly a wrong way to
control user-space processes in general, but in this particular case the
sysctl(8) variable should have remained as it supports init(8), which
isn't passed command line flags (which in turn could have been set via
/etc/rc.conf), etc.
As there is already a similar "kern.init_path" sysctl(8) variable which
directly affects init(8), resurrect the init(8) shutdown timeout under
sysctl(8) variable "kern.init_shutdown_timeout". But this time document
it as being intentionally unused within the kernel and used by init(8).
Also document it in the manpages init(8) and rc.conf(5).
Reviewed by: phk
MFC after: 2 weeks
running" panics.
Previously, recursion through the "include" feature was prevented by
marking each ruleset as "running" when applied. This doesn't work for
the case where two DEVFS instances try to apply the same ruleset at
the same time.
Instead introduce the sysctl vfs.devfs.rule_depth (default == 1) which
limits how many levels of "include" we will traverse.
Be aware that traversal of "include" is recursive and kernel stack
size is limited.
MFC: after 3 days
that with the NIC set of registers rather than the ASIC registers. I
believe this was a harmless oversight, since we set ED_P0_CR to the
same value 5ms later, but just to be safe...
it is destroyed in GEOM, in addition to being removed from /dev.
Before this patch, if you applied a new MBR which deleted a slice,
the deleted slice would not be in /dev, but it would still appear
in kern.geom.conftxt and kern.geom.confxml, which would confused
the diskPartitionEditor in sysinstall.
Submitted by: pjd
Tested by: pjd, rodrigc
MFC after: 1 week
risky because the "current time" is supposed to be fed to the card during
initialization, and the current time is supposed to be put into each command
that is sent to the card. Hopefully either the card doesn't actually care
about the timestamps, or it doesn't care about the absolute values so long
and the relative values are consistent. Not an MFC candidate until more
thorough testing can be done.
While I'm here add KASSERT(9) to notify failure of SYSUNINIT handler.
Reported by: Ben Kaduk < minimarmot AT gmail DOT com >
Tested by: Ben Kaduk < minimarmot AT gmail DOT com >
o Attach AX88x90's MII bus to system, and require its presence.
o Reorg the mii code a little, and move more of it into pccard attachment.
o Eliminate ed_pccard_{read,write}_attrmem in favor of a more appropriate
function in the pccard layer.
o Update comments to reflect knowledge gained.
o Update how re recognize a NE-2000 ROM. I found a couple of different
datasheets that define the structure of the PROM data, so the code's
old heuristics have been removed, and comments updated to reflect the
structure.
o Eliminate work around for EC2T. It is no longer needed, and was wrong
headed since the EC2T has a Winbound 82C926C in it, not a AX88x90.
o Add copyright to if_ed_pccard.c, since I believe I've re-written more than
3/4 of it.
# With these changes, all of my 20-odd ed based cards work, except for the
# NetGear FA-410, and I'm pretty sure that's a MII/PHY problem.
attach to. These cards are combo cards (in that they have a modem
inside of them), but not true MFC cards. Full support of these cards
will have to wait until we can pick the config to use and for the PFC
support that I have brewing.
fifo_kqfilter() VOP implementations, since they in theory are used
only on open file descriptors, in which case the ioctls are via
fifo_ioctl_f() and kqueue requests are via fifo_kqfilter_f().
Generate warnings if they are entered for now. These printf()
calls should become panic() calls.
Annotate and re-implement fifo_ioctl_f(): don't arbitrarily
forward ioctls to the socket layer, only forward the ones we
explicitly support for fifos. In the case of FIONREAD, don't
forward the request to the write socket on a read-write fifo, or
the read result is overwritten. Annotate a nasty case for the
undefined POSIX O_RDWR on fifos, in which failure of the second
ioctl will result in the socket pair being in an inconsistent
state.
Assert copyright as I find myself rewriting non-trivial parts of
fifofs.
MFC after: 3 days
be held when entering a kqueue filter for fifos via a socket buffer
event: as such, assert the lock unconditionally rather than acquiring
it conditionall.
MFC after: 3 days
1) fifo_kqfilter() is not actually ever used, it likely should be GC'd.
2) fifo_kqfilter_f() doesn't implement EVFILT_VNODE, so detecting events
on the underlying vnode for a fifo no longer works (it did in 4.x).
Likely, fifo_kqfilter_f() should forward the request to the VFS using
fp->f_vnode, which would work once fifo_kqfilter() was detached from
the vnode operation vector (removing the fifo override).
Discussed with: phk
used when a read filter is requested on a write-only fifo descriptor, or
a write filter is requested on a read-only fifo descriptor. This
permits the filters to be registered, but never raises the event, which
causes kqueue behavior for fifos to more closely match similar semantics
for poll and select, which permit testing for the condition even though
the condition will never be raised, and is consistent with POSIX's notion
that a fifo has identical semantics to a one-way IPC channel created
using pipe() on most operating systems.
The fifo regression test suite can now run to completion on HEAD without
errors.
MFC after: 3 days
- Remove an assertion in sound.c, it's not needed (and causes a panic now).
From the conversation via mail between glebius and Ariff:
---snip---
> Well, but which mutex protects now? Do we own anything else
> in pcm_chnalloc()? I see some queue(4) macros in pcm_chnalloc(),
> they should be protected, shouldn't they?
Queue insertion/removal occur during
1) driver loading (which is pretty much single thread /
sequential) or unloading (mutex protected, bail out if there is
any channel with refcount > 0 or busy).
2) vchan_create()/destroy(), (which is *sigh* quite complicated), but
somehow protected by 'master'/parent channel mutex. Other
thread cannot add/remove vchan (or even continue traversing
that queue) unless it can acquire parent channel mutex.
---snip---
Fix the locking in dsp.c to prevent a LOR (AFAIK not on the LOR page).
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
Tested with: INVARIANTS[1] and DIAGNOSTICS[2]
Tested by: netchild [1,2], David Reid <david@jetnet.co.uk> [1]
'buffers' pending NMIs from multiple interrupting PMCs and delivers
them serially.
Reported by: Olivier Crameri <olivier.crameri@epfl.ch>
MFC after: 3 days
according to POSIX, not to mention the fact that it doesn't make sense
(and hence isn't really implemented). This causes the fifo_misc
regression test to succeed.
file descriptor. Otherwise, the read end of a fifo might return that it
is writable (which it isn't).
Only poll the fifo for write events if the fifo attached to a writable
file descriptor. Otherwise, the write end of a fifo might return that
it is readable (which it isn't).
In the event that a file is FREAD|FWRITE (which is allowed by POSIX, but
has undefined behavior), we poll for both.
MFC after: 3 days
to poll the write socket for, the fifo polling code proceeded to poll
for the complete set of events. Use 'levents' instead of 'events' as
the argument to poll, and only poll the write socket if there is
interest in write events.
MFC after: 3 days
while sleeping to allocate fifo state: due to using the vnode lock to
serialize access to a fifo during open, it shouldn't happen (tm).
MFC after: 3 days
In case this causes trouble for some other chipsets add a comment how to
proceed. If we don't get bugreports, this should be removed after a while
(some releases?).
PR: 56617 [1], 29465, 39260, 40574, 68225
Submitted by: Matthew E. Gove <mgove@comcast.net> [1]
believe that there are PC98 systems with an OPTi chip.
I don't know enough about this special PC architecture to be sure about
this, so let's find out by letting people with such a system complain in
case this commit breaks the sound system for them. It's easy to revert
then.
PR: 45673
Submitted by: Watanabe Kazuhiro <CQG00620@nifty.ne.jp>
same as today: do no power management. 1 means be conservative about
what you power down (any device class that has caused problems gets
added here). 2 means be agressive about what gets powered down (any
device class that's fundamental to the system is here). 3 means power
them all down, reguardless. The default is 1.
The effect in the default system is to add mass storage devices to the
list that we don't power down. From all the pciconf -l lists that
I've seen for the aac and amr issue, the bad device has been a mass
storage device class.
This is an attempt at a compromise between the very small number of
systems that have extreme issues with powerdown, and the very large
number of systems that gain real benefits from powerdown (I get about
20% more battery life when I attach a minimal set of drivers on my
Sony). Hopefully it will strike the proper balance.
MFC After: 3 days (before next beta)
is aligned with the sectorsize value returned by GEOM, before
doing a bread() of the superblock.
This eliminates a panic when trying the following on an empty CD-ROM drive:
mount_ext2fs /dev/acd0 /mnt
Reviewed by: phk
pmap_bootstrap by using the sync;isync big hammer to make sure
all prior operations have completed.
Reported by: Nathan Whitehorn <nathan at uchicago edu>
MFC after: 2 days
* New definition CHN_F_HAS_VCHAN.
- channel.c
* Use CHN_F_HAS_VCHAN to mark channel with vchan capability instead
of relying on SLIST_EMPTY(&channel->children) == true for better
clarification and future possible usages of children (like
'slave' channel).
* Various fixes, including blocksize / format bps allignment,
better 24bit seeking (mplayer, others).
* Improve format chain building, it's now possible to record something
to a format non-native to the soundcard through various feeder format
converters or to higher sampling rate. This also gains another feature,
like doing vchan mixing on non s16le soundcard such as sb8.
- sound.c
* Increase robustness within various function that handle vchan
creation / termination (these function need a total rewrite, but
that would cause other major rewrite within various places too!).
As far as its robustness can be guaranteed, leave it as is.
* Optimize channel ordering, prefer *real* hardware playback
channels over virtual channels. cat /dev/sndstat should look
better.
* Increase sndstat verbosity to include bufsoft/bufhard allocation.
- vchan.c
* Fix LOR 119.
- http://sources.zabbadoz.net/freebsd/lor.html#119
* Reorder / increase robustness of vchan_create() / destroy().
Enforce destroy_dev() during destroy operation, fix possible
panic / dangling character device.
- http://lists.freebsd.org/pipermail/freebsd-current/2005-May/050308.html
* Tolerate a little bit more during mixing process, this should help
non s16le soundcards.
Note: Recoring in a non-native rate/format may result in overruns. A friendly
application is wavrec from audio/wavplay. The problem is under
investigation.
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
From the PR:
---snip---
The vibra16X supports full duplex. I traced the Windows driver, and what is
does is that it programs one DMA channel 8-bit, and the other 16-bit. There
might be some kind of auto detection logic here, because it always uses 8-bit
for playback, even if I play 16-bit sound ...
---snip---
PR: 80977
Submitted by: Hans Petter Selasky <hselasky@c2i.net>
locks were not aquired because the user buffers were not wired, thus it was
possible that that SYSCTL_OUT could sleep, causing a number of different
problems such as lock ordering issues and dead locks.
-Wire user supplied buffer to ensure SYSCTL_OUT will not sleep.
-Pickup ifnet locks to protect the list.
-Where applicable pickup address locks.
-Pickup radix node head locks.
-Remove splnet stubs
-Remove various comments about locking here, because they are no
longer needed.
It is the hope that these changes will make sysctl_rtsock MP safe.
MFC after: 3 weeks
has been done in icmp_input() already.
This fixes the ICMP_UNREACH_NEEDFRAG case where no MTU was
proposed in the ICMP reply.
PR: kern/81813
Submitted by: Vitezslav Novy <vita at fio.cz>
MFC after: 3 days
Reduce the size of ed a little by removing some CIS based entries (others
likely can be removed too):
o The D-Link DFE-670TXD doesn't need its own entry based on strings.
o The Xircom CompactCard appears to be a TDK design, so list it there by ID
and remove the strings.
Increase the size of ed a little:
o Add support for the Addtron AE-660CT and Addtron AE-660. This is a very
generic NE-2000 clone (so generic that its CIS tags say NE-2000 generic
card!).
take the lock from interrupt context, which causes an implicit
lock order reversal. We've been using the lock carefully enough
that making it a spin lock should not be harmful.
first interface is detached from parent and then bpfdetach() is called.
If the interface was the last carp(4) interface attached to parent, then
the mutex on parent is destroyed. When bpfdetach() calls if_setflags()
we panic on destroyed mutex.
To prevent the above scenario, clear pointer to parent, when we detach
ourselves from parent.
UMA boot pages.
Disable recursion on the general UMA lock now that startup_alloc() no
longer uses it.
Eliminate the variable uma_boot_free. It serves no purpose.
Note: This change eliminates a lock-order reversal between a system
map mutex and the UMA lock. See
http://sources.zabbadoz.net/freebsd/lor.html#109 for details.
MFC after: 3 days
pf_ioctl.c Revision 1.153 Sun Aug 7 11:37:33 2005 UTC by dhartmei
| verify ticket in DIOCADDADDR, from Boris Polevoy, ok deraadt@
pf_ioctl.c Revision 1.158 Mon Sep 5 14:51:08 2005 UTC by dhartmei
| in DIOCCHANGERULE, properly initialize table, if used in NAT rule.
| from Boris Polevoy <vapcom at mail dot ru>, ok mcbride@
pf.c Revision 1.502 Mon Aug 22 11:54:25 2005 UTC by dhartmei
| when nat'ing icmp 'connections', replace icmp id with proxy values
| (similar to proxy ports for tcp/udp). not all clients use
| per-invokation random ids, this allows multiple concurrent
| connections from such clients.
| thanks for testing to Rod Whitworth, "looks ok" markus@
pf.c Revision 1.501 Mon Aug 22 09:48:05 2005 UTC by dhartmei
| fix rdr to bitmask replacement address pool. patch from Max Laier,
| reported by Boris Polevoy, tested by Jean Debogue, ok henning@
Obtained from: OpenBSD
MFC after: 3 days
mailing list onto the vendor branch:
pf_ioctl.c Revision 1.153 Sun Aug 7 11:37:33 2005 UTC by dhartmei
| verify ticket in DIOCADDADDR, from Boris Polevoy, ok deraadt@
pf_ioctl.c Revision 1.158 Mon Sep 5 14:51:08 2005 UTC by dhartmei
| in DIOCCHANGERULE, properly initialize table, if used in NAT rule.
| from Boris Polevoy <vapcom at mail dot ru>, ok mcbride@
pf.c Revision 1.502 Mon Aug 22 11:54:25 2005 UTC by dhartmei
| when nat'ing icmp 'connections', replace icmp id with proxy values
| (similar to proxy ports for tcp/udp). not all clients use
| per-invokation random ids, this allows multiple concurrent
| connections from such clients.
| thanks for testing to Rod Whitworth, "looks ok" markus@
pf.c Revision 1.501 Mon Aug 22 09:48:05 2005 UTC by dhartmei
| fix rdr to bitmask replacement address pool. patch from Max Laier,
| reported by Boris Polevoy, tested by Jean Debogue, ok henning@
times consequently, without checking whether callout has been serviced
or not. (ng_pptpgre and ng_ppp were catched in this behavior).
- In ng_callout() save old item before calling callout_reset(). If the
latter has returned 1, then free this item.
- In ng_uncallout() clear c->c_arg.
Problem reported by: Alexandre Kardanev
First, mutexed callouts are incompatible with netgraph nodes, because
netgraph(4) can guarantee that the function will be called with mutex
held.
Second, nodes should not send data to their neighbor holding their
mutex. A node does not know what stack can it enter sending data in
some direction. May be executing will encounter a place to sleep.
New locking:
- ng_pptpgre_recv() and ng_pptpgre_xmit() must be entered with mutex held.
- ng_pptpgre_recv() and ng_pptpgre_xmit() unlock mutex before
sending data and then return unlocked.
- callout routines acquire mutex themselves.
struct bufs that are persistently held by ext2fs. Ignore any buffers
with this flag in the code in boot() that counts "busy" and dirty
buffers and attempts to sync the dirty buffers, which is done before
attempting to unmount all the file systems during shutdown.
This fixes the problem caused by any ext2fs file systems that are
mounted at system shutdown time, which caused boot() to give up on
a non-zero number of buffers and skip the call to vfs_unmountall().
This left all the mounted file systems in a dirty state and caused
them to all require cleanup by fsck on reboot.
Move the two separate copies of the "busy" buffer test in boot()
to a separate function.
Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro.
Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with
this and previous flag changes.
PR: kern/56675, kern/85163
Tested by: "Matthias Andree" matthias.andree at gmx.de
Reviewed by: bde
MFC after: 3 days
here is the support for amd64, as well as possible support for PAE. Many
thanks to Highpoint for continuing to support FreeBSD.
Obtained from: Steve Chang @ Highpoint
MFC After: 3 days.
o Note that the first 255 locations are reserved for JEDEC Ids from
publication 106 (current revision Q, each one verified with
JEDEC and the PMCICA).
o Move ADAPTEC2 to the right section.
o Sort TOSHIBA2 numerically.
dl100xx case.
o We no longer acquire and release resources during attach many times. We now
do it once at the beginning.
o Move setting the resource offsets to just after acquiring the ports in
attach.
o Move ax88x90 code to the end of the file, just after the dl100xx specific
code.
o Rename ed_pccard_Linksys to ed_pccard_dl100xx to reflect the underlying
chipset.
o Pass the ed_product structure into ed_pccard_{dl100xx,ax88x90} and have
those routines test the flags to see if this card should be probed in that
way.
o transition from ed_probe_Novell to ed_probe_Novell_generic since we already
have the resources setup.
o Move use of ed_probe_Novell_generic into ed_pccard_dl100xx to be more
consistant with ax88x90 case.
o simplify the code where we probe for the chipsets
the probe code that this used to be part of, but as part of the
attach, we shouldn't be dropping the resources here.
Also, allocate the proper rid in the ax88x90 setup.
as yet unknown, those cards report their MAC address a byte at a time.
However, other AX88x90 cards report the MAC address a word at a time.
Add a heuristic which looks at the high order bytes of the first 6
words. If they are all '0', assume the card is behaving like the
Linksys EC2T card. Since the default prefix for these cards appears
to be 00:e0:98, this appears to be a safe heuristic. While some cards
have been observed with different prefixes, they all work with this
heuristic.
I'm unsure if this is a bug in the EC2T card, or if it is a bug in the
initialization of the card. No other OS has this heuristic (although
w/o it, the MAC address that is used works).
listed in different orders. Since it is easy to identify the Modem
resources vs the Ethernet resources by looking at the size, use that
rather than hard coded rids. For such parts, go ahead and guess which
rid we should use based on the size. This guess appears reliable for
the two example cards that I have with different CIS info.
assigned to the interface.
IPv6 auto-configuration is disabled. An IPv6 link-local address has a
link-local scope within one link, the spec is unclear for the bridge case and
it may cause scope violation.
An address can be assigned in the usual way;
ifconfig bridge0 inet6 xxxx:...
Tested by: bmah
Reviewed by: ume (netinet6)
Approved by: mlaier (mentor)
MFC after: 1 week
does not clear m_nextpkt for us. The mbufs are sent into netgraph and
then, if they contain a TCP packet delivered locally, they will enter
socket code again. They can pass the first assert in sbappendstream()
because m_nextpkt may be set not in the first mbuf, but deeper in the
chain. So the problem will trigger much later, when local program
reads the data from socket, and an mbuf with m_nextpkt becomes a
first one.
This bug was demasked by revision 1.54, when I made upcall queueable.
Before revision 1.54 there was a very small probability to have 2
mbufs in GRE socket buffer, because ng_ksocket_incoming2() dequeued
the first one immediately.
- in ng_ksocket_incoming2() clear m_nextpkt on all mbufs
read from socket.
- restore rev. 1.54 change in ng_ksocket_incoming().
PR: kern/84952
PR: kern/82413
In collaboration with: rwatson
Also introduce an aclinit function which will be used to create the UMA zone
for use by file systems at system start up.
MFC after: 1 month
Discussed with: rwatson
refresh the PID which has the descriptor open. The PID is refreshed in various
operations like ioctl(2), kevent(2) or poll(2). This produces more accurate
information about current bpf consumers. While we are here remove the bd_pcomm
member of the bpf stats structure because now that we have an accurate PID we
can lookup the via the kern.proc.pid sysctl variable. This is the trick that
NetBSD decided to use to deal with this issue.
Special care needs to be taken when MFC'ing this change, as we have made a
change to the bpf stats structure. What will end up happening is we will leave
the pcomm structure but just mark it as being un-used. This way we keep the ABI
in tact.
MFC after: 1 month
Discussed with: Rui Paulo < rpaulo at NetBSD dot org >
Don't free a struct inodedep if another process is allocating saved inode
memory for the same struct inodedep in initiate_write_inodeblock_ufs[12]().
Handle disappearing dependencies in softdep_disk_io_initiation().
Reviewed by: mckusick
was not invalidated if the PTE was not actually being removed. In
an UP kernel this didn't cause problems, because the new mapping
would preempt the old one. In an SMP kernel this could lead to the
use of stale translations when processes move between CPUs at the
"right" moment. This fixes the last of the obvious SMP problems
and it should be safe to enable SMP by default now.
o In pmap_remove_pte: minor code refactoring to avoid duplication.
o Test all PTE pointers against NULL. Don't use implicit boolean
tests.
It seems that this issue only become obvious when compiled with -O2
on sparc64.
Since each struct iconv_converter_class has been initialized by
DEFINE_CLASS macro, not all members of struct iconv_converter_class
has been allocated on memory and cc_link member has not been
initialized, while iconv_register_converter() wanted to access it
with TAILQ.
Now we modify KICONV_CONVERTER macro and fix this bug.
Problem reported on: freebsd-sparc64
Pointed out by: yongari
Discussed with: yongari
Tested by: yongari
MFC after: 3 days
instead. Detailed changelist:
o Add flags field to struct pollrec, to indicate that
are particular entry is being worked on.
o Define a macro PR_VALID() to check that a pollrec
is valid and pollable.
o Mark ISRs as mpsafe.
o ether_poll()
- Acquire poll_mtx while traversing pollrec array.
- Skip pollrecs, that are being worked on.
- Conditionally acquire Giant when entering handler.
o netisr_pollmore()
- Conditionally assert Giant.
- Acquire poll_mtx while working with statistics.
o netisr_poll()
- Conditionally assert Giant.
- Acquire poll_mtx while working with statistics
and traversing pollrec array.
o ether_poll_register(), ether_poll_deregister()
- Conditionally assert Giant.
- Acquire poll_mtx while working with pollrec array.
o poll_idle()
- Remove all strange manipulations with Giant.
In collaboration with: ru, pjd
In collaboration with: Oleg Bulyzhin <oleg rinet.ru>
In collaboration with: dima <_pppp mail.ru>
waiting for geom events to happen:
Instead of maintaining a count of outstanding events, simply look if
the queue is empty. Make sure to not remove events from the queue
until they are executed in order to not open a new race.
Much work by: pjd
Tested by: kris
MT6: yes, should be.
eliminate TLB invalidations when permissions are relaxed, such as when a
read-only mapping is changed to a read/write mapping. Additionally,
eliminate TLB invalidations when bits that are ignored by the hardware,
such as PG_W ("wired mapping"), are changed.
Reviewed by: tegge
o s/vhpt_base/pmap_vhpt_base/g
o s/vhpt_bucket/pmap_vhpt_bucket/g
o Declare the above in <machine/pmap.h>
o Move the vm.stats.vhpt.* sysctls to machdep.vhpt.*
o Create a tunable machdep.vhpt.log2size, with corresponding sysctl.
The tunable allows the user to specify the VHPT size from the loader.
o Don't keep track of the number of PTEs in the VHPT. Calculate the
population when necessary by iterating the buckets and summing up
the length of the buckets.
o Don't perform the tpa instruction with a bucket lock held. The
instruction can (theoretically) fault and locking is not needed.
return the correct bar size if we encountered a 64-bit BAR that had
its resources already assigned. If the resources weren't yet
assigned, we'd bogusly assume it was a 32-bit bar and return 1.
is never 0, so one cannot test for a NULL pointer after a physical
address is translated into a virtual pointer with said macro. Instead,
keep the physical address around and test it against 0. Note that
this obviously implies that a PTE can never be allocated at physical
address 0. This isn't exactly guaranteed, but hasn't been a problem
so far. We test the physical address against 0 for as long as the ia64
port exists...
remaining % arguments because the varargs are now out of sync and
there is a risk that we might for instance dereference an integer
in a %s argument.
Sponsored by: Napatech.com
link proctree and allproc to Giant since that order is already implicitly
enforced.
- Use a goto to handle the case where we want to enforce a reversal before
calling isitmydescendant() in witness_checkorder() so that the logic is
easier to follow and so that it is easier to add more forced-reversal
cases in the future.
MFC after: 3 days
mutex.
- Don't panic if a spin lock is held too long inside _mtx_lock_spin() if
panicstr is set (meaning that we are already in a panic). Just keep
spinning forever instead.
PROM by bytes. Adjust the extraction of the MAC address from this data
to reflect this change.
This gets the AX88x90 based PC Cards MAC address working again (my
UMAX Ethernet and Linksys EC2T cards now work).
MFC After: 3 days
- Always check mdnew() return value, as even in !autounit case
kthread_create() can fail.
Those two changes fix serval panics provked by simple stress test.
Tested by: Kris The BugMagnet
MFC after: 3 days
panic. The panic happens when outgoing L2CAP connection descriptor is
deleted with the L2CAP command(s) pending in the queue. In this case when
the last L2CAP command is deleted (due to cleanup) and reference counter
for the L2CAP connection goes down to zero the auto disconnect timeout
is incorrectly set. pjd gets credit for tracking this down and committing
bandaid.
Reported by: Jonatan B <onatan at gmail dot com>
MFC after: 3 days
- Set errno to ENXIO instead of 0 in several attach failure cases.
- Setup the interrupt handler at the very end of txp_attach() after
ether_ifattach().
- Various whitespace fixes in function prototypes.
value from an m_tag, but also to set it. This reduces
complex code duplication and improves its readability.
Alas, we shouldn't rename the macro to VLAN_TAG_LVALUE()
globally because that would cause pain for kernel module
port maintainers and vendors using FreeBSD as their codebase.
Added a clarifying comment instead.
Discussed with: ru, glebius
X-MFC-After: 6.0-RELEASE (MFC is good just to reduce the diff)
o for() instead of while() looping over mbuf chain
o paren's around all flag checks
o more verbose function and purpose description
o some more style changes
Based on feedback from: sam
- Add locked variants of start(), init(), ifmedia_upd(), and poll() and stop
recursing on the driver lock.
- Add locking to ifmedia_upd() and ifmedia_sts().
- Use callout_*() instead of timeout/untimeout.
- Fix locking in ioctl().
Tested by: Bob Bishop rb at gid dot co dot uk
MFC after: 3 days
m_demote(m->m_next) if they wish to start at the second mbuf in chain.
o Test m_type with == instead of &.
o Check m_nextpkt against NULL instead of implicit 0.
Based on feedback from: sam
by explicitly setting sc->font_width, in the same
places where sc->font_size is set, instead of
relying on the default initialized value of 0 for sc->font_width.
PR: kern/84836
Reported by: Andrey V. Elsukov <bu7cher at yandex dot ru>
MFC after: 2 days
switching(ifconfig devX channel Y). This fix should make channel changing
works again in monitor mode.
Submitted by: sam
X-MFC-With: other ic_curchan changes
parallel ng_pptp_rcvdata():
- Add a per-node mutex.
- Acquire mutex during all ng_pptp_rcvdata() method.
- Make callouts protected by mutex. Now callouts count as
netgraph writers, but there are plans to allow reader callouts
for nodes, that have internal locking.
- Acquire mutex in ng_pptp_reset(), which can be triggered
by a message or node shutdown.
PR: kern/80035
Tested by: Deomid Ryabkov <myself rojer.pp.ru>
Reviewed by: Deomid Ryabkov <myself rojer.pp.ru>
1. Walk the absolute list in reverse to prefer duplicated levels that have
a lower absolute setting, i.e. 800 Mhz/50% is better than 1600 Mhz/25% even
though both have the same actual frequency. This also removes the need to
check for already-modified levels since by definition, those will be added
later in the sorted list.
2. Compare the absolute settings for derived levels and don't use the new
level if it's higher. For example, a level of 800 Mhz/75% is preferable to
1600 Mhz/25% even though the latter has a lower total frequency.
This work is based on a patch from the submitter but reworked by myself.
Submitted by: Tijl Coosemans (tijl/ulyssis.org)
int prep, int how).
Copies the data portion of mbuf (chain) n starting from offset off
for length len to mbuf (chain) m. Depending on prep the copied
data will be appended or prepended. The function ensures that the
mbuf (chain) m will be fully writeable by making real (not refcnt)
copies of mbuf clusters. For the prepending the function returns
a pointer to the new start of mbuf chain m and leaves as much
leading space as possible in the new first mbuf.
Reviewed by: glebius
checking on mbuf's and mbuf chains. Set sanitize to 1 to garble
illegal things and have them blow up later when used/accessed.
m_sanity()'s main purpose is for KASSERT()'s and debugging of non-
kosher mbuf manipulation (of which we have a number of).
Reviewed by: glebius
any tags and packet headers. If "all" is set then the first mbuf
in the chain will be cleaned too.
This function is used before an mbuf, that arrived as packet with
m->flags & M_PKTHDR, is appended to an mbuf chain using m->m_next
(not m->m_nextpkt).
Reviewed by: glebius
for the spl-era locking, but now that we can have multiple, concurrent
interrupts for multiple wi devices, having a global check to make sure
at most one of them was in wi_cmd no longer makes sense.
MFC After: 2 decifortnight
o Lock ed
o Fix extra newline in probe messages
o Eliminate gone.
o Make detach less-racy.
o Eliminate spl*
o Switch from timeout/untimeout to callout interface.
o Read/write card memory using bus_space calls.
o generalize readmem so that we don't need ifs in the code.
o Fix memory stuff to be consistant.
o Remove OLDCARD compat stuff.
o Mark interrupt as MPSAFE.
# sic, hpp not tested at all
# ISA and PCI attachments lightly tested
- On resume all registers have to be initialized again like after
power-on so reset sc_inited in gem_suspend() in order get all of
the registers set next time gem_init_regs() is called.
- On at least some ERI and GEM revisions GEM_MAC_RX_OVERFLOW happen
often due to a silicon bug and re-initializing is all we can do
about these errors so make handling them non-verbose.
- Remove a superfluous memset(3) call in gem_meminit(), all elements
are initialized to 0 anyway.
MFC after: 1 week
but vm_map_wire() fails, then a vm object, vm map entries, and kernel_map
free space is leaked and (2) unwiring is handled automatically by
vm_map_remove().
Suggested by: tegge
On entry or exit from the kernel the 'alltraps' and 'doreti' code
used taken by normal traps disables interrupts to protect the
critical sections where it is setting up %gs.
This protection is insufficient in the presence of NMIs since NMIs
can be taken even when the processor has disabled normal interrupts.
Thus the NMI handler needs to actually read MSR_GBASE on entry to
the kernel to determine whether a swap of %gs using 'swapgs' is
needed. However, reads of MSRs are expensive and integrating this
check into the 'alltraps'/'doreti' path would penalize normal
interrupts.
- Teach DDB about the 'nmi_calltrap' symbol.
Reviewed by: bde, peter (older versions of this change)
so that we do not call uiomove() while IFNET_RLOCK() is held.
This eliminates the witness warning:
Calling uiomove() with the following non-sleepable locks held:
exclusive sleep mutex ifnet r = 0 (0xc096dd60) locked @
/usr/src/sys/modules/linux/../../compat/linux/linux_ioctl.c:2170
MFC after: 2 days
attached.
This is caused by bpf_detachd clearing IFF_PROMISC on the interface which does
a SIOCSIFFLAGS ioctl. The problem here is that while the interface has been
stopped, IFF_UP has not been cleared so IFF_UP != IFF_DRV_RUNNING, this causes
the ioctl function to init() the interface which resets the callouts.
The destroy then completes and frees the softc but softclock will panic on a
dead callout pointer.
Ensure ifp->if_flags matches reality by clearing IFF_UP when we destroy.
Silence from: rwatson
Approved by: mlaier (mentor)
MFC after: 3 days
either reader or writer flag on item in the function, that
allocates the item. Do not modify these flags when item is
applied or queued.
The only exceptions are node and hook overrides - they can
change item flags to writer.
This way, the VINUMDRIVE class is loaded before the VINUM class,
but since geom does the tasting for newly arrived classes
last-in-first-out, the VINUM class tastes first.
This removes the need to call gv_parse_config() in the drive
taste path.
tulip_mbuf_compress(). If we fail to allocate a new mbuf to copy the
data into, put the mbuf back in the driver's send queue so that we can
retry it later rather than throwing the packet away.
- Use m_devget() instead of doing it inline ourselves in the
TULIP_COPY_RXDATA case. If we fail to allocate an mbuf to copy the data
into, don't forget about the original mbuf cluster. The old code would
lose the pointer and leak the cluster in that case. Now it doesn't lose
it but always sticks the original rx buffer back into the receive ring
after trying to copy the data out and send it up the stack. Also, if we
fail to allocate a new mbuf to copy the data into, log an input error.
Also, don't combine the priming case with the received-a-packet case to
make the code flow a bit clearer and easier to follow.
earlier as no one has stepped up to test recent changes to the driver.
Oddly, the module was actually turned on on ia64 though I'm fairly certain
that no ia64 machine has ever had or will ever have an ISA slot.
Axe borrowed from: phk
- if minfd < fd_freefile (as is most often the case, since minfd is
usually 0), set it to fd_freefile.
- remove a call to fd_first_free() which duplicates work already done
by fdused().
This change results in a small but measurable speedup for processes
with large numbers (several thousands) of open files.
PR: kern/85176
Submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
MFC after: 3 weeks
1. The amd64 pmap, unlike the i386 pmap, maintains a reference count
for each page directory (PD) page. However, in the transformation
of the i386 pmap into the amd64 pmap, operations, such as
pmap_copy() and pmap_object_init_pt(), that create 2MB "superpage"
mappings by setting the PG_PS bit in a PD entry were not modified
to adjust the underlying PD page's reference count. Consequently,
superpage mappings could disappear prematurely.
2. pmap_object_init_pt() could crash or corrupt memory if either the
virtual address range being mapped crosses a 1GB boundary in the
virtual address space or nothing is mapped in the 1GB area.
3. When pmap_allocpte() destroys a 2MB "superpage" mapping it does not
reduce the pmap's resident count accordingly. It should. (This
bug is inherited from i386.)
Discussed with: peter
Reviewed by: tegge
not exsist, do not have ioctl return an error, but instead set -1
in the data returned to the user. This allows the HP bios flash
utilities to work without requiring changes to their code.
Reviewed by: jhb
- Remove form feed characters.
- Fixup style of function declarations.
- Assume that an mbuf cluster is big enough to hold an ethernet frame.
(This should really be using m_defrag(), but this diff is just simple
changes for now.)
- Allocate arrays of metadata for the descriptors in the rx and tx rings
and change the ring pointers to walk the metadata array rather than the
actual descriptor rings. Each metadata object contains a pointer to its
descriptor, a pointer to any associated mbuf, and a pointer to the
associated bus_dmamap_t in the bus_dma case. The mbuf pointers replace
the tulip_txq and tulip_rxq local ifqueue's in the softc.
- Add lots of KTR trace entries using a local KTR_TULIP level which
defaults to 0, but can be changed to KTR_DEV at the top of the file
when debugging.
- Rename tulip_init(), tulip_start(), tulip_ifinit(), and tulip_ifstart()
to tulip_init_locked(), tulip_start_locked(), tulip_init(), and
tulip_start(), respectively, to match the convention in other drivers.
- Add a TULIP_SP_MAC() macro to encode two bytes of the MAC address into
the setup buffer and use that in place of lots of BYTE_ORDER #ifdef's.
Also, remove an incorrect XXX comment I added earlier, the driver was
correct (at least it does the same thing dc(4) does). TULIP_SP_MAC
was shamelessly copied from DC_SP_MAC() in dc(4).
- Remove the #ifdef'd NetBSD bus-dma code and replace it with FreeBSD
bus-dma code that not only compiles but even works at runtime.
- Use callout_init_mtx() instead of just callout_init().
- Correct the various wrapper macros for bus_dmamap_sync() for the rx
and tx buffers to only ask for the sync ops that they actually need.
- Tidy the #ifdef TULIP_COPY_RXDATA code by expanding an #ifdef a bit
so it becomes easier to read at the expense of a couple of duplicated
lines of code. Also, use m_getcl() to get an mbuf cluster rather than
MGETHDR() followed by MCLGET().
- Maintain the ring free (ri_free) count for the rx ring metadata since
we no longer have tulip_rxq.ifq_len around to indicate how many mbuf's
are currently in the rx ring.
- Add code to teardown bus_dma resources when attach fails and generally
fixup attach to do a better job of cleaning up when it fails. This
gets us a good bit closer to possibly having a detach method someday
and making this driver an unloadable module.
- Add some functions that can be called from ddb to dump the state of
a descriptor ring and to dump the state of an individual descriptor.
- Various comment grammer and spelling fixes.
I have bus-dma turned on by default, but I've left the non-bus-dma code
around so that it can be turned off to aid in debugging should any problems
turn up later on. I'll be removing the non-bus-dma code in a subsequent
commit.
the code, i.e. ng_fec_init() is called with the ifp->if_softc pointer and
NOT with the ifp pointer.
PR: kern/85239
Reviewed by: brooks
MFC after: 1 day
through locking; leave some spl references around code where there
are open questions about global variable references. Also, add
an XXX regarding locking in sysctl.
MFC after: 3 days
ARP requests only on the network where this IP address belong, to.
Before this change we did replied on all interfaces. This could
lead to an IP address conflict with host we are doing ARP proxy
for.
PR: kern/75634
Reviewed by: andre
cooling thread which refers psv, tc1, tc2 and tsp. The previous
code made the period where sc->tz_zone.tsp was zero, and it caused
panic at msleep().
Reported by: keramida
Tested by: keramida
it fixes. I believe the problem lives somewhere outside ng_ksocket,
but until it is found, let the node be working.
PR: kern/84952
PR: kern/82413
MFC after: 3 days
if an indirect relationship exists (keep both A->B->C and A->C).
This allows witness_checkorder() to use isitmychild() instead of
the much more expensive isitmydescendant() to check for valid lock
ordering.
Don't do an expensive tree walk to update the w_level values when
the tree is updated. Only update the w_level values when using the
debugger to display the tree.
Nuke the experimental "witness_watch > 1" mode that only compared
w_level for the two locks. This information is no longer maintained
at run time, and the use of isitmychild() in witness_checkorder
should bring performance close enough to the acceptable level that
this hack is not needed.
Report witness data structure allocation statistics under the
debug.witness sysctl.
Reviewed by: jhb
MFC after: 30 days
and detach() since mtx_lock() will assert that already since the driver
lock is not recursive.
- Move the call to callout_init_mtx() before hme_stop() so that the
callout_stop() in hme_stop() doesn't operate on an uninitialized callout
structure during attach.
Reported by: yongari (2)
MFC after: 3 days
between sack and a bug in the "bad retransmit recovery" logic. This is
a workaround, the underlying bug will be fixed later.
Submitted by: Mohan Srinivasan, Noritoshi Demizu
so that devd can match on it. This field was already available to
usbd and is used by a number of usbd.conf entries, so now it is
possible to transfer those entries to devd.conf.
Submitted by: Anish Mistry
actually 1514, so comparing the mbuf length which includes the Ethernet header
to the interface MTU is wrong.
The check was a little over the top so just remove it.
Approved by: mlaier (mentor)
MFC after: 3 days
vlrureclaim() in vfs_subr.c 1.636 because waiting for the vnode
lock aggravates an existing race condition. It is also undesirable
according to the commit log for 1.631.
Fix the tiny race condition that remains by rechecking the vnode
state after grabbing the vnode lock and grabbing the vnode interlock.
Fix the problem of other threads being starved (which 1.636 attempted
to fix by removing LK_NOWAIT) by calling uio_yield() periodically
in vlrureclaim(). This should be more deterministic than hoping
that VOP_LOCK() without LK_NOWAIT will block, which may not happen
in this loop.
Reviewed by: kan
MFC after: 5 days
enhance the security of bpf(4) by further relinquishing the privilege of
the bpf(4) consumer (assuming the ioctl commands are being implemented).
Once BIOCLOCK is executed, the device becomes locked which prevents the
execution of ioctl(2) commands which can change the underly parameters of the
bpf(4) device. An example might be the setting of bpf(4) filter programs or
attaching to different network interfaces.
BIOCSETWF can be used to set write filters for outgoing packets. Currently if
a bpf(4) consumer is compromised, the bpf(4) descriptor can essentially be used
as a raw socket, regardless of consumer's UID. Write filters give users the
ability to constrain which packets can be sent through the bpf(4) descriptor.
These features are currently implemented by a couple programs which came from
OpenBSD, such as the new dhclient and pflogd.
-Modify bpf_setf(9) to accept a "cmd" parameter. This will be used to specify
whether a read or write filter is to be set.
-Add a bpf(4) filter program as a parameter to bpf_movein(9) as we will run the
filter program on the mbuf data once we move the packet in from user-space.
-Rather than execute two uiomove operations, (one for the link header and the
other for the packet data), execute one and manually copy the linker header
into the sockaddr structure via bcopy.
-Restructure bpf_setf to compensate for write filters, as well as read.
-Adjust bpf(4) stats structures to include a bd_locked member.
It should be noted that the FreeBSD and OpenBSD implementations differ a bit in
the sense that we unconditionally enforce the lock, where OpenBSD enforces it
only if the calling credential is not root.
Idea from: OpenBSD
Reviewed by: mlaier
specifies a PMC capability (e.g., sampling) that is not supported
by hardware. Return EINVAL early if the PMC class passed in is
not recognized.
MFC after: 3 days
TTL a packet must have when received on a socket. All packets with a
lower TTL are silently dropped. Works on already connected/connecting
and listening sockets for RAW/UDP/TCP.
This option is only really useful when set to 255 preventing packets
from outside the directly connected networks reaching local listeners
on sockets.
Allows userland implementation of 'The Generalized TTL Security Mechanism
(GTSM)' according to RFC3682. Examples of such use include the Cisco IOS
BGP implementation command "neighbor ttl-security".
MFC after: 2 weeks
Sponsored by: TCP/IP Optimization Fundraise 2005
cluster if needed.
Fixes the TCP issues raised in I-D draft-gont-icmp-payload-00.txt.
This aids in-the-wild debugging a lot and allows the receiver to do
more elaborate checks on the validity of the response.
MFC after: 2 weeks
Sponsored by: TCP/IP Optimization Fundraise 2005
than ~PDRMASK to extract the physical address of a superpage from a PDE.
The use of ~PDRMASK is problematic if the PDE has PG_NX set. Specifically,
the PG_NX bit will be included in the physical address if ~PDRMASK is used.
Reviewed by: peter
depends, like all other pccard drivers, indirectly through kobj on
pccard. Therefore, it is not appropriate to force pccard to be loaded
when if_ral.ko is loaded. This makes it possible to load if_ral w/o
loading pccard.ko on, eg, pci only systems.
packet in an ICMP reply. The minimum of 8 bytes is internally
enforced. The maximum quotation is the remaining space in the
reply mbuf.
This option is added in response to the issues raised in I-D
draft-gont-icmp-payload-00.txt.
MFC after: 2 weeks
Spnsored by: TCP/IP Optimizations Fundraise 2005
the IP address the packet came through in. This is useful for routers
to show in traceroutes the actual path a packet has taken instead of
the possibly different return path.
The new sysctl is named net.inet.icmp.reply_from_interface and defaults
to off.
MFC after: 2 weeks
is a workaround for non-symetric teardown of the file systems at
shutdown with respect to the mount order at boot. The proper long term
fix is to properly detach devfs from the root mount before unmounting
each, and should be implemented, but since the problem is non-harmful,
this temporary band-aid will prevent false positive bug reports and
unnecessary error output for 6.0-RELEASE.
MFC after: 3 days
Tested by: pav, pjd
o management of multiple tx rings (up to 4)
o setting of WME IE in association requests
Some features are still missing though, like the possibility to override
the default cwmin/cwmax/asfn values of each tx queues.
it to __MINSIGSTKSZ. Define MINSIGSTKSZ in <sys/signal.h>.
This is done in order to use MINSIGSTKSZ for the macro PTHREAD_STACK_MIN
in <pthread.h> (soon <limits.h>) without having to include the whole
<sys/signal.h> header.
Discussed with: bde
sizeof(struct g_eli_metadata) will return the exact number of bytes needed
for storing it on the disk.
Without this change GELI was unusable on amd64 (and probably other 64-bit
archs), because sizeof(struct g_eli_metadata) was greater than 512 bytes
and geli(8) was failing on assertion.
Reported by: Michael Reifenberger <mike@Reifenberger.com>
MFC after: 3 days
128 bytes, 256 bytes, and 32 bytes respectively. This makes it much
easier to identify when two kernels are identical apart from a version
number bump (as often happens on security branches).
Discussed on: freebsd-arch, in May 2005
- Remove a lot of superfluous locking during attach. There is no need
to lock access to the driver until some other thread has a way of getting
to it. For ethernet drivers the other ways include registering an
interrupt handler via bus_setup_intr(), calling ether_ifattach() to hook
into the network stack, and kicking off a callout-driven timer via
callout_reset().
- Use callout_* rather than timeout/untimeout.
- Break out of xl_rxeof() if IFF_DRV_RUNNING is clear after ifp->if_input
returns to handle the case where the interface was stopped while we were
passing a packet up the stack. Don't call xl_rxeof() in xl_rxeof_task()
unless IFF_DRV_RUNNING is set. With these fixes in place, any
outstanding task will gracefully terminate as soon as it gets a chance to
run after the interface has been stopped via xl_stop(). As a result,
taskqueue_drain() is no longer required in xl_stop(). The task is still
drained in detach() however to make sure that detach() can safely destroy
the driver mutex at the end of the function.
- Lock the driver lock in the ifmedia callouts and don't lock across
ifmedia_ioctl() in xl_ioctl().
Note: glebius came up with most of (3) as well independently. I took a
rather roundabout way of arriving at the same conclusion.
MFC after: 3 days
- Add locked versions of start and init. The SRM_MEDIA code in dc_init()
stayed in dc_init() instead of moving to dc_init_locked() to make the
locking saner.
- Use callout_init_mtx().
- Fixup locking in detach and ioctl.
- Lock the driver in the ifmedia callouts.
- Don't recurse on the driver lock.
- De-spl.
MFC after: 3 days
struct ifnet most of if_findindex() become a complex no-op. Remove it
and replace it with a corrected version of the four line for loop it
devolved to plus some error handling. This should probably be replaced
with subr_unit at some point.
Switch from checking ifaddr_byindex to ifnet_byindex when looking for
empty indexes. Since we're doing this from if_alloc/if_free, we can
only be sure that ifnet_byindex will be correct. This fixes panics when
loading the ef(4) module. The panics were caused by the fact that
if_alloc was called four time before if_attach was called and thus
ifaddr_byindex was not set and the same unit was allocated again. This
in turn caused the first if_attach to fail because the ifp was not the
one in ifnet_byindex(ifp->if_index).
Reported by: "Wojciech A. Koszek" <dunstan at freebsd dot czest dot pl>
PR: kern/84987
MFC After: 1 day
- Add locked variants of start, init, and ifmedia_upd.
- Use callout_* instead of timeout/untimeout.
- Don't recurse on the driver lock.
- Fixup locking in ioctl.
- Lock the driver lock in the ifmedia handlers rather than across
ifmedia_ioctl().
Tested by: brueffer
MFC after: 3 days
is not defined, so that the module will get the
compatibility options from the current kernel configuration
if built with the latter, not with the world.
[Some other modules seem in need of fixing WRT this, too.]
Add more compatibility options found in GENERIC to the default
opt_compat.h. While not all of them are used in the procfs code,
we can't tell for sure if the system .h files don't need them either,
so let's stay on the safe side.
Submitted by: kensmith
Reviewed by: ru
interrupt comes in later on, which can happen in some uncommon cases.
Another possible fix is to call re_detach() instead of re_stop(), like
ve(4) does, but I am not sure if the latter is really RTTD, so that stick
with this one-liner for now.
PR: kern/80005
Approved by: silence on -arch, no reply from selected network gurus
This is actually a local DoS, as every user can use /dev/crypto if there
is crypto hardware in the system and cryptodev.ko is loaded (or compiled
into the kernel).
Reported by: Mike Tancsa <mike@sentex.net>
MFC after: 1 day
than one interface in one subnet. However, some userland apps rely on
the believe that this configuration is impossible.
Add a sysctl switch net.inet.ip.same_prefix_carp_only. If the switch
is on, then kernel will refuse to add an additional interface to
already connected subnet unless the interface is CARP. Default
value is off.
PR: bin/82306
In collaboration with: mlaier
the serial console speed (i386 and amd64 only). If the previous
stage boot loader requested a serial console (RB_SERIAL or RB_MULTIPLE)
then the default speed is determined from the current serial port
speed. Otherwise it is set to 9600 or the value of BOOT_COMCONSOLE_SPEED
at compile time.
This makes it possible to set the serial port speed once in
/boot.config and the setting will propagate to boot2, loader and
the kernel serial console.
/boot.config or on the "boot:" prompt line via a "-S<speed>" flag,
e.g. "-h -S19200". This adds about 50 bytes to the size of boot2
and required a few other small changes to limit the size impact.
This changes only affects boot2; there are further loader changes
to follow.
r_gdt -> saved_gdt
r_idt -> saved_idt
r_ldt -> saved_ldt
in order to prevent clashes with variables with same names
defined in <machine/segments.h>. This fixes compilation of this
file with GCC 4.0.
Reviewed by: njl
- Don't set IFF_ALLMULTI in our ifnet's if_flags if we end up allowing
all multicast due to limits in the MAC receive filters in hardware.
Requested by: rwatson (2)
that if softclock is running on another CPU and is blocked on our driver
lock, we will wait until it has acquired the lock, seen that it was
cancelled, dropped the lock, and awakened us so that we can safely destroy
the mutex.
MFC after: 3 days
- Add locked variants of el_init and el_start.
- Don't initialize the mutex and lock it during el_probe().
- Do initialize the mutex during attach. (el_probe() did destroy the mutex
to cleanup, so this meant the driver was always using a destroyed mutex
when it was running.)
- Setup the interrupt handler after ether_ifattach().
- Fix locking in el_detach() and el_ioctl().
Note: Since I couldn't actually find anyone with this hardware, I'm going
ahead and committing these changes so they won't be lost. I'll remove the
driver in a week (real purpose of the MFC after below) unless someone pipes
up to test this.
MFC after: 1 week
Tested by: gcc(1)
effect. since CPU speed is restored by degrees, we cannot use
the facility of saving cpu speed by CPUFREQ_set() effectively.
so, we need to save the value when passive cooling is in effect.
Repoeted by: Kevin Oberman <oberman__at__es.net>
or unreadable blocks, make sure to destroy the mutex we created.
Also fix an unrelated typo in a comment.
Found by: Peter Holm's stress tests
Reviewed by: dwmalone
MFC after: 3 days
by md(4). Before this change, it was possible to by-pass these flags
by creating memory disks which used a file as a backing store and
writing to the device.
This was discussed by the security team, and although this is problematic,
it was decided that it was not critical as we never guarantee that root will
be restricted.
This change implements the following behavior changes:
-If the user specifies the readonly flag, unset write operations before
opening the file. If the FWRITE mask is unset, the device will be
created with the MD_READONLY mask set. (readonly)
-Add a check in g_md_access which checks to see if the MD_READONLY mask
is set, if so return EROFS
-Do not gracefully downgrade access modes without telling the user. Instead
make the user specify their intentions for the device (assuming the file is
read only). This seems like the more correct way to handle things.
This is a RELENG_6 candidate.
PR: kern/84635
Reviewed by: phk
- Add locked variants of my_start() and my_init().
- Assert that the lock is held in several places rather than recursing.
- Overhaul failure case handling in my_attach() so that it will actually
clean up completely in each of the failure cases.
- Setup the interrupt after ether_ifattach() in my_attach().
- Remove unused callout handle from softc.
- Free the metadata for the descriptors my_in detach() (we leaked it
before).
- Fix locking in my_ioctl().
- Remove spls.
Tested by: brueffer
MFC after: 3 days
It checked other algorithms against this bug and it seems they aren't
affected.
Reported by: Mike Tancsa <mike@sentex.net>
PR: i386/84860
Reviewed by: phk, cperciva(x2)
- Add a note that additions should be made to if_free_type and not
if_free to help avoid this in the future.
This apparently fixes a use after free in if_bridge and may fix bugs
in other direct if_free_type consumers.
Reported by: thompsa
archs and enable splash(4) by default (the non-working screen savers
either don't compile or just have no effect when loaded, i.e. don't
cause harm).
MFC after: 1 week
which serial device to use in that case respectively to not rely on
the OFW names of the input/output and stdin/stdout devices. Instead
check whether input and output refers to the same device and is of
type serial (uart(4) was already doing this) and for the fallback
to a serial console in case a keyboard is the selected input device
but unplugged do the same for stdin and stdout in case the input
device is nonexistent (PS/2 and USB keyboards) or has a 'keyboard'
property (RS232 keyboards). Additionally also check whether the OFW
did a fallback to a serial console in the same way in case the
output device is nonexistent. While at it save on some variables
and for sys/boot/sparc64/loader/metadata.c move the code in question
to a new function md_bootserial() so it can be kept in sync with
uart_cpu_getdev_console() more easily.
This fixes selecting a serial console and the appropriate device
when using a device path for the 'input-device' and 'output-device'
OFW environment variables instead of an alias for the serial device
to use or when using a screen alias that additionally denotes a
video mode (like e.g. 'screen:r1024x768x60') but no keyboard is
plugged in (amongst others). It also makes the code select a serial
console in case the OFW did the same due to a misconfiguration like
both 'input-device' and 'output-device' set to 'keyboard' or to a
nonexisting device (whether the OFW does a fallback to a serial
console in case of a misconfiguration or one ends up with just no
console at all highly depends on the OBP version however).
- Reduce the size of buffers that only ever need to hold the string
'serial' accordingly. Double the size of buffers that may need to
hold a device path as e.g. '/pci@8,700000/ebus@5/serial@1,400000:a'
exceeds 32 chars.
- Remove the package handle of the '/options' node from the argument
list of uart_cpu_getdev_dbgport() as it's unused there and future
use is also unlikely.
MFC after: 1 week
When a drive is newly created, it's state is initially set to 'down',
so it won't allow saving the config to it (thus it will never know of
itself being created). Work around this by adding a new flag, that's
also checked when saving the config to a drive.
could initialise while unlocked if the bridge is not up when setting the inet
address, ether_ioctl() would call bridge_init.
Change it so bridge_init is always called unlocked and then locks before
calling bstp_initialization().
Reported by: Michal Mertl
Approved by: mlaier (mentor)
MFC after: 3 days
could initialise while unlocked if the bridge is not up when setting the inet
address, ether_ioctl() would call bridge_init.
Change it so bridge_init is always called unlocked and then locks before
calling bstp_initialization().
Reported by: Michal Mertl
Approved by: mlaier (mentor)
MFC after: 3 days
points in lookup(). The lock can be dropped safely around VFS_ROOT because
LOCKPARENT semantics with child and perent vnodes coming from different FSes
does not really have any meaningful use. On the other hard, this prevents
easily triggered deadlock on systems using automounter daemon.
some of the options test, specifically the joliet and rockridge tests.
Since the root mount callchain doesn't go through cd9660_cmount, the
default mount options aren't set. Rather than having the main codepath
assume the options are there, test for the absence of the inverted
optioin
e.g. instead of vfs_flagopt(.. "joliet" ..), test for
!vfs_flagopt(.. "nojoliet" ..)
This works for root mount, non-root mount and future nmount cases.
- in cd9660_cmount, remove inadvertent setting of "gens" when "extatt"
was set.
Reported by: grehan, Dario Freni <saturnero at freesbie org>
Tested by: Dario Freni
Not objected to by: phk
MFC after: 3 days
high FP registers. It was not that the IPI got lost due to the
perceived unreliability of the IPI delivery, but rather that the
IPI was not assigned a vector (ugh). Sending a 0 vector to a CPU
results in a stray external interrupt.
Add a KASSERT to ipi_send() to catch this. The initialization of
the IPIs could be better, but it's not at all sure what the future
of the code is. Avoid wasting a lot of time on something that is
going to be rewritten anyway.
vm_pager_init() is run before required nswbuf variable has been set
to correct value. This caused system to run with single pbuf available
for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt
variable in the same way.
Reported by: ade
Obtained from: alc
MFC after: 2 days
add support for getting the current policy setting and collecting
the list of mac addresses in the acl table.
Submitted by: Michal Mertl (original version)
MFC after: 2 weeks
* Correct handling of IPv6 Extension Headers.
* Add unreach6 code.
* Add logging for IPv6.
Submitted by: sysctl handling derived from patch from ume needed for ip6fw
Obtained from: is_icmp6_query and send_reject6 derived from similar
functions of netinet6,ip6fw
Reviewed by: ume, gnn; silence on ipfw@
Test setup provided by: CK Software GmbH
MFC after: 6 days
the procfs module by creating opt_compat.h with
appropriate compatibility options: COMPAT_43 on all
arch's and COMPAT_IA32 in addition on amd64.
Pointy hat to: peter
MFC after: 3 days
incoming ARP packet and route request adding/removing
ARP entries. The root of the problem is that
struct llinfo_arp was accessed without any locks.
To close race we will use locking provided by
rtentry, that references this llinfo_arp:
- Make arplookup() return a locked rtentry.
- In arpresolve() hold the lock provided by
rt_check()/arplookup() until the end of function,
covering all accesses to the rtentry itself and
llinfo_arp it refers to.
- In in_arpinput() do not drop lock provided by
arplookup() during first part of the function.
- Simplify logic in the first part of in_arpinput(),
removing one level of indentation.
- In the second part of in_arpinput() hold rtentry
lock while copying address.
o Fix a condition when route entry is destroyed, while
another thread is contested on its lock:
- When storing a pointer to rtentry in llinfo_arp list,
always add a reference to this rtentry, to prevent
rtentry being destroyed via RTM_DELETE request.
- Remove this reference when removing entry from
llinfo_arp list.
o Further cleanup of arptimer():
- Inline arptfree() into arptimer().
- Use official queue(3) way to pass LIST.
- Hold rtentry lock while reading its structure.
- Do not check that sdl_family is AF_LINK, but
assert this.
Reviewed by: sam
Stress test: http://www.holm.cc/stress/log/cons141.html
Stress test: http://people.freebsd.org/~pho/stress/log/cons144.html
- rt0 passed to rt_check() must not be NULL, assert this.
- rt returned by rt_check() must be valid locked rtentry,
if no error occured.
o Modify callers, so that they never pass NULL rt0
to rt_check().
Reviewed by: sam, ume (nd6.c)
#if 0.
- Use pci_enable_busmaster() to enable busmastering instead of frobbing
the command register directly.
- Don't check to see if memory or I/O can be enabled by writing to the
command register. The PCI bus driver's bus_alloc_resource() method
already checks this and will fail if it can't enable the bit.
everywhere. This means that my_unit is no longer used as well. The
watchdog routine now also prints 'my0: ...' rather than 'm0x0: ...'.
- Don't bzero the softc and don't try to free it in detach or if attach
fails.
- A whitespace fix.
- Use the driver lock instead of Giant in a bus dma callback.
- Clear IFF_DRV_(RUNNING|OACTIVE) in hme_stop() instead of just clearing
RUNNING in hme_ioctl() to be more like other ethernet drivers.
- Lock the driver lock around mii operations.
- Remove spls.
- Cleanup locking in hme_ioctl().
MFC after: 1 week
use the station channel properties. Fixes assert failure/bogus
operation when an ap is operating in 11a and has associated stations
then switches to 11g.
Noticed by: Michal Mertl
Reviewed by: avatar
MFC after: 2 weeks
o add ic_curchan and use it uniformly for specifying the current
channel instead of overloading ic->ic_bss->ni_chan (or in some
drivers ic_ibss_chan)
o add ieee80211_scanparams structure to encapsulate scanning-related
state captured for rx frames
o move rx beacon+probe response frame handling into separate routines
o change beacon+probe response handling to treat the scan table
more like a scan cache--look for an existing entry before adding
a new one; this combined with ic_curchan use corrects handling of
stations that were previously found at a different channel
o move adhoc neighbor discovery by beacon+probe response frames to
a new ieee80211_add_neighbor routine
Reviewed by: avatar
Tested by: avatar, Michal Mertl
MFC after: 2 weeks
due to the vm object being locked.
When a process writes large amounts of data to a file, the vm object associated
with that file can contain most of the physical pages on the machine. If the
process is preempted while holding the lock on the vm object, pagedaemon would
be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE
to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to
other vm objects.
Temporarily unlock the page queues lock while locking vm objects to avoid lock
order violation. Detect and handle relevant page queue changes.
This change depends on both the lock portion of struct vm_object and normal
struct vm_page being type stable.
Reviewed by: alc
to atomically return either an existing set of IP multicast options for the
PCB, or a newlly allocated set with default values. The inpcb is returned
locked. This function may sleep.
Call ip_moptions() to acquire a reference to a PCB's socket options, and
perform the update of the options while holding the PCB lock. Release the
lock before returning.
Remove garbage collection of multicast options when values return to the
default, as this complicates locking substantially. Most applications
allocate a socket either to be multicast, or not, and don't tend to keep
around sockets that have previously been used for multicast, then used for
unicast.
This closes a number of race conditions involving multiple threads or
processes modifying the IP multicast state of a socket simultaenously.
MFC after: 7 days
list lock, as there has been a report that an alternative lock order
is getting introduced. This should help ferret it out.
Reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
device driver, owned by the network stack, or initialized by the device
driver before attach and read-only from then on.
Not all device drivers and network stack components currently follow
these rules, especially with respect to IFF_UP, and a few exceptions
with IFF_ALLMULTI.
MFC after: 7 days
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.
Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.
Reviewed by: pjd, bz
MFC after: 7 days
and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making
and documenting the locking of these flags the responsibility of the
device driver, not the network stack. The flags for these two fields
will be mutually exclusive so that they can be exposed to user space as
though they were stored in the same variable.
Provide #defines to provide the old names #ifndef _KERNEL, so that user
applications (such as ifconfig) can use the old flag names. Using the
old names in a device driver will result in a compile error in order to
help device driver writers adopt the new model.
When exposing the interface flags to user space, via interface ioctls
or routing sockets, or the two fields together. Since the driver flags
cannot currently be set for user space, no new logic is currently
required to handle this case.
Add some assertions that general purpose network stack routines, such
as if_setflags(), are not improperly used on driver-owned flags.
With this change, a large number of very minor network stack races are
closed, subject to correct device driver locking. Most were likely
never triggered.
Driver sweep to follow; many thanks to pjd and bz for the line-by-line
review they gave this patch.
Reviewed by: pjd, bz
MFC after: 7 days
- Push 'i' into the only block where it is used.
- Remove redundant check for rt being NULL. If rt_check() hasn't
returned an error, then rt is valid.
Reviewed by: gnn
m_copym(m, 0, M_COPYALL, how).
This is required for strict alignment architectures where we align the IP
header in the input path but m_copym() will create an unaligned copy in
bridge_broadcast(). m_copypacket() preserves alignment of the first mbuf.
Noticed by: Petri Simolin
Approved by: mlaier (mentor)
MFC after: 3 days
not holding any non-sleep-able-locks locks when copyin is called.
This gets executed un-conditionally since we have no function
to wire the buffer in this direction.
Pointed out by: truckman
MFC after: 1 week
the timeout routine.
- Fix locking in detach.
- Add locking in shutdown.
- Don't mess with the PCI command register in resume, the PCI bus driver
already does this for us.
- Add locking to the non-serial ifmedia routines.
- Fix locking in ioctl.
- Remove spls and support for 4.x.
MFC after: 1 week
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do. This avoids having multiple event handler types and
fall-back/precedence logic in devfs.
This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.
Requested by: phk
MFC after: 3 days
I copied strcasecmp() from userland to the kernel and it didn't worked!
I started to debug the problem and I find out that this line:
while (tolower(*us1) == tolower(*us2++)) {
was adding _3_ bytes to 'us2' pointer. Am I loosing my minds here?!...
No, in-kernel tolower() is a macro which uses its argument three times.
Bad tolower(9), no cookie.
the buffer has not been wired and we are holding any non-sleep-able locks,
drop a witness warning. If the buffer has not been wired, it is possible
that the writing of the data can sleep, especially if the page is not in
memory. This can result in a number of different locking issues, including
dead locks.
MFC after: 1 week
Discussed with: rwatson
Reviewed by: jhb
Crypto changes:
o change driver/net80211 key_alloc api to return tx+rx key indices; a
driver can leave the rx key index set to IEEE80211_KEYIX_NONE or set
it to be the same as the tx key index (the former disables use of
the key index in building the keyix->node mapping table and is the
default setup for naive drivers by null_key_alloc)
o add cs_max_keyid to crypto state to specify the max h/w key index a
driver will return; this is used to allocate the key index mapping
table and to bounds check table loookups
o while here introduce ieee80211_keyix (finally) for the type of a h/w
key index
o change crypto notifiers for rx failures to pass the rx key index up
as appropriate (michael failure, replay, etc.)
Node table changes:
o optionally allocate a h/w key index to node mapping table for the
station table using the max key index setting supplied by drivers
(note the scan table does not get a map)
o defer node table allocation to lateattach so the driver has a chance
to set the max key id to size the key index map
o while here also defer the aid bitmap allocation
o add new ieee80211_find_rxnode_withkey api to find a sta/node entry
on frame receive with an optional h/w key index to use in checking
mapping table; also updates the map if it does a hash lookup and the
found node has a rx key index set in the unicast key; note this work
is separated from the old ieee80211_find_rxnode call so drivers do
not need to be aware of the new mechanism
o move some node table manipulation under the node table lock to close
a race on node delete
o add ieee80211_node_delucastkey to do the dirty work of deleting
unicast key state for a node (deletes any key and handles key map
references)
Ath driver:
o nuke private sc_keyixmap mechansim in favor of net80211 support
o update key alloc api
These changes close several race conditions for the ath driver operating
in ap mode. Other drivers should see no change. Station mode operation
for ath no longer uses the key index map but performance tests show no
noticeable change and this will be fixed when the scan table is eliminated
with the new scanning support.
Tested by: Michal Mertl, avatar, others
Reviewed by: avatar, others
MFC after: 2 weeks
entry points that will be inserted over the life-time of the 6.x branch,
including for:
- New struct file labeling (void * already added to struct file), events,
access control checks.
- Additional struct mount access control checks, internalization/
externalization.
- mac_check_cap()
- System call enter/exit check and event.
- Socket and vnode ioctl entry points.
MFC after: 3 days
o separate configured beacon interval from listen interval; this
avoids potential use of one value for the other (e.g. setting
powersavesleep to 0 clobbers the beacon interval used in hostap
or ibss mode)
o bounds check the beacon interval received in probe response and
beacon frames and drop frames with bogus settings; not clear
if we should instead clamp the value as any alteration would
result in mismatched sta+ap configuration and probably be more
confusing (don't want to log to the console but perhaps ok with
rate limiting)
o while here up max beacon interval to reflect WiFi standard
Noticed by: Martin <nakal@nurfuerspam.de>
MFC after: 1 week
made in pmap_protect(): The pmap's resident count should not be reduced
unless mappings are removed.
The errant change to the pmap's resident count could result in a later
pmap_remove() failing to remove any mappings if the errant change has set
the pmap's resident count to zero.
o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU
uses to look up translations it can't find in the TLB. As such,
the VHPT serves as a level 1 cache (the TLB being a level 0 cache)
and best results are obtained when it's not shared between CPUs.
The collision chain (i.e. the hash bucket) is shared between CPUs,
as all buckets together constitute our collection of PTEs. To
achieve this, the collision chain does not point to the first PTE
in the list anymore, but to a hash bucket head structure. The
head structure contains the pointer to the first PTE in the list,
as well as a mutex to lock the bucket. Thus, each bucket is locked
independently of each other. With at least 1024 buckets in the VHPT,
this provides for sufficiently finei-grained locking to make the
ssolution scalable to large SMP machines.
o Add synchronisation to the lazy FP context switching. We do this
with a seperate per-thread lock. On SMP machines the lazy high FP
context switching without synchronisation caused inconsistent
state, which resulted in a panic. Since the use of the high FP
registers is not common, it's possible that races exist. The ia64
package build has proven to be a good stress test, so this will
get plenty of exercise in the near future.
o Don't use the local ID of the processor we want to send the IPI to
as the argument to ipi_send(). use the struct pcpu pointer instead.
The reason for this is that IPI delivery is unreliable. It has been
observed that sending an IPI to a CPU causes it to receive a stray
external interrupt. As such, we need a way to make the delivery
reliable. The intended solution is to queue requests in the target
CPU's per-CPU structure and use a single IPI to inform the CPU that
there's a new entry in the queue. If that IPI gets lost, the CPU
can check it's queue at any convenient time (such as for each
clock interrupt). This also allows us to send requests to a CPU
without interrupting it, if such would be beneficial.
With these changes SMP is almost working. There are still some random
process crashes and the machine can hang due to having the IPI lost
that deals with the high FP context switch.
The overhead of introducing the hash bucket head structure results
in a performance degradation of about 1% for UP (extra pointer
indirection). This is surprisingly small and is offset by gaining
reasonably/good scalable SMP support.
allocating a VHPT per CPU. Since we don't yet know how many CPUs
are actually in the system at the time we need to allocate the
VHPTs, we allocate for MAXCPU processors. This can result in a
lot of wasted space for 2-way machines. So, for now, limit MAXCPU
to something smaller until we have something more dynamic.
integer to an unsigned long. This lifts variables like the maximum
number of pages available for shared memory from 2^31 to 2^32 on 32
bit architectures, and from 2^31 to 2^64 on 64 bit architectures.
It should be noted that this changes breaks ABI on 64 bit architectures
because the size of the shmmax, shmmin, shmmni, shmseg and shmall members
of the shminfo structure has changed.
Silence on: current@
when operating in ap mode. Previously we allocated a node from the
station table, sent the frame (using the node), then released the
reference that "held the frame in the table". But while the frame
was in flight the node might be reclaimed which could lead to
problems. The solution is to add an ieee80211_tmp_node routine
that crafts a node that does exist in a table and so isn't ever
reclaimed; it exists only so long as the associated frame is in flight.
MFC after: 5 days
vnode is inactivated), possibly leading to a NULL dereference when
checking if the mount wants knotes to be activated in the VOP hooks.
So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(),
if necessary, and check it when activating knotes.
Since the flags are not erased when a vnode is being held, we can safely
read them.
Reviewed by: kris@
MFC after: 3 days
Previously, we used all info (including -1 or "not present") which would
keep the system from reaching 100% when charging.
Reported by: Eric Anderson
MFC after: 2 days
- Add locked versions of the init() and start() methods.
- Use callout_*() rather than timeout().
- Make the driver lock non-recursive.
- Push down locking in detach() and ioctl().
- Fix the tick routine to bail if the interface has been stopped and use
callout_drain() in detach() after the call to stop().
- Lock the driver lock in the ifmedia handlers.
Tested by: Ketrien I. Saihr-Kesenchedra ketrien at error404.nls.net
MFC after: 1 week
0. This means that we 'succeed' the attach, even after we've freed
the internal data bits. This leads to a panic when you eject the card
with this problem.
Set error = ENXIO in the mac read zeros case.
if_attach(). This allows ethernet drivers to use it in their routines
to program their MAC filters before ether_ifattach() is called (de(4) is
one such driver). Also, the if_addr mutex is destroyed in if_free()
rather than if_detach(), so there was another potential bug in that a
driver that failed during attach and called if_free() without having
called ether_ifattach() would have tried to destroy an uninitialized mutex.
Reported by: Holm Tiffe holm at freibergnet dot de
Discussed with: rwatson
as opt_vmpage.h will not be available to user space library builds. A
similar existing check is present for KLD_MODULE for similar reasons.
MFC after: 3 days
when using mice containing a tilt movement: there was a missing
usb_callout_init() for the UMS_SPUR_BUT_UP quirk code, and UMS_T
was defined to the same flag value as UMS_SPUR_BUT_UP.
Reported by: flz
MFC after: 3 days
lists, as well as accessor macros. For now, this is a recursive mutex
due code sequences where IPv4 multicast calls into IGMP calls into
ip_output(), which then tests for a multicast forwarding case.
For support macros in in_var.h to check multicast address lists, assert
that in_multi_mtx is held.
Acquire in_multi_mtx around iteration over the IPv4 multicast address
lists, such as in ip_input() and ip_output().
Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses,
as well as over the manipulation of ifnet multicast address lists in order
to keep the two layers in sync.
Lock down accesses to IPv4 multicast addresses in IGMP, or assert the
lock when performing IGMP join/leave events.
Eliminate spl's associated with IPv4 multicast addresses, portions of
IGMP that weren't previously expunged by IGMP locking.
Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded
lock order in WITNESS, in that order.
Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 10 days
holders. The license that was approved for my changes to this driver
originally came from LSI, but the changes to the driver core are not
owned by LSI.
MFC: 1 day
caller by saving the stack of the last locker/unlocker in lockmgr. We
also put the stack in KTR at the moment.
Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
over iteration of their multicast address lists when synchronizing the
hardware address filter with the network stack-maintained list.
Problem reported by: Ed Maste (emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week
using ifp->if_addr_mtx:
- Initialize if_addr_mtx when ifnet is initialized.
- Destroy if_addr_mtx when ifnet is torn down.
- Rename ifmaof_ifpforaddr() to if_findmulti(); assert if_addr_mtx.
Staticize.
- Extract ifmultiaddr allocation and initialization into if_allocmulti();
accept a 'mflags' argument to indicate whether or not sleeping is
permitted. This centralizes error handling and address duplication.
- Extract ifmultiaddr tear-down and deallocation in if_freemulti().
- Re-structure if_addmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Make use of non-sleeping allocations. Annotate the fact that we only
generate routing socket events for explicit address addition, not
implicit link layer address addition.
- Re-structure if_delmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Annotate the lack of a routing socket event for implicit link layer
address removal.
- De-spl all and sundry.
Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week
ifp->if_resolvemulti(), do so with M_NOWAIT rather than M_WAITOK, so
that a mutex can be held over the call. In the FDDI code, add a
missing M_ZERO. Consumers are already aware that if_resolvemulti()
can fail.
MFC after: 1 week
lists. Add accessor macros.
This changes the size of struct ifnet, but ideally, all ifnet consumers
are now using if_alloc() to allocate these structures rather than
embedding them into device driver softc's, so this won't modify the
network device driver ABI.
MFC after: 1 week
- Conforming to the latest ether_ifattach() change;
- Moving PCCARD_API_LEVEL to the right place.
Reported and Tested by: Vladimir Grebenschikov <vova at fbsd dot ru>
MFC after: 3 days
FreeBSD specific ip_newid() changes NetBSD does not have.
Correct handling of non AF_INET packets passed to bpf [2].
PR: kern/80340[1], NetBSD PRs 29150[1], 30844[2]
Obtained from: NetBSD ip_gre.c rev. 1.34,1.35, if_gre.c rev. 1.56
Submitted by: Gert Doering <gert at greenie.muc.de>[2]
MFC after: 4 days
primary vendor id for this device. The location is empty because ISA
doesn't give one a way to generally locate a card. PNP BIOS entries
do provide a way to locate cards, as do isa pnp cards. These
locations will be added as soon as the code to remember them is
written.
1. Support wide range sampling rate, as low as 1hz up to int32 max
(which is, insane) through new feeder_rate, multiple precisions
choice (32/64 bit converter). This is indeed, quite insane, but it
does give us more room and flexibility. Plenty sysctl options to
adjust resampling characteristics.
2. Support 24/32 bit pcm format conversion through new, much improved,
simplified and optimized feeder_fmt.
Changes:
1. buffer.c / dsp.c / sound.h
* Support for 24/32 AFMT.
2. feeder_rate.c
* New implementation of sampling rate conversion with 32/64 bit
precision, 1 - int32max hz (which is, ridiculous, yet very
addictive). Much improved / smarter buffer management to not
cause any missing samples at the end of conversion process
* Tunable sysctls for various aspect:
hw.snd.feeder_rate_ratemin - minimum allowable sampling rate
(default to 4000)
hw.snd.feeder_rate_ratemax - maximum allowable sampling rate
(default to 1102500)
hw.snd.feeder_rate_buffersize - conversion buffer size
(default to 8192)
hw.snd.feeder_rate_scaling - scaling / conversion method
(please refer to the source for explaination). Default to
previous implementation type.
3. feeder_fmt.c / sound.h
* New implementation, support for 24/32bit conversion, optimized,
and simplified. Few routines has been removed (8 to xlaw, 16 to
8). It just doesn't make sense.
4. channel.c
* Support for 24/32 AFMT
* Fix wrong xruns increment, causing incorrect underruns statistic
while using vchans.
5. vchan.c
* Support for 24/32 AFMT
* Proper speed / rate detection especially for fixed rate ac97.
User can override it using kernel hint:
hint.pcm.<unit>.vchanrate="xxxx".
Notes / Issues:
* Virtual Channels (vchans)
Enabling vchans can really, really help to solve overrun
issues. This is quite understandable, because it operates
entirely within its own buffering system without relying on
hardware interrupt / state. Even if you don't need vchan,
just enable single channel can help much. Few soundcards
(notably via8233x, sblive, possibly others) have their own
hardware multi channel, and this is unfortunately beyond
vchan reachability.
* The arrival of 24/32 also come with a price. Applications
that can do 24/32bit playback need to be recompiled (notably
mplayer). Use (recompiled) mplayer to experiment / test /
debug this various format using -af format=fmt. Note that
24bit seeking in mplayer is a little bit broken, sometimes
can cause silence or loud static noise. Pausing / seeking
few times can solve this problem.
You don't have to rebuild world entirely for this. Simply
copy /usr/src/sys/sys/soundcard.h to
/usr/include/sys/soundcard.h would suffice. Few drivers also
need recompilation, and this can be done via
/usr/src/sys/modules/sound/.
Support for 24bit hardware playback is beyond the scope of
this changes. That would require spessific hardware driver
changes.
* Don't expect playing 9999999999hz is a wise decision. Be
reasonable. The new feeder_rate implemention provide
flexibility, not insanity. You can easily chew up your CPU
with this kind of mind instability. Please use proper
mosquito repellent device for this obvious cracked brain
attempt. As for testing purposes, you can use (again)
mplayer to generate / play with different sampling rate. Use
something like "mplayer -af resample=192000:0:0 <files>".
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
Tested by: multimedia@
1) 32bit data, packed within 32bit (4bytes) boundary.
2) 24bit data, packed within 32bit (4bytes) boundary where the data
is stored in the 24 most significant bits and least significant 8
bits are not used and should be set to 0.
While this might hold true in few cases, lots of applications (notably
mplayer, sweep) really deal / produce 24bit as what they should meant
to be: 24bit data / 3bytes per sample.
To handle this "true" 24bit pcm format add AFMT_x24_xE, so the in-kernel
conversion space did not confuse itself with 32bit variant.
You need to rebuild mplayer after installing this change (this header and
the upcomming kernel changes), if you want to use this new feature.
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
Tested by: multimedia@
This mostly to help CT4730, but apparently it does help other
cards too (especially via8233x). This probably need further test
and confirmation from other people with ac97 cards other than via
/ es137x.
* Aggresive dac power wake up call, again, to help CT4730 (and
probably others).
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
Tested by: multimedia@
- Don't mark MPSAFE (yet).
- DSP_CMD_DMAEXIT_8 doesn't work on old cards, use sb_reset_dsp() instead.
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
* Add kernel hint option to disable DXS channels entirely. Report
from several skype users / Pav Lucistnik indicate that disabling
DXS may fix lots of pop / crackling noise. To disable DXS add
hint.pcm.<unit>.via_dxs_disabled="1" to /boot/device.hints.
Further investigation of the issues regarding DXS showed, that
the problem is in another (more generic) place, but until the
right fix is tested/reviewed this may help a little bit.
Added sysctl's to aid testing/debugging:
hint.pcm.<unit>.via_dxs_disabled=X - Disable / Enable DXS channels entirely
hint.pcm.<unit>.via_dxs_channels=X - Limit DXS channels up to X
hint.pcm.<unit>.via_sgd_channels=X - Limit SGD channels up to X
hint.pcm.<unit>.via_dxs_src=X - Enable / Disable DXS sample rate
converter.
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
Tested by: multimedia@
especially for CT4730 / EV1938 chip, causing misconfigured mixer
(David Xu), crippled after power cycle (Kevin Oberman). Fixed.
* Incorporate locking/spdif patches from Jon Noack / matk. Not all
es137x can really do spdif, clean it up a bit to only let few capable
chip. This adds a "hw.snd.pcm<unit>.spdif_enabled" sysctl until
a more generic way of handling this from userland (by an ordinary
user) is designed/implemented.
* Convert all bus_space_(read|write) to use es_rd/es_wr, simmilar
with other drivers.
* Add tunable hw.snd.pcm<unit>.latency_timer sysctl to toggle pci
latency timer value on the fly. Much noise / pop / crackling
issues can be solved by increasing its value. Other people have
pointed out to use pciconf instead, but this is just an added
value specific for CT4730/EV1938.
* Remove es137x specific debug sysctl/code.
Several PRs can now be closed.
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my>
Submitted by: Jon Noack <noackjr@alumni.rice.edu> (implicit)
Submitted by: matk (implicit)
PR: 59349, 68594, 73498
Tested by: multimedia@
kenv environment in kern_environment.c switches to dynamic kenv. The prior
call sets the static variable hintp to the static hints in subr_hints.c
(hintmode==0).
However, changes to the environment are not detected by the resource_xxx
lookups after the change to dynamic kernel environment, so the lookup
routines only report the old stuff of hintmode==0, even after the change to
the dynamic kenv. This causes kenv users to see a different environment than
the kernel routines.
This is a problem in the mixer.c code that looks up initial mixer volume
settings from the hints: If the hints are dynamic and not from the
device.hints file, mixer.c doesn't see them, but kenv does.
The patch from the PR (modified to comply to the style of the function)
solves this.
PR: 83686
Submitted by: Harry Coin <harrycoin@qconline.com>
This has no security implications since only root is allowed to use
kenv(1) (and corrupt the kernel memory after adding too much variables
previous to this commit).
This is based upon the PR [1] mentioned below, but extended to check both
bounds (in case of an overflow of the counting variable) and to comply
to the style of the function. An overflow of the counting variable
shouldn't happen after adding the check for the upper bound, but better
safe than sorry (in case some other function in the kernel overwrites
random memory).
An interested soul may want to add a printf to notify root in case the
bounds are hit.
Also allocate KENV_SIZE+1 entries (the array is NULL-terminated), since
the comment for KENV_SIZE says it's the maximum number of environment
strings. [2]
PR: 83687 [1]
Submitted by: Harry Coin <harrycoin@qconline.com> [1]
Submitted by: Ariff Abdullah <skywizard@MyBSD.org.my> [2]
in the resource error in ep_alloc case. This results in a panic.
Zero resources to make it safe to call twice pending resolution of
layering questions.
MFC After: 3 days
mlx devices. This fixes an issue where mlx device drives fail to be
detected at system boot.
This is a RELENG_6 candidate.
Submitted by: oliver
PR: kern/84163
trap_subr.S: declare a stub for the a-unavailable trap
that does an absolute jump to the vector-assist trap.
This is due to the fact that the vec-unavail trap
doesn't start at a 256-byte boundary, so the trick of
masking the bottom 8 bits of the link register to identify
the interrupt doesn't work, so let the vec-assist
case handle Altivec-disabled for the time being.
Note that this will be fixed in the future with a much
smaller vector code-stub (< 16 bytes) that will allow
use of strange vector offsets that are also present in
4xx processors, and also allow smaller differences in
vector codepaths on the G5.
trap.c: Treat altivec-unavailable/assist process traps as SIGILL.
Not quite correct, since altivec-assist should really be a panic,
but it is fine for the moment due to the above measure.
machdep.c Install the stub code for the altivec-unavailable trap, and
the standard trap code at the altivec-assist.
Reported by: Andreas Tobler <toa at pop agri ch>
MFC after: 3 days
was not compiled with 'options HWPMC_HOOKS' or if the compiled-in
version numbers of the kernel and module are out of sync.
Reported by: cracauer
MFC after: 3 days
the vm_page_t associated with a pte using only the lower 32-bits of the pte
instead of the full 64-bits.
Submitted by: Greg Taleck greg at isilon dot com
Reviewed by: jeffr, alc
MFC after: 3 days
not mask the ExtINT pin on the first I/O APIC as at least one PIII chipset
seems to need this even though all of the pins in the 8259A's are masked.
The default is still to mask the ExtINT pin.
Reported by: Mike Tancsa mike at sentex.net
MFC after: 3 days
later sum capacities for all batteries, even those that weren't actually
present. We only need to do this for _BST but do it for all of them.
Reported by: Eric Anderson
MFC after: 1 day
too much even though we actually validate the parameters. This code
also is more compatible with other *BSDs, which do copyin within
setsockopt().
Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Reviewed by: security-officer (nectar)
Obtained from: KAME
Make sure that there actually is a next packet before setting
nextrecord to that field.
PR: 83885
Submitted by: hirose@comm.yamaha.co.jp
Obtained from: Patch suggested in the PR
MFC after: 1 week
- increase number of allocations count only on successfull malloc(9),
so it doesn't confuse people;
- because we need to check if 'size > 0', hide 'mtsp->mts_memalloced += size;'
under the check as well, as for size=0 it is of course a no-op;
- avoid critical_enter()/critical_exit() in case of failure in
malloc_type_allocated() as there will be nothing to do.
OK'ed by: rwatson
MFC after: 2 days
are called outside of AN_LOCK()/AN_UNLOCK. This fixes the following
WITNESS warning (produced when an(4) PCMCIA card is detached).
taskqueue_drain with the following non-sleepable locks held:
exclusive sleep mutex an0 (network driver) r = 0 (0xc59af168) locked @ /usr/src/sys/dev/an/if_an.c:2836
MFC after: 3 days
Silence from: current@
in the arm __swp() and sparc64 casa() and casax() functions is actually
being used as an input and output and not just the value of the register
that points to the memory location. This was the underlying source of
the mbuf refcount problems on sparc64 a while back. For arm this should be
a nop because __swp() has a constraint to clobber all memory which can
probably be removed now.
Reviewed by: alc, cognet
MFC after: 1 week
- Add locking to protect the softc and mark this driver as MP safe. There
are still some edge cases with multiport cards that need more locking
work.
MFC after: 1 week
Tested on: alpha
Actually, one cannot setup root file system on RAID3 device, but when
other file system exist in /etc/fstab which are placed on RAID3 device,
boot process will be interrupted when these devices are missing.
MFC after: 3 days
X-MFC-note: MFC only to RELENG_6, as RELENG_5 doesn't have root_mount KPI.
temporary buffer then pass the array to user-space once we have
dropped the lock.
While we are here, drop an assertion which could result in a
kernel panic under certain race conditions.
Pointed out by: rwatson
the assumption that performance was more important that beancounter
quality statistics.
As it transpires the microoptimization is not measurable in the
real world and the inconsistent statistics confuse users, so revert
the decision.
MT6 candidate: possibly
MT5 candidate: possibly
- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from: KAME
It creates very huge provider (41PB) /dev/gzero.
On BIO_READ request it zero-fills bio_data and on BIO_WRITE it does nothing.
You can also set kern.geom.zero.clear sysctl to 0 to do nothing even for
BIO_READ.
I'm using it for performance testing where it is very helpful.
MFC after: 3 days
bridge but the interface can still be changed afterwards.
This falls under the 'dont do that' category but log an warning when INVARIANTS
is defined.
Approved by: mlaier (mentor)
MFC after: 3 days
monitoring API, which might or might not be the same as the internal
maximum (currently none).
Export flag information on UMA zones -- in particular, whether or
not this is a secondary zone, and so the keg free count should be
considered in that light.
MFC after: 1 day
descriptors that are still marked owned in #ifdef GEM_RINT_TIMEOUT
instead of #if 0 for convenience.
- Remove stale code and comment about relying on the preset XIF config.
- In case of a watchdog timeout call the init function instead of just
the start function so the chip is properly reset.
Merge from hme(4):
- Convert to use bus_dmamap_load_mbuf_sg() for loading RX buffers.
- Protect from a duplicate mbuf free panic in case the DMA engine hangs.
Reviewed by: yongari
Tested on: powerpc(grehan), sparc64
MFC after: 1 week
chips are commonly found, it makes sense to have it in GENERIC. This
is a candidate for a RELENG_6 MFC.
Approved by; peter
Requested by: pav
Tested by: pav
be used to pass statistics regarding dropped, matched and received
packet counts from the kernel to user-space. While we are here
introduce a new counter for filtered or matched packets. We currently
keep track of packets received or dropped by the bpf device, but not
how many packets actually matched the bpf filter.
-Introduce net.bpf.stats sysctl OID
-Move sysctl variables after the function prototypes so we can
reference bpf_stats_sysctl(9) without build errors.
-Introduce bpf descriptor counter which is used mainly for sizing
of the xbpf_d array.
-Introduce a xbpf_d structure which will act as an external
representation of the bpf_d structure.
-Add a the following members to the bpfd structure:
bd_fcount - Number of packets which matched bpf filter
bd_pid - PID which opened the bpf device
bd_pcomm - Process name which opened the device.
It should be noted that it's possible that the process which opened
the device could be long gone at the time of stats collection. An
example might be a process that opens the bpf device forks then exits
leaving the child process with the bpf fd.
Reviewed by: mdodd
status after attach, only after a reset
o when setting diversity via the sysctl don't update sc_diversity
until we know the hal requested worked
o while here eliminate sc_hasdiversity and sc_hastpc; just query
the hal each time since these are the only places we need to know
MFC after: 3 days
(i.e., smart battery) and fix various bugs found during the cleanup.
API changes:
* kernel access:
Access to individual batteries is now via devclass_find("battery").
Introduce new methods ACPI_BATT_GET_STATUS (for _BST-formatted data) and
ACPI_BATT_GET_INFO (for _BIF-formatted data). The helper function
acpi_battery_get_battinfo() now takes a device_t instead of a unit #
argument. If dev is NULL, this signifies all batteries.
* ioctl access:
The ACPIIO_BATT_GET_TYPE and ACPIIO_BATT_GET_BATTDESC ioctls have been
removed. Since there is now no need for a mapping between "virtual" unit
and physical unit, usermode programs can just specify the unit directly and
skip the old translation steps. In fact, acpiconf(8) was actually already
doing this and virtual unit was the same as physical unit in all cases
since there was previously only one battery type (acpi_cmbat). Additionally,
we now map the ACPIIO_BATT_GET_BIF and ACPIIO_BATT_GET_BST ioctls for all
batteries, if they provide the associated methods.
* apm compatibility device/ioctls: no change
* sysctl: no change
Since most third-party applications use the apm(4) compat interface, there
should be very few affected applications (if any).
Reviewed by: bruno
MFC after: 5 days
We must not increase a capability of buffer size here,
because codes which call these functions expect that dst and src
are the same size.
This will cause problem when someone convert a character whose
length is different between charsets on smbfs which was changed
to use xlat16 converter.
from OpenFirmware be 16 pages to avoid fragmentation in the list
of mappings returned when the kernel requests it in pmap_bootstrap.
This allows a static buffer to be used when obtaining the existing
mappings - very useful on the G5 when random physical pages can't
be grabbed because they can't be BAT-mapped.
MFC after: 3 days
o add ic_flags_ext for eventual extention of ic_flags
o define/reserve flag+capabilities bits for superg,
bg scan, and roaming support
o refactor debug msg macros
MFC after: 3 days
writers that want to extend the file. It was also used to serialize
readers that might want to read the last block of the file (with a
writer extending the file). Now that we support vnode locking for
NFS, the rslock is unnecessary. Writers grab the exclusive vnode
lock before writing and readers grab the shared (or in some cases
the exclusive) lock.
Submitted by: Mohan Srinivasan
which in the future will hold IFF_OACTIVE and IFF_RUNNING, and have
its access synchronized by the device driver rather than the
protocol stack. This will avoid potential races in the management
of flags in if_flags.
Discussed with: various (scottl, jhb, ...)
MFC after: 1 week
set in tulip_attach() and its value is never changed, so all the extra sets
are redundant. I'm guessing that at some point in time de(4) had an
alternate start routine, but that hasn't been true in recent history.
default:
- TULIP_NEED_FASTTIMEOUT - tulip_fasttimeout() wasn't called anywhere
- BIG_PACKET - only worked on i386 anyway
- TULIP_USE_SOFTINTR - doesn't compile and was never updated to handle
new netisr registration
- non-FreeBSD code
in6p_outputopts at the entrance of the functions. this trick was
necessary when we passed an in6 pcb to in6_embedscope(), within which
the in6p_outputopts member was used, but we do not use this kind of
interface any more.
Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net>
Obtained from: KAME
data link type of the hook. It will be used to ease autoconfiguration
of netgraph and also to print warning messages, when incompatoble nodes
are connected together.
At the end of ng_snd_item(), node queue is processed. In certain
netgraph setups deep recursive calls can occur.
For example this happens, when two nodes are connected and can send
items to each other in both directions. If, for some reason, both nodes
have a lot of items in their queues, then the processing thread will
recurse between these two nodes, delivering items left and right, going
deeper in the stack. Other setups can suffer from deep recursion, too.
The following factors can influence risk of deep netgraph call:
- periodical write-access events on node
- combination of slow link and fast one in one graph
- net.inet.ip.fastforwarding
Changes made:
- In ng_acquire_{read,write}() do not dequeue another item. Instead,
call ng_setisr() for this node.
- At the end of ng_snd_item(), do not process queue. Call ng_setisr(),
if there are any dequeueable items on node queue.
- In ng_setisr() narrow worklist mutex holding.
- In ng_setisr() assert queue mutex.
Theoretically, the first two changes should negatively affect performance.
To check this, some profiling was made:
1) In general real tasks, no noticable performance difference was found.
2) The following test was made: two multithreaded nodes and one
single-threaded were connected into a ring. A large queues of packets
were sent around this ring. Time to pass the ring N times was measured.
This is a very vacuous test: no items/mbufs are allocated, no upcalls or
downcalls outside of netgraph. It doesn't represent a real load, it is
a stress test for ng_acquire_{read,write}() and item queueing functions.
Surprisingly, the performance impact was positive! New code is 13% faster
on UP and 17% faster on SMP, in this particular test.
The problem was originally found, described, analyzed and original patch
was written by Roselyn Lee from Vernier Networks. Thanks!
Submitted by: Roselyn Lee <rosel verniernetworks com>
and return a printable representation.
This fixes recognition of the PC Engines WRAP and improves the
recognition of the Soekris boards (Bios version can now be
seen in the dmesg output for instance).
Also, add watchdog support for PCM-582x platforms.
Submitted by: Adrian Steinmann <ast@marabu.ch>
Slightly changed by: phk
PR: 81360
by Vladimir Dergachev for inclusion in DRM CVS, with minor modifications for
FreeBSD CVS and the appropriate license from Nicolai Haehnle on r300_reg.h.
Fixes hangs when using r300.sf.net userland, tested on a Radeon 9600 on amd64.
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function. Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().
Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.
Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
states - has to drop the lock when calling back to ip_output(), the state
purge timeout might run and gc the state. This results in a rb-tree
inconsistency. With this change we flag expiring states while holding the
lock and back off if the flag is already set.
Reported by: glebius
MFC after: 2 weeks
- Add a new uma_zfree_internal() flag, ZFREE_STATFREE, which causes it to
to update the zone's uz_frees statistic. Previously, the statistic was
updated unconditionally.
- Use the flag in situations where a "real" free occurs: i.e., one where
the caller is freeing an allocated item, to be differentiated from
situations where uma_zfree_internal() is used to tear down the item
during slab teardown in order to invoke its fini() method. Also use
the flag when UMA is freeing its internal objects.
- When exchanging a bucket with the zone from the per-CPU cache when
freeing an item, flush cache statistics back to the zone (since the
zone lock and critical section are both held) to match the allocation
case.
MFC after: 3 days
vnlru proc is extremely inefficient, potentially iteration over tens of
thousands of vnodes without blocking. Droping Giant allows other threads
to preempt us although we should revisit the algorithm to fix the runtime
problems especially since this may hold up all vnode allocations.
- Remove the LK_NOWAIT from the VOP_LOCK in vlrureclaim. This provides
a natural blocking point to help alleviate the situation described above
although it may not technically be desirable.
- yield after we make a pass on all mount points to prevent us from
blocking other threads which require Giant.
MFC after: 2 weeks
Compare pointers with NULL rather than treating them as booleans.
Compare pointers with NULL rather than 0 to make it more clear
they are pointers.
Assign pointers value of NULL rather than 0 to make it more clear
they are pointers.
MFC after: 3 days
through umass(4), in order to make cdcontrol(1) to issue commands to
a USB CD driver.
The command IDs were obtained from the CAM subsystem. This was tested
on half dozen of USB CD drivers from different vendors.
Suggested by: "intron" <intron at intron dot ac>
PR: usb/83439
Reviewed by: sanpei
MFC After: 1 week
It does not work with ng_ubt(4) and require special driver and firmware.
Obtained from: Marcel Holtmann < marcel at holtmann dot org >
Submitted by: Rainer Goellner < rainer at jabbe dot de >
MFC after: 3 days
o Add Agere Hermes II and II.5 PC Cards (from zipit web page), TDK
GlobalNetworker 3410 (from dmesg for my card) and another alternate
PANASONIC KXLC0005_2 (from pcmcia-cs id lists).
hokie and much more readable and expand the comment to explain why it is
the way that it is.
- Close a race where one CPU could free the process belonging to a thread
on another CPU that hasn't quite finished exiting yet but is beyond the
point of setting the process state as PRS_ZOMBIE.
Reported and tested by: ps (2)
MFC after: 3 days
- Introduce a subsystem mutex, natm_mtx, manipulated with accessor macros
NATM_LOCK_INIT(), NATM_LOCK(), NATM_UNLOCK(), NATM_LOCK_ASSERT(). It
protects the consistency of pcb-related data structures. Finer grained
locking is possible, but should be done in the context of specific
measurements (as very little work is done in netnatm -- most is in the
ATM device driver or socket layer, so there's probably not much
contention).
- Remove GIANT_REQUIRED, mark as NETISR_MPSAFE, remove
NET_NEEDS_GIANT("netnatm").
- Conditionally acquire Giant when entering network interfaces for
ifp->if_ioctl() using IFF_LOCKGIANT(ifp)/IFF_UNLOCKGIANT(ifp) in order
to coexist with non-MPSAFE atm ifnet drivers..
- De-spl.
MFC after: 2 weeks
Reviewed by: harti, bms (various versions)
there are at least two versions of the adapter. Version 1 (product ID 0x2200)
of the adapter does not work with ng_ubt(4) and require special driver and
firmware. Version 2 (product ID 0x3800) seems to work just fine, except it
does not have bDeviceClass, bDeviceSubClass and bDeviceProtocol set to required
(by specification) values. This change forces ng_ubt(4) to attach to the
version 2 adapter.
Obtained from: Marcel Holtmann <marcel at holtmann dot org>
Submitted by: Rainer Goellner <rainer at jabbe dot de>
problems we were having properly mapping the CIS attr space on some
cards. Those problems have been solved other ways, so this kludge is
no longer necessary. Remove it and have pccards come up a whole
second faster.
implement the OHCI programming interface. Thus it probes, but fails
to attach because of an invalid OHCI version. Rather than count on
the downstream tests properly failing, print a message that this
chipset isn't supported and fail the probe.
16-bit cards when we're powering them up. Other bridges may have
similar issues, so we do this for all of them by setting the
interrupt in the PCIC register 3 to be 0 (done always anyway)
and turning on the bit in the bridge control register to route these
interrupts via the ISA bus (or via the interrupt configured in the
PCIC register 3). '0' means disable completely. There's a small
chance this may interfere with the o2micro power hacks, but I'll
wait for reports to come in from o2micro users.
o Expand some of the comments about why we do certain things.
# this gets rid of the interrupt storm warnings on my 505TS. I think
# that we may need to do something similar on suspend, but I'm unsure
# since I don't have a laptop that supports suspened/resume with a
# ricoh chipset in it.
are string names for their respective UMA zones and malloc types, and
are passed into uma_zcreate() and MALLOC_DEFINE(). Export them
outside of _KERNEL in mbuf.h so that netstat can reference them.
Change the names to improve consistency, with each zone/type
associated with the mbuf allocator being prefixed mbuf_.
MFC after: 1 week
card. Mask it while we're doing power things, as the PC Card standard
suggests. Also, poll the POWER_CYCLE bit 10x a second as well as
providing a timeout for power cycle interrupt to happen.
The Ricoh '475 that I have doesn't seem to generate an interrupt for
power at the present time, so the polling is necessary for reasons as
yet unknown. This results in an interrupt storm warning that I'm
still trying to quantify (the o2micro trick doesn't work to mitigate
this storm). At the very least, this should help those users that
lost pccards on boot with the prior rev of this code. My VAIO
PCG-505TS is now happier, but more investigation is necessary.
If a character cannot be converted to DOS code page,
unix2doschr() returned `0'. As a result, unix2dosfn()
was forced to return `0', so we saw a file which was
composed of these characters as `Invalid argument'.
To correct this, if a character can be converted to
Unicode, unix2doschr() now returns `1' which is a magic
number to make unix2dosfn() know that the character
must be converted to `_'.
[2] unix2dosfn()
The above-mentioned solution only works if a file
has both of Unicode name and DOS code page name.
Unicode name would not be recorded if file name
can be settled within 11 bytes (DOS short name)
and if no conversion from Unix charset to DOS code
page has occurred. Thus, FreeBSD can create a file
which has only short name, but there is no guarantee
that the short name contains allways valid characters
because we leave it to people by using mount_msdosfs(8)
to select which conversion is used between DOS code
page and unix charset.
To avoid this, Unicode file name should be recorded
unless a character is an ascii character. This is
the way Windows XP do.
PR: 77074 [1]
MFC after: 1 week
per-CPU cache statistics. UMA sizes the cache array based on the
number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU
could read off the end of the array (into the next zone).
Reported by: yongari
MFC after: 1 week
it covers the following of the uc_alloc/freebucket cache pointers.
Originally, I felt that the race wasn't helped by holding the mutex,
hence a comment in the code and not holding it across the cache access.
However, it does improve consistency, as while it doesn't prevent
bucket exchange, it does prevent bucket pointer invalidation. So a
race in gathering cache free space statistics still can occur, but not
one that follows an invalid bucket pointer, if the mutex is held.
Submitted by: yongari
MFC after: 1 week
variables rather than void * variables. This makes it easier and simpler
to get asm constraints and volatile keywords correct.
MFC after: 3 days
Tested on: i386, alpha, sparc64
Compiled on: ia64, powerpc, amd64
Kernel toolchain busted on: arm
the case of an RTM_CHANGE was specific, i.e. that it matched completely. This
led to a route change of a non-existent route changing the default route
as the radix code would simply back track to that point and hand that
route back to the routing socket code.
PR: 82974
Reviewed by: Tai-hwa Liang <avatar@mmlab.cse.yzu.edu.tw>
Ben Kaduk <minimarmot@gmail.com>
Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net>
Obtained from: OpenBSD with modifications.
MFC after: 2 weeks
succeed. There are many printers that return status over the read
channel, and if we wait for the status to become ready, then we can't
find the status automatically. Linux doesn't wait, nor does it ever
seem to really check the status in any meaningful way... If there
really is a problem, the writes to the bulk out endpoint will still
fail (like they would if the printer was ready and then ran out of
paper or became unready).
In addition, there are a number of printers being made that emulate
the 'status' byte by returning '0' always rather than '0x18'. This
fixes the EBUSY on open timeouts on those printer as well.
Reviewed by: the defining silence on usb@
gets the ethernet part of the card working, while putting appropriate
hooks in place for the modem code. Other ed based lan/modem combo
cards should be easy to add. Please send me info on any you'd like to
see support added.
Note: The 650 isn't a strictly conforming multi-function card, so
special support is needed. :-(
o Add minimal kbdmux(4) man page to the source tree (more details to follow);
o Hook up kbdmux(4) to the build.
This concludes the first part of the kbdmux(4) keyboard multiplexer
integration. It now should be possible to use kbdmux(4), however one
must configure kbdmux(4) by hand (i.e. load kbdmux(4) module and use
kbdcontrol(1) to add/remove slave keyboards to/from kbdmux(4)).
MFC after: 1 week
o Slightly change KBADDKBD and KBRELKBD ioctl() interface. Instead of passing
keyboard index pass keyboard_info_t structure with populated 'kb_unit' and
'kb_name' fields. Keyboard index is not very user-friendly and is not very
easy to obtain. Keyboard driver name and unit, on the other hand, is much
more user friendly and known almost all the time;
o Move definition of keyboard_info_t structure up;
o Teach kbdcontrol(1) how to attach/detach keyboards to/from the keyboard
multiplexor;
o Update kbdcontrol(1) man page and document new functionality.
To attach/detach keyboard to/from keyboard multiplexor one needs to use
keyboard device name (i.e. ukbd0).
MFC after: 1 week
fs.first_object->flags & OBJ_NEEDGIANT test that was missed in an earlier
revision. This fixes mutex assertion failures in the debug.mpsafevm=0
case.
Reported by: ps
MFC after: 3 days
o Don't busy wait on powerup. Instead, use the power up interrupt to wait
for the card to power up. Don't wait when we're turning the card off,
since no interrupt happens in that case.
o Convert many of the long DELAYs to tsleeps. We do not run before
the timer have stared, so DELAY isn't necessary. More DELAYs can likely
be eliminated in the future.
o When powering up the card, don't do anything if the card is already
powered up (before we'd power cycle it). This means that for most
cards we power them up once and then never change the power.
o On card eject, mask (by clearing) the CD bit. Before we set it, which
was wrong. We don't want to see any CD events past the first one since
they need to be debounced.
With these changes, I can insert/eject 16bit cards without glitching xmms'
sound output. Something very important to the development of better pccard
drivers :-)
Some of the (IPv6) cleanup functions send packets to inform peers of the
departure. These packets confused users of ifnet_departure_event (pf at the
moment).
PR: kern/80627
Tested by: Divacky Roman
MFC after: 1 week
- Fix nfsm_disct() so that after pulling up data, the remaining data
is aligned if necessary.
- Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the
mbuf (instead of casting m_data to a uint32).
Submitted by: Pyun YongHyeon
Reviewed by: Mohan Srinivasan
variables to loader:
hint.smbios.0.enabled "YES" when SMBIOS is detected
hint.smbios.0.bios.vendor BIOS vendor
hint.smbios.0.bios.version BIOS version
hint.smbios.0.bios.reldate BIOS release date
hint.smbios.0.system.maker System manufacturer
hint.smbios.0.system.product System product name
hint.smbios.0.system.version System version number
hint.smbios.0.planar.maker Base board manufacturer
hint.smbios.0.planar.product Base board product name
hint.smbios.0.planar.version Base board version number
hint.smbios.0.chassis.maker Enclosure manufacturer
hint.smbios.0.chassis.version Enclosure version
These strings can be used to detect hardware quirks and to set appropriate
flags. For example, Compaq R3000 series and some HP laptops require
hint.atkbd.0.flags="0x9"
to boot. See amd64/67745 for more detail.
Note: Please do not abuse this feature to resolve general problem when it
can be fixed programmatically. This must be used as a last resort.
PR: kern/81449
Approved by: anholt (mentor)
o Add sys/dev/kbdmux/kbdmux.c to the source tree
o Add sys/modules/kbdmux/Makefile to the source tree
These are not yet connected to the build. Man page and other changes to follow.
MFC after: 1 week
channel devices. This should fix Dell 2450/2550/2650 systems that have RAID
enabled. This will likely not fix 2400 systems though as I don't have the
appropriate PCI Id info for them.
MFC After: 3 day
statistics via a binary structure stream:
- Add structure 'uma_stream_header', which defines a stream version,
definition of MAXCPUs used in the stream, and the number of zone
records in the stream.
- Add structure 'uma_type_header', which defines the name, alignment,
size, resource allocation limits, current pages allocated, preferred
bucket size, and central zone + keg statistics.
- Add structure 'uma_percpu_stat', which, for each per-CPU cache,
includes the number of allocations and frees, as well as the number
of free items in the cache.
- When the sysctl is queried, return a stream header, followed by a
series of type descriptions, each consisting of a type header
followed by a series of MAXCPUs uma_percpu_stat structures holding
per-CPU allocation information. Typical values of MAXCPU will be
1 (UP compiled kernel) and 16 (SMP compiled kernel).
This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.
While here, also export the number of UMA zones as a sysctl
vm.uma_count, in order to assist in sizing user swpace buffers to
receive the stream.
A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days. This change directly
supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats
rather than separately maintained mbuf allocator statistics.
MFC after: 1 week
zone whenever it was moving buckets between the zone and the cache,
or when coalescing statistics across the CPU. Remove flushing of
statistics to the zone when coalescing statistics as part of sysctl,
as we won't be running on the right CPU to write to the cache
statistics.
Add a missed gathering of statistics: when uma_zalloc_internal()
does a special case allocation of a single item, make sure to update
the zone statistics to represent this. Previously this case wasn't
accounted for in user-visible statistics.
MFC after: 1 week
- Introduce a helper function if_setflag() containing the code common
to ifpromisc() and if_allmulti() instead of duplicating the code poorly,
with different bugs.
- Call ifp->if_ioctl() in a consistent way: always use more compatible C
syntax and check whether ifp->if_ioctl is not NULL prior to the call.
MFC after: 1 month
statistics via a binary structure stream:
- Add structure 'malloc_type_stream_header', which defines a stream
version, definition of MAXCPUS used in the stream, and a number of
malloc_type records in the stream.
- Add structure 'malloc_type_header', which defines the name of the
malloc type being reported on.
- When the sysctl is queried, return a stream header, followed by a
series of type descriptions, each consisting of a type header
followed by a series of MAXCPUS malloc_type_stats structures holding
per-CPU allocation information. Typical values of MAXCPUS will be 1
(UP compiled kernel) and 16 (SMP compiled kernel).
This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.
While here:
- Bump statistics width to uint64_t, and hard code using fixed-width
type in order to be more sure about structure layout in the stream.
We allocate and free a lot of memory.
- Add kmemcount, a counter of the number of registered malloc types,
in order to avoid excessive manual counting of types. Export via a
new sysctl to allow user-space code to better size buffers.
- De-XXX comment on no longer maintaining the high watermark in old
sysctl monitoring code.
A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days. Likewise, similar changes
to UMA.
process that caused the clone event to take place for the device driver
creating the device. This allows cloned device drivers to adapt the
device node based on security aspects of the process, such as the uid,
gid, and MAC label.
- Add a cred reference to struct cdev, so that when a device node is
instantiated as a vnode, the cloning credential can be exposed to
MAC.
- Add make_dev_cred(), a version of make_dev() that additionally
accepts the credential to stick in the struct cdev. Implement it and
make_dev() in terms of a back-end make_dev_credv().
- Add a new event handler, dev_clone_cred, which can be registered to
receive the credential instead of dev_clone, if desired.
- Modify the MAC entry point mac_create_devfs_device() to accept an
optional credential pointer (may be NULL), so that MAC policies can
inspect and act on the label or other elements of the credential
when initializing the skeleton device protections.
- Modify tty_pty.c to register clone_dev_cred and invoke make_dev_cred(),
so that the pty clone credential is exposed to the MAC Framework.
While currently primarily focussed on MAC policies, this change is also
a prerequisite for changes to allow ptys to be instantiated with the UID
of the process looking up the pty. This requires further changes to the
pty driver -- in particular, to immediately recycle pty nodes on last
close so that the credential-related state can be recreated on next
lookup.
Submitted by: Andrew Reisse <andrew.reisse@sparta.com>
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR, SPARTA
MFC after: 1 week
MFC note: Merge to 6.x, but not 5.x for ABI reasons
o Add two new ioctl's KBADDKBD and KBRELKBD. These are used to add and remove
keyboard to (and from) kbdmux(4) keyboard multiplexer;
o Introduce new kbd_find_keyboard2() function. It does exactly the same job
as kbd_find_keyboard() function except it allows to specify starting index.
This function can be used to iterate over keyboards array;
o Re-implement kbd_find_keyboard() as call to kbd_find_keyboard2() with starting
index of zero;
o Make sure syscons(4) passed KBADDKBD and KBRELKBD ioctl's onto currently
active keyboard.
These changes should not have any visible effect.
MFC after: 1 week
syscalls.master for the master list and the Alpha/OSF1 compat ABI to be
consistent with all the other compat ABIs where 'make sysent' already
works.
MFC after: 3 days
we can only bridge interfaces with the same value it meant that all members had
to be set at ETHERMTU as well.
Allow the first member to be added to define the MTU for the bridge, the check
still applies to all additional members.
Print an informative message if the MTU is incorrect [1]
Requested by: Niki Denev [1]
Approved by: mlaier (mentor)
MFC after: 3 days
- Make sure timer0_max_count is set to a correct value in the lapic case.
- Revert i8254_restore() to explicitly reprogram timer 0 rather than
calling set_timer_freq() to do it. set_timer_freq() only reprograms
the counter if the max count changes which it never does on resume. This
unbreaks suspend/resume for several people.
Tested by: marks, others
Reviewed by: bde
MFC after: 3 days
in the PCI config registers) that are > 15 as $PIR can only route PCI
interrupts to ISA IRQs which are limited to the 0 to 15 range.
- Remove an extra word from a printf.
Reported by: othermark atkin901 at yahoo dot com
MFC after: 3 days
since it calls into VFS and VM. This makes the freebsd32_mmap() routine
MP safe and the extra Giants here can be revisited later.
Glanced at by: marcel
MFC after: 3 days
o Use pf more consistantly for pccard_function.
o Make sure we quote the strings properly (maybe this function belongs in
subr_bus.c)
o Tweak a comment to be more accurate after code changed.
and combine the old xe_pccard_{probe,attach} into one routine _attach.
Create a lookup function to lookup items in the table. Eliminate the
check for network cards, since many modems were eliminated by it.
Tweak a few printfs as well.
This gets many of my older cards working again CEM2, CEM28, CEM36,
etc.
support for them can really be added. Eliminate the check for network
card, because many of the cards in the commented out section are combo
cards and report themselves as either multifunction or modem. They
will be added back as I obtain hardware and test them more fully.
few other cards need. This firmware was obtained from the Linux
pcmica-cs project, but Ositech Communications, Inc has been kind
enough to grant permission to change the license to a pure BSDL type.
enabling of interrupts inside of trap(). Fix a typo in a comment.
Revert rev 1.113 of "sys/i386/i386/exception.s" as it is no longer
needed.
Reviewed by: bde
MFC after: 3 days
we bailed if we couldn't collect the 16-bytes of data required
for an aes block cipher in 2 mbufs; now we deal with it. While
here make space accounting signed so a sanity check does the
right thing for malformed mbuf chains.
Approved by: re (scottl)
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.
Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)
o Add timeout error recovery (from a thread context to avoid
the deferral of other critical interrupts).
o Properly recover commands across controller reset events.
o Update the driver to handle events and status codes that
have been added to the MPI spec since the driver was
originally written.
o Make the driver more modular to improve maintainability and
support dynamic "personality" registration (e.g. SCSI Initiator,
RAID, SAS, FC, etc).
o Shorten and simplify the common I/O path to improve driver
performance.
o Add RAID volume and RAID member state/settings reporting.
o Add periodic volume resynchronization status reporting.
o Add support for sysctl tunable resync rate, member write cache
enable, and volume transaction queue depth.
Sponsored by
----------------
Avid Technologies Inc:
SCSI error recovery, driver re-organization, update of MPI library
headers, portions of dynamic personality registration, and misc bug
fixes.
Wheel Open Technologies:
RAID event notification, RAID member pass-thru support, firmware
upload/download support, enhanced RAID resync speed, portions
of dynamic personality registration, and misc bug fixes.
Detailed Changes
================
mpt.c mpt_cam.c mpt_raid.c mpt_pci.c:
o Add support for personality modules. Each module exports
load, and unload module scope methods as well as probe, attach,
event, reset, shutdown, and detach per-device instance
methods
mpt.c mpt.h mpt_pci.c:
o The driver now associates a callback function (via an
index) with every transaction submitted to the controller.
This allows the main interrupt handler to absolve itself
of any knowledge of individual transaction/response types
by simply calling the callback function "registered" for
the transaction. We use a callback index instead of a
callback function pointer in each requests so we can
properly handle responses (e.g. event notifications)
that are not associated with a transaction. Personality
modules dynamically register their callbacks with the
driver core to receive the callback index to use for their
handlers.
o Move the interrupt handler into mpt.c. The ISR algorithm
is bus transport and OS independent and thus had no reason
to be in mpt_pci.c.
o Simplify configuration message reply handling by copying
reply frame data for the requester and storing completion
status in the original request structure.
o Add the mpt_complete_request_chain() helper method and use
it to implement reset handlers that must abort transactions.
o Keep track of all pending requests on the new
requests_pending_list in the softc.
o Add default handlers to mpt.c to handle generic event
notifications and controller reset activities. The event
handler code is largely the same as in the original driver.
The reset handler is new and terminates any pending transactions
with a status code indicating the controller needs to be
re-initialized.
o Add some endian support to the driver. A complete audit is
still required for this driver to have any hope of operating
in a big-endian environment.
o Use inttypes.h and __inline. Come closer to being style(9)
compliant.
o Remove extraneous use of typedefs.
o Convert request state from a strict enumeration to a series
of flags. This allows us to, for example, tag transactions
that have timed-out while retaining the state that the
transaction is still in-flight on the controller.
o Add mpt_wait_req() which allows a caller to poll or sleep
for the completion of a request. Use this to simplify
and factor code out from many initialization routines.
We also use this to sleep for task management request
completions in our CAM timeout handler.
mpt.c:
o Correct a bug in the event handler where request structures were
freed even if the request reply was marked as a continuation
reply. Continuation replies indicate that the controller still owns
the request and freeing these replies prematurely corrupted
controller state.
o Implement firmware upload and download. On controllers that do
not have dedicated NVRAM (as in the Sun v20/v40z), the firmware
image is downloaded to the controller by the system BIOS. This
image occupies precious controller RAM space until the host driver
fetches the image, reducing the number of concurrent I/Os the
controller can processes. The uploaded image is used to
re-program the controller during hard reset events since the
controller cannot fetch the firmware on its own. Implementing this
feature allows much higher queue depths when RAID volumes
are configured.
o Changed configuration page accessors to allow threads to sleep
rather than busy wait for completion.
o Removed hard coded data transfer sizes from configuration page
routines so that RAID configuration page processing is possible.
mpt_reg.h:
o Move controller register definitions into a separate file.
mpt.h:
o Re-arrange includes to allow inlined functions to be
defined in mpt.h.
o Add reply, event, and reset handler definitions.
o Add softc fields for handling timeout and controller
reset recovery.
mpt_cam.c:
o Move mpt_freebsd.c to mpt_cam.c. Move all core functionality,
such as event handling, into mpt.c leaving only CAM SCSI
support here.
o Revamp completion handler to provide correct CAM status for
all currently defined SCSI MPI message result codes.
o Register event and reset handlers with the MPT core. Modify
the event handler to notify CAM of bus reset events. The
controller reset handler will abort any transactions that
have timed out. All other pending CAM transactions are
correctly aborted by the core driver's reset handler.
o Allocate a single request up front to perform task management
operations. This guarantees that we can always perform a
TMF operation even when the controller is saturated with other
operations. The single request also serves as a perfect
mechanism of guaranteeing that only a single TMF is in flight
at a time - something that is required according to the MPT
Fusion documentation.
o Add a helper function for issuing task management requests
to the controller. This is used to abort individual requests
or perform a bus reset.
o Modify the CAM XPT_BUS_RESET ccb handler to wait for and
properly handle the status of the bus reset task management
frame used to reset the bus. The previous code assumed that
the reset request would always succeed.
o Add timeout recovery support. When a timeout occurs, the
timed-out request is added to a queue to be processed by
our recovery thread and the thread is woken up. The recovery
thread processes timed-out command serially, attempting first
to abort them and then falling back to a bus reset if an
abort fails.
o Add calls to mpt_reset() to reset the controller if any
handshake command, bus reset attempt or abort attempt
fails due to a timeout.
o Export a secondary "bus" to CAM that exposes all volume drive
members as pass-thru devices, allowing CAM to perform proper
speed negotiation to hidden devices.
o Add a CAM async event handler tracking the AC_FOUND_DEVICE event.
Use this to trigger calls to set the per-volume queue depth once
the volume is fully registered with CAM. This is required to avoid
hitting firmware limits on volume queue depth. Exceeding the
limit causes the firmware to hang.
mpt_cam.h:
o Add several helper functions for interfacing to CAM and
performing timeout recovery.
mpt_pci.c:
o Disable interrupts on the controller before registering and
enabling interrupt delivery to the OS. Otherwise we risk
receiving interrupts before the driver is ready to receive
them.
o Make use of compatibility macros that allow the driver to
be compiled under 4.x and 5.x.
mpt_raid.c:
o Add a per-controller instance RAID thread to perform settings
changes and query status (minimizes CPU busy wait loops).
o Use a shutdown handler to disable "Member Write Cache Enable"
(MWCE) setting for RAID arrays set to enable MWCE During Rebuild.
o Change reply handler function signature to allow handlers to defer
the deletion of reply frames. Use this to allow the event reply
handler to queue up events that need to be acked if no resources
are available to immediately ack an event. Queued events are
processed in mpt_free_request() where resources are freed. This
avoids a panic on resource shortage.
o Parse and print out RAID controller capabilities during driver probe.
o Define, allocate, and maintain RAID data structures for volumes,
hidden member physical disks and spare disks.
o Add dynamic sysctls for per-instance setting of the log level, array
resync rate, array member cache enable, and volume queue depth.
mpt_debug.c:
o Add mpt_lprt and mpt_lprtc for printing diagnostics conditioned on
a particular log level to aid in tracking down driver issues.
o Add mpt_decode_value() which parses the bits in an integer
value based on a parsing table (mask, value, name string, tuples).
mpilib/*:
o Update mpi library header files to latest distribution from LSI.
Submitted by: gibbs
Approved by: re
- Add a missing "ATI" in one of the device descriptions.
- In machfb_init_engine() adjust a wait_for_fifo() call to the
actual number of operations.
- As a speed optimization cache setting the foreground and back-
ground colors.
- I got the meaning of V_DISPLAY_BLANK wrong, it's blank like turn
off and not blank like turn on and clear the screen. So move
clearing the screen to machfb_clear() were it hopefully belongs.
- Properly implement V_DISPLAY_BLANK, V_DISPLAY_STAND_BY and
V_DISPLAY_SUSPEND. This makes blank_saver.ko and green_saver.ko
work. [1]
- Implement machfb_load_palette() and machfb_save_palette() and
set the V_ADP_PALETTE flag. This makes fade_saver.ko work. [2]
- Install our 16-color color map only once and with an offset of
16 as the OBP driver expects white to be at index 0 and black at
255. This fixes the inversion of the colors back at the boot
prompt after shutting down FreeBSD. This will also be handy if
we ever want to implement breaking into OFW. Unfortunately there
doesn't seem to be a better way to achieve this as e.g. bypassing
the color map isn't supported by all Mach64 chips.
- Move invalidating the cache variables to machfb_set_mode() and
set the V_ADP_MODECHANGE flag. This causes machfb_set_mode() to
be called when the X server shuts down. This hopefully will fix
the screen corruption happening occasionally when shutting down
the X server and which is present until switching to another VTY.
Inspired by: NetBSD [1]
Based on: Xorg [2]
Approved by: re (scottl)
- Let creator_bitblt() return ENODEV as it's not implemented (missed
in sys/dev/fb/creator.c rev. 1.6).
- As a speed optimization inline the creator_ras_wait() etc. helper
functions and also cache setting the font increment, font width
and plane mask. [1]
- I got the meaning of V_DISPLAY_BLANK wrong, it's blank like turn
off and not blank like turn on and clear the screen. So move
clearing the screen to creator_clear() were it hopefully belongs.
- Properly implement V_DISPLAY_BLANK, V_DISPLAY_STAND_BY and
V_DISPLAY_SUSPEND. This makes blank_saver.ko and green_saver.ko
work. [1]
- Change the order of operations in creator_fill_rect(), i.e. write
y before x and cy before cx. This fixes drawing the top part of
the border with Elite3D cards when switching from Xorg to a VTY.
- Move setting the chip configuration we use and invalidating the
cache variables to creator_set_mode() and set the V_ADP_MODECHANGE
flag. This causes creator_set_mode() to be called when the X server
shuts down which fixes the screen corruption caused most of the
time by Xorg not restoring the original configuration present at
startup.
Inspired by/based on: Xorg [1]
Approved by: re (scottl)
after sys/dev/sound/pcm/channel.c rev. 1.99, i.e. when there's no
existing KERNBUILDDIR with an opt_isa.h defined.
- Sync with sys/dev/sound/pcm/channel.c rev. 1.99 (sort of), i.e.
never compile in isadma support on sparc64 as we just never need
it there. This allows to use the "generic" module with a custom
kernel that is built without isa(4).
Reviewed by: ru
Approved by: re (scottl)
variant to allocating a fixed set of 5 banks that the EBus variant
is documented to have (and also has in reality). Trying to allocate
up to 8 banks is a remnant from experiments during the development
of this driver.
Discussed with: joerg, yongari
Reviewed by: yongari
Approved by: re (scottl)
legacy parts (i.e. those that have 4 global key slots). We
blindly assign unicast keys to key slot 0. Devices that need
alternate allocation logic must override this method.
Reviewed by: avatar
Approved by: re (scottl)
ext2fs fails to set the device in the stat(2) system call.
Subsequently, that makes fts(3) fail, which goes as far as make ls(1)
fail (which uses fts) on ext2fs.
Approved by: re (Robert Watson <rwatson@FreeBSD.org>)
- Update driver interrupt statistics correctly.
sys/sys/pmc.h, sys/dev/hwpmc/hwpmc_mod.c:
- Fix a bug affecting debug printfs.
- Move the 'stalled' flag from being in a bit in the
'pm_flags' field of a 'struct pmc' to a field of its own in the
same structure. This flag is updated from the NMI handler and
keeping it separate makes it easier to avoid races with other
parts of the code.
sys/dev/hwpmc/hwpmc_logging.c:
- Do arithmetic with 'uintptr_t' types rather that casting
to and from 'char *'.
Approved by: re (scottl)
exit via 'doreti_exit'.
Since the NMI interrupt may be taken at any time, including when
the processor has masked external interrupts, it is not safe to
call ast() as is done for normal interrupts.
Approved by: re (scottl)
further changes and fixes in the future:
- Use aliases via macros rather than duplicated inlines wherever possible.
- Move all the aliases to the bottom of these files and the inline
functions to the top.
- Add various comments.
- On alpha, drop atomic_{load_acq,store_rel}_{8,char,16,short}().
- On i386 and amd64, don't duplicate the extern declarations for functions
in the two non-inline cases (KLD_MODULE and compiler doesn't do inlines),
instead, consolidate those two cases.
- Some whitespace fixes.
Approved by: re (scottl)
- Conditionally grab Giant around the EISCONN hack at the end based on
debug.mpsafenet.
- Protect access to so_emuldata via SOCK_LOCK.
Reviewed by: rwatson
Approved by: re (scottl)
be set to 0 on input. This caused a panic in an an MP test version
of the GEM driver from Marius, and from inspection of other PCI
drivers, the same problem would happen there.
Fix by explicitly setting to 0.
Approved by: re
that since I can't test it directly. The driver also lists the EM1144
as being supported, but in reality it isn't. The EM1144-T,
{XJ,CC}{3288,3336} have the SMC chips in them, but aren't conformant MFC
cards, so they need their own driver.
Also, it does little harm to list the 8020BT, so remove #if 0.
Approved by: re (scottl)
as a part of the GENERIC kernel with INVARIANT* and WITNESS*
turned off.
(For non GENERIC kernel KTR and MUTEX_PROFILING should be also
off).
Submitted by: Eygene A. Ryabinkin <rea at rea dot mbslab dot kiae dot ru>
Approved by: re (scottl)
PR: 81767
in the build still due to some #undef's in svr4.h, but if you hack around
that and add some missing entries to syscalls.master, then this file will
now compile. The changes involved proc -> thread, using FreeBSD syscall
names instead of NetBSD, and axeing syscallarg() and retval arguments.
Approved by: re (scottl)
and writev() except that they take an additional offset argument and do
not change the current file position. In SAT speak:
preadv:readv::pread:read and pwritev:writev::pwrite:write.
- Try to reduce code duplication some by merging most of the old
kern_foov() and dofilefoo() functions into new dofilefoo() functions
that are called by kern_foov() and kern_pfoov(). The non-v functions
now all generate a simple uio on the stack from the passed in arguments
and then call kern_foov(). For example, read() now just builds a uio and
calls kern_readv() and pwrite() just builds a uio and calls kern_pwritev().
PR: kern/80362
Submitted by: Marc Olzheim marcolz at stack dot nl (1)
Approved by: re (scottl)
MFC after: 1 week
frame being sent is to be ack'd and hasn't been filtered by the h/w;
this insures we don't pass in tx descriptors that have no meaningful
state (e.g. mcast/bcast frames are not acked and so have no tx retry
counts)
Approved by: re (scottl)
Obtained from: Atheros
Move ethernet MAC address setting into pccard attachment
Fix panic from IFP2ENADDR() use prior to if_alloc
Remove OLDCARD compat support. This should work still on oldcard, however.
sn_attach now requires that the resources be activated now, so adjust.
Approved by: re (scottl)
don't mark the MORE_DATA bit when taking it off the ps queue, there's
no 802.11 header then; we must wait to do this at encap time so
mark the mbuf instead.
Reviewed by: avatar
Approved by: re (scottl)
Obtained from: Atheros
stations in ap mode. Track when a node's first auth frame is
received and use this to decide whether or not to bump the refcnt.
This insures we only ever bump the refcnt once.
Reviewed by: avatar
Approved by: re (scottl)
This fixes duplicative BSS entries(memory leaks as well) listed in
"ifconfig dev list scan" when a station fails to associate with an AP.
Reviewed by: sam
Approved by: re (scottl)
hooks for each outgoing interface but also run pfil hooks _N times_ on the
bridge interface. This is changed so pfil hooks are run once for the bridge
interface (bridge0) and then only on the outgoing interfaces in the broadcast
loop.
- Simplify bridge_enqueue() by moving bridge_pfil() to the callers.
- Check (inet6_pfil_hook.ph_busy_count >= 0), it may be possible to have a
packet filter hooked for only ipv6 but we were only checking if ipv4 hooks
were busy.
- Minor optimisation for null mbuf check after bridge_pfil(), move it into the
if-block as it couldnt possibly be null outside.
Prodded by: mlaier
Approved by: re (scottl), mlaier (mentor)
redundant with respect to existing mbuf copy label routines. Expose
a new mac_copy_mbuf() routine at the top end of the Framework and
use that; use the existing mpo_copy_mbuf_label() routine on the
bottom end.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, SPAWAR
Approved by: re (scottl)
which is invoked from socket() and socketpair(), permitting MAC
policy modules to control the creation of sockets by domain, type, and
protocol.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, SPAWAR
Approved by: re (scottl)
Requested by: SCC
copied and pasted. I had actually tested without this change in my
trees as had the other testers.
Reported by: bde, Rostislav Krasny rosti dot bsd at gmail dot com
Approved by: re (scottl)
Pointy hat to: jhb
PPPoE modes. The interface was declared obsoleted before 5.3-RELEASE.
When running as access concentrator ng_pppoe(4) supports both modes
simultanously. When running as client mode can be swicthed in ppp(8)
configuration.
Approved by: re (scottl)
fixes a ref cnt leak
o make unicast key handling on delete identical to set
o change legacy wep key api to reset the 802.11 state
machine for backwards compatibility
Reviewed by: avatar
Approved by: re (scottl)
an item may be queued and processed later. While this is OK for mbufs,
this is a problem for control messages.
In the framework:
- Add optional callback function pointer to an item. When item gets
applied the callback is executed from ng_apply_item().
- Add new flag NG_PROGRESS. If this flag is supplied, then return
EINPROGRESS instead of 0 in case if item failed to deliver
synchronously and was queued.
- Honor NG_PROGRESS in ng_snd_item().
In ng_socket:
- When userland sends control message add callback to the item.
- If ng_snd_item() returns EINPROGRESS, then sleep.
This change fixes possible races in ngctl(8) scripts.
Reviewed by: julian
Approved by: re (scottl)
This case is triggered with ptrace(2) and the PT_SETREGS function.
Change the return type of the function to int so that errors can be
passed on to the caller.
Approved by: re (scottl)
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64, ia64 and ppc.
This uses the code block from if_bridge and the newly added macro
IP_HDR_ALIGNED_P().
This /might/ be a temporary messure before all NIC drivers are educated
to align the header themself.
PR: ia64/81284
Obtained from: NetBSD (if_bridge)
Approved by: re (dwhite), mlaier (mentor)
- Allow libpmc(3) to support P4/EMT64 PMCs on the amd64 architecture
and AMD K8 PMCs on the i386. [2]
Submitted by: ps [1]
Pointy hat: myself [2]
Approved by: re (scottl)
o Indent usb ids properly
o Check the return value of if_alloc()
o Call if_free() in ural_detach()
Reviewed by: silby (mentor)
Approved by: re (scottl)
Using ISO-10646-UCS-2 will cause a problem when we use our own
iconv functions in the future, or port iconv other than GNU
libiconv.
Each vendors treat "UCS-2" as follows, and endian issue is
vendor specific:
- Solaris 8 iconv
Little Endian with BOM
- HP-UX iconv
Big Endian
- NetBSD/i386 1.6 iconv
Little Endian
- GNU libiconv
Big Endian
- glibc(RedHat AS 2.1 x86) iconv
Little Endian
- IANA
Name: ISO-10646-UCS-2
MIBenum: 1000
Source: the 2-octet Basic Multilingual Plane, aka Unicode
this needs to specify network byte order: the standard
does not specify (it is a 16-bit integer space)
Alias: csUnicode
- MSDN
Little Endian
http://msdn.microsoft.com/library/en-us/cpref/html/frlrfsystemtextencodingclassgetencodingtopic2.asp
Now using UTF-16BE is harmless, because
- same as UCS-2 with 2 byte range (U+0000 - U+FFFF)
- kernel code of each file systems(cd9660, msdosfs, ntfs) believes
Unicode is a 2 byte character at this time.
- UDF has only 2 byte range of Unicode filenames.
- It's defined at RFC2781.
So I believe it's time to change before starting new RELENG_6. :)
Approved by: re (scottl)
pointer doesn't point to the first instruction of that function, but
rather to a descriptor. The descriptor has the address of the first
instruction, as well as the value of the global pointer. The symbol
table doesn't know anything about descriptors, so if you lookup the
name of a function you get the address of the first instruction. The
cast from the address, which is the result of the symbol lookup, to a
function pointer as is done in db_fncall is therefore invalid.
Abstract this detail behind the DB_CALL macro. By default DB_CALL is
defined as db_fncall_generic, which yields the old behaviour. On ia64
the macro is defined as db_fncall_ia64, in which a descriptor is
constructed to yield a valid function pointer.
While here, introduce DB_MAXARGS. DB_MAXARGS replaces the existing
(local) MAXARGS. The DB_MAXARGS macro can be defined by platforms to
create a convenient maximum. By default this will be the legacy 10.
On ia64 we define this macro to be 8, for 8 is the maximum number of
arguments that can be passed in registers. This avoids having to
implement spilling of arguments on the memory stack.
Approved by: re (dwhite)
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64 (tier1) and ia64/ppc (tier2).
This adds two new macros that check the alignment, these are compile time
dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where
alignment isn't need so the cost is avoided.
IP_HDR_ALIGNED_P()
IP6_HDR_ALIGNED_P()
Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment
is checked for ipfw and dummynet too.
PR: ia64/81284
Obtained from: NetBSD
Approved by: re (dwhite), mlaier (mentor)
as they are already default for I686_CPU for almost 3 years, and
CPU_DISABLE_SSE always disables it. On the other hand, CPU_ENABLE_SSE
does not work for I486_CPU and I586_CPU.
This commit has:
- Removed the option from conf/options.*
- Removed the option and comments from MD NOTES files
- Simplified the CPU_ENABLE_SSE ifdef's so they don't
deal with CPU_ENABLE_SSE from kernel configuration. (*)
For most users, this commit should be largely no-op. If you used to
place CPU_ENABLE_SSE into your kernel configuration for some reason,
it is time to remove it.
(*) The ifdef's of CPU_ENABLE_SSE are not removed at this point, since
we need to change it to !defined(CPU_DISABLE_SSE) && defined(I686_CPU),
not just !defined(CPU_DISABLE_SSE), if we really want to do so.
Discussed on: -arch
Approved by: re (scottl)
by amd64 and i386: For buffered writes we collect data and write it
out a ${DEV_BSIZE}-sized block at a time. The fragsz variable is used
to keep track of how much data we have collected in the buffer so far
and it's reset to zero immediately after writing a block to the dump
device.
When the last, possibly partially filled buffer is flushed, we didn't
reset fragsz to 0 and as such would stop reflecting reality. Since we
currently only need to do buffered writes once, this isn't a problem.
However, when kernel dumps are made by hand (say by callling doadump
from within DDB), the improperly cleared state from the first call to
dumpsys causes the next call to dumpsys to create an invalid code file.
This change resets fragsz after flushing the partially filled buffer so
that it fixes the two problems at once.
Approved by: re (scottl)
after PAWS checks. The symptom of this is an inconsistency in the cached
sack state, caused by the fact that the sack scoreboard was not being
updated for an ACK handled in the header prediction path.
Found by: Andrey Chernov.
Submitted by: Noritoshi Demizu, Raja Mukerji.
Approved by: re
does not clear tlen and frees the mbuf (leaving th pointing at
freed memory), if the data segment is a complete duplicate.
This change works around that bug. A fix for the tcp_reass() bug
will appear later (that bug is benign for now, as neither th nor
tlen is referenced in tcp_input() after the call to tcp_reass()).
Found by: Pawel Jakub Dawidek.
Submitted by: Raja Mukerji, Noritoshi Demizu.
Approved by: re
- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.
- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.
Reviewed by: jmg
Approved by: re (scottl), scottl (mentor override)
Pointyhat to: ssouhlal
Will be happy: everyone
timer since irq0 isn't being driven at hz in that case and we don't need to
try to handle edge cases with rollover, etc. that require irq0 to be firing
for the timecounter to actually work.
Submitted by: phk
Tested by: schweikh
Approved by: re (scottl)
and stop trying to play cute games so that sccs[] shares space with
version[].
Reported by: Jilles Tjoelker jilles at stack dot nl
Discussed with: bde, "R. Imura" imura at ryu16 dot org
Idea from: NetBSD (via bde)
Approved by: re (scottl)
MFC after: 1 week
a cosmetic change. m_uiotombuf() produces a packet header mbuf, while
original implementation did not. When kernel is compiled with MAC
support, headerless mbuf will cause panic.
Reported by: Alexander Nikiforenko <asn rambler-co.ru>
Approved by: re (scottl)
MFC After: 2 weeks
module-specific malloc types. These should help us to pinpoint the
possible memory leakage in the future.
- Implementing xpt_alloc_ccb_nowait() and replacing all malloc/free based
CCB memory management with xpt_alloc_ccb[_nowait]/xpt_free_ccb. Hopefully
this would be helpful if someday we move the CCB allocator to use UMA
instead of malloc().
Encouraged by: jeffr, rwatson
Reviewed by: gibbs, scottl
Approved by: re (scottl)
starts with an ifatm which in turns has an ifnet. Remove also a couple
of unneccessary casts that could hide such things in the future.
Approved by: re
so residue of division for all hosts on net is the same, and thus only
one VHID answers. Change source IP in host byte order.
Reviewed by: mlaier
Approved by: re (scottl)
o Grab the MAC address out of the CIS if the card has the special
3Com 0x88 tuple. Most 3Com cards don't have this tuple, but we
prefer it to the eeprom since it only appears to be present when
the eeprom doesn't have the info. So far, I've only observed this
on my 3C362 and 3C362B cards, but the NetBSD driver implies that
the 3C362C also has this tuple, and that some 3C574 cards do too (none
of mine do). ep_pccard_mac was written after looking at the NetBSD
code.
o Store the enet addr in the softc for this device, so we can use the
overridden MAC to set the station address.
o Create a routine to set the station address and use it where we need it.
o setup the cmd shitfs and such before we call ep_alloc(), and remove
setting up the cmd shift value there. It initializes to 0, and those
attachments that need to frob it do so before calling ep_alloc.
o Remove some obsolete comments
o No longer a need to export ep_get_macaddr, so make it static
o ep_alloc already grabs the EEPROM id, so we don't need to grab it again
in ep_pccard_attach.
o eliminate unit, it isn't needed, fix some printfs to be device_printf
instead.
# All my pccards except the 3C1 work now. Didn't test ISA or cbus cards
# that I have: 3C509B-TP or 3C569B-J-TPO
Tested on: 3C589B, 3C589C, 3C589D, 3C589D-TP, 3C562, 3C562B/3C563B,
3C562D/3C563D, 3CCFE574BT, 3CXEM556, 3CCSH572BT, 3C574-TX,
3CCE589EC, 3CXE589EC, 3CCFEM556, 3C1
Approved by: re (scottl)
scan the CIS for interesting tuples. 95% of what can be obtained from
the CIS is harvested by the pccard layer and presented to the user in
standard function calls. However, there are special needs at times
where the standard stuff doesn't suffice. This is for those special
cases.
CARD_SCAN_CIS(device_get_parent(dev), function, argp)
scans the CIS of the card, passing each tuple to function with
the tuple and argp as its arguments. Returning 0 continues the scan,
while returning 1 terminates the scan. The value of the last
invocation of function is returned from this function.
int (*pccard_scan_t)(struct pccard_tuple *tuple, void *argp)
function called for each tuple. Elements of the CIS tuple can be
read with pccard_tuple_read_{1,2,3,4,n}(). You are reading
the actual tuple memory each time, in case your card has
registers in the CIS.
# I suppose these things should be documented in pccard(4) or something like
# that.
# I plan on unifying cardbus CIS support in a similar way.
Approved by: re (scottl)
- pmcstat(8) gprof output mode fixes:
lib/libpmc/pmclog.{c,h}, sys/sys/pmclog.h:
+ Add a 'is_usermode' field to the PMCLOG_PCSAMPLE event
+ Add an 'entryaddr' field to the PMCLOG_PROCEXEC event,
so that pmcstat(8) can determine where the runtime loader
/libexec/ld-elf.so.1 is getting loaded.
sys/kern/kern_exec.c:
+ Use a local struct to group the entry address of the image being
exec()'ed and the process credential changed flag to the exec
handling hook inside hwpmc(4).
usr.sbin/pmcstat/*:
+ Support "-k kernelpath", "-D sampledir".
+ Implement the ELF bits of 'gmon.out' profile generation in a new
file "pmcstat_log.c". Move all log related functions to this
file.
+ Move local definitions and prototypes to "pmcstat.h"
- Other bug fixes:
+ lib/libpmc/pmclog.c: correctly handle EOF in pmclog_read().
+ sys/dev/hwpmc_mod.c: unconditionally log a PROCEXIT event to all
attached PMCs when a process exits.
+ sys/sys/pmc.h: correct a function prototype.
+ Improve usage checks in pmcstat(8).
Approved by: re (blanket hwpmc)
This is good enough to be able to run a RELENG_4 gdb binary against
a RELENG_4 application, along with various other tools (eg: 4.x gcore).
We use this at work.
ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace,
procfs and core dumps.
procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client
and target application.
procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their
sscanf fails. They expect an unsigned long.
imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps.
sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note
that 64 bit consumers can still debug 32 bit targets.
IA64 has got stubs for ia32_reg.c.
Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't
implemented in the 32/64 wrapper yet. We also make a tiny patch to
gdb pacify it over conflicting formats of ld-elf.so.1.
Approved by: re
that newer Intel cpu hardware implements them too. This includes things
like the NX (pte no-execute) flag for execute protection. We'll need to
reference this for implementing no-exec in pmap.c at some point.
Some feature flags are duplicated in both the Intel-orignated bits and
the AMD bits. Suppress the the duplicates correctly - the old code
assumed they were a 1:1 mapping which is not correct. We can't just mask
off the bits present in cpu_feature.
Converge with amd64 where this originated from.
Intel cpu's that implement any AMD features will report them in dmesg now.
Approved by: re
* Add ichwd (The Intel EM64T folks have an ICH)
* Cosmetic comment syncs
* Merge cpufreq change over to NOTES
* add pbio (it compiles, but isn't useful since no boxes have ISA slots)
* copy ath settings (note: wlan disabled here since its in global NOTES)
* copy profiling, including fixing a previous i386->amd64 merge typo.
Approved by: re (blanket i386 <-> amd64 sync/convergence)
be possible to get the swapgs state reversed if doreti traps during
the iretq. Attempt to handle this. load_gs() might need special
handling too. Running the kernel with the user's TLS and the
kernel's PCPU space interchanged would be bad(TM).
Discovered as a result of a conversation with: bde
Approved by: re
ioctl numbers in backwards compatability mode. eg: an IOC_IN ioctl with
a size of zero. Traditionally this was what you did before IOC_VOID
existed, and we had some established users of this in the tree, namely
procfs. Certain 3rd party drivers with binary userland components also
have this too.
This is necessary to have 4.x and 5.x binaries use these ioctl's. We
found this at work when trying to run 4.x binaries.
Approved by: re
dump format. The key reason to do this is so that we can dump sparse
address space. For example, we need to be able to skip the PCI hole
just below the 4GB boundary. Trying to destructively dump MMIO device
registers is Really Bad(TM). The frequent result of trying to do a
crash dump on a machine with 4GB or more ram was ugly (lockup or reboot).
This code has been taken directly from the IA64 dump_machdep.c code,
with just a few (mostly minor) mods.
Introduce a dump_avail[] array in the machdep.c code so that we have a
source of truth for what memory is present in a machine that needs to be
dumped. We can't use phys_avail[] because all sorts of things slice
memory out of it that we really need to dump. eg: the vm page array
and the dmesg buffer. dump_avail[] is pretty much an unmolested version
of phys_avail[]. It does have Maxmem correction.
Bump the i386 and amd64 dump format to version 2, but nothing actually
uses this. amd64 was actually using the i386 dump version number.
libkvm support to follow.
Approved by: re
The ipfw tables lookup code caches the result of the last query. The
kernel may process multiple packets concurrently, performing several
concurrent table lookups. Due to an insufficient locking, a cached
result can become corrupted that could cause some addresses to be
incorrectly matched against a lookup table.
Submitted by: ru
Reviewed by: csjp, mlaier
Security: CAN-2005-2019
Security: FreeBSD-SA-05:13.ipfw
Correct bzip2 permission race condition vulnerability.
Obtained from: Steve Grubb via RedHat
Security: CAN-2005-0953
Security: FreeBSD-SA-05:14.bzip2
Approved by: obrien
Correct TCP connection stall denial of service vulnerability.
A TCP packets with the SYN flag set is accepted for established
connections, allowing an attacker to overwrite certain TCP options.
Submitted by: Noritoshi Demizu
Reviewed by: andre, Mohan Srinivasan
Security: CAN-2005-2068
Security: FreeBSD-SA-05:15.tcp
Approved by: re (security blanket), cperciva
was written in the old fragmented mbuf chain instead of the defragmented
one. Thus, the duration field of outgoing frames was incorrect.
o Only call m_defrag() if the mbuf fragmentation threshold is greater
than what is currently supported by the driver.
Reviewed by: silby (mentor)
Approved by: re (scottl)
fields for each system call, I missed two system call files because
they weren't named syscalls.master. Catch up with this last two,
mapping the system calls to the NULL event for now.
Spotted by: jhb
Approved by: re (scottl)
with a single copyin() + translate and translate + copyout() rather than
using the stackgap.
- Remove implementation of the stackgap for freebsd32 since it is no longer
used for that compat ABI.
Approved by: re (scottl)
reporting - in my previous change, I missed the case where a mbuf
from the packet zone was freed back to the mbuf/packet keg, where
it was subsequently put into the mbuf zone and found not to contain
the expected trash. This change adds the necessary trash_dtor call inside
mb_fini_pack so that everything is correct.
Thanks for Bosko for finding the bug and showing me how secondary zones
work.
Approved by: re (dwhite)
route itself.
It fixes a bug where an IPv4 route for example has an IPv6 gateway
specified:
route add 10.1.1.1 -inet6 fe80::1%fxp0
Destination Gateway Flags Refs Use Netif Expire
10.1.1.1 fe80::1%fxp0 UGHS 0 0 fxp0
The fix rejects these illegal combinations:
route: writing to routing socket: Invalid argument
add host 10.1.1.1: gateway fe80::1%fxp0: Invalid argument
Reviewed by: KAME jinmei@isl.rdc.toshiba.co.jp
Reviewed by: andre (mentor)
Approved by: re
MFC after: 5
which command to use to read the eeprom and which devices have an MII.
Simplify code by no longer using the OLDCARD compat rouintes (I don't
know if this breaks OLDCARD on pc98 or not, but OLDCARD on pc98 days
are numbered, I hope). This also removes a number of kludges that we
had before because they are OBE. Add a convenience routine to lookup
the device to avoid many casts in many places.
Tested with: 3C589D-TP, 3CCSH572BT
Approved by: re (scottl, blanket ep)
handling of pci resources, and mapping framebuffer leading to panics on X
startup. The proper solution involves use of bus_alloc_resource without
RF_ACTIVE, but this code is being rewritten in DRM CVS currently, and disabling
for now doesn't remove any features, so take the easy route.
PR: kern/80718
Approved by: re (scottl)
immediate is not saved by the architecture. Any of the break.{mifx}
instructions have their immediate saved in cr.iim on interruption.
Consequently, when we handle the break interrupt, we end up with a
break value of 0 when it was a break.b. The immediate is important
because it distinguishes between different uses of the break and
which are defined by the runtime specification.
The bottomline is that when the GNU debugger replaces a B-unit
instruction with a break instruction in the inferior, we would not
send the process a SIGTRAP when we encounter it, because the value
is not one we recognize as a debugger breakpoint.
This change adds logic to decode the bundle in which the break
instruction lives whenever the break value is 0. The assumption
being that it's a break.b and we fetch the immediate directly out
of the instruction. If the break instruction was not a break.b,
but any of break.{mifx} with an immediate of 0, we would be doing
unnecessary work. But since a break 0 is invalid, this is not a
problem and it will still result in a SIGILL being sent to the
process.
Approved by: re (scottl)
processing is now done in the ACK processing case.
- Merge tcp_sack_option() and tcp_del_sackholes() into a new function
called tcp_sack_doack().
- Test (SEG.ACK < SND.MAX) before processing the ACK.
Submitted by: Noritoshi Demizu
Reveiewed by: Mohan Srinivasan, Raja Mukerji
Approved by: re
pointer to a softc which is no longer valid since the ifnet struct was split
out from the softc.
Approved by: mlaier (mentor)
Approved by: re (blanket)
Dont try to enable read/write caching on devices that doesn't support it,
this reduces the noise from ATA on flash devices and the like.
Approved by: re@ (scottl)
kernel module. LibAlias is not aware about checksum offloading,
so the caller should provide checksum calculation. (The only
current consumer is ng_nat(4)). When TCP packet internals has
been changed and it requires checksum recalculation, a cookie
is set in th_x2 field of TCP packet, to inform caller that it
needs to recalculate checksum. This ugly hack would be removed
when LibAlias is made more kernel friendly.
Incremental checksum updates are left as is, since they don't
conflict with offloading.
Approved by: re (scottl)
actually work. Also use the right semantics for IF_HANDOFF to get correct
stats.
Reported and tested by: Sascha Luck <sascha at c4inet dot net>
Approved by: re (blanket)
a DLT_NULL interface. In particular:
1) Consistently use type u_int32_t for the header of a
DLT_NULL device - it continues to represent the address
family as always.
2) In the DLT_NULL case get bpf_movein to store the u_int32_t
in a sockaddr rather than in the mbuf, to be consistent
with all the DLT types.
3) Consequently fix a bug in bpf_movein/bpfwrite which
only permitted packets up to 4 bytes less than the MTU
to be written.
4) Fix all DLT_NULL devices to have the code required to
allow writing to their bpf devices.
5) Move the code to allow writing to if_lo from if_simloop
to looutput, because it only applies to DLT_NULL devices
but was being applied to other devices that use if_simloop
possibly incorrectly.
PR: 82157
Submitted by: Matthew Luckie <mjl@luckie.org.nz>
Approved by: re (scottl)
that says why we do this (or rather, explains that it is some voodoo magic
that's poorly understood). The local buffer fixes the crash on attach.
o Rename get_e() to ep_get_e() to avoid namespace pollution.
Submitted by: mux
Approved by: re (scottl)
opening a device, devfs_open needs the file descriptor to install its
own fileops. Failing to pass the file descriptor causes the vnode to
be returned with the regular vnops, which will cause a panic on the
first read or write because devfs_specops is not meant to support
those operations.
This bug caused a panic after exec'ing any set[ug]id program with
fds 0..2 closed (i.e., if any action had to be taken by fdcheckstd, we
would panic if the exec'd program ever tried to use any of those
descriptors).
Reviewed by: phk
Approved by: re (scottl)
allowed writing to the registers by any user that can open the DRI device, and
therefore ability to initiate DMA. This came in with the merge from DRI CVS on
2005-04-15.
Approved by: re (scottl)
Obtained from: DRM CVS
a problem with one particular switch module. Create a kernel option
BGE_FAKE_AUTONEG that restores the 5.4 behavior, which should make the DNLK
switch module work. IBM/Intel blades with Intel or AD switch modules should
work without patching or kernel options with this commit.
Hardware for testing provided by several folks, including
Danny Braniss <danny@cs.huji.ac.il>, Achim Patzner <ap@bnc.net>,
and OffMyServer.
Approved by: re
exec_copyin_strings() to catch up to rev 1.266 of kern_exec.c. This fixes
panics on amd64 with compat binaries since exec_free_args() was freeing
more memory than these functions were allocating and the mismatch could
cause memory to be freed out from under other concurrent execs.
Approved by: re (scottl)
Provide a backwards compatible way to have the extra macro by defining
PCCARD_API_LEVEL 5 before including pccarddevs for driver writers that
want/need to have the same driver on 5 and 6 with pccard attachments.
Approved by: re (dwhite)
(depends on how many memory you have) observed through "tar -tvf /dev/sa0."
Without this patch, RELENG_5 and HEAD panics with something like:
kmem_malloc(4096): kmem_map too small: 42258432 total allocated
RELENG_4 doesn't panic but spews following errors:
camq_init: - cannot malloc array!
Reviewed by: gibbs, scottl
Approved by: re (scottl)
MFC after: 3 days
to the statement in ip_mroute.h, as well as being the same as what
OpenBSD has done with this file. It matches the copyright in NetBSD's
1.1 through 1.14 versions of the file as well, which they subsequently
added back.
It appears to have been lost in the 4.4-lite1 import for FreeBSD 2.0,
but where and why I've not investigated further. OpenBSD had the same
problem. NetBSD had a copyright notice until Multicast 3.5 was
integrated verbatim back in 1995. This appears to be the version that
made it into 4.4-lite1.
Approved by: re (scottl)
MFC after: 3 days
(1) "ipf -T" is broken for fetching single entries and
(2) loading rules with numbered collections does not order insertion right.
(3) stats aren't accumulated for hash table memory failures
Approved by: re (dwhite)
the UMA "trash" allocator is used - this ensures that any writes to a freed
mbuf should provoke a panic.
Only enabled under INVARIANTS, of course.
Approved by: re (scottl)
portability issues. Also note that for amd64, a hack is used to work
around gcc optimization (thanks to cognet@).
Reviewed by: mux (mentor)
Approved by: re (dougb)
#!-line had multiple whitespace characters after the interpreter name, and
it did not have any options, then the code would do nasty things trying to
process a (non-existent) option-string which "ended before it began"...
Submitted by: Morten Johansen
Approved by: re (dwhite)
are actually caused by a buf with both VNCLEAN and VNDIRTY set. In
the traces it is clear that the buf is removed from the dirty queue while
it is actually on the clean queue which leaves the tail pointer set.
Assert that both flags are not set in buf_vlist_add and buf_vlist_remove.
Sponsored by: Isilon Systems, Inc.
Approved by: re (blanket vfs)
cache_zap() to clear the v_dd pointers when a directory vnode is forcibly
discarded. For this to work, all vnodes with v_dd pointers to a directory
must also have name cache entries linked via v_cache_dst to that dvp
otherwise we could not find them at cache_purge() time. The following
code snipit could break this guarantee by unlinking a directory before
fetching it's dotdot. The dotdot lookup would initialize the v_dd field
of the unlinked directory which could never be cleared. To fix this
we don't initialize v_dd for orphaned vnodes.
printf("rmdir: %d\n", rmdir("../foo")); /* foo is cwd */
printf("chdir: %d\n", chdir(".."));
printf("%s\n", getwd(NULL));
Sponsored by: Isilon Systems, Inc.
Discovered by: kkenn
Approved by: re (blanket vfs)
not only means that it's possible (though unlikely) that we hand out
differing tags for the same bus space, it also means that the tags
we handed out are not used during bus enumeration. Both affect our
ability to compare tags. Fix the first by initializing our tags only
once. Fix the second by testing if one of the tags to compare is our
tag and the other is a busspace_isa_{io|mem} tag and declare them
equal if so.
This fixes using uart(4) as the serial console on a ds10. That is,
the low-level console worked, but we could not match the resources
to one of the UARTs found during bus enumeration, which prevented
uart(4) from becoming the console in single- or multi-user mode.
Approved by: re (kensmith)
MFC after: 2 days
Thanks to: all involved in getting a ds10 to me; directly or indirectly.
Special thanks to: Dave Knight, ISC (for not scratching my Porsche :-)
pending discussion of how implementation would proceed. Applications
like -lc_r expect select(3) to match the EAGAIN-status of IO
functions.
Approved by: re
- do not use static memory as we are under a shared lock only
- properly rtfree routes allocated with rtalloc
- rename to verify_path6()
- implement the full functionality of the IPv4 version
Also make O_ANTISPOOF work with IPv6.
Reviewed by: gnn
Approved by: re (blanket)
kernel mode, always use the curthread pmap instead. There are valid cases
were we can fault on a user address from the kernel without pcb_onfault
being set.
Approved by: re (blanket)
contaminated with the GPL code. While this information was present in
the COPYRIGHT.INFO file, it is FreeBSD's standard practice to, where
possible, include explicit license information in files.
Approved by: release engineer (scottl)
ref while we're calling vgone(). This prevents transient refs from
re-adding us to the free list. Previously, a vfree() triggered via
vinvalbuf() getting rid of all of a vnode's pages could place a partially
destructed vnode on the free list where vtryrecycle() could find it. The
first call to vtryrecycle would hang up on the vnode lock, but when it
failed it would place a now dead vnode onto the free list, and another
call to vtryrecycle() would free an already free vnode. There were many
complications of having a zero ref count while freeing which can now go
away.
- Change vdropl() to release the interlock before returning. All callers
now respect this, so vdropl() directly frees VI_DOOMED vnodes once the
last ref is dropped. This means that we'll never have VI_DOOMED vnodes
on the free list.
- Seperate v_incr_usecount() into v_incr_usecount(), v_decr_usecount() and
v_decr_useonly(). The incr/decr split is so that incr usecount can
return with the interlock still held while decr drops the interlock so
it can call vdropl() which will potentially free the vnode. The calling
function can't drop the lock of an already free'd node. v_decr_useonly()
drops a usecount without droping the hold count. This is done so the
usecount reaches zero in vput() before we recycle, however the holdcount
is still 1 which prevents any new references from placing the vnode
back on the free list.
- Fix vnlrureclaim() to vhold the vnode since it doesn't do a vget(). We
wouldn't want vnlrureclaim() to bump the usecount since this has
different semantics. Also change vnlrureclaim() to do a NOWAIT on the
vn_lock. When this function runs we're usually in a desperate situation
and we wouldn't want to wait for any specific vnode to be released.
- Fix a bunch of misc comments to reflect the new behavior.
- Add vhold() and vdrop() to vflush() for the same reasons that we do in
vlrureclaim(). Previously we held no reference and a vnode could have
been freed while we were waiting on the lock.
- Get rid of vlruvp() and vfreehead(). Neither are used. vlruvp() should
really be rethought before it's reintroduced.
- vgonel() always returns with the vnode locked now and never puts the
vnode back on a free list. The vnode will be freed as soon as the last
reference is released.
Sponsored by: Isilon Systems, Inc.
Debugging help from: Kris Kennaway, Peter Holm
Approved by: re (blanket vfs)
Hopefully this fixes ed(4) under qemu. I'm shocked that real hardware
is apparently working with these bugs.
Approved by: re (ifnet blanket)
Pointy hat: brooks
of the clean and dirty lists. This is in an attempt to catch the wrong
bufobj problem sooner.
- In vgonel() don't acquire an extra reference in the active case, the
vnode lock and VI_DOOMED protect us from recursively cleaning.
- Also in vgonel() clean up some stale comments.
Sponsored by: Isilon Systems, Inc.
Approved by: re (blanket vfs)
early. I've moved it all the way to the top rather than part way up as
the submitter did.
Submitted by: Jung-uk Kim <jkim at niksun dot com>
Reported by: submitter, le, dougb
Approved by: re (ifnet blanket)
function pointer to the vga render dispatch table and initialized it with
vga_nop. The problem is that vga_nop() is a varargs function, and the
table declares a non-varargs function pointer. On amd64 (and I think ppc),
mixing varargs and non-varargs function pointers is fatal.
Change vga_nop() and gfb_nop() from varargs to non-varargs do-nothing
functions. This stops the stack corruption that only happened on amd64.
Approved by: re (scottl)
used to ensure that we weren't exiting the syscall with a lock still
held. This wasn't safe, however, because we'd already executed a vput()
and on a loaded system the vnode may have been free'd by the time we
assert. This functionality is also handled by the td_locks assert in
userret, which doesn't tell you what the syscall was, but will at least
panic before you deadlock.
Sponsored by: Isilon Systems, Inc.
Discovred by: Peter Holm
Approved by: re (blanket vfs)
anyway and it's not used outside of vfs_subr.c.
- Change vgonel() to accept a parameter which determines whether or not
we'll put the vnode on the free list when we're done.
- Use the new vgonel() parameter rather than VI_DOOMED to signal our
intentions in vtryrecycle().
- In vgonel() return if VI_DOOMED is already set, this vnode has already
been reclaimed.
Sponsored by: Isilon Systems, Inc.
this is happening at the moment and sometimes causing panics later on the
package cluster when we bremfree() a buf whose delayed bremfree() did not
previously happen.
Sponsored by: Isilon Systems, Inc.
most of the code to deal with them has been dead for sometime. Simplify
the code by doing an insert sort hinted by the current head position.
Met with apathy by: arch@
on an IPv4 packet as these variables are uninitialized if not. This used to
allow arbitrary IPv6 packets depending on the value in the uninitialized
variables.
Some opcodes (most noteably O_REJECT) do not support IPv6 at all right now.
Reviewed by: brooks, glebius
Security: IPFW might pass IPv6 packets depending on stack contents.
Approved by: re (blanket)
I introduce a very small race here (some file system can be mounted or
unmounted between 'count' calculation and file systems list creation),
but it is harmless.
Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by: Peter Holm <peter@holm.cc>
fails.
Move detaching the ifnet from the ifindex_table into if_free so we can
both keep the sanity checks and actually delete the ifnets. [0]
Reported by: gallatin [0]
Approved by: re (blanket)
It can be used to panic the kernel by giving too big value.
Fix it by moving allocation and size verification into kern_getfsstat().
This even simplifies kern_getfsstat() consumers, but destroys symmetry -
memory is allocated inside kern_getfsstat(), but has to be freed by the
caller.
Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by: Peter Holm <peter@holm.cc>
o getsockopt(SO_ACCEPTFILTER) always returns success on listen socket
even we didn't install accept filter on the socket.
o Fix these bugs and add regression tests for them.
Submitted by: Igor Sysoev [1]
Reviewed by: alfred
MFC after: 2 weeks
to initialize the buffer array in ata_raid_attach() by removing the
initializer. There's no memset(?) in the kernel. Instead, assign
'\0' to the first element. The buffer array holds strings only, so
this is functionally equivalent.
Applies to: ia64
Tripped over by: tinderbox
events could be added to cover other interesting details.
- Add some VNASSERTs to discover places where we access vnodes after
they have been uma_zfree'd before we try to free them again.
- Add a few more VNASSERTs to vdestroy() to be certain that the vnode is
really unused.
Sponsored by: Isilon Systems, Inc.
in that KTR_VFS will be hand placed, while KTR_VOP traces the individual
vnode operations and is generated by vnode_if.awk.
- Add a comment describing KTR_VOP.
I missed. Since I did no rearrange any softcs, casting the result of
device_get_softc() to (struct ifnet **) and derefrencing it yeilds a
pointer to the ifp. This makes at least vr(4) nics work.
many regions checked again and again despite knowing the pages
contained were not usable and only satisfied the alignment constraints
This case was compounded, especially for large allocations, by the
practice of looping from the top of memory so as to keep out of the
important low-memory regions. While the old contigmalloc(9) has the
same problem, it is not as noticeable due to looping from the low
memory to high.
This degenerate case is fixed, as well as reversing the sense of the
rest of the loops within it, to provide a tremendous speed increase.
This makes the best case O(n * VM overhead) much more likely than the
worst case O(4 * VM overhead). For comparison, the worst case for old
contigmalloc would be O(5 * VM overhead) in addition to its strategy
of turning used memory into free being highly pessimal.
Also, fix a bug that in practice most likely couldn't have been triggered,
int the new contigmalloc(9): it walked backwards from the end of memory
without accounting for how many pages it needed. Potentially, nonexistant
pages could have been mapped. This hasn't occurred because the kernel
generally requests as its first contigmalloc(9) a single page.
Reported by: Nicolas Dehaine <nicko@stbernard.com>, wes
MFC After: 1 month
More testing by: Nicolas Dehaine <nicko@stbernard.com>, wes
atomic write request, it can fill the buffer cache with the entirety
of that write in order to handle retries. However, it never drops
the vnode lock, or else it wouldn't be atomic, so it ends up waiting
indefinitely for more buf memory that cannot be gotten as it has it
all, and it waits in an uncancellable state.
To fix this, hibufspace is exported and scaled to a reasonable
fraction. This is used as the limit of how much of an atomic write
request by the NFS client will be handled asynchronously. If the
request is larger than this, it will be turned into a synchronous
request which won't deadlock the system. It's possible this value is
far off from what is required by some, so it shall be tunable as soon
as mount_nfs(8) learns of the new field.
The slowdown between an asynchronous and a synchronous write on NFS
appears to be on the order of 2x-4x.
General nod by: gad
MFC after: 2 weeks
More testing: wes
PR: kern/79208
well worth the bloat.
- Change the formatting of 'show ktr' slightly to accommodate the
additional field. Remove a tab from the verbose output and place the
actual trace data after a : so it is more easy to understand which
part is the event and which is part of the record.
psm(4), ukbd(4), ums(4) and usb(4) on by default. Modulo some nits with
the most annoying one probably being USB keyboards no longer working at
the OFW boot prompt after halting FreeBSD these drivers work fine on
sparc64 including X and there's nothing left that I'd consider a show-
stopper. I.e. graphical consoles on sun4u machines should either work
out of the box or by plugging in a card that is supported by either
creator(4) or machfb(4). The exception obviously are SBus-only machines
without UPA slots like some Ultra 1 (but which also still lack support
in other areas) and certain Exx0 (but which probably are mainly used
with serial consoles anyway). I'll try to add a cgsix(4) for these later
as Sun CG6 cards are probably the most common SBus framebuffer cards in
sun4u machines. I however don't see much sense in adding drivers for the
dozen of SBus framebuffers that were destined for sparc v8 machines.
The rest of the USB drivers aren't enabled as I'm only aware of ukbd(4)
and ums(4) as well as ohci(4) working with the on-board ALI M5237 and
Sun PCIO-2 controllers. Aue(4) definitely doesn't work on sparc64, yet.
Thanks to:
- Jake for the initial work on syscons(4) on sparc64 and creator(4).
- Marcel for uart(4) and especially for its support for the SCCs which
are only used on sparc64 so far. In various regards it wouldn't have
been possible to enable syscons(4) by default on sparc64, yet, without
uart(4).
- All that tested patches.
Ok'ed by: scottl (RE hat), tmm
files after they were repo-copied to sys/dev/atkbdc. The sources of
atkbdc(4) and its children were moved to the new location in preparation
for adding an EBus front-end to atkbdc(4) for use on sparc64; i.e. in
order to not further scatter them over the whole tree which would have
been the result of adding atkbdc_ebus.c in e.g. sys/sparc64/ebus. Another
reason for the repo-copies was that some of the sources were misfiled,
e.g. sys/isa/atkbd_isa.c wasn't ISA-specific at all but for hanging
atkbd(4) off of atkbdc(4) and was renamed to atkbd_atkbdc.c accordingly.
Most of sys/isa/psm.c, i.e. expect for its PSMC PNP part, also isn't
ISA-specific.
- Separate the parts of atkbdc_isa.c which aren't actually ISA-specific
but are shareable between different atkbdc(4) bus front-ends into
atkbdc_subr.c (repo-copied from atkbdc_isa.c). While here use
bus_generic_rl_alloc_resource() and bus_generic_rl_release_resource()
respectively in atkbdc_isa.c instead of rolling own versions.
- Add sparc64 MD bits to atkbdc(4) and atkbd(4) and an EBus front-end for
atkbdc(4). PS/2 controllers and input devices are used on a couple of
Sun OEM boards and occur on either the EBus or the ISA bus. Depending on
the board it's either the only on-board mean to connect a keyboard and
mouse or an alternative to either RS232 or USB devices.
- Wrap the PSMC PNP part of psm.c in #ifdef DEV_ISA so it can be compiled
without isa(4) (e.g. for EBus-only machines). This ISA-specific part
isn't separated into its own source file, yet, as it requires more work
than was feasible for 6.0 in order to do it in a clean way. Actually
philip@ is working on a rewrite of psm(4) so a more comprehensive
clean-up and separation of hardware dependent and independent parts is
expected to happen after 6.0.
Tested on: i386, sparc64 (AX1105, AXe and AXi boards)
Reviewed by: philip
of just dropping the lock around the ip_output call. This used to cause
corrupted state tree walks for some call-paths.
In a second stage all callouts will be marked MPSAFE according to the
setting of mpsafenet.
Reported and tested by: Matthew Grooms <mgrooms at seton dot org>
MFC after: 3 days
X-MFC after: Marking callouts MPSAFE + 1 week
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.
This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.
Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.
Reviewed by: sobomax, sam
it; instead pass the space occupied by the header down into the
crypto modules (except in the demic case which needs it only when
doing int in s/w)
o while here fix defrag to strip the header from 2nd and later frames
o teach decap code how to handle 4-address frames
- Do not edit pullup_len outside M_CHECK macro.
- Do not reimplement NG_FWD_NEW_DATA().
- Remove redundant check for item being not NULL.
Submitted by: ru
do the subsequent ip_output() in IPFW. In ipfw_tick(), the keep-alive
packets must be generated from the data that resides under the
stateful lock, but they must not be sent at that time, as this would
cause a lock order reversal with the normal ordering (interface's
lock, then locks belonging to the pfil hooks).
In practice, this caused deadlocks when using IPFW and if_bridge(4)
together to do stateful transparent filtering.
MFC after: 1 week
- Initialize val_ec with the content of the volume EC register
for ACPI_IBM_METHOD_VOLUME and ACPI_IBM_METHOD_MUTE in acpi_ibm_sysctl_set()
if there is no CMOS handle present. This fixes setting volume and mute on
such models.
Submitted by: ru
Approved by: philip
hack MSS of packets outgoing via interface with small MTU, to workaround
path MTU discovery problems.
Written by Alexey Popov, with some cleanups from me. There are also plans
to improve mpd port, so that it uses this node, instead of doing MSS
hacking in userland, when 'enable tcpmssfix' option is on.
Submitted by: Alexey Popov <lollypop@flexuser.ru>
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.
Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.
Remove stale comments from pmap_init().
Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.
Tested by: cognet, kensmith, marcel
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()
My benchmarks have not revealed any performance degradation.
Reviewed by: jeff, bde
Approved by: rwatson, jmg (kqueue changes), grehan (mentor)
- Restructured for easier extensibility and maintainability
- To be more uniform with the other ACPI extras drivers and to better reflect
their actual meaning, some sysctls were moved:
o brightness -> lcd_brightness
o keylight -> thinklight
o enable -> events
o misckey -> hotkey
o avail_mask -> availmask
o key_mask -> eventmask
- New "initialmask" sysctl, which holds the initial eventmask
- The "wlan" sysctl is now read-only, since writing to it didn't have
any effect
- The "version" sysctl was removed, since it seems to be the same (0x100)
on all models I have seen
- Support for more hotkeys by the "hotkey" sysctl
- Improved support of ACPI events. Disabled by default, since it unexpectedly
changes the behaviour of some keys. (on my T41p there are now 24 different
keypress events that get reported)
- write support for: volume, mute, lcd_brightness and thinklight
- led(4) interface for the thinklight [1]
- New sysctls "fan" and "fan_speed" to support reading of fan status and speed
- New sysctl "thermal" to support reading of up to 8 thermal sensors
Reviewed by: philip
Approved by: philip
Submitted by: simon [1]
Inspired by: The Linux ibm_acpi driver by Borislav Deianov
http://ibm-acpi.sourceforge.net/
The ThinkPad Button program (tpb) by Markus Braun
http://www.nongnu.org/tpb/
Thanks to: brueffer, dvl, njl, philip, simon, takawata and the many
testers from freebsd-acpi@ and freebsd-mobile@
- Implement sampling modes and logging support in hwpmc(4).
- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.
- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).
- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.
- bug fixes & documentation.
makes the amount of spare room consistent across architectures, and gets
rid of some cases where space was wasted-and-unusable because types have
different sizes & alignment on different hardware platforms. This change
causes more variables to move around on i386, and is much less disruptive
on other platforms. This has been tested on i386, ppc, and sparc, and has
at least been compiled on everything but arm.
Reviewed by: no objections from freebsd-arch, bde
and extend its functionality:
value policy
0 show all mount-points without any restrictions
1 show only mount-points below jail's chroot and show only part of the
mount-point's path (if jail's chroot directory is /jails/foo and
mount-point is /jails/foo/usr/home only /usr/home will be shown)
2 show only mount-point where jail's chroot directory is placed.
Default value is 2.
Discussed with: rwatson
security.bsd.see_other_uids is set to 0, etc.
One can check if invisible process is active, by doing:
# ktrace -p <pid>
If ktrace returns 'Operation not permitted' the process is alive and
if returns 'No such process' there is no such process.
MFC after: 1 week
milliseconds due to what is essentially n^2 algorithmic complexity. This
change makes the algorithm N*2 instead. This heavy processing manifested
itself as skipping in audio and video playback due to the long scheduling
latencies and contention on giant by pcm.
- flushbufqueues() is now responsible for flushing multiple buffers
rather than one at a time. This allows us to save our progress in the
list by using a sentinal. We must do the numdirtywakeup() and
waitrunningbufspace() here now rather than in buf_daemon().
- Also add a uio_yield() after we have processed the list once for bufs
without deps and again for bufs with deps. This is to release Giant
and allow any other giant locked code to proceed.
Tested by: Many users on current@
Revealed by: schedgraph traces sent by Emil Mikulic & Anthony Ginepro
list on fork() if the process doesn't actually have references to any
semaphores. This avoids extra work, as well as potentially asking to
allocate storage for 0 references.
Found by: avatar
MFC after: 1 week
or a bssid+ssid. This is needed for later versions of wpa_supplicant
and for forthcoming addons to wpa_supplicant.
Note this is an api change and applications must be rebuilt.
using the layer2, mac and mac-type keywords.
This is one of the last features that bridge.c has over if_bridge and gets us
very close to a full functional replacement.
Approved by: mlaier (mentor)
points to convert _sema() to _sem() for consistency purposes with
respect to the other semaphore-related entry points:
mac_init_sysv_sema() -> mac_init_sysv_sem()
mac_destroy_sysv_sem() -> mac_destroy_sysv_sem()
mac_create_sysv_sema() -> mac_create_sysv_sem()
mac_cleanup_sysv_sema() -> mac_cleanup_sysv_sem()
Congruent changes are made to the policy interface to support this.
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR, SPARTA
as this happens via thread_switchout(). I don't particularly like the
structure of the code here. We twice call out to thread code when
a thread is voluntarily switching. Once to thread_switchout() and once
to slot_fill(), while sched_4BSD does even more work which is redundant
to select another thread to use our remaining slice. This should be
simplified in the future, but for now I'm only going to fix the bug not
the bad design.
coded at 512 (BPF_MAXINSNS) to being tunable. This is useful for users
who wish to use complex or large bpf programs when filtering traffic.
For now we will default it to BPF_MAXINSNS. I have tested bpf programs
with well over 21,000 instructions without any problems.
Discussed with: phk
the tail (in tcp_sack_option()). The bug was caused by incorrect
accounting of the retransmitted bytes in the sackhint.
Reported by: Kris Kennaway.
Submitted by: Noritoshi Demizu.
o purge ath_initkeytable; it's not needed
o add multicast key search support for supporting multiple group keys
(disabled for now; requires updated hal)
o create keycache entry for stations using open auth so they get h/w
antenna management support
o add keycache -> node mapping table; eliminates mac-based lookup in
the net80211 layer
spanning tree support.
Based on Jason Wright's bridge driver from OpenBSD, and modified by Jason R.
Thorpe in NetBSD.
Reviewed by: mlaier, bms, green
Silence from: -net
Approved by: mlaier (mentor)
Obtained from: NetBSD
mutex instead of a MTX_DEF one in order to defer preemption while
reading the date and time registers. If we don't manage to read them
within the time slot where we are guaranteed that no updates occur we
might actually read them during an update in which case the output is
undefined.
and visible effect of the bug what that autoboot would boot a kernel
after only a couple of seconds had passed instead of waiting the
full 10 seconds it's supposed to wait by default.
Add my copyright notice, since one was missing and I reimplemented
the one and only function in this file.
MFC after: 1 week
times which was added in the last revision with what should be a proper
solution as long as keyboards that were pluggged in after the kernel
has fully booted aren't supported. I.e. when sunkbd_configure() is
called for the high-level console probe make sure that the keyboard is
both successfully configured (i.e. also probed) and attached. The band-
aid left the possibility to attach the keyboard device to the high-level
console without attaching the keyboard device itself when the keyboard
is plugged in after uart(4) attached but before syscons(4) does.
share their IRQ lines with the i8042. Any IRQ activity (typically during
attach) on the NS16550 used to connect the keyboard when actually the
PS/2 keyboard is selected in OFW causes interaction with the OBP i8042
driver resulting in a hang (and vice versa). As RS232 keyboards and mice
obviously aren't meant to be used in parallel with PS/2 ones on these
boards don't attach to these NS16550 in case the RS232 keyboard isn't
selected in order to prevent such hangs.
Ok'ed by: marcel
UARTs used to connect keyboards and not also PS/2 keyboards and only
return their package handle in case the keyboard is the preferred one
according to the OFW but otherwise still regardless of whether the
keyboard is used for stdin or not. This is simply achieved by looking
at the 'keyboard' alias and returning the corresponding package handle
in case it refers to a SCC/UART. This is change is done in order to
give the keyboard which the OFW or the user selected in OFW on boards
that support additional types of keyboards besides the RS232 ones also
preference in FreeBSD. It will be also used to determine on Sun AXi and
Sun AXmp boards whether a PS/2 or a RS232 is to be used as these are
sort of mutual exclusive there (see upcoming commit to uart_bus_ebus.c).
Note that Tatung AXi boards have the same issue but the former code
happened to already give the PS/2 keyboard preference by not identifying
the respective UART as keyboard system device there because the PS/2
keyboard node precedes the keyboard UART one in the OFW device tree of
these boards (which isn't the case for the Sun AXi).
Ok'ed by: marcel
the number of registered adapters instead of determining again whether
stdout is a supported card (and which might have failed to attach and
register).
- Fix a bug in the handling of the FBIOSCURSOR IOCTL; the code was meant
to return ENODEV for all invocations expect when used to disable the
cursor and not just when used for enabling the cursor.
- In case the adapter is the OFW stdout move its OFW cursor to the start
of the last line on halt so OFW output doesn't get intermixed with what
FreeBSD left on the screen.
- Drop variable names in the prototypes of some functions in order to
match the style of majority of the prototypes in this file.
the number of registered adapters instead of determining again whether
stdout is a supported card (and which might have failed to attach and
register).
- Drop creator_set_mode() and move the relevant parts to creator_fill_rect()
and creator_putc() respectively. This is a bit cleaner than having to
make sure that creator_set_mode() was called before creator_fill_rect()
or creator_putc() are used and matches better what Xorg does.
- Fix a bug in the handling of the FBIOSCURSOR IOCTL; the code was meant
to return ENODEV for all invocations expect when used to disable the
cursor and not just when used for enabling the cursor.
- In case the adapter is the OFW stdout move its OFW cursor to the start
of the last line on halt so OFW output doesn't get intermixed with what
FreeBSD left on the screen. With hindsight this is what the faking of a
hardware cursor which was removed in the last revision really was about,
i.e. to keep the OFW updated about the current cursor position. The new
approach however is simpler while producing the same result and doesn't
cause the first letter of the OFW output to be turned into a blank and
a newline.
- Add variable names to the prototypes of creator_cursor_*() which were
added in the last revision and list them alphabetically in order to match
the style of this file.
for the SYS_RES_IOPORT -> SYS_RES_MEMORY transition again. While it
was helpful to not need to change all of the affected drivers in a
single pass together with ebus(4) we probably shouldn't start into
6.0 with such a hack.
This requires some of the modules of affected drivers to be rebuilt,
namely: auxio(4), snd_audiocs(4) and puc(4).
resources in ebus.c rev. 1.22 and collapse the resource allocation for
both the EBus and SBus variants into auxio_attach_common().
- For the EBus variant make sure that the resource for controlling the
LED is actually available; (in theory) we could have ended up using
the resource without allocating it.
aio_write(2) completion through kevent(2). This method does not work on
64-bit architectures. It was deprecated in FreeBSD 4.4. See revisions
1.87 and 1.70.2.7.
Change aio_physwakeup() to call psignal(9) directly rather than indirectly
through a timeout(9). Discussed with: bde
Correct a bug introduced in revision 1.65 that could result in premature
delivery of a signal if an lio_listio(2) consisted of a mixture of
direct/raw and queued I/O operations. Observed by: tegge
Eliminate a field from struct kaioinfo that is now unused.
Reviewed by: tegge
policy. It may be used to provide more detailed classification of
traffic without actually having to decide its fate at the time of
classification.
MFC after: 1 week
slot for us. Previously, we would take two slots on every preempt, and
setrunqueue() would fix it up for us in the non threaded case. The
threaded case was simply broken.
- Clean up flags, prototypes, comments.
- Walks the scoreboard backwards from the tail to reduce the number of
comparisons for each sack option received.
- Introduce functions to add/remove sack scoreboard elements, making
the code more readable.
Submitted by: Noritoshi Demizu
Reviewed by: Raja Mukerji, Mohan Srinivasan
the driver has unholy private knowledge of its great-*cgrandchildren.
The ACPI allocation routine lacked such knowledge when it tried to do
a default allocation for all descendants, rather than just its
immeidate children, so would access grandchild's ivar in an unsafe
way. This could lead to a panic when devices were present which had
no addresses setup by the BIOS, but which were later allocated in a
lazy manner via pci_alloc_map. As such, only do the default
allocation adjustments for immediate children. The manner that
acpi_sysres_find accesses the resource list, used later in
acpi_alloc_resource, is safe and proper so no additional test is
needed there.
This fixes a panic when probing an disabled ata controller on some
newer intel blades.
Reported by: dwhite
against 0 in pci_alloc_map, just like we do in pci_add_map. Also,
make sure that we restore the value to the BAR that was there before
if the bar is 0. Chances are that it was 0 before the write too and
that the restoration is a nop, but better safe than sorry.
Notice by: dwhite
This is the last requirement before we can retire ip6fw.
Reviewed by: dwhite, brooks(earlier version)
Submitted by: dwhite (manpage)
Silence from: -ipfw
if_ioctl routine. This should fix a number of code paths through
soo_ioctl() that could call into Giant-locked network drivers without
first acquiring Giant.
Assert tcbinfo lock in tcp_close() due to its call to in{,6}_detach()
Assert tcbinfo lock in tcp_drop_syn_sent() due to its call to tcp_drop()
MFC after: 7 days
we are processing has a base address of zero. Note that this will only
change behavior for devices where all the BARs of a given type have a base
address of 0 since we will enable the appropriate access when we encounter
the first BAR with a base that is not 0. Specifically, this allows certain
Toshiba laptops to no longer require 'hw.pci.enable_io_modes=0' to avoid
hangs during boot.
PR: kern/20040
PR: i386/63776 (possibly)
PR: i386/68900 (possibly)
PR: i386/74532 (possibly)
MFC after: 1 week
file's access time should be updated when it gets executed. A while
ago the mechanism used to exec was changed to use a more mmap based
mechanism and this behavior was broken as a side-effect of that.
A new vnode flag is added that gets set when the file gets executed,
and the VOP_SETATTR() vnode operation gets called. The underlying
filesystem is expected to handle it based on its own semantics, some
filesystems don't support access time at all. Those that do should
handle it in a way that does not block, does not generate I/O if possible,
etc. In particular vn_start_write() has not been called. The UFS code
handles it the same way as it would normally handle the access time if
a file was read - the IN_ACCESS flag gets set in the inode but no other
action happens at this point. The actual time update will happen later
during a sync (which handles all the necessary locking).
Got me into this: cperciva
Discussed with: a lot with bde, a little with kan
Showed patches to: phk, jeffr, standards@, arch@
Minor discussion on: arch@
audit event identifier associated with each system call, which will
be stored by makesyscalls.sh in the sy_auevent field of struct sysent.
For now, default the audit identifier on all system calls to AUE_NULL,
but in the near future, other BSM event identifiers will be used. The
mapping of system calls to event identifiers is many:one due to
multiple system calls that map to the same end functionality across
compatibility wrappers, ABI wrappers, etc.
Submitted by: wsalamon
Obtained from: TrustedBSD Project
are subtle differences in the read and write completion path. Instead,
grab an extra write ref so the write path can drop it when we recursively
call bufdone(). I believe this may be the source of the wrong bufobj
panics.
Reported by: pho, kkenn
structure, sysent. This field will hold the default audit event
to generate when the system call is entered. Currently, it will
default to 0 due to allocation in bss.
Submitted by: wsalamon
Obtained from: TrustedBSD Project
a nested include of param.h is required so that MAXCPU is visible to all
consumers of sys/malloc.h. In an earlier version of the patch, the
malloc_type_internal structure was only conditionally visible.
Pointed out by: delphij
in order to modify the system call table to include event identifiers.
The full audit.h will be merged at a later date.
Obtained from: TrustedBSD Project
allocators: a set of power-of-two UMA zones for small allocations, and the
VM page allocator for large allocations. In order to maintain unified
statistics for specific malloc types, kernel malloc maintains a separate
per-type statistics pool, which can be monitored using vmstat -m. Prior
to this commit, each pool of per-type statistics was protected using a
per-type mutex associated with the malloc type.
This change modifies kernel malloc to maintain per-CPU statistics pools
for each malloc type, and protects writing those statistics using critical
sections. It also moves to unsynchronized reads of per-CPU statistics
when generating coalesced statistics. To do this, several changes are
implemented:
- In the previous world order, the statistics memory was allocated by
the owner of the malloc type structure, allocated statically using
MALLOC_DEFINE(). This embedded the definition of the malloc_type
structure into all kernel modules. Move to a model in which a pointer
within struct malloc_type points at a UMA-allocated
malloc_type_internal data structure owned and maintained by
kern_malloc.c, and not part of the exported ABI/API to the rest of
the kernel. For the purposes of easing a possible MFC, re-use an
existing pointer in 'struct malloc_type', and maintain the current
malloc_type structure size, as well as layout with respect to the
fields reused outside of the malloc subsystem (such as ks_shortdesc).
There are several unused fields as a result of no longer requiring
the mutex in malloc_type.
- Struct malloc_type_internal contains an array of malloc_type_stats,
of size MAXCPU. The structure defined above avoids hard-coding a
kernel compile-time value of MAXCPU into kernel modules that interact
with malloc.
- When accessing per-cpu statistics for a malloc type, surround read -
modify - update requests with critical_enter()/critical_exit() in
order to avoid races during write. The per-CPU fields are written
only from the CPU that owns them.
- Per-CPU stats now maintained "allocated" and "freed" counters for
number of allocations/frees and bytes allocated/freed, since there is
no longer a coherent global notion of the totals. When coalescing
malloc stats, accept a slight race between reading stats across CPUs,
and avoid showing the user a negative allocation count for the type
in the event of a race. The global high watermark is no longer
maintained for a malloc type, as there is no global notion of the
number of allocations.
- While tearing up the sysctl() path, also switch to using sbufs. The
current "export as text" sysctl format is retained with the same
syntax. We may want to change this in the future to export more
per-CPU information, such as how allocations and frees are balanced
across CPUs.
This change results in a substantial speedup of kernel malloc and free
paths on SMP, as critical sections (where usable) out-perform mutexes
due to avoiding atomic/bus-locked operations. There is also a minor
improvement on UP due to the slightly lower cost of critical sections
there. The cost of the change to this approach is the loss of a
continuous notion of total allocations that can be exploited to track
per-type high watermarks, as well as increased complexity when
monitoring statistics.
Due to carefully avoiding changing the ABI, as well as hardening the ABI
against future changes, it is not necessary to recompile kernel modules
for this change. However, MFC'ing this change to RELENG_5 will require
also MFC'ing optimizations for soft critical sections, which may modify
exposed kernel ABIs. The internal malloc API is changed, and
modifications to vmstat in order to restore "vmstat -m" on core dumps will
follow shortly.
Several improvements from: bde
Statistics approach discussed with: ups
Tested by: scottl, others
spaces were 1 too large. This resulted in the rman list not being
sorted correctly, and USB ports not being discovered on older
TiBooks.
Detective work by: Andreas Tobler <toa at pop dot agri dot ch>
24, and 32 bit modes. To use that, syscons(4) must be built with
the compile time option 'options SC_PIXEL_MODE', and VESA support (a.k.a.
vesa.ko) must be either loaded, or be compiled into the kernel.
Do not return EINVAL when the mouse state is changed to what it already is,
which seems to cause problems when you have two mice attached, and
applications are not likely obtain useful information through the EINVAL
caused by showing the mouse pointer twice.
Teach vidcontrol(8) about mode names like MODE_<NUMBER>, where <NUMBER> is
the video mode number from the vidcontrol -i mode output. Also, revert the
video mode if something fails.
Obtained from: DragonFlyBSD
Discussed at: current@ with patch attached [1]
PR: kern/71142 [2]
Submitted by: Xuefeng DENG <dsnofe at msn com> [1],
Cyrille Lefevre <cyrille dot lefevre at laposte dot net> [2]
Only panic is fixed, module will be still listed in kldstat(8) output.
Not sure what is correct fix, because adding unloading code in case of
failure to linker_init_kernel_modules() doesn't work.
of having the kernel parse that line and add an entry to the argument list for
each 'separate word' it finds, have it add only one entry which holds all
the words found on that line. The old behavior is useful in some situations,
but it does not match the way any other operating system will parse that line.
This has been discussed in the thread "Bug in #! processing - One More Time"
on the freebsd-arch mailing list (starting back on Feb 24, 2005). The first
few messages in that thread provide the background in much detail.
PR: 16393
Reviewed by: freebsd-arch
sysctl path. While this code is close to MPSAFE, it may require some
additional locking. Mark ntp_gettime1() as GIANT_REQUIRED for now.
Suggested by: phk
on the their simply wrapping MPSAFE implementations of existing MPSAFE
system calls:
getfsstat()
lseek()
stat()
lstat()
truncate()
ftruncate()
statfs()
fstatfs()
Note that ogetdirentries() is not marked MPSAFE because it does not share
the MPSAFE implementation used for getdirentries(), and requires separate
locking to be implemented.
Note: len gets intialized to 0 for sap == NULL case only to
make compiler on amd64 happy. This has nothing todo with the
former uninitialized use of len in sap != NULL case.
Reviewed by: glebius
Approved by: pjd (mentor)
- Teach the i386 and pc98 loaders to honor multiple console requests from
their respective boot2 binaries so that the same console(s) are used in
both boot2 and the loader.
- Since the kernel doesn't support multiple consoles, whichever console is
listed first is treated as the "primary" console and is passed to the
kernel in the boot_howto flags.
PR: kern/66425
Submitted by: Gavin Atkinson gavin at ury dot york dot ac dot uk
MFC after: 1 week
that protects socket and receive socket buffer state, and a second
mutex to protect send socket buffer state. In some places, the
mutex shared between the socket and receive socket buffer will be
acquired twice, once by each layer, resulting in some
inconsistency, but providing the abstraction benefit of being able
to more easily separate the two mutexes in the future if desired.
When transitioning a socket to the SS_ISDISCONNECTING or
SS_ISDISCONNECTED states, grab the socket/receive socket buffer lock
once rather than grabbing it as the socket lock, modifying socket
state, then grabbing a second time as the receive lock in order to
modify the socket buffer state to indicate no further data can be
read. This change is believed to close a race between the change in
socket state and the change in socket buffer state, which for a
remotely initiated close on a UNIX domain socket, resulted in
soreceive() returning ENOTCONN rather than an EOF condition.
A similar race still exists in the case of send, however, and is
harder to fix as the socket and send socket buffer mutexes are not
the same, and we would like to avoid holding combinations of socket
mutexes over sb_upcall until we've finished clarifying the locking
protocol for upcalls.
This change has the side affect of reducing the number of mutex
operations to initiate disconnect or perform disconnect on a
socket by two.
PR: 78824
Rerported by: Marc Olzheim <marcolz@stack.nl>
MFC after: 2 weeks
the ipx_net data structure. Doing so introduced a stronger alignment
requirement for the address structure, which in turn propagated into
other dependent data structures, which turns out not to be suported by
the available IPX source code. As a result, a number of user space
applications, such as IPX routing components, failed to operate
correctly.
RELENG_5_3 candidate?
PRs: 74059, 80266
Pointy hat to: bms
Fix by: bde
Tested by: Keith White <Keith dot White at site dot uottawa dot ca>
MFC after: 1 week
Suffering: great
- Changed from using explicit devices id to using descriptive labels.
- Added support for 82573 and 82546 Quad adapters.
- Corrected support for 82547EI and 82541ER (mac_type was not assigned)
- Removed #ifdef DBG_STATS and extraneous code.
if_em_hw.c/if_em_hw.h
- Added support for 82573 and 82546 Quad adapters.
- Brought forward Intel's most current mac and phy changes.
so if_tap doesn't need to rely on locally-rolled code to do same.
The observable symptom of if_tap's bzero'ing the address details
was a crash in "ifconfig tap0" after an if_tap device was closed.
Reported By: Matti Saarinen (mjsaarin at cc dot helsinki dot fi)
such, the segments pointer in the DMA tag will always be NULL. In
bus_dmamap_load(), temporarily point the segments pointer in the
DMA tag to a local variable so that we don't dereference a NULL
pointer. Reset the segments pointer to NULL after calling the
callback function with it.
PR: alpha/30486
MFC after: 1 week
for the dmamap by using static dmamaps.
- Don't do anything for BUS_DMASYNC_PREREAD and BUS_DMASYNC_POSTWRITE in
bus_dmamap_sync(), it's not needed anymore.
to change the DACR when switching to a kernel thread, thus making
userland thread => kernel thread => same userland thread switch cheaper by
totally avoiding data cache and TLB invalidation.
be assumed that modules are contiguous in memory (they're not)
so don't blindly __syncicache start/end. In fact, don't bother
syncing the icache for modules since the kernel will do it after
fixing up relocations.
This fixes the trap when loading modules at boot time.
Reported by: orlando at break dot net
exist on other architectures yet.
- While I'm here, fix the formatting of the options line. The keyword
"options" should be followed by a space and then a tab, not 2 tabs.
- Correct idxp pointer to point the properly address of the
each array of the kiconv character conversion tables,
so that character conversion work properly when file
systems are mounted with kiconv options.
- The definition of ICONV_CSMAXDATALEN was also bogus
because it was defined as if all machines were 32bit
computers.
Tested on: amd64
MFC after: 1 month
that if we sort the incoming SACK blocks, we can update the scoreboard
in one pass of the scoreboard. The added overhead of sorting upto 4
sack blocks is much lower than traversing (potentially) large
scoreboards multiple times. The code was updating the scoreboard with
multiple passes over it (once for each sack option). The rewrite fixes
that, reducing the complexity of the main loop from O(n^2) to O(n).
Submitted by: Mohan Srinivasan, Noritoshi Demizu.
Reviewed by: Raja Mukerji.
program RXMAC to discard frames with SA field matching the stations's
MAC address. Experimentation shows that HME receives its own frames
when it operates at 10Mbps half-duplex. With this change HME runs at
10Mbps half-duplx should work with IPv6.
(No more "DAD detected duplicate IPv6 address".)
Reported by: jacques brierre <jbrierre AT bellsouth DOT net>
Reviewed by: marius
so do not duplicate the code in cvtstatfs().
Note, that we now need to clear fsid in freebsd4_getfsstat().
This moves all security related checks from functions like cvtstatfs()
and will allow to add more security related stuff (like statfs(2), etc.
protection for jails) a bit easier.
and on resume (reported to fix issues with ACPI)
o Add monitor mode support
o Add WPA (802.11i) support (not tested extensively though!)
o Add a device specific sysctl to control the tx antenna (default to
antenna diversity)
o Fix sensitivity setting
o Fix setting of the capinfo field when associating
o Temporarly disable 802.11a channels scanning that was causing firmware
panics with 2915ABG adapters until I find a better fix. This breaks
802.11a support.
o Temporarly switch back to software WEP until I implement hardware
encryption for AES and TKIP too.
Approved by: silby (mentor)
by default, yet.
- Replace "graphics cards" with "framebuffers" in the description
of creator(4) in order to make it uniform with the description of
machfb(4) and the latter occur both on-board and as add-on cards.
use with syscons(4) on sparc64. It's based on the respective NetBSD
driver with some additional info (initialisation/hardware cursor)
obtained from the Xorg 'ati' driver and some ideas taken from
creator(4). ATI Mach64 chips ("ATI Rage") are quite common as low-
end graphics chips in PCI-based sun4u machines and are used on-board
in e.g. Blade 100 and a couple of OEM products. Most if not all of
the Sun PGX add-on cards family (descriptions of the PGX32 are
conflicting but most say it's a Rage Pro) are also based on these
chips. Depending on the version of the OBP Mach64 cards destined for
use in i386 machines also work in sun4u machines.
The driver uses pixel mode with hardware acceleration as far as
syscons(4) currently permits on sparc64 so text mode is already
quite fast. The hardware cursor is used for the mouse pointer;
for one because this is a "restriction" induced in syscons(4) on
sparc64 by creator(4) and also because of issues with mapping
the aperture when used as a low-level early during boot. Due to
insufficiencies in the available documentation I didn't manage to
get mode switch work properly (sync problems), yet. So for now
this driver relies on the OBP having initialised a mode (as does
creator(4)). On all of the tested machines is even true when using
a serial console (and also not only when the OBP switched to a
serial console because no keyboard is present). In general however
the states the Mach64 chips are left in by the OBP vary a lot
depending on the version of the OBP. This e.g. includes the aperture
not being mapped in even when used as the console and the OBP just
barfing when asked to map it. The latter is also the reason for the
existence of this native driver in FreeBSD rather than taking an
OFW frambuffer approach.
Xorg is also happy to talk to these chips by mmap'ing them through
this driver. For some hardware configs like on the Blade 100 a fix
for the Xorg sparc64 MD bus code is however needed (added in version
6.8.2_2 of the xorg-server port).
The video driver font loading and saving methods are not implemented,
yet, as syscons(4) needs more work in that area to work viable on
sparc64.
With minor modifications machfb(4) would most likely also work on
powerpc, when #ifdef'ing the OFW and possibly implementing mode
setting probably also on the other archs. The latter is however
not very practible at the moment as it would conflict with vga(4).
Tested/developed with: Rage II+ add-on card on AX1105 and AXi board,
AXe board (on-board Rage Pro)
Additional testing by: marcel (Ultra 5 w/ on-board Rage Pro),
scottl (Naturetech GENIALstation 777S w/ on-board
Rage Mobility M1),
Michiel Boland and Ilmar S. Habibulin (Blade 100
w/ on-board Rage XL)
- Use register macros instead of magic values in the code. [1]
- Check the return values of OF_getprop() and other stuff that actually
can fail.
- Let the unimplemented video driver methods return ENODEV rather
than 0 so other code isn't tricked into thinking a certain operation
was successfull. In case of e.g. the video driver creator_ioctl()
this caused vidcontrol(1) to return random garbage information.
Remove the TODO macros in the unimplemented video driver methods
which did a printf("%s: unimplemented\n", __func__). Under certain
circumstances these managed to invoke a printf() when a low-level
console device wasn't attached, yet, causing a Fast Data Access MMU
Miss. These macros were only really usefull for development anyway.
- Set the struct video_adapter and struct video_info va_flags and
vi_flags etc. as appropriate.
- In creator_configure() don't rely on hitting the node which is the
chosen console device first when searching the OFW tree for adapters
compatible with this driver. Instead just check whether the chosen
console device is a viable target for this driver. Targets that are
not the console (including additional cards in multi-head configs)
will be attached through creator_upa_attach(). I think this how the
code in creator_configure() was actually meant to work.
Honour the VIO_PROBE_ONLY flag and don't initialise and register the
console device twice when creator_configure() is called a second time
during sc_probe_unit().
Let creator_configure() return the number of the found adapters,
i.e. 1 in case probing succeeds, as it's expected. The return values
of video adapter configure functions however currently aren't checked
so this doesn't make a difference at the moment.
- In creator_upa_attach() don't rely on probing and attaching the
adapter which is the console first, in case there are multiple
adpaters and one of them is the console this could lead into using
the video adapter unit 0 twice.
- Make the check for DACs with inverted cursor control a bit more
precise and actually honour that information when turning the cursor
on or off. Add a helper function creator_cursor_enable() for this
in order to keep code duplication low. [1]
- Don't bother with faking a hardware cursor in case a device is the
console. Apparently this was meant to start kernel output right after
where the firmware left. In general this isn't worth the fuzz and
also had no real effect as creator_set_mode() did clear the screen
in any case, not just in case a device was not the console.
- Implement creator_fill_rect() and use it to actually blank the
display in creator_blank_display() when the mode is V_DISPLAY_BLANK,
moving blanking the display out of creator_set_mode(). Use it also
to implement creator_set_border() so the border can be re-drawn
when switching to a VTY from X, exiting X, etc. (which leaves us
with a black border most of the time).
- Implement the video driver creator_ioctl(), moving the implementation
of the IOCTL interface from the fbN CDEV version of creator_ioctl()
into the video driver version and use the latter to implement the
former. Use fb_commonioctl() to handle most of the FBIO IOCTLs.
This gives programs like vidcontrol(1) which use the video driver
creator_ioctl() a chance of working.
Implement turning off the cursor via the FBIOSCURSOR IOCTL, which
Xorg uses to in order to inform the OS that it's taking over the
cursor. In creator_putm() check whether the cursor is enabled and
(re-)install it if necessary, moving installing the cursor out of
creator_init() and into a helper function creator_cursor_install().
This fixes the missing mouse pointer when switching to a VTY from X,
exiting X, etc.
- Some clean-up (remove unused/useless code, etc.).
o sparc64/creator/creator_upa.c / sparc64/sparc64/sc_machdep.c:
- Attach syscons(4) as an own pseudo-device on the nexus rather than
directly in creator_upa_attach(), similiar to attaching syscons(4)
as a pseudo-device on isa(4) on other archs. This makes it a whole
lot easier to do the right thing in multi-head configs, especially
with different types of graphics adapters. [2]
- Set SC_AUTODETECT_KBD by default so USB keyboards work out of the
box. [2]
Based on/obtained from: Xorg 'ffb' driver [1]
Based on/obtained from: FreeBSD/powerpc [2]
uses white) so base the color of the border on SC_NORM_ATTR rather
than hardcoding BG_BLACK.
- Use SC_DRIVER_NAME rather than hardcoding 'sc' in message strings
(see also sys/dev/syscons/syscons.h rev. 1.82).
syscons(4) and its pseudo-devices don't get confused (including by
other device drivers) with the system controller devices which are
also termed 'sc' in the OFW tree (and which we probably want to
interface with hwpmc(4) one day).
VTB_FRAMEBUFFER specific code. On sparc64 we don't use a buffer of
type VTB_FRAMEBUFFER (see syscons.c) and excluding the respective
code here allows to compile syscons(4) without isa(4).
Requested by: joerg, marcel, yongari
a band-aid allowing to call this function savely multiple times, e.g.
during sckbdprobe() and sc_probe_unit(). Otherwise calling it a second
time results in a non-working keyboard. This needs a lot of more work
to actually do the right thing and work like expected.
- Let sunkbd_configure() return the number of the found keyboards, i.e.
1 in case probing succeeds, as it's expected. The return values of the
keyboard configure functions however currently aren't checked so this
doesn't make a difference at the moment.
- Use FBSDID.
Use bus_generic_probe() and add a bus_add_child() interface method to
allow device drivers to use the identify method to add themselves if
need be (e.g. syscons(4)).
- Use FBSDID.
consist of the expected number of address and size cells (we can't use
dynamic arrays here because at the point in the boot process when this
code is used malloc() doesn't work, yet). This fixes a Fast Data Access
MMU Miss when uart(4) (erroneously) calls OF_decode_addr() to decode
the address of PS/2 keyboards. PS/2 keyboards use a different and also
undocumented scheme at the first parent node than mapping at 'ranges'
properties. It's however not worth implementing that other scheme and
actually also fits atkbdc(4) better to just start at the first parent
node of PS/2 keyboards which is the 8042 controller (I have atkbdc(4)
working that way).
- Use FBSDID.
MFC after: 1 month
which doesn't assume a hardware cursor on __sparc64__ rather than on
DEV_CREATOR. If we want to include more than one framebuffer driver in
e.g. the GENERIC kernel all drivers have to work the same way. Now that
DEV_CREATOR is no longer used remove it from options.sparc64.
address. One was alltraps_with_regs_pushed, the other was calltrap.
When the stack tracer walks up, it looks for magic symbol names to
determine how to parse non-standard stack frames, such as a trapframe.
It was looking for "calltrap". Which of the two symbols you got depended
on things like Phase of moon, etc. If you were unlucky, you got a
garbage stack trace for things like 'debug.trace_on_panic', which would
completely hide the actual source of the problem.
of MCOUNT to have a C version and an asm version with the same name and
not have LOCORE ifdefs to distinguish them. <machine/profile.h> provides
a C version and <machine/asmacros.h> provides an assembler version.
Discussed with: bde
controllers of Sun PCIO-2 chips which are used onboard in most of
the newer PCI-based sun4u machines (cosmetic change as they were also
already probed as generic FWOHCI without this). As with gem(4), hme(4)
and ohci(4) detect whether their intpin register is valid and correct
it if necessary, i.e. set the respective IVAR to the right value for
allocating the IRQ resource, as some of them come up having it set
to 0 (in fact in all machines I'm currently aware of the FireWire
part being enabled). This fixes attaching affected controllers.
Apporved by: simokawa
Tested by: Michiel Boland <michiel@boland.org>
MFC after: 1 month
in case of IP fast forwarding. Enqueue a taskqueue(9) task instead of
calling xl_rxeof() directly.
Reported & tested by: Slava Alpatov
Reviewed by: wpaul
MFC after: 1 week
pointer. If kernel malloc(0) returns a valid pointer, it needs to be
freed. If it returns NULL, it's ok to free this also.
Submitted by: pjd
Reviewed by: imp, dfr
Obtained from: Coverity Prevent
We can't call KeFlushQueuedDpcs() during bootstrap (cold == 1), since
the flush operation sleeps to wait for completion, and we can't sleep
here (clowns will eat us).
On an i386 SMP system, if we're loaded/probed/attached during bootstrap,
smp_rendezvous() won't run us anywhere except CPU 0 (since the other CPUs
aren't launched until later), which means we won't be able to set up
the GDTs anywhere except CPU 0. To deal with this case, ctxsw_utow()
now checks to see if the TID for the current processor has been properly
initialized and sets up the GTD for the current CPU if not.
Lastly, in if_ndis.c:ndis_shutdown(), do an ndis_stop() to insure we
really halt the NIC and stop interrupts from happening.
Note that loading a driver during bootstrap is, unfortunately, kind of
a hit or miss sort of proposition. In Windows, the expectation is that
by the time a given driver's MiniportInitialize() method is called,
the system is already in 'multiuser' state, i.e. it's up and running
enough to support all the stuff specified in the NDIS API, which includes
the underlying OS-supplied facilities it implicitly depends on, such as
having all CPUs running, having the DPC queues initialized, WorkItem
threads running, etc. But in UNIX, a lot of that stuff won't work during
bootstrap. This causes a problem since we need to call MiniportInitialize()
at least once during ndis_attach() in order to find out what kind of NIC
we have and learn its station address.
What this means is that some cards just plain won't work right if
you try to pre-load the driver along with the kernel: they'll only be
probed/attach correctly if the driver is kldloaded _after_ the system
has reached multiuser. I can't really think of a way around this that
would still preserve the ability to use an NDIS device for diskless
booting.
prevent anything from making calls to the NIC while it's being shut down.
This is yet another attempt to stop things like mdnsd from trying to
poke at the card while it's not properly initialized and panicking
the system.
Also, remove unneeded debug message from if_ndis.c.
user to interrupt autoboot process at all. Currently, even when
`autoboot_delay' is set to 0, loader(8) still allows autoboot process to be
interrupted by pressing any key on the console when the loader reads kernel
and modules from the disk. In some cases (i.e. untrusted environment) such
behaviour is highly indesirable and user should not be allowed to interfere
with the autoboot process at all.
Sponsored by: PBXpress Inc.
MFC after: 3 days
are used onboard in most of the newer PCI-based sun4u machines
(cosmetic change as they were also already probed as generic OHCI
without this). Detect whether their intpin register is valid and
correct it if necessary, i.e. set the respective IVAR to the right
value for allocating the IRQ resource, as some of them come up
having it set to 0 (mainly those used in Blade 100 and the first
one on AX1105 boards). This fixes attaching affected controllers.
Correcting the intpin value might be better off in the PCI code
via a quirk table but on the other hand gem(4) and hem(4) also
correct it themselves and at least for the USB controller part
the intpin register is truely hardwired to 0 and can't be changed.
This means that we would have to hook up the quirk information
in a lot of places in the PCI code (i.e. whenever the value of the
intpin register is read from or written to the pci_devinfo of the
respective device) in order to do it the right way.
MFC after: 1 month
- Add locking.
- Account for if the MC146818_NO_CENT_ADJUST flag is set we don't need
to check wheter year < POSIX_BASE_YEAR.
- Add some comments about mapping the day of week from the range the
generic clock code uses to the range the chip uses and which I meant
to add in the initial version.
- Minor clean-up, use __func__ instead of hardcoded function names in
error strings.
o in the rtc(4) front-end additionally:
- Don't leak resources in case mc146818_attach() fails.
- Account for ebus(4) defaulting to SYS_RES_MEMORY for the memory
resources since ebus.c rev. 1.22.
- Add support for storing the century in MK48TXX_WDAY_CB on MK48Txx with
extended registers when the MK48TXX_NO_CENT_ADJUST flag is set (and which
is termed somewhat confusing as it actually means don't manually adjust
the century in the driver).
- Add the MI part of interfacing the watchdog functionality of MK48Txx with
extended registers with watchdog(9). This is inspired by the SunOS/Solaris
drivers for the 'eeprom' devices also having watchdog support. I actually
expected this to work out of the box on Sun Exx00 machines with 'eeprom'
devices which have a 'watchdog-enable' property. On terminal count of the
the watchdog timer however only the MK48TXX_FLAGS_WDF bit rises but the
reset signal and the interrupt respectively (depending on whether the
MK48TXX_WDOG_WDS bit of the chip and the MK48TXX_WDOG_ENABLE_WDS flag
of the driver respectively is set) goes nowhere. Apparently passing the
reset signal on to the WDR line of the CPUs has to be enabled somewhere
else but we don't have documentation for the Exx00 specific controllers.
I decided to commit this nevertheless so it can be enabled in the eeprom(4)
front-end later in e.g. 6.0-STABLE without breaking the API. Besides the
Exx00 the watchdog part of the MK48Txx should also work on E250 and E450.
Possibly also without extra fiddling on these machines but I haven't
found someone willing to give it a try on such a machine so far.
- Use uintXX_t instead of u_intXX_t, use __func__ instead of hardcoded
function names in error strings.
eeprom_ebus_attach() and eeprom_sbus_attach() into eeprom_attach()
respectively. Since the introduction of the ofw_bus interface some
time ago and now that ebus(4) also uses SYS_RES_MEMORY for the
memory resources since ebus.c rev. 1.22 there is no longer a
need to have separate front-ends for ebus(4), fhc(4) and sbus(4).
- Fail gracefully instead of panicing when the model can't be
determined.
- Don't leak resources when mk48txx_attach() fails.
- Use FBSDID.
compatibility with ISA devices while in fact all known EBus devices
actually use memory space turned out to be not a good idea as so far
there is only the 'rtc' device known to show up either on an EBus or
ISA bus but not on any of the other busses used on sparc64. However
there are quite a couple of them that show up on either EBus, FireHose
or SBus. In order to save extra code in the respective drivers switch
ebus(4) to actually use SYS_RES_MEMORY for the memory resources of
its children. At least for transition still accept SYS_RES_IOPORT
and silently change it to SYS_RES_MEMORY. [1]
- In ebus_probe() use ofw_bus_get_name() instead of re-implementing it
via ofw_bus_get_node() and OF_getprop().
- Remove some unused variables.
- Use FBSDID.
Discussed with: tmm (some time ago)
the iteration variable as the RID when adding the respective resource
to the child via bus_set_resource(). In case a device has both I/O
and memory resources this generates gaps in the newbus resources of
the child, e.g. its first memory resource might end up as RID 1.
To solve this mimic resource_list_add_next() via resource_list_find()
and bus_set_resource(); we can't just use resource_list_add_next()
here as this would circumvent the limit checks in isa_set_resource()
of the common ISA code.
This however is more or less a theoretical problem so far as all known
ISA devices on sparc64 soley use I/O space.
- Just use bus_generic_rl_release_resource() for isa_release_resource()
instead of re-implementing the former.
- Improve some comments to better reflect reality, minor clean-up and
simplifications, return NULL instead of 0 were appropriate.
front-end and the LSI64854 and NCR53C9x code in case one of these
functions fails. Add detach functions to these parts and make esp(4)
detachable.
- Revert rev. 1.7 of esp_sbus.c, since rev. 1.34 of sbus.c the clockfreq
IVAR defaults to the per-child values.
- Merge ncr53c9x.c rev. 1.111 from NetBSD (partial):
On reset, clear state flags and the msgout queue.
In NetBSD code to notify the upper layer (i.e. CAM in FreeBSD) on reset
was also added with this revision. This is believed to be not necessary
in FreeBSD and was not merged.
This makes ncr53c9x.c to be in sync with NetBSD up to rev. 1.114.
- Conditionalize the LSI64854 support on sbus(4) only instead of sbus(4)
and esp(4) as it's also required for the 'dma', 'espdma' and 'ledma'
busses/devices as well as the 'SUNW,bpp' device (printer port) which
all hang off of sbus(4).
- Add a driver for the 'dma', 'espdma' and 'ledma' (pseudo-)busses/
devices. These busses and devices actually represent the LSI64854 DMA
engines for the ESP SCSI and LANCE Ethernet controllers found on the
SBus of Ultra 1 and SBus add-on cards. With 'espdma' and 'ledma' the
'esp' and 'le' devices hang off of the respective DMA bus instead of
directly from the SBus. The 'dma' devices are either also used in this
manner or on some add-on cards also as a companion device to an 'esp'
device which also hangs off directly from the SBus. With the latter
variant it's a bit tricky to glue the DMA engine to the core logic of
the respective 'esp' device. With rev. 1.35 of sbus.c we are however
guaranteed that such a 'dma' device is probed before the respective
'esp' device which simplifies things a lot. [1]
- In the esp(4) SBus front-end read the part-unique ID code of Fast-SCSI
capable chips the right way. This fixes erroneously detecting some
chips as FAS366 when in fact they are not. Add explicit checks for the
FAS100A, FAS216 and FAS236 variants instead treating all of these as
ESP200. That way we can correctly set the respective Fast-SCSI config
bits instead of driving them out of specs. This includes adding the
FAS100A and FAS236 variants to the NCR53C9x core code. We probably
still subsume some chip variants as ESP200 while in fact they are
another variant which however shouldn't really matter as this will
only happen when these chips are driven at 25MHz or less which implies
not being able to run Fast-SCSI. [3]
- Add a workaround to the NCR53C9x interrupt handler which ignores the
stray interrupt generated by FAS100A when doing path inquiry during
boot and which otherwiese would trigger a panic.
- Add support for the 'esp' devices hanging off of a 'dma' or 'espdma'
busses or which are companions of 'dma' devices to esp(4). In case of
the variants that hang off of a DMA device this is a bit hackish as
esp(4) then directly uses the softc of the respective parent to talk
to the DMA engine. It might make sense to add an interface for this
in order to implement this in a cleaner way however it's not yet clear
how the requirements for the LANCE Ethernet controllers are and the
hack works for now. [2]
This effectively adds support for the onboard SCSI controller in
Ultra 1 as well as most of the ESP-based SBus add-on cards to esp(4).
With this the code for supporting the Performance Technologies SBS430
SBus SCSI add-on cards is also largely in place the remaining bits
were however omitted as it's unclear from the NetBSD how to couple
the DMA engine and the core logic together for these cards.
Obtained from: OpenBSD [1]
Obtained from: NetBSD [2]
Clue from: BSD/OS [3]
Reviewed by: scottl (earlier version)
Tested with: FSBE/S add-on card (FAS236), SSHA add-on card (ESP100A),
Ultra 1 (onboard FAS100A), Ultra 2 (onboard FAS366)
device and which also applies to the children. This is very usefull for
drivers for the various subordinate busses so they don't need to fiddle
with the OFW node of their parent themselves. As SBus busses hang of the
nexus and we don't use the ofw_bus interface for nexus devices, yet, this
would also require special knowledge about this in the drivers for the
SBus children which these shouldn't need to have.
This includes switching to use an unshifted IGN in the sc_ign member of
the sbus(4) softc internally.
- For SBus child devices where there are variants that are actually split
split into two SBus devices (as opposed to the first half of the device
being a SBus device and the second half hanging off of the first one)
like 'auxio' and 'SUNW,fdtwo' or 'dma' and 'esp' probe the SBus device
which is a prerequisite to the driver attaching to the second one with
a lower order. This saves us from dealing with different probe orders
in the respective device drivers which generally is more hackish.
- Remove a stale comment about the 'specials' array above the attaching
of the child devices. This is a remnant of the NetBSD/sparc origin of
this code. There the 'specials' array is also used to probe certain
devices which are prerequisites to others first. Why NetBSD soley
relies on the devices having the expected order in the OFW tree on
sparc64 isn't clear to me, as far as I can tell OFW doesn't guaranteed
such things.
copying, rather than a page at a time. This was creating far
too many single-page mappings, and eventually OFW overflowed
some internal data structure and refused to map any more.
The new algorithm creates far less mappings and fixed a bug
where multiple mappings for the same page would be created.
'Twas known this was a problem, but only became urgent when the
install CD's mfs_root grew large enough to cause the overflow.
works again.
This driver uses NdisScheduleWorkItem(), and we have to take special steps
to insure that its workitems don't collide with any of the other workitems
used by the NDISulator. In particular, if one of the driver's work jobs
blocks, it can prevent NdisMAllocateSharedMemoryAsync() from completing
when expected.
The original hack to fix this was to have NdisMAllocateSharedMemoryAsync()
defer its work to the DPC queue instead of the general task queue. To
fix it now, I decided to add some additional workitem threads. (There's
supposed to be a pool of worker threads in Windows anyway.) Currently,
there are 4. There should be at least 2. One is reserved for the legacy
ExQueueWorkItem() API, while the others are used in round-robin by the
IoQueueWorkItem() API. NdisMAllocateSharedMemoryAsync() uses the latter
API while NdisScheduleWorkItem() uses the former, so the deadlock is
avoided.
Fixed NdisMRegisterDevice()/NdisMDeregisterDevice() to work a little
more sensibly with the new driver_object/device_object framework. It
doesn't really register a working user-mode interface, but the existing
code was completely wrong for the new framework.
Fixed a couple of bugs dealing with the cancellation of events and
DPCs. When cancelling an event that's still on the timer queue (i.e.
hasn't expired yet), reset dh_inserted in its dispatch header to FALSE.
Previously, it was left set to TRUE, which would make a cancelled
timer appear to have not been cancelled. Also, when removing a DPC
from a queue, reset its list pointers, otherwise a cancelled DPC
might mistakenly be treated as still pending.
Lastly, fix the behavior of ntoskrnl_wakeup() when dealing with
objects that have nobody waiting on them: sync event objects get
their signalled state reset to FALSE, but notification objects
should still be set to TRUE.
occur on a filesystem running with soft updates after a crash and
before a background fsck has been run. To prevent discrepancies
from arising in a background fsck that may already be running,
the directory is removed but its inode is not freed and is left
with the residual reference count. When encountered by the
background fsck it will be reclaimed.
post an event to the geom event queue that will take care of it,
letting outstanding bios finish, and closing the consumers.
Plus some cosmetic clean ups.
specified by caller.
- Change ng_send_item() interface - use 'flags' argument instead of
boolean 'queue'.
- Extend ng_send_fn(), ng_package_data() and ng_package_msg()
interface - add possibility to pass flags. Rename ng_send_fn() to
ng_send_fn1(). Create macro for ng_send_fn().
- Update all macros, that use ng_package_data() and ng_package_msg().
Reviewed by: julian
The Ralink RT2500 driver uses this API instead of NdisMIndicateReceivePacket().
Drivers use NdisMEthIndicateReceive() when they know they support
802.3 media and expect to hand their packets only protocols that want
to deal with that particular media type. With this API, the driver does
not manage its own NDIS_PACKET/NDIS_BUFFER structures. Instead, it
lets bound protocols have a peek at the data, and then they supply
an NDIS_PACKET/NDIS_BUFFER combo to the miniport driver, into which
it copies the packet data.
Drivers use NdisMIndicateReceivePacket() to allow their packets to
be read by any protocol, not just those bound to 802.3 media devices.
To make this work, we need an internal pool of NDIS_PACKETS for
receives. Currently, we check to see if the driver exports a
MiniportTransferData() method in its characteristics structure,
and only allocate the pool for drivers that have this method.
This should allow the RT2500 driver to work correctly, though I
still have to fix ndiscvt(8) to parse its .inf file properly.
Also, change kern_ndis.c:ndis_halt_nic() to reap timers before
acquiring NDIS_LOCK(), since the reaping process might entail sleeping
briefly (and we can't sleep with a lock held).
i8253reg.h, and add some defines to control a speaker.
- Move PPI related defines from i386/isa/spkr.c into ppireg.h and use them.
- Move IO_{PPI,TIMER} defines into ppireg.h and timerreg.h respectively.
- Use isa/isareg.h rather than <arch>/isa/isa.h.
Tested on: i386, pc98
real next hole to retransmit from the scoreboard, caused by a bug
which did not update the "nexthole" hint in one case in
tcp_sack_option().
Reported by: Daniel Eriksson
Submitted by: Mohan Srinivasan
or to compute the total retransmitted bytes in this sack recovery
episode, the scoreboard is traversed. While in sack recovery, this
traversal occurs on every call to tcp_output(), every dupack and
every partial ack. The scoreboard could potentially get quite large,
making this traversal expensive.
This change optimizes this by storing hints (for the next hole to
retransmit and the total retransmitted bytes in this sack recovery
episode) reducing the complexity to find these values from O(n) to
constant time.
The debug code that sanity checks the hints against the computed
value will be removed eventually.
Submitted by: Mohan Srinivasan, Noritoshi Demizu, Raja Mukerji.
The most significant changes are:
- Use UMA zone instead of own chunk of memory.
- Lock each hash entry separately.
- Expire items "actively" - interrupt method can expire flows
from hash slot, when it searches through it.
- Remove global tailqueue. Make callout thread search through
every hash slot.
- Export datagram is detached from private data and filled. If
it is incomplete, it is attached back. Another thread will
continue working with it.
Lesser, but also important speedups:
- Flows in hash slot are stored in tailqueue. Whenever a flow is
hit, it is moved to the begging, so it can be located quicker.
- When callout thread works with hash slot it bails out if
slot mutex is contested.
re-sent instead of timing out.
don't log an error message on reconnection, which is not an error.
remove unused nfs_mrep_before_tsleep.
Reviewed by: Mohan Srinivasan
Approved by: alfred
- Move MD files into <arch>/<arch>.
- Move bus dependent files into <arch>/<bus>.
Rename some files to more suitable names.
Repo-copied by: peter
Discussed with: imp
of swi. This allows us to use the taskqueue_thread_* functions instead of
rolling our own. It also avoids a double trip through the queue.
Submitted by: njl
Reviewed by: sam
the same time.
Fix if_ndis_pccard.c so that it sets sc->ndis_dobj and sc->ndis_regvals.
Correct IMPORT_SFUNC() macros for the READ_PORT_BUFFER_xxx() routines,
which take 3 arguments, not 2.
This fixes it so that the Windows driver for my Cisco Aironet 340 PCMCIA
card works again. (Yes, I know the an(4) driver supports this card natively,
but it's the only PCMCIA device I have with a Windows XP driver.)
The core console code checks this field when a console is added and
emits a warning if it's empty. In practice the warning is harmless for
uart(4), because the cn_name is filled in as soon as the device name is
known; which is when the device is enumerated.
To avoid the warning, to avoid possible complications caused by emitting
the warning without there (possibly) being a console selected yet and to
avoid complications when the UART isn't found during bus enumeration, we
just preset the cn_name field here to the name of the driver.
routines (_alldiv(), _allmul(), _alludiv(), _aullmul(), etc...)
that use the _stdcall calling convention.
These routines all take two arguments, but the arguments are 64 bits wide.
On the i386 this means they each consume two 32-bit slots on the stack.
Consequently, when we specify the argument count in the IMPORT_SFUNC()
macro, we have to lie and claim there are 4 arguments instead of two.
This will cause the resulting i386 assembly wrapper to push the right
number of longwords onto the stack.
This fixes a crash I discovered with the RealTek 8180 driver, which
uses these routines a lot during initialization.
1. Copy a NULL-terminated string into a fixed-length buffer, and
2. copyout that buffer to userland,
we really ought to
0. Zero the entire buffer
first.
Security: FreeBSD-SA-05:08.kmem
To check a directory's in-use bitmap bit by bit, we use
a pointer to an 8 bit wide unsigned value.
The index used to dereference this pointer is calculated
by shifting the bit index right 3 bits. Then we do a
logical AND with the bit# represented by the lower 3
bits of the bit index.
This is an idiomatic way of iterating through a bit map
with simple bitwise operations.
This commit fixes the bug that we only checked bits
3:0 of each 8 bit chunk, because we only used bits 1:0
of the bit index for the bit# in the current 8 bit value.
This resulted in files not being returned by getdirentries(2).
Change the type of the bit map pointer from `char *' to
`u_int8_t *'.
technically a no-op since uintmax_t is uint64_t on all currently
supported architectures, but we should use an explicit cast instead
of depending on this obscure coincidence.
- kernel module declarations and handler.
- macros to map malloc(3) calls to malloc(9) ones.
- malloc(9) declarations.
- call finishoff() from module handler MOD_UNLOAD case
instead of atexit(3).
- use panic(9) instead of abort(3)
- take time from time_second instead of gettimeofday(2)
- define INADDR_NONE
in as-yet uncommitted code for 32 bit binary compatability on 64 bit
kernels, some of the 32 bit registers come from the pcb. Moving the
initialization here means fill_regs32() etc are laid out the same.
Remove unused fields from ndis_miniport_block.
Fix a bug in KeFlushQueuedDpcs() (we weren't calculating the kq pointer
correctly).
In if_ndis.c, clear the IFF_RUNNING flag before calling ndis_halt_nic().
Add some guards in kern_ndis.c to avoid letting anyone invoke ndis_get_info()
or ndis_set_info() if the NIC isn't fully initialized. Apparently, mdnsd
will sometimes try to invoke the ndis_ioctl() routine at exactly the
wrong moment (to futz with its multicast filters) when the interface
comes up, and can trigger a crash unless we guard against it.
Oh, one additional change I forgot to mention in the last commit:
NdisOpenFile() was broken in the case for firmware files that were
pre-loaded as modules. When searching for the module in NdisOpenFile(),
we would match against a symbol name, which would contain the string
we were looking for, then save a pointer to the linker file handle.
Later, in NdisMapFile(), we would refer to the filename hung off
this handle when trying to find the starting address symbol. Only
problem is, this filename is different from the embedded symbol
name we're searching for, so the mapping would fail. I found this
problem while testing the AirGo driver, which requires a small
firmware file.
- Remove the old task threads from kern_ndis.c and reimplement them in
subr_ntoskrnl.c, in order to more properly emulate the Windows DPC
API. Each CPU gets its own DPC queue/thread, and each queue can
have low, medium and high importance DPCs. New APIs implemented:
KeSetTargetProcessorDpc(), KeSetImportanceDpc() and KeFlushQueuedDpcs().
(This is the biggest change.)
- Fix a bug in NdisMInitializeTimer(): the k_dpc pointer in the
nmt_timer embedded in the ndis_miniport_timer struct must be set
to point to the DPC, also embedded in the struct. Failing to do
this breaks dequeueing of DPCs submitted via timers, and in turn
breaks cancelling timers.
- Fix a bug in KeCancelTimer(): if the timer is interted in the timer
queue (i.e. the timeout callback is still pending), we have to both
untimeout() the timer _and_ call KeRemoveQueueDpc() to nuke the DPC
that might be pending. Failing to do this breaks cancellation of
periodic timers, which always appear to be inserted in the timer queue.
- Make use of the nmt_nexttimer field in ndis_miniport_timer: keep a
queue of pending timers and cancel them all in ndis_halt_nic(), prior
to calling MiniportHalt(). Also call KeFlushQueuedDpcs() to make sure
any DPCs queued by the timers have expired.
- Modify NdisMAllocateSharedMemory() and NdisMFreeSharedMemory() to keep
track of both the virtual and physical addresses of the shared memory
buffers that get handed out. The AirGo MIMO driver appears to have a bug
in it: for one of the segments is allocates, it returns the wrong
virtual address. This would confuse NdisMFreeSharedMemory() and cause
a crash. Why it doesn't crash Windows too I have no idea (from reading
the documentation for NdisMFreeSharedMemory(), it appears to be a violation
of the API).
- Implement strstr(), strchr() and MmIsAddressValid().
- Implement IoAllocateWorkItem(), IoFreeWorkItem(), IoQueueWorkItem() and
ExQueueWorkItem(). (This is the second biggest change.)
- Make NdisScheduleWorkItem() call ExQueueWorkItem(). (Note that the
ExQueueWorkItem() API is deprecated by Microsoft, but NDIS still uses
it, since NdisScheduleWorkItem() is incompatible with the IoXXXWorkItem()
API.)
- Change if_ndis.c to use the NdisScheduleWorkItem() interface for scheduling
tasks.
With all these changes and fixes, the AirGo MIMO driver for the Belkin
F5D8010 Pre-N card now works. Special thanks to Paul Robinson
(paul dawt robinson at pwermedia dawt net) for the loan of a card
for testing.
to the mbuf. Offset cannot exceed MHLEN bytes. This is currently used to
fix Ethernet header alignment problem on alpha and sparc64. Also change all
users of m_uiotombuf to pass proper offset.
Reviewed by: jmg, sam
Tested by: Sten Spans "sten AT blinkenlights DOT nl"
MFC after: 1 week
look up the packet size of the packet that generated the
response, step down the MTU by one step through ip_next_mtu()
and try again.
Suggested by: dwmalone
access to POSIX Semaphores:
mac_init_posix_sem() Initialize label for POSIX semaphore
mac_create_posix_sem() Create POSIX semaphore
mac_destroy_posix_sem() Destroy POSIX semaphore
mac_check_posix_sem_destroy() Check whether semaphore may be destroyed
mac_check_posix_sem_getvalue() Check whether semaphore may be queried
mac_check_possix_sem_open() Check whether semaphore may be opened
mac_check_posix_sem_post() Check whether semaphore may be posted to
mac_check_posix_sem_unlink() Check whether semaphore may be unlinked
mac_check_posix_sem_wait() Check whether may wait on semaphore
Update Biba, MLS, Stub, and Test policies to implement these entry points.
For information flow policies, most semaphore operations are effectively
read/write.
Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Sponsored by: DARPA, McAfee, SPARTA
Obtained from: TrustedBSD Project
debug.bpf_bufsize is now net.bpf.bufsize
debug.bpf_maxbufsize is now net.bpf.maxbufsize
-move function prototypes for bpf_drvinit and bpf_clone up to the
top of the file with the others
-assert bpfd lock in catchpacket() and bpf_wakeup()
MFC after: 2 weeks
to ksem.h so that they are accessible from the MAC Framework for the
purposes of labeling and enforcing additional protections. #error
if these are included without _KERNEL, since they are not intended
(nor installed) for user application use.
Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Sponsored by: DARPA, SPARTA
Obtained from: TrustedBSD Project
missing and will be implemented in a second step. This is functional as is.
Tested by: freebsd-pf, pfsense.org
Obtained from: OpenBSD
X-MFC after: never (breaks API/ABI)
underlying vnode requires Giant.
- In vm_fault only acquire Giant if the underlying object has NEEDSGIANT
set.
- In vm_object_shadow inherit the NEEDSGIANT flag from the backing object.
perfect solution as the lower vm object can change at unpredictable times
if our lower vp happens to be on another unionfs, etc.
Submitted by: Oleg Sharoiko <os@rsu.ru>
export. This was happening anyway since this file manually sets DEBUG.
- Add a sysctl for the number of items on the worklist.
- Use a more canonical loop restart in softdep_fsync_mountdev, it saves
some code at the expense of a goto and makes me worry less about
modifying a variable that should be private to the TAILQ_FOREACH_SAFE
macro.
devstat_end_transaction is called from a fast interrupt. Presently
there is no way for mtx_assert to determine that we're not executing
in a real thread context.
Submitted by: jhusted@isilon.com
as they have no connection with the expected MNT_* flags. This bug
was exposed 18 months ago when the assignments to f_flags in
vfs_syscalls.c were moved to before the VFS_STATFS() call. It was
fixed in the CSRG source 10 years ago, but we never picked up that
change.
PR: kern/80390
MFC after: 1 week
drop the check+initialization for a straight initialization. Also
assert that curthread will never be NULL just to be sure.
Discussed with: rwatson, peter
MFC after: 1 week
Have pmcstat(8) and pmccontrol(8) use these APIs.
Return PMC class-related constants (PMC widths and capabilities)
with the OP GETCPUINFO call leaving OP PMCINFO to return only the
dynamic information associated with a PMC (i.e., whether enabled,
owner pid, reload count etc.).
Allow pmc_read() (i.e., OPS PMCRW) on active self-attached PMCs to
get upto-date values from hardware since we can guarantee that the
hardware is running the correct PMC at the time of the call.
Bug fixes:
- (x86 class processors) Fix a bug that prevented an RDPMC
instruction from being recognized as permitted till after the
attached process had context switched out and back in again after
a pmc_start() call.
Tighten the rules for using RDPMC class instructions: a GETMSR
OP is now allowed only after an OP ATTACH has been done by the
PMC's owner to itself. OP GETMSR is not allowed for PMCs that
track descendants, for PMCs attached to processes other than
their owner processes.
- (P4/HTT processors only) Fix a bug that caused the MI and MD
layers to get out of sync. Add a new MD operation 'get_config()'
as part of this fix.
- Allow multiple system-mode PMCs at the same row-index but on
different CPUs to be allocated.
- Reject allocation of an administratively disabled PMC.
Misc. code cleanups and refactoring. Improve a few comments.
are set when we attempt to remove a buffer from a queue we should panic.
Hopefully this will catch the source of the wrong bufobj panics.
Sponsored by: Isilon Systems, Inc.
needed only for implicit connect cases. Under load, especially on SMP,
this can greatly reduce contention on the tcbinfo lock.
NB: Ambiguities about the state of so_pcb need to be resolved so that
all use of the tcbinfo lock in non-implicit connection cases can be
eliminated.
Submited by: Kazuaki Oda <kaakun at highway dot ne dot jp>
a new entry in the taskqueue struct each time it wakes up to see if it
should terminate
o adjust TASKQUEUE_DEFINE_THREAD & co. to record the thread/proc identity for
the shutdown rendezvous
o replace wakeup after adding a task to a queue with wakeup_one; this helps
queues where multiple threads are used to service tasks (e.g. acpi)
o remove NULL check of tq_enqueue method; it should never be NULL
Reviewed by: dfr, njl
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.
KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.
Submitted by: ups
vtryrecycle(). We could sometimes get into situations where two threads
could try to recycle the same vnode before this.
- vtryrecycle() is now responsible for returning the vnode to the free list
if it fails and someone else hasn't done it.
- Make a new function vfreehead() which moves a vnode to the head of the
free list and use it in vgone() to clean up that code a bit.
Sponsored by: Isilon Systems, Inc.
Reported by: pho, kkenn
due to a change made in revision 1.284 of sys/kern/kern_sig.c in August
2004 which made ptracestop() return with the process still locked.
Submitted by: Mauritz Sundell
MFC After: 3 days
mutexes, which offers lower overhead on both UP and SMP. When allocating
from or freeing to the per-cpu cache, without INVARIANTS enabled, we now
no longer perform any mutex operations, which offers a 1%-3% performance
improvement in a variety of micro-benchmarks. We rely on critical
sections to prevent (a) preemption resulting in reentrant access to UMA on
a single CPU, and (b) migration of the thread during access. In the event
we need to go back to the zone for a new bucket, we release the critical
section to acquire the global zone mutex, and must re-acquire the critical
section and re-evaluate which cache we are accessing in case migration has
occured, or circumstances have changed in the current cache.
Per-CPU cache statistics are now gathered lock-free by the sysctl, which
can result in small races in statistics reporting for caches.
Reviewed by: bmilekic, jeff (somewhat)
Tested by: rwatson, kris, gnn, scottl, mike at sentex dot net, others
theoretically unload pci bridges or pci drivers. It will also allow
detach to work if one needed to detach a subtree.
This is inspired by looking at the p4 commits from bms to his 5.4
tree, but I didn't look at the final results.
/usr/src/sbin/ipf/ipftest/../../../sys/contrib/ipfilter/netinet/ip_frag.c: In function `fr_ipid_newfrag':
/usr/src/sbin/ipf/ipftest/../../../sys/contrib/ipfilter/netinet/ip_frag.c:397: warning: cast to pointer from integer of different size
/usr/src/sbin/ipf/ipftest/../../../sys/contrib/ipfilter/netinet/ip_frag.c: In function `fr_ipid_knownfrag':
/usr/src/sbin/ipf/ipftest/../../../sys/contrib/ipfilter/netinet/ip_frag.c:582: warning: cast from pointer to integer of different size
for the VGA I/O or memory ranges, when it's not within the default
ranges decoded by the bridge. When allocation for VGA addresses is
attempted, check that the bridge has the VGA Enable bit set before
allowing it.
As such, newbusified VGA drivers can allocate their resources when
the VGA adapter is behind a PCI-to-PCI bridge.
Reviewed by: imp@, jhb@
Only allow a process to use the x86 RDPMC instruction if it has
allocated and attached a PMC to itself.
Inform the MD layer of the "pseudo context switch out" that needs
to be done when the last thread of a process is exiting.
Move crc32() and crc32_raw() from the latter to the former. Move
the declaration of crc32_tab[] to <sys/libkern.h> as well.
Pointed out by: bde@
Tested on: ia64, sparc64
CRC logic to a new function: crc32_raw() that obtains the initial
CRC value as well as leaves any post-processing to the caller. As
such, it can be used when the initial CRC value is not ~0U or when
the final CRC value does need to be inverted (bitwise). It also
means that crc32_raw() can be called repeatedly when the data is
not available as a single block, such as for scatter/gather lists
and the likes.
Avoid the additional call overhead incured by the refactoring by
moving the implementation off crc32() to sys/systm.h and making it
inlinable. Since crc32_raw() is itself trivial and since it may
be used in loops that iterate over fragments, having it available
for inlining can be beneficial. Hence, move its implementation
to sys/systm.h as well.
Keep the original implementation of crc32() in libkern/crc32.c for
documentation purposes (as a comment of course).
Triggered by: Jose M Rodriguez (josemi at freebsd dot jazztel dot es)
Discussed on: current@
Tested on: amd64, ia64 (BVO having GPT partitions)
Jargon file candidate: BVO = By Virtue Of :-)
fact that access to RR0 does not need a prior write to the register
index because the index always reverts to 0 after the indexed register
has been accessed.
Typically when a RR or WR is to accessed, one programs the index (which
is a write to the control register), followed by a read or write to the
actual indexed register (a read pr write to the same control register).
When this non-atomic sequence is interrupted after having written the
index and low-level console I/O is done in that situation, the write to
program the index will actually write to the indexed register and nuke
state. This almost always yields a wedge.
By not programming the index register and instead just reading from RR0,
the worst case scenario is non-fatal. For if we don't actually read from
RR0 but some other register we get an invalid status, which may lead us
to conclude that the transit data register is empty when it's not or that
the receive data register contains data when it doesn't. Hence, we may
lose an output character or get a sporadic input character, but given
the situation this is a non-issue.
Full serialization is not possible due to the fact that this code needs
to work from DDB and before mutex initialization has happened.
In collaboration with: kris@, marius@
Tested by: kris@
MFC after: 1 day
X-MFC: 5.4-RELEASE candidate
the MNT_RDONLY flag if the "ro" option was passed in from userland, and
clears it otherwise. In the diskless case, the MNT_RDONLY flag is already
set when this code is reached, but there are no mount options, so it was
incorrectly cleared. Change the logic so the MNT_RDONLY flag is set if the
"ro" option was specified, and left alone otherwise.
Note that the NFS code will still happily let you mount a filesystem RW
even if the server exports it RO. I'm not sure how to fix that.
caller will be interested in the actual data contents of a vnode after
a successful lookup. This intended to help deal with lifetime issues
for device cloning and to alert autofs when filesystems need to be
mounted.
fields of an ICMP packet.
Use this to allow ipfw to pullup only these values since it does not use
the rest of the packet and it was failed on ICMP packets because they
were not long enough.
struct icmp should probably be modified to use these at some point, but
that will break a fair bit of code so it can wait for another day.
On the off chance that adding this struct breaks something in ports,
bump __FreeBSD_version.
Reported by: Randy Bush <randy at psg dot com>
Tested by: Randy Bush <randy at psg dot com>
during a data phase. Before, we would try to recover the autosense, but
the DMA engine would still be active with interrupted transfer, and we'd
quickly spiral out of control and cause massive data corruption. For now,
just reset the chip and cancel everything. The better solution is to
cancel the DMA operation, but there is no clear way to do that right now.
The data corruption problem is severe enough to warrant this fix in the
interim. Thanks to Kris Kenneway to sacrificing countless filesystems to
this bug.
MFC After: 3 days
the number of entries in exec_map (maximum number of simultaneous execs
that can be handled by the kernel). The default value of 16 is
insufficient on heavily loaded machines (particularly SMP machines), and
if it is exceeded then executing further processes will generate a SIGABRT.
This is a workaround until a better solution can be implemented.
Reviewed by: alc
MFC after: 3 days
on boards with VIA gigE controllers that are embedded in VIA chipsets.
Presumably, they don't have an external EEPROM and store the MAC
address somewhere else. To get around this, force an autoload and
read the station address from the RX filter registers instead.
This has been tested to work on both embedded and standalone
controllers.
While there also check for failed device_add_child calls.
Found by: Coventry Analysis tool[1].
Submitted by: sam[1]
Approved by: pjd (mentor)
MFC after: 1 week
here on in, if_ndis.ko will be pre-built as a module, and can be built
into a static kernel (though it's not part of GENERIC). Drivers are
created using the new ndisgen(8) script, which uses ndiscvt(8) under
the covers, along with a few other tools. The result is a driver module
that can be kldloaded into the kernel.
A driver with foo.inf and foo.sys files will be converted into
foo_sys.ko (and foo_sys.o, for those who want/need to make static
kernels). This module contains all of the necessary info from the
.INF file and the driver binary image, converted into an ELF module.
You can kldload this module (or add it to /boot/loader.conf) to have
it loaded automatically. Any required firmware files can be bundled
into the module as well (or converted/loaded separately).
Also, add a workaround for a problem in NdisMSleep(). During system
bootstrap (cold == 1), msleep() always returns 0 without actually
sleeping. The Intel 2200BG driver uses NdisMSleep() to wait for
the NIC's firmware to come to life, and fails to load if NdisMSleep()
doesn't actually delay. As a workaround, if msleep() (and hence
ndis_thsuspend()) returns 0, use a hard DELAY() to sleep instead).
This is not really the right thing to do, but we can't really do much
else. At the very least, this makes the Intel driver happy.
There are probably other drivers that fail in this way during bootstrap.
Unfortunately, the only workaround for those is to avoid pre-loading
them and kldload them once the system is running instead.
modify-after-free races when the task structure is malloc'd
o shrink task structure by removing ta_flags (no longer needed with
avoid fix) and combining ta_pending and ta_priority
Reviewed by: dwhite, dfr
MFC after: 4 days
inherit signal mask from parent thread, setup TLS and stack, and
user entry address.
Also support POSIX thread's PTHREAD_SCOPE_PROCESS and PTHREAD_SCOPE_SYSTEM,
sysctl is also provided to control the scheduler scope.
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.
- Introduce a global mutex, mac_bsdextended_mtx, to protect the rule
array and hold this mutex over use and modification of the rule array
and rules.
- Re-order and clean up sysctl_rule so that copyin/copyout/update happen
in the right order (suggested by: jhb done by rwatson).
destination windows were confused, one instead of other.
This error was masked, because first segment of just
established connection is usually smaller than initially
announced window, and it was successfully passed. First
window reannouncement corrected erroneous 'seqhi' value.
The error showed up when client connected to synproxy
with zero initial window, and reannounced it after
session establishment.
In collaboration with: dhartmei [we came to same patch independtly]
Reviewed by: mlaier
Sponsored by: Rambler
MFC after: 3 days
instead of assuming fixed offsets within the GDT. The hard-coded
values here have been incorrect since Peter's GDT rearranging around
10 days ago, causing ACPI resume problems.
Reviewed by: peter
16C950. Adding it here doesn't unlock any of the cool 16C950 features
(like the 128 byte fifo, the different prescalor, etc), but it does
seem to get it working for me in light testing.
Card Provided by: Ihsan Dogan
o Remove the clock interface. Not only does it conflict with the MI
version when device genclock is added to the kernel, it was also
not possible to have more than 1 clock device. This of course would
have been a problem if we actually had more than 1 clock device.
In short: we don't need a clock interface and if we do eventually,
we should be using the MI one.
o Rewrite inittodr() and resettodr() to take into account that:
1) We use the EFI interface directly.
2) time_t is 64-bit and we do need to make sure we can determine
leap years from year 2100 and on. Add a nice explanation of
where leap years come from and why.
3) This rewrite happened in 2005 so any date prior to 1/1/2005
(either M/D/Y or D/M/Y) is bogus. Reprogram the EFI clock with
1/1/2005 in that case.
4) The EFI clock has a high probability of being correct, so
only (further) correct the EFI clock when the file system time
is larger. That should never happen in a time-synchronised world.
Complain when EFI lost 2 days or more.
Replace the copyright notice now that I (pretty much) rewrote all of
this file.
pumping data despite our scsi data counters being at 0, something has
gone massively wrong. The consequence of happily ignoring this is more
DMA phase errors and a disk full of spammed sectors. Instead, panic on
the first occurance to hopefully limit the damage.
MFC After: 3 days
do not correctly deal with failures. This presently risks deadlock
problems if dependency processing is held up by failures to allocate
a vnode, however, this is better than the situation with the failures.
Sponsored by: Isilon Systems, Inc.
If TCP Signatures are enabled, the maximum allowed sack blocks aren't
going to fit. The fix is to compute how many sack blocks fit and tack
these on last. Also on SYNs, defer padding until after the SACK
PERMITTED option has been added.
Found by: Mohan Srinivasan.
Submitted by: Mohan Srinivasan, Noritoshi Demizu.
Reviewed by: Raja Mukerji.
code readability and facilitates some anticipated optimizations in
tcp_sack_option().
- Remove tcp_print_holes() and TCP_SACK_DEBUG.
Submitted by: Raja Mukerji.
Reviewed by: Mohan Srinivasan, Noritoshi Demizu.
- If the peer sends the Signature option in the SYN, use of Timestamps
and Window Scaling were disabled (even if the peer supports them).
- The sender must not disable signatures if the option is absent in
the received SYN. (See comment in syncache_add()).
Found, Submitted by: Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp>.
Reviewed by: Mohan Srinivasan <mohans at yahoo-inc dot com>.
tcp_ctlinput() and subject it to active tcpcb and sequence
number checking. Previously any ICMP unreachable/needfrag
message would cause an update to the TCP hostcache. Now only
ICMP PMTU messages belonging to an active TCP session with
the correct src/dst/port and sequence number will update the
hostcache and complete the path MTU discovery process.
Note that we don't entirely implement the recommended counter
measures of Section 7.2 of the paper. However we close down
the possible degradation vector from trivially easy to really
complex and resource intensive. In addition we have limited
the smallest acceptable MTU with net.inet.tcp.minmss sysctl
for some time already, further reducing the effect of any
degradation due to an attack.
Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.2
MFC after: 3 days
latest 82550 and 82551 chipsets (revision IDs 0x0e, 0x0f and 0x10).
We were only enabling it for revisions 0x0c and 0x0d, now it's
enabled for any 8255x NIC with a revision ID bigger than 0x0c. It
should be safe, and this is what Intel does in their open source
driver.
MFC after: 2 weeks
Tested by: Pavel Lobach lobach_pavel at mail dot ru
ineffective, depreciated and can be abused to degrade the performance
of active TCP sessions if spoofed.
Replace a bogus call to tcp_quench() in tcp_output() with the direct
equivalent tcpcb variable assignment.
Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.1
MFC after: 3 days
This also removes the warning timeout on the taskqueues stalling as
I'm tired of getting ATA error reports for problems in other parts ;)
Misc cosmetic and comment cleanups now we are here.
number of task threads to start on boot. Go back to a default of 3
threads to work around lost battery state problems. Users that need
a setting of 1 can set this via the tunable. I am investigating the
underlying issues and this tunable can be removed once they are solved.
MFC after: 2 days
HWPMC_HOOKS is defined. The pmc_cpu_is_*() functions in this file
are referenced unconditionally by hwpmc(4).
This is mostly a stop-gap. The pmc_cpu_is*() function should
probably be declared inline in <sys/pmc.h> or <sys/pmckern.h> and
the function pointers with corresponding SX lock should probably
be moved to another file and compiled conditionally upon HWPMC_HOOKS.
Ok'd by: jkoshy@
includes the MD header for us. Do not include <machine/specialreg.h>
as it is not a header file that can be included from MI files. It
is included from <machine/pmc_mdep.h> if so needed and possible.
Ok'd: jkoshy@
inclusion of <sys/pmc.h> and depending on being included from
that header file.
o Include any MD specific header files that otherwise need to be
included from MI files.
Ok'd: jkoshy@
<machine/pmc_mdep.h> here.
o Remove the #error directive. There's no union md_pm referenced
on (as of yet) unsupported platforms and will not be if there
are no MD extensions for a particular platform.
Further cleanups can be expected.
Ok'd: jkoshy@
a taskqueue(9) task. This fixes LORs and adds possibility
to serve such events pseudorecursively, when link state
change of interface causes subsequent change on other
interfaces.
Sponsored by: Rambler
Reviewed by: sam, brooks, mux
ioctls are now handled explicitly, but we can't really do anything
with them unless the NIC is up (trying to get/set a parameter when
the NDIS driver isn't running always yields an error). If something
invokes either of these ioctls and the NIC isn't initialized, punt
to the default ieee80211_ioctl() routine.
resides on. Fix the special case of the filesystem fragment size not
evenly dividing the size of the provider. Fixing the general case
probably requires better superblock validation (left as an exercise to
the reader).
settings and is an older version of the same design used for ICH SpeedStep.
It is only known to be available on PIIX4 chipsets.
Many thanks to Bruno Ducrot for writing the driver and Jon Noack for
testing.
Submitted by: Bruno Ducrot
Fix a race condition between kern_wait() and thread_stopped().
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.
Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after : 4 days
Problem is in kern_wait(), parent process steps through children list,
once a child process is skipped, and later even if the child is stopped,
parent process still sleeps in msleep(), the race happens if parent
masked SIGCHLD.
Submitted by : Peter Edwards peadar.edwards at gmail dot com
MFC after : 4 days
into _bus.h to help with name space polution from including all of bus.h.
In a few days, I'll commit changes to the MI code to take advantage of thse
sepration (after I've made sure that these changes don't break anything in
the main tree, I've tested in my trees, but you never know...).
Suggested by: bde (in 2002 or 2003 I think)
Reviewed in principle by: jhb
While we wait for holds to be released, print a list of who holds us
back once per second.
Use the new KPI from GEOM instead of vfs_mount.c calling g_waitidle().
Use the new KPI also from ata.
With ATAmkIII's newbusification, ata could narrowly miss the window
and ad0 would not exist when we tried to mount root.
mac_check_proc_wait(), which control the ability to wait4() specific
processes. This permits MAC policies to limit information flow from
children that have changed label, although has to be handled carefully
due to common programming expectations regarding the behavior of
wait4(). The cr_seeotheruids() check in p_canwait() is #if 0'd for
this reason.
The mac_stub and mac_test policies are updated to reflect these new
entry points.
Sponsored by: SPAWAR, SPARTA
Obtained from: TrustedBSD Project
dc0: MII without any PHY!
We have to enable the connection to the MII first. Doing so fixes the
problem cards without breaking the older, working cards.
Bad card provided by: deischen
- ncr53c9x.c:
1.108: Remove unreachable break after return and goto statements.
1.109: avoid strong words; use 'screw' instead
1.110: Fix some typos. From Tom Cosgrove via jmc@openbsd.
1.114: nuke trailing whitespace
1.107 was already merged, 1.112 and 1.113 are not relevant for FreeBSD.
1.111 is a functional change and will be merged later.
- ncr53c9xreg.h:
1.12: DMA, not dma nor Dma.
1.13: Fix some typos. From Tom Cosgrove via jmc@openbsd.
1.14: nuke trailing whitespace
- ncr53c9xvar.h:
1.43: Fix some typos. From Tom Cosgrove via jmc@openbsd.
1.44: Constify.
1.42 and 1.46 were already merged, 1.45 is not relevant for FreeBSD.
- Merge esp_sbus.c rev. 1.31 from NetBSD: nuke trailing whitespace.
Rev. 1.28 and 1.30 were already merged, 1.29 is not relevant for FreeBSD.
- Remove unused headers.
- Use BUS_PROBE_DEFAULT.
- Use __func__ instead of hardcoded function names in error messages.
- Correct some comments.
- Correct some function declarations to match their prototypes.
- Some style(9) fixes (don't use function calls in initializers; indentation).
- Zero the allocated structs to avoid problems with uninitialized members.
- Remove the ifdef'ed out SBus interrupt priority code and the hook for
ncr53c9x_reset(), remove the unused SBus interrupt priority member from
esp_softc. On FreeBSD setting the SBus interrupt priority is entirely done
in sbus(4) and the reset function isn't even really used in NetBSD.
- s,dma,DMA, in comments.
- Make the code fit in 80 columns.
- Merge lsi64854.c rev. 1.25 from NetBSD: nuke trailing whitespace.
- Update NetBSD RCS IDs according to what was actually already merged.
- Remove dv_name from the lsi64854_softc and use device_printf() instead.
- Use __func__ instead of hardcoded function names in error messages.
- Use ulmin() instead of min() for comparing the DMA sizes as the values
involved actually are represented by 64bit unsigned instead of 32bit
unsigned. As far as I can't tell this doesn't make a difference in
practice though.
- Some style(9) fixes (mainly indentation).
- Remove unnecessary braces.
to see if additional write requests will arrive that can be coalesced and
clustered with earlier ones. When doing so, it must determine whether
the two requests are made by credentials with the same access writes, so
as not to coalesce improperly. NFSW_SAMECRED() implements a test of two
credentials using a binary compare.
Replace NFSW_SAMECRED() macro with nfsrv_samecred() function, which is
aware of the contents and layout of a struct ucred, rather than a simple
binary compare. While the binary compare works when ucred is simply a
zero'd and embedded 'struct ucred' in the NFS descriptor, it will work
less well when the ucred associated with an NFS descriptor is "real", so
has defined and populated reference count, mutex, etc.
MFC after: 1 week
Obtained from: TrustedBSD Project
this buffer anyway so the constraint that it had to be DMA capable only
caused pain when devices failed to aquire the memory. Use a regular
malloc instead with sndbuf_setup.
Approved by: tanimura (mentor)
at their old location in sys/dev/esp after they were repo-copied to
sys/sparc64/sbus at rev. 1.1:
sys/dev/esp/lsi64854.c rev. 1.2
sys/dev/esp/lsi64854var.h rev. 1.2
Add some style(9) touch ups; style(9) states that new code should follow
these conventions and, well, this is a new driver.
Tested on: i386, sparc64
Reviewed by: scottl
with the attaching of the children done in the bus attach function like
it's supposed to be.
- In the bus probe nomatch function print the resources of the children
like it's done in the other sparc64 specific bus drivers.
- For the clock frequency IVAR use the per-child values and fall back to
the bus default in case a child doesn't have the respective property
instead of always using the bus default so a child driver doesn't need
to obtain the per-child value itself (see also the commit message of
sys/dev/esp/esp_sbus.c rev. 1.7).
- Add support for pass-through allocations. The comment preceding
sbus_alloc_resource() wasn't quite correct, we need to support pass-
through allocations for the 'espdma' and 'ledma' (pseudo-)busses which
hang off of the SBus in Ultra 1 machines. There can also be actual
bridges like the SBus-to-PCMCIA bridge on the SBus and the XBox (SBus
extension box) probably also involves one.
- Use auto-generated typedefs for the prototypes of the device interface
functions.
- Style(9) fixes (mainly don't use function calls in initializers).
- Use __func__ instead of hardcoded function names in error messages.
- Try to make error messages sound uniform.
- Try to keep the code within 80 columns.
- Correct some typos.
- Correct some function declarations to match their prototypes.
- Remove unused headers, macros and variables.
- Remove a bzero() superfluous due to allocating with M_ZERO.
- Use FBSDID.
Since the name cache is case-sensitive and msdosfs isn't,
creating a file 'foo' won't invalidate a negative entry for 'FOO'.
There are similar problems related to 8.3 filenames.
A better solution is to override VOP_LOOKUP with a method that
canonicalizes the name, then calls vfs_cache_lookup(). Unfortunately,
it's not quite that simple because vfs_cache_lookup() will call
msdosfs_lookup() on a cache miss, and msdosfs_lookup() needs a way to
get at the original component name.
control socket poll() (select()), fstat(), and accept() operations,
required for some policies:
poll() mac_check_socket_poll()
fstat() mac_check_socket_stat()
accept() mac_check_socket_accept()
Update mac_stub and mac_test policies to be aware of these entry points.
While here, add missing entry point implementations for:
mac_stub.c stub_check_socket_receive()
mac_stub.c stub_check_socket_send()
mac_test.c mac_test_check_socket_send()
mac_test.c mac_test_check_socket_visible()
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR, SPARTA
of the socket label to thread-local storage, and replace it with
conditional acquisition based on debug.mpsafenet. Acquire the socket
lock around the copy operation.
In mac_set_fd(), replace the unconditional acquisition of Giant with
the conditional acquisition of Giant based on debug.mpsafenet. The socket
lock is acquired in mac_socket_label_set() so doesn't have to be
acquired here.
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR, SPARTA
Don't use atomic ops to increment interrupt stats.
On sparc64 this reduces delay until tick interrupts are service by 1/10th
on average. In turn this reduces the clock drift caused by these delays
so there's less drift which has to be compensated in tick_hardclock().
This includes switching from atomically incrementing the global cnt.v_intr
to the asm equivalent of PCPU_LAZY_INC(cnt.v_intr) in exception.S
- Correct some comments to match the registers actually used.
- Correct some format specifiers, interrupt levels passed in are u_int.
- Use FBSDID.
Ok'ed by: jhb
- Fix NULL pointer dereferences caused when an ithread or a handler is
NULL which happens when a stray interrupt triggers after the respective
device interrupt was torn down.
- Remove the critical section around INTR_FAST handlers which actually
was a nested critical section. Both tl0_intr() and tl1_intr() already
enter a critical section for calling intr_execute_handlers().
MFC after: 3 days
call tick_stop() again after tick_init() as tick interrupts already
have been disabled as part of tick_init().
- In spinlock_enter() replace the magic value for PIL TICK with the
respective macro.
- Use FBSDID.
SpitFire erratum #54) which can cause writes to the TICK_CMPR register
to fail. This seems to fix the dying clocks problem reported by jhb@
and kris@. [1]
- In tick_start() don't reset the tick counter of the boot processor to
zero. It's initially reset in _start() and afterwards but _before_
tick_start() is called on the BSP the APs synchronise with the tick
counter of the BSP in mp_startup(). Resetting the tick counter of the
BSP in tick_start() probably also was the cause of problems seen when
using the CPU tick counter as timecounter on SMP machines.
Not resetting the tick counter of the BSP in mp_startup() makes the
tick counters and tick interrupts between the BSP and APs be pretty
much in sync as it's supposed to be. This also means there's no longer
a real reason to have separate tick_start() and tick_start_ap() so
merge them and zap tick_start_ap(). This is also a first step in
simplifying the interface to the tick counters in preparation to use
alternate clock hardware where available.
- Switch to the algorithm used on FreeBSD/ia64 for updating the tick
interrupt register and which compensates the clock drift caused by
varying delays between when the tick interrupts actually trigger and
when they are serviced. Not compensating the clock drift mainly hurts
interactive performance especially when using WITNESS. [2]
For further information about the algorithm also see the commit log
of sys/ia64/ia64/interrupt.c rev. 1.38.
On sparc64 the sysctls for monitoring the behaviour of the tick
interrupts are machdep.tick.adjust_edges, machdep.tick.adjust_excess,
machdep.tick.adjust_missed and machdep.tick.adjust_ticks.
- In tick_init() just use tick_stop() for stopping the tick interrupts
until a proper handler is set up later. This also stops the system
tick interrupt on USIII systems earlier.
- In tick_start() check for a rough upper limit of HZ.
- Some minor changes, e.g. use FBSDID, remove unused headers, etc.
Info obtained from: Linux [1]
Ok'ed by: marcel [2]
Additional testing by: kris (earlier version of the workaround), jhb
X-MFC after: 3 days [1]
of system calls to manipulate elements of the process credential,
including:
setuid() mac_check_proc_setuid()
seteuid() mac_check_proc_seteuid()
setgid() mac_check_proc_setgid()
setegid() mac_check_proc_setegid()
setgroups() mac_check_proc_setgroups()
setreuid() mac_check_proc_setreuid()
setregid() mac_check_proc_setregid()
setresuid() mac_check_proc_setresuid()
setresgid() mac_check_rpoc_setresgid()
MAC checks are performed before other existing security checks; both
current credential and intended modifications are passed as arguments
to the entry points. The mac_test and mac_stub policies are updated.
Submitted by: Samy Al Bahra <samy@kerneled.org>
Obtained from: TrustedBSD Project
than defaulting the cmode argument to vn_open() to 0. Supply a default
argument of ALQ_DEFAULT_CMODE (0600) in current callers.
Discussed with/pointed out by: hmp
Reveiwed by: jeff, hmp
MFC after: 3 days
unw_step(). Both errors denote the end of a stack trace (i.e. no
prior frame), but are otherwise not error conditions.
Have db_trace() return 0 when the trace ends due to one of these
return codes as they are really normal termination conditions.
This change especially improves the output of the "show thread"
command in DDB when there are threads in fork_trampoline() and
previously db_trace() would return an error, causing the show
command to emit '***'.
- Split core DRM routines back into their own module, rather than using the
nasty templated system like before.
- Development-class R300 support in radeon driver (requires userland pieces, of
course).
- Mach64 driver (haven't tested in a while -- my mach64s no longer fit in the
testbox). Covers Rage Pros, Rage Mobility P/M, Rage XL, and some others.
- i915 driver files, which just need to get drm_drv.c fixed to allow attachment
to the drmsub device. Covers i830 through i915 integrated graphics.
- savage driver files, which should require minimal changes to work. Covers the
Savage3D, Savage IX/MX, Savage 4, ProSavage.
- Support for color and texture tiling and HyperZ features of Radeon.
Thanks to: scottl (much p4 handholding)
Jung-uk Kim (helpful prodding)
PR: [1] kern/76879, [2] kern/72548
Submitted by: [1] Alex, lesha at intercaf dot ru
[2] Shaun Jurrens, shaun at shamz dot net
than WIN_CHARS bytes, we shift the suffix (previous substrings) upwards
by the amount this substring exceeds its WIN_CHARS slot. Profiling shows
this change is indistinguishable from the previous code at 95% confidence.
This bug would result in attempts to access or create files or directories
with multi-byte characters returning an error but no data loss.
Reported and tested by: avatar
MFC after: 3 days
of physical addresses. The pages containing these physical addresses will
not be added to the free list and thus will effectively be ignored by the
VM system. This is mostly useful for the case when one knows of specific
physical addresses that have bit errors (such as from a memtest run) so
that one can blacklist the bad pages while waiting for the new sticks of
RAM to arrive. The physical addresses of any ignored pages are listed in
the message buffer as well.
This allows to attach to the children (ATA devices) even without a
driver being attached. This allows atapi-cam to do its work both
with and without the pure ATAPI driver being present.
ATA patches by /me
ATAPI-cam pathes by Thomas
IPv6 support. The header in IPv6 is more complex then in IPv4 so we
want to handle skipping over it in one location.
Submitted by: Mariano Tortoriello and Raffaele De Lorenzo (via luigi)
MCA state requires a spin lock, which requires a valid curthread.
This change allows SMP kernels to boot into multi-user again.
While here, update the copyright notice and use __FBSDID for the
revision string.
in flight in SACK recovery.
Found by: Noritoshi Demizu
Submitted by: Mohan Srinivasan <mohans at yahoo-inc dot com>
Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp>
Raja Mukerji <raja at moselle dot com>
disabling interrupts before updating the saved pil in the thread. If we
save the value first then it can be clobbered if an interrupt comes in
and the interrupt handler tries to acquire a spin lock.
Submitted by: marius
Specifically, if the BIOS has programmed an IRQ for a device that doesn't
match the list of valid IRQs for the link, use it anyway as some BIOSes
don't correctly list the valid IRQs in the $PIR. Also, allow the user
to specify an IRQ that $PIR claims is invalid as an override, but emit a
warning in that case.
when using an APIC. This simplifies the APIC code somewhat and also allows
us to be pedantically more compliant with ACPI which mandates no use of
mixed mode.
printf's during a verbose boot is more intuitive (the BAR listings and
interrupt routing info now comes after the config header dump rather than
just before it).
MAP_SHARED so that the entry point gets executed un-conditionally.
This may be useful for security policies which want to perform access
control checks around run-time linking.
-add the mmap(2) flags argument to the check_vnode_mmap entry point
so that we can make access control decisions based on the type of
mapped object.
-update any dependent API around this parameter addition such as
function prototype modifications, entry point parameter additions
and the inclusion of sys/mman.h header file.
-Change the MLS, BIBA and LOMAC security policies so that subject
domination routines are not executed unless the type of mapping is
shared. This is done to maintain compatibility between the old
vm_mmap_vnode(9) and these policies.
Reviewed by: rwatson
MFC after: 1 month
Save a memory dereference in the ISR by passing this in directly.
Calling pps_capture is MP safe for all other operations on struct
pps_state, so there's no need to aquire the lock before we do this,
even from a fast ISR. Avoid dereferencing sc->ppbus until after
pps_capture is called as well. These actions reduce somewhat the
cache effects that cause variance in interrupt times. On an
especially slow test machine (300MHz Cyrix GXm), this reduces the
interrupt latency about about 10% (from 21us to 19us) and helps a
little with the variance (although most of the variance seems to be
caused by lots of interrupt masking).
This also happens fixes one or two of bde's style issues.
i.e. checking to see if a cluster was every less than 48 bytes,
a rather unlikely case.
Check return value of m_dup_pkthdr() calls.
Found by: Coverity
Reviewed by: rwatson (mentor), Keiichi Shima (for Kame)
Approved by: rwatson (mentor)
we start turning any of them back on again. This works around a bug in
some BIOSen that alias two different link devices for APIC vs ATPIC modes
onto the same physical hardware link.
Submitted by: njl
Tested by: Antoine Brodin antoine dot brodin at laposte dot net
Specifically, sleepq_broadcast() uses td_slpq for its private pending
queue of threads that it is going to wake up after it takes them off the
sleep queue. The problem is that if one of the threads is actually not
asleep yet, then we can end up with td_slpq being corrupted and/or the
thread being made runnable at the wrong time resulting in the td_sleepqueue
== NULL assertion failures occasionally reported under heavy load.
The fix is to stop being so fancy and ditch the whole pending queue bit.
Instead, sleepq_remove_thread() and sleepq_resume_thread() were merged
into one function that requires the caller to hold sched_lock. This
fixes several places that unlocked sched_lock only to call a function
that then locked sched_lock, so even though sched_lock is now held
slightly longer, removing the extra lock acquires (1 pair instead of 3
in some cases) probably makes it an overall win if you don't include the
fact that it closes a race. This is definitely a 5.4 candidate.
PR: kern/79693
Submitted by: Steven Sears stevenjsears at yahoo dot com
MFC after: 4 days
was satisified for the rest of the kernel on the i386 build except for
these two files. Rather than adding a submarine include to pcb.h, I've
added proc.h here.
I forgot to include these with the original commit. Sorry folks.
period value. I suppose the BT adapter driver should be
fixed, but more importantly we should protect against
dividing by zero.
PR: kern/75603
MFC after: 1 week
of the kernel address space already. Intel recommend this anyway, because
using a non-4GB limit adds an additional clock cycle to address generation.
We were able to install 4GB segments into the LDT, so any limits we imposed
on %cs and %ds were academic anyway. More importantly, this allows us to
make a page in the kernel readable to user applications, for holding things
like the signal trampoline and other fun things.
Move the user %cs/%ds segments from the LDT to the GDT. There was no good
reason for them to be there anyway. The old LDT entries are still there
but we can now relax the restriction that prevented users from emptying
the default LDT entries.
Putting user and kernel %cs and %ds together allows us to access the fast
sysenter/sysexit/syscall/sysret instructions. syscall/sysret in particular
require that the user/kernel segments be laid out this way. Reserve a slot
specifically for NDIS while here.
Create two user controllable slots in the GDT that are context switched
with the (kernel) thread. This allows user applications to set two
user privilige selectors to arbitary values. Create
i386_set_fsbase(void *base) and friends. (get/set, fs/gs). For i386,
%gs is used by tls and the thread libraries and this means that user
processes no longer have to have the cost of having a custom LDT, and
we will no longer to do a ldt switch when activating a kthread/ithread in
the usual case any more.
In other words, we can now set the base address for %fs and %gs to arbitary
addresses without the pain of messing with ldt segments.
of the __pcb_spare longs. Except that fields were changed and one of the
spare values was used and the __pcb_spare field was reduced from two to one
long. Now VM86 bios calls can trash the first 4 bytes of the next page
following the kernel stack/pcb. This Is Bad(TM). This bug has been
present in 5.2-release and onwards, and is still in RELENG_5.
Instead of tempting fate and trying to use "spare" fields, explicitly
reserve them.
pcib_route_interrupt interface. Since there's only one interrupt pin
in the CardBus form factor, everybody gets to share it. Implement
cbb_route_interrupt to return the interrupt we have.
Suggested by: bms
Otherwise, busses that implement the pcib interface that forget to
implement pcib_route_interrupt would return EIO, which the caller
interprets as 'use interrupt 6'. This is likely the cause of much of
the grief that we had when I enabled power modes for the cardbus
bridge, since the card needed to reroute the interrupt to it and it
was getting 6 which was d by the pccbb sanity checks.
Also, move the -I stuff to the centralized kern.pre.mk. However, it
might be better to add these flags to files.conf. This is a short
term fix to fix the broken builds on my machine (I don't have a valid
/sys link).
guard against NULL t_modem entry. Otherwise, driver doesn't have t_modem
callback implemented(such like sys/dev/usb/ucycom.c) would panic when
someone opens the driver's associated tty device.
Reviewed by: phk, sam (mentor)
case. There are bugs in some which didn't unlock in the ISDOTDOT case
to begin with that need to be addressed seperately. This simplifies
things anyway.
- Fix relookup() to prevent it from vrele()'ing the dvp while the vp
is locked. Catch up to other lookup changes.
Sponsored by: Isilon Systems, Inc.
Reported by: Peter Wemm
Enhanced SpeedStep (that is, a follow-up of it called Foxton). Until
we actually have support for that, we build to catch regressions in
the framework.
Triggered by: njl
- Add unp_addsockcred() (for LOCAL_CREDS).
- Add an argument to unp_connect2() to differentiate between
PRU_CONNECT and PRU_CONNECT2. (for LOCAL_CONNWAIT)
Obtained from: NetBSD (with some changes)
3ware's 9xxx series controllers. This corresponds to
the 9.2 release (for FreeBSD 5.2.1) on the 3ware website.
Highlights of this release are:
1. The driver has been re-architected to use a "Common Layer"
(all tw_cl* files), which is a consolidation of all OS-independent
parts of the driver. The FreeBSD OS specific portions of the
driver go into an "OS Layer" (all tw_osl* files).
This re-architecture is to achieve better maintainability, consistency
of behavior across OS's, and better portability to new OS's (drivers
for new OS's can be written by just adding an OS Layer that's specific
to the OS, by complying to a "Common Layer Programming Interface" API.
2. The driver takes advantage of multiple processors.
3. The driver has a new firmware image bundled, the new features of which
include Online Capacity Expansion and multi-lun support, among others.
More details about 3ware's 9.2 release can be found here:
http://www.3ware.com/download/Escalade9000Series/9.2/9.2_Release_Notes_Web.pdf
Since the Common Layer is used across OS's, the FreeBSD specific include
path for header files (/sys/dev/twa) is not part of the #include pre-processor
directive in any of the source files. For being able to integrate twa into
the kernel despite this, Makefile.<arch> has been changed to add the include
path to CFLAGS.
Reviewed by: scottl
1 Move the debug.clock_adjust_* sysctls to debug.clock.adjust_* to
make it easier to get only the clock statistics.
2 Make the sysctls read-only [suggested by Marius].
3 When determining the new clock adjustment, we checked for an error
either larger than 12.5% or smaller than 12.5%. We left out an error
of exactly 12.5%. For errors larger than 12.5% we adjust the clock
reload value in such a way that the next clock interrupt would be
early (as in premature). For errors less than 12.5% we stopped the
adjustment.
The current algorithm doesn't benefit from excluding an error of
exactly 12.5%. Change the code to stop adjusting the clock if the
error is *not* larger than 12.5% [suggested by Marius].
Discussed with: marius@
o don't pre-assign key index to the global key table entries so device
has a chance to decide what to use
o make ieee80211_crypto_newkey take the desired flags as an argument
instead of wacking the key structure directly; this eliminates a
bunch of code warts
o add a new flag IEEE80211_KEY_GROUP to indicate a key is a WPA Group
key so devices don't need to guess (temporarily add this flag in the
ioctl code until we can get wpa_supplicant+hostapd updated)
o shuffle IEEE80211_KEY_* bits to move flags used internally to the high
nibble of the flags word
Reviewed by: Tai-hwa Liang
problems here, it became clear we were being too complex.
o Don't keep track of resources in two places
o Use resource_list_purge instead of rolling our own
o Just reassign the ownership of the resource, rather than freeing it
and reallocating it.
o Fix compile problems when sizeof(u_long) != sizeof(int)
in BSD and MBR classes, ie. if provider below us uses the same metadata,
don't create labels based on the metadata.
This allows to create labels on geoms with rank != 1 without hacks.
Tested by: Chris Elsworth <chris@shagged.org> on sparc64
OK'ed by: phk
MFC after: 2 weeks
this code:
o rid is stored in the resource, so don't bother keeping track of it here.
o Implement memory space
o Don't try to activate 'memory card' CFEs. This is type memory, as opposed
to the memory resource.
resource_list_find. Check to make sure that rle is not NULL and panic
if it is (but it appears that resource_list_add already panics, so I'm
not entirely sure it is necessary now).
Add a test to make sure we have a interrupt resource when we're
disabling it. This is also a cannot happen, but the extra care
shoudln't hurt.
Found by: Coventry tool via sam@
whether or not the receive pipe is stopped. This ensures that we
do not attempt to start the same transfer twice, and it allows
ucomstop() to skip the restarting of the read pipe if it was not
originally running, such as when called indirectly from ucomreadcb().
PR: kern/79420
MFC after: 1 day
with E may be called with a shared lock held. This list really could
be made per filesystem if we had any filesystems which differed from
ffs in locking guarantees. VFS itself is not sensitive to this except
where vgone() etc. are concerned.
Sponsored by: Isilon Systems, Inc.
potential lock order reversal. Also, don't unlock the vnode if this
fails, lockmgr has already unlocked it for us.
- Restructure vget() now that vn_lock() does all of VI_DOOMED checking
for us and also handles the case where there is no real lock type.
- If VI_OWEINACT is set, we need to upgrade the lock request to EXCLUSIVE
so that we can call inactive. It's not legal to vget a vnode that hasn't
had INACTIVE called yet.
Sponsored by: Isilon Systems, Inc.
as I'd like to get rid of the vxthread.
- Handle lock requests which don't actually want a lock as this is a
much more convenient place to handle this condition than in vget().
These requests simply want to know that VI_DOOMED isn't set.
- Correct a test at the end of vn_lock, if error !=0 should be
if error == 0, this has been broken since I comitted the VI_DOOMED
changes, but no one ran into it because vget() duplicated this
functionality.
Sponsored by: Isilon Systems, Inc.
layer, but with a twist.
The twist has to do with the fact that Microsoft supports structured
exception handling in kernel mode. On the i386 arch, exception handling
is implemented by hanging an exception registration list off the
Thread Environment Block (TEB), and the TEB is accessed via the %fs
register. The problem is, we use %fs as a pointer to the pcpu stucture,
which means any driver that tries to write through %fs:0 will overwrite
the curthread pointer and make a serious mess of things.
To get around this, Project Evil now creates a special entry in
the GDT on each processor. When we call into Windows code, a context
switch routine will fix up %fs so it points to our new descriptor,
which in turn points to a fake TEB. When the Windows code returns,
or calls out to an external routine, we swap %fs back again. Currently,
Project Evil makes use of GDT slot 7, which is all 0s by default.
I fully expect someone to jump up and say I can't do that, but I
couldn't find any code that makes use of this entry anywhere. Sadly,
this was the only method I could come up with that worked on both
UP and SMP. (Modifying the LDT works on UP, but becomes incredibly
complicated on SMP.) If necessary, the context switching stuff can
be yanked out while preserving the convention calling wrappers.
(Fortunately, it looks like Microsoft uses some special epilog/prolog
code on amd64 to implement exception handling, so the same nastiness
won't be necessary on that arch.)
The advantages are:
- Any driver that uses %fs as though it were a TEB pointer won't
clobber pcpu.
- All the __stdcall/__fastcall/__regparm stuff that's specific to
gcc goes away.
Also, while I'm here, switch NdisGetSystemUpTime() back to using
nanouptime() again. It turns out nanouptime() is way more accurate
than just using ticks(). On slower machines, the Atheros drivers
I tested seem to take a long time to associate due to the loss
in accuracy.
systems that boot with this value at the lowest setting. Change the
default boot config back to "leave frequency as BIOS set it". Also, fix
buglet where acpi_throttle wouldn't be used if p4tcc was present but
disabled by the user.
MFC after: 1 week
a return instruction. (The latter is discouraged by the Opteron
optimization manual because it disables branch prediction for the return
instruction.)
Reviewed by: bde
Affects to people WITH an AD1888 codec, the system will output to the port
labeled "speaker" instead of microphone. System will work the same in
multiple operating systems.
If people are currently using their systems with this codec they will need
to swap their output ports.
I have _not_ checked audio input or line input (basically, I have checked
nothing other than line-out).
I believe this is an appropriate change, it makes us consistent with
documentation, and other operating systems. Furthermore, this feature
(playing) is the vast majority of sound activities, so if this makes is
right for playback and wrong for recording... playback is more important,
and we can fix recoding in the future without worries of screwing people
again in the future (since we'll be "right" on the playback).
Submitted by: David Cross
setting ts_recent to an arbitrary value, stopping further
communication between the two hosts.
- If the Echoed Timestamp is greater than the current time,
fall back to the non RFC 1323 RTT calculation.
Submitted by: Raja Mukerji (raja at moselle dot com)
Reviewed by: Noritoshi Demizu, Mohan Srinivasan
a reassembly queue state structure, don't update (receiver) sack
report.
- Similarly, if tcp_drain() is called, freeing up all items on the
reassembly queue, clean the sack report.
Found, Submitted by: Noritoshi Demizu <demizu at dd dot iij4u dot or dot jp>
Reviewed by: Mohan Srinivasan (mohans at yahoo-inc dot com),
Raja Mukerji (raja at moselle dot com).
(Fix for kern/78226).
Submitted by : Noritoshi Demizu <demizu at dd dot iij4u dot or dot jp>
Reviewed by : Mohan Srinivasan (mohans at yahoo-inc dot com),
Raja Mukerji (raja at moselle dot com).
The main reason for doing this is that the ELF dump handler expects
the thread list to be fixed while the dump header is generated, so an
upcall that occurs at the wrong time can lead to buffer overruns and
other Bad Things.
Another solution would be to grab sched_lock in the ELF dump handler,
but we might as well single-thread, since the process is about to die.
Furthermore, I think this should ensure that the register sets in the
core file are sequentially consistent.
when vrele() acquires the directory lock in the wrong order. Fix this
via the following changes:
- Keep the directory locked after VOP_LOOKUP() until we've determined
what we're going to do with the child. This allows us to remove the
complicated post LOOKUP code which determins whether we should lock or
unlock the parent. This means we may have to vput() in the appropriate
cases later, rather than doing an unsafe vrele.
- in NDFREE() keep two flags to indicate whether we need to unlock vp or
dvp. This allows us to vput rather than vrele in the appropriate
cases without rechecking the flags. Move the code to handle dvp after
we handle vp.
- Remove some dead code from namei() that was the result of changes to
VFS_LOCK_GIANT().
Sponsored by: Isilon Systems, Inc.
try to reasseble the packet from the fragments queue with the only
fragment, finish with the first fragment as soon as we create a queue.
Spotted by: Vijay Singh
o Drop the fragment if maxfragsperpacket == 0, no chances we
will be able to reassemble the packet in future.
Reviewed by: silby
ip.portrange.last and there is the only port for that because:
a) it is not wise; b) it leads to a panic in the random ip port
allocation code. In general we need to disable ip port allocation
randomization if the last - first delta is ridiculous small.
PR: kern/79342
Spotted by: Anjali Kulkarni
Glanced at by: silby
MFC after: 2 weeks
instructs the driver to avoid using Keyboard Interface Test command.
This command causes problems with some non-compliant hardware, resulting
in machine being abruptly powered down early in the boot process.
Particularly it's known that HP ZV5000 and Compaq R3000Z notebooks
are affected by this problem.
Due to popularity of those models this patch is good MFC5.4 candidate.
PR: 67745
Submitted by: Jung-uk Kim jkim at niksun.com
MFC after: 1 days
protocol. RFCOMM is a SOCK_STREAM protocol not SOCK_SEQPACKET. This was a
serious bug caused by cut-and-paste. I'm surprised it did not bite me before.
Dunce hat goes to me.
MFC after: 3 days
EA bit is set in hdr->length (16-bit length). This currently has no effect
on the rest of the code. It just fixes the debug message.
MFC After: 3 weeks
we have a non-NULL args.rule. If the same packet later is subject to "tee"
rule, its original is sent again into ipfw_chk() and it reenters at the same
rule. This leads to infinite loop and frozen router.
Assign args.rule to NULL, any time we are going to send packet back to
ipfw_chk() after a tee rule. This is a temporary workaround, which we
will leave for RELENG_5. In HEAD we are going to make divert(4) save
next rule the same way as dummynet(4) does.
PR: kern/79546
Submitted by: Oleg Bulyzhin
Reviewed by: maxim, andre
MFC after: 3 days
to root cause on exactly how this happens.
- If the assert is disabled, we presently try to handle this case, but the
BUF_UNLOCK was missing. Thus, if this condition ever hit we would leak
a buf lock.
Many thanks to Peter Holm for all his help in finding this bug. He really
put more effort into it than I did.
the register values coming back from sigreturn(2). Normally this wouldn't
matter because the 32 bit environment would truncate the upper 32 bits
and re-save the truncated values at the next trap. However, if we got
a fast second signal and it was pending while we were returning from
sigreturn(2) in the signal trampoline, we'd never have had a chance to
truncate the bogus values in 32 bit mode, and the new sendsig would get
an EFAULT when trying to write to the bogus user stack address.
Functional changes:
- Cut struct source_hookinfo. Just use hook_p pointer.
- Remove "start_now" command. "start" command now requires number of
packets to send as argument. "start" command actually starts sending.
Move the code that actually starts sending from ng_source_rcvmsg()
to ng_source_start().
- Remove check for NG_SOURCE_ACTIVE in ng_source_stop(). We can be called
with flag cleared (see begin of ng_source_intr()).
- If NG_SEND_DATA_ONLY() use log(LOG_DEBUG) instead of printf(). Otherwise
we will *flood* console.
- Add ng_connect_t method, which sends NGM_ETHER_GET_IFNAME command
to "output" hook. Cut ng_source_request_output_ifp(). Refactor
ng_source_store_output_ifp() to use ifunit() and don't muck through
interface list.
- Add "setiface" command, which gives ability to configure interface
in case when ng_source_connect() failed. This happens, when we are not
connected directly to ng_ether(4) node.
- Remove KASSERTs, which can never fire.
- Don't check for M_PKTHDR in rcvdata method. netgraph(4) does this
for us.
Style:
- Assign sc_p = NG_NODE_PRIVATE(node) in declaration, to be
consistent with style of other nodes.
- Sort variables.
- u_intXX -> uintXX.
- Dots at ends of comments.
Sponsored by: Rambler
libalias.
In /usr/src/lib/libalias/alias.c, the functions LibAliasIn and
LibAliasOutTry call the legacy PacketAliasIn/PacketAliasOut instead
of LibAliasIn/LibAliasOut when the PKT_ALIAS_REVERSE option is set.
In this case, the context variable "la" gets lost because the legacy
compatibility routines expect "la" to be global. This was obviously
an oversight when rewriting the PacketAlias* functions to the
LibAlias* functions.
The fix (as shown in the patch below) is to remove the legacy
subroutine calls and replace with the new ones using the "la" struct
as the first arg.
Submitted by: Gil Kloepfer <fgil@kloepfer.org>
Confirmed by: <nicolai@catpipe.net>
PR: 76839
MFC after: 3 days
- newbus plumbing. Each atapicam bus is a child off of a parent ata channel
bus. This is somewhat of a hack, but allows the ata core to be completely
free of atapicam knowledge.
- No more global lists of softc's and no more groping around in internal ata
structures on each command.
- Giant-free operation of the completion handler.
- Per-bus mutex for protecting the busy list and synchronizing detach.
- Lots of streamlining and dead code elimination, better adherence to the
CAM locking protocol.
This feature still requires that the appropriate atapi-* driver be present
for each atapi device that you want to talk to (i.e. atapi-cd for cdroms).
It does work both compiled into the kernel and as a loadable module.
Reviewed by: thomas, sos
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.
Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.
This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).
Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
to see what features they may support before calling identify/probe/attach.
This is necessary because the ACPI 3.0 spec requires driver support be
advertised before running any methods. For now, the flags are as specified
in for the _PDC and _OSC methods but we can support private flags as needed.
Add an implementation of this for acpi_cpu. It checks all its children
(notably cpufreq drivers) and calls the _PDC method to report the results.
instances in a given devclass. This is useful for systems that want to
call code in driver static methods, similar to device_identify().
Reviewed by: dfr
MFC after: 2 weeks
one to become available for one second and then return ENFILE. We
can run out of vnodes, and there must be a hard limit because without
one we can quickly run out of KVA on x86. Presently the system can
deadlock if there are maxvnodes directories in the namecache. The
original 4.x BSD behavior was to return ENFILE if we reached the max,
but 4.x BSD did not have the vnlru proc so it was less profitable to
wait.
down. If we have dirty pages, the putpages routine will need to know
what the vnode's object is so that it may write out dirty pages.
Pointy hat: phk
Found by: obrien
in a devclass. All the other uses of maxunit are correct and this one was
safe since it checks the return value of devclass_get_device(), which would
always say that the highest unit device doesn't exist.
Reviewed by: dfr
MFC after: 3 days
completed I/O requests here.
- First allocate all needed bios, so if any of allocations fail, we can
free memory before sending any I/O requests down.
Reported by: Pawel Malachowski
MFC after: 3 days
- Don't intermingle direct calls to lockmgr and indirect calls through
VOPs. This will be important in the future.
- Dont lock the devvp's interlock just to release it on the next line by
passing LK_INTERLOCK to lockmgr.
- Restructure ffs_snapshot_unmount so we don't call free() with the
devvp's interlock locked.
because it may change identities while we're sleeping on the lock.
Otherwise we may bail out of ffs_sync() early due to an error from
deadfs.
- Collapse a VOP_UNLOCK, vrele into a single vput().
two bugs.
- ffs_disk_prewrite was pulling the vp from the buf and checking for
COPYONWRITE, when really it wanted the vp from the bufobj that we're
writing to, which is the devvp. This lead to us skipping the copy on
write to all file data, which significantly broke snapshots for the
last few months.
- When the SOFTUPDATES option was not included in the kernel config we
would also skip the copy on write check, which would effectively disable
snapshots.
- Remove an invalid mp_fixme().
Debugging tips from: mckusick
Reported by: iedowse, others
Discussed with: phk
generate dirty bufs even with a locked vnode, 100 retries is not that
many. This should probably change from a retry count to an abort when
we are no longer cleaning any buffers.
- Don't call vprint() while we still hold the vnode locked. Move the call
to later in the function.
- Clean up a comment.
implementations inspired by the ones in DragonFly. Unlike the
DragonFly versions, these have a small data cache footprint, and my
tests show that they're never slower than the old code except when the
charset or the span is 0 or 1 characters. This implementation is
generally faster than DragonFly until either the charset or the span
gets in the ballpark of 32 to 64 characters.
compiler features tests. This is ok, since machine/ieeefp.h is an internal
interface. But floatingpoint.h is a public interface and some ports use it,
so include sys/cdefs.h in the amd64 and i386 version of floatingpoint.h.
Note: some architectures don't provide recursive inclusion protection in
floatingpoint.h, namely alpha and ia64. Except for this part and now the
include of sys/cdefs.h, all those files are equal (from a compiler POV),
so they could be moved to only one version in src/include/.
Approved by: joerg
these at the moment, but applications that test for them will now
have a better chance of compiling.
I have intentionally omitted errnos that are only good for STREAMS,
since apps that use STREAMS won't compile anyway. The exception is
EPROTO, which was apparently intended for STREAMS, but worth having
anyway because Linux (mis)uses it for other things.
specific code will migrate to these files to augment or replace the
version in i386/include and/or i386/linux. This should, in the
fullness of time, allow many of the #ifdef PC98 in the tree.
# These files are in the public domain because there is insufficient
# creative content in them. When you customize them, please add a
# copyright notice and license.
OK'd in principle by: nyan@
creating the /dev/dpti%d entry that the software expects. This is just
a band-aid until either someone (hopefully) rewrites the utilities, or all
asr/dpt cards in existance get blasted into the sun.
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
vnodes and anonymous memory. Note that mmaping a cdev directly does not
currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
cdev the ioctl is acting on rather than trying to find a suitable vnode
to map from.
Reviewed by: alc, arch@
pc98 machines because (a) it is PCIe or PCI-X (b) there's a BIOS that
must run at boot which assumes IBM-AT compatible boot environment.
Noticed by: scottl
There are too many questions in freebsd-amd64@ about how to enable Linux
support that it seems a required piece of functionality. Thus we should
just have it on by default.
series of controllers. Areca provides a CLI and HTTP management tool for
FreeBSD/i386 and FreeBSD/amd64 on their website. Many thanks to Areca for
their support of FreeBSD. Thanks also to Mike Tansca and Sentex Communications
for donating hardware.
Obtained from: Erich Chen <erich at areca com tw>
ndis_timercall() in NdisMInitializeTimer(), we can't use the raw
function pointer. This is because ntoskrnl_run_dpc() expects to
invoke a function with Microsoft calling conventions. On i386,
this works because ndis_timercall() is declared with the __stdcall
attribute, but this is a no-op on amd64. To do it correctly, we
have to generate a wrapper for ndis_timercall() and us the wrapper
instead of of the raw function pointer.
Fix this by adding ndis_timercall() to the funcptr table in subr_ndis.c,
and create ndis_findwrap() to extract the wrapped function from the
table in NdisMInitializeTimer() instead of just passing ndis_timercall()
to KeInitializeDpc() directly.
checks, including cpuid_is_k7(), will catch CPUs that really don't support
this method.
Submitted by: Bruno Ducrot
Tested by: Jari Kirma (kirma cs.hut.fi)
on filesystems which safely support them. It appears that many
network filesystems specifically are not shared lock safe.
Sponsored by: Isilon Systems, Inc.
since simply unlocking a mutex does not ensure that one of the waiters
will run and acquire it. We're more likely to reacquire the mutex
before anyone else has a chance. It has also bit me three times now, as
it's not safe to drop the interlock before sleeping in many cases.
Sponsored by: Isilon Systems, Inc.
objdump --disassemble when disassembling itself in userland. I've added
the cmovCC instruction group and tweaked a bunch of size sensitive array
indexes to either fix my mistakes and/or force it to work by any means
necessary.
I'm committing this because it is usable enough to see what is going on
when single stepping via ddb.
It might still tell lies, but its lies will be far more subtle now. I'm
not sure that this is a good thing or not.
instructions as it was when I dropped it back in May 31, 2003. I'm
committing this as an intermediate stage because back then I thought I
understood what I was doing with this file.
an ap in 11g with protection enabled
o correct rate selection when operating in 11g with protection when no
packets have been sent yet (from John Bicket)
o track api change to get first descriptor and use it to collect the frame
length for calculating the state bin
o add more debugging and shuffle some existing debugging to give more info
o bump version to distinguish bug fixes
to the rate control module for tx complete processing; this enables
rate control algorithms to extract the packet length for xmits that
require multiple descriptors
o ATA is now fully newbus'd and split into modules.
This means that on a modern system you just load "atapci and ata"
to get the base support, and then one or more of the device
subdrivers "atadisk atapicd atapifd atapist ataraid".
All can be loaded/unloaded anytime, but for obvious reasons you
dont want to unload atadisk when you have mounted filesystems.
o The device identify part of the probe has been rewritten to fix
the problems with odd devices the old had, and to try to remove
so of the long delays some HW could provoke. Also probing is done
without the need for interrupts, making earlier probing possible.
o SATA devices can be hot inserted/removed and devices will be created/
removed in /dev accordingly.
NOTE: only supported on controllers that has this feature:
Promise and Silicon Image for now.
On other controllers the usual atacontrol detach/attach dance is
still needed.
o Support for "atomic" composite ATA requests used for RAID.
o ATA RAID support has been rewritten and and now supports these
metadata formats:
"Adaptec HostRAID"
"Highpoint V2 RocketRAID"
"Highpoint V3 RocketRAID"
"Intel MatrixRAID"
"Integrated Technology Express"
"LSILogic V2 MegaRAID"
"LSILogic V3 MegaRAID"
"Promise FastTrak"
"Silicon Image Medley"
"FreeBSD PseudoRAID"
o Update the ioctl API to match new RAID levels etc.
o Update atacontrol to know about the new RAID levels etc
NOTE: you need to recompile atacontrol with the new sys/ata.h,
make world will take care of that.
NOTE2: that rebuild is done differently from the old system as
the rebuild is now done piggybacked on read requests to the
array, so atacontrol simply starts a background "dd" to rebuild
the array.
o The reinit code has been worked over to be much more robust.
o The timeout code has been overhauled for races.
o Support of new chipsets.
o Lots of fixes for bugs found while doing the modulerization and
reviewing the old code.
Missing or changed features from current ATA:
o atapi-cd no longer has support for ATAPI changers. Todays its
much cheaper and alot faster to copy those CD images to disk
and serve them from there. Besides they dont seem to be made
anymore, maybe for that exact reason.
o ATA RAID can only read metadata from all the above metadata formats,
not write all of them (Promise and Highpoint V2 so far). This means
that arrays can be picked up from the BIOS, but they cannot be
created from FreeBSD. There is more to it than just the missing
write metadata support, those formats are not unique to a given
controller like Promise and Highpoint formats, instead they exist
for several types, and even worse, some controllers can have
different formats and its impossible to tell which one.
The outcome is that we cannot reliably create the metadata of those
formats and be sure the controller BIOS will understand it.
However write support is needed to update/fail/rebuild the arrays
properly so it sits fairly high on the TODO list.
o So far atapicam is not supported with these changes. When/if this
will change is up to the maintainer of atapi-cam so go there for
questions.
HW donated by: Webveveriet AS
HW donated by: Frode Nordahl
HW donated by: Yahoo!
HW donated by: Sentex
Patience by: Vife and my boys (and even the cats)
carp_carpdev_state_locked() is called every time carp interface is attached.
The first call backs up flags of the first interface, and the second
call backs up them again, erasing correct values.
To solve this, a carp_sc_state_locked() function is introduced. It is
called when interface is attached to parent, instead of calling
carp_carpdev_state_locked. carp_carpdev_state_locked() calls
carp_sc_state_locked() for each sc in chain.
Reported by: Yuriy N. Shkandybin, sem
queues lock in vm_object_backing_scan(). Updates to the page's PG_BUSY
flag and busy field are synchronized by the containing object's lock.
Testing the page's hold_count and wire_count in vm_object_backing_scan()'s
OBSC_COLLAPSE_NOWAIT case is unnecessary. There is no reason why the held
or wired pages cannot be migrated to the shadow object.
Reviewed by: tegge
filesystem modules must be recompiled. (Since struct vnode has
already changed in 6-CURRENT, there's little advantage to leaving
the unused fields around.)
vnodes whose names it caches, so we no longer need a `generation
number' to tell us if a referenced vnode is invalid. Replace the use
of the parent's v_id in the hash function with the address of the
parent vnode.
Tested by: Peter Holm
Glanced at by: jeff, phk
except for places where people forget to update one of them. We now
collect only one set of stats for both of these routines. Other
changes in this commit include:
- Start acquiring Giant again in vn_fullpath(), since it is required
when crossing a mount point.
- Expand the scope of the cache lock to avoid dropping it and
picking it up again for every pathname component. This also
makes it trivial to avoid races in stats collection.
- Assert that nc_dvp == v_dd for directories instead of returning
an error to userland when this is not true. AFAIK, it should
always be true when v_dd is non-null.
- For vn_fullpath(), handle the first (non-directory) vnode
separately.
Glanced at by: jeff, phk
returns error. In this case mbuf has already been freed. [1]
- Remove redundant declaration.
PR: kern/78893 [1]
Submitted by: Liang Yi [1]
Reviewed by: sam
MFC after: 1 day
to cache_lookup(). This allows us to acquire the vnode interlock before
dropping the cache lock. This protects the vnodes identity until we
have locked it.
Sponsored by: Isilon Systems, Inc.
We don't need a mknod(2) call
No tricky install documentation
Kernel leave them dev_t alone
Hey Kernel leave them cdevsw alone
All in all it's just another struct in src/sys
All in all you're just another struct in src/sys
Don't remove the now unused element from cdev yet, wait until
we have a better reason to bump the version.
There is now no longer any upper limit on how many device drivers
a FreeBSD kernel can have.
only allow proper values. ENTROPYSOURCE is a maxval+1, not an
allowable number.
Suggested loose protons in the solution: phk
Prefers to keep the pH close to seven: markm
acquire shared locks on intermediate directories.
- For the LASTCN, we may have to LK_UPGRADE the parent directory before
we lookup the last component.
- Acquire VFS_ROOT and dp locks based on the cn_lkflag.
Sponsored by: Isilon Systems, Inc.
vhold()s us.
- Avoid an extra mutex acquire and release in the common case of vgonel()
by checking for OWEINACT at the start of the function.
- Fix the case where we set OWEINACT in vput(). LK_EXCLUPGRADE drops our
shared lock if it fails.
Sponsored by: Isilon Systems, Inc.
ExAllocatePoolWithTag(), not malloc(), so it should be released
with ExFreePool(), not free(). Fix a couple if instances of
free(fh, ...) that got overlooked.
- On amd64, InterlockedPushEntrySList() and InterlockedPopEntrySList()
are mapped to ExpInterlockedPushEntrySList and
ExpInterlockedPopEntrySList() via macros (which do the same thing).
Add IMPORT_FUNC_MAP()s for these.
- Implement ExQueryDepthSList().
alloc and free routine pointers in the lookaside list with pointers
to ExAllocatePoolWithTag() and ExFreePool() (in the case where the
driver does not provide its own alloc and free routines). For amd64,
this is wrong: we have to use pointers to the wrapped versions of these
functions, not the originals.
zero'ing their length (copied from m_adj where this code came from
after the equivalent change there has had time to soak)
Noticed by: Coverity Prevent analysis tool
This adds support for the SiS intergrated NIC on some Athlon64 motherboards.
The MAC address is stored in the APC CMOS RAM and this fixes the
sis driver ending up with a 00:00:00:00:00:00 MAC address.
Submitted by: Stasys Smailys <ssmailys@komvista.lt>
nll_obsoletelock field in the lookaside list structure is only defined
for the i386 arch. For amd64, the field is gone, and different list
update routines are used which do their locking internally. Apparently
the Inprocomm amd64 driver uses lookaside lists. I'm not positive this
will make it work yet since I don't have an Inprocomm NIC to test, but
this needs to be fixed anyway.
- Assert that REMOVE, CREATE, and RENAME callers have WANTPARENT
or LOCKPARENT set. You can't complete any of these operations without
at least a reference to the parent. Many filesystems check for this case
even though it isn't possible in the current system.
- Only unlock the directory if this is a DOTDOT lookup. Previously this
code could have deadlocked if there was a DOTDOT lookup with LOCKPARENT
set and another thread was locking the other way up the tree.
Sponsored by: Isilon Systems, Inc.
handled in vfs_lookup.c. This code was missing PDIRUNLOCK use prior
to the removal of PDIRUNLOCK in rev 1.73 of vfs_lookup.c.
Sponsored by: Isilon Systems, Inc.
handled in vfs_lookup.c. This code was missing PDIRUNLOCK use prior
to the removal of PDIRUNLOCK in rev 1.73 of vfs_lookup.c.
Sponsored by: Isilon Systems, Inc.
handled in vfs_lookup.c. This code was missing PDIRUNLOCK use prior
to the removal of PDIRUNLOCK in rev 1.73 of vfs_lookup.c.
Sponsored by: Isilon Systems, Inc.
rely on ufs to always leave the parent locked except in the ISDOTDOT
case. Adjust asserts to deal with these changes.
Sponsored by: Isilon Systems, Inc.
- In the ISDOTDOT case we have to unlock the dvp before locking the child,
if this fails we must relock dvp before returning an error. This was
missing before.
Sponsored by: Isilon Systems, Inc.
- Network filesystems are written with a special idiom that checks the
cache first, and may even unlock dvp before discovering that a network
round-trip is required to resolve the name. I believe dvp is prevented
from being recycled even in the forced unmount case by the shared lock
on the mount point. If not, this code should grow checks for VI_DOOMED
after it relocks dvp or it will access NULL v_data fields.
Sponsored by: Isilon Systems, Inc.
calling VOP_LOOKUP(). Rather than having each filesystem check the
LOCKPARENT flag, we simply check it once here and unlock as required.
The only unusual case is ISDOTDOT, where we require an unlocked vnode
on return. Relocking this vnode with the child locked is allowed since
the child is actually its parent.
- Add a few asserts for some unusual conditions that I do not believe can
happen. These will later go away and turn into implementations for these
conditions.
Sponsored by: Isilon Systems, Inc.
case where filesystems legitimately need to unlock the directory vp is
in the DOTDOT case, which we can explicitly check for in lookup().
Furthermore, allowing filesystems to unlock dvp can lead to lock order
reversals in lookup() when we vrele the dvp while the child is still
locked.
Sponsored by: Isilon Systems, Inc.
acpi_bus_alloc_gas() to delete the resource it set if alloc fails. Then,
change acpi_perf to delete the resource after releasing it if alloc fails.
This should make probe and attach both fully restartable if either fails.
and AMD Cool&Quiet PowerNow! (k8) cpufreq control. This driver is enabled
for both i386 and amd64 architectures. It has both acpi and legacy BIOS
attachments. Thanks to Bruno Ducrot for writing this driver and Jung-uk
Kim for testing.
Submitted by: Bruno Ducrot (ducrot:poupinou.org)
may help with various interdependencies between subsystems. More testing
is needed to understand what the underlying issues are here.
Tested by: Juho Vuori
MFC after: 2 days
variables in internal blocks.
Also, go ahead and fail if we can't load the firmware. It should have
failed like this, but never did (firmware loads generally don't fail).
some of which are rather serious:
- Use the device sysctl tree instead of rolling our own.
- Don't create a bus_dmamap_t to pass to bus_dmamem_alloc(), it is
bus_dmamem_alloc() that creates it itself. The DMA map created
by the driver was overwritten and its memory was leaked.
- Fix resource handling bugs in the error path of ixgb_dma_alloc().
- Don't use vtophys() to get the base address of the TX and RX rings
when busdma already gave us the correct address to use!
- Remove now useless includes and the alpha_XXX_dmamap() hack.
- Don't initialize if_output to ether_output(), ether_ifattach() does
it for us already.
- Add proper module dependencies on ether and pci.
Unfortunately, I'm not lucky enough to own an ixgb(4) card, nor a
machine with a bus where to plug it in and I couldn't find anyone able
to test these patches, so they are only build-tested and I won't MFC
them for 5.4-RELEASE.
This ensures that we explore EHCI busses before their companion
controllers' busses, so that ports connected to full/low speed
devices will be properly routed to the companion controllers by the
time the OHCI/UHCI exploration occurs.
work on SMP" saga. After several weeks and much gnashing of teeth,
I have finally tracked down all the problems, despite their best
efforts to confound and annoy me.
Problem nunmber one: the Atheros windows driver is _NOT_ a de-serialized
miniport! It used to be that NDIS drivers relied on the NDIS library
itself for all their locking and serialization needs. Transmit packet
queues were all handled internally by NDIS, and all calls to
MiniportXXX() routines were guaranteed to be appropriately serialized.
This proved to be a performance problem however, and Microsoft
introduced de-serialized miniports with the NDIS 5.x spec. Microsoft
still supports serialized miniports, but recommends that all new drivers
written for Windows XP and later be deserialized. Apparently Atheros
wasn't listening when they said this.
This means (among other things) that we have to serialize calls to
MiniportSendPackets(). We also have to serialize calls to MiniportTimer()
that are triggered via the NdisMInitializeTimer() routine. It finally
dawned on me why NdisMInitializeTimer() takes a special
NDIS_MINIPORT_TIMER structure and a pointer to the miniport block:
the timer callback must be serialized, and it's only by saving the
miniport block handle that we can get access to the serialization
lock during the timer callback.
Problem number two: haunted hardware. The thing that was _really_
driving me absolutely bonkers for the longest time is that, for some
reason I couldn't understand, my test machine would occasionally freeze
or more frustratingly, reset completely. That's reset and in *pow!*
back to the BIOS startup. No panic, no crashdump, just a reset. This
appeared to happen most often when MiniportReset() was called. (As
to why MiniportReset() was being called, see problem three below.)
I thought maybe I had created some sort of horrible deadlock
condition in the process of adding the serialization, but after three
weeks, at least 6 different locking implementations and heroic efforts
to debug the spinlock code, the machine still kept resetting. Finally,
I started single stepping through the MiniportReset() routine in
the driver using the kernel debugger, and this ultimately led me to
the source of the problem.
One of the last things the Atheros MiniportReset() routine does is
call NdisReadPciSlotInformation() several times to inspect a portion
of the device's PCI config space. It reads the same chunk of config
space repeatedly, in rapid succession. Presumeably, it's polling
the hardware for some sort of event. The reset occurs partway through
this process. I discovered that when I single-stepped through this
portion of the routine, the reset didn't occur. So I inserted a 1
microsecond delay into the read loop in NdisReadPciSlotInformation().
Suddenly, the reset was gone!!
I'm still very puzzled by the whole thing. What I suspect is happening
is that reading the PCI config space so quickly is causing a severe
PCI bus error. My test system is a Sun w2100z dual Opteron system,
and the NIC is a miniPCI card mounted in a miniPCI-to-PCI carrier card,
plugged into a 100Mhz PCI slot. It's possible that this combination of
hardware causes a bus protocol violation in this scenario which leads
to a fatal machine check. This is pure speculation though. Really all I
know for sure is that inserting the delay makes the problem go away.
(To quote Homer Simpson: "I don't know how it works, but fire makes
it good!")
Problem number three: NdisAllocatePacket() needs to make sure to
initialize the npp_validcounts field in the 'private' section of
the NDIS_PACKET structure. The reason if_ndis was calling the
MiniportReset() routine in the first place is that packet transmits
were sometimes hanging. When sending a packet, an NDIS driver will
call NdisQueryPacket() to learn how many physical buffers the packet
resides in. NdisQueryPacket() is actually a macro, which traverses
the NDIS_BUFFER list attached to the NDIS_PACKET and stashes some
of the results in the 'private' section of the NDIS_PACKET. It also
sets the npp_validcounts field to TRUE To indicate that the results are
now valid. The problem is, now that if_ndis creates a pool of transmit
packets via NdisAllocatePacketPool(), it's important that each time
a new packet is allocated via NdisAllocatePacket() that validcounts
be initialized to FALSE. If it isn't, and a previously transmitted
NDIS_PACKET is pulled out of the pool, it may contain stale data
from a previous transmission which won't get updated by NdisQueryPacket().
This would cause the driver to miscompute the number of fragments
for a given packet, and botch the transmission.
Fixing these three problems seems to make the Atheros driver happy
on SMP, which hopefully means other serialized miniports will be
happy too.
And there was much rejoicing.
Other stuff fixed along the way:
- Modified ndis_thsuspend() to take a mutex as an argument. This
allows KeWaitForSingleObject() and KeWaitForMultipleObjects() to
avoid any possible race conditions with other routines that
use the dispatcher lock.
- Fixed KeCancelTimer() so that it returns the correct value for
'pending' according to the Microsoft documentation
- Modfied NdisGetSystemUpTime() to use ticks and hz rather than
calling nanouptime(). Also added comment that this routine wraps
after 49.7 days.
- Added macros for KeAcquireSpinLock()/KeReleaseSpinLock() to hide
all the MSCALL() goop.
- For x86, KeAcquireSpinLockRaiseToDpc() needs to be a separate
function. This is because it's supposed to be _stdcall on the x86
arch, whereas KeAcquireSpinLock() is supposed to be _fastcall.
On amd64, all routines use the same calling convention so we can
just map KeAcquireSpinLockRaiseToDpc() directly to KfAcquireSpinLock()
and it will work. (The _fastcall attribute is a no-op on amd64.)
- Implement and use IoInitializeDpcRequest() and IoRequestDpc() (they're
just macros) and use them for interrupt handling. This allows us to
move the ndis_intrtask() routine from if_ndis.c to kern_ndis.c.
- Fix the MmInitializeMdl() macro so that is uses sizeof(vm_offset_t)
when computing mdl_size instead of uint32_t, so that it matches the
MmSizeOfMdl() routine.
- Change a could of M_WAITOKs to M_NOWAITs in the unicode routines in
subr_ndis.c.
- Use the dispatcher lock a little more consistently in subr_ntoskrnl.c.
- Get rid of the "wait for link event" hack in ndis_init(). Now that
I fixed NdisReadPciSlotInformation(), it seems I don't need it anymore.
This should fix the witness panic a couple of people have reported.
- Use MSCALL1() when calling the MiniportHangCheck() function in
ndis_ticktask(). I accidentally missed this one when adding the
wrapping for amd64.
count of valid frequencies and use that as the final package count, don't
give up when the first invalid state is found. Also, add 0x9999 and expand
our upper check to >= 0xffff Mhz [2].
Submitted by: Bruno Ducrot, Jung-uk Kim [2]
succeed if there was no media in the drive.
This was broken in rev 1.72 when the media check was added to cdioctl().
For now, check the ioctl group to decide whether to check for media or not.
(We only need to check for media on CD-specific ioctls.)
Reported by: bland
MFC after: 3 days
found it guilty in putting the card into unusable state after UP->DOWN->UP
media status change.
Looks like some of register writes in this functions mess up PHY interface.
No visible regressions has been found after commenting this code out -
the card properly handles forceful local mode changes and auto-detects changes
made remotely (tested with Auto, 10HD, 10FD, 100HD, 100FD).
Sponsored by: PBXpress Inc.
MFC after: 3 days
add more work are forced to process two worklist items first.
However, processing an item may generate additional work, causing the
unlucky thread to recursively process the worklist. Add a per-thread
flag to detect this situation and avoid the recursion. This should
fix the stack overflows that could occur while removing large
directory trees.
Tested by: kris
Reviewed by: mckusick
results in connectivty to MacOSX hosts via fwip.
Thanks to Apple's Arulchandran Paramasivam <arulchandranp@apple.com> for
letting us know what we were doing wrong.
Reviewed by: dfr
MFC After: 7 days
now always allocates a new vnode.
- Define a new function, vnlru_free, which frees vnodes from the free list.
It takes as a parameter the number of vnodes to free, which is
wantfreevnodes - freevnodes when called from vnlru_proc or 1 when
called from getnewvnode(). For now, getnewvnode() still tries to reclaim
a free vnode before creating a new one when we are near the limit.
- Define a function, vdestroy, which handles the actual release of memory
and teardown of locks, etc. This could become a uma_dtor() routine.
- Get rid of minvnodes. Now wantfreevnodes is 1/4th the max vnodes. This
keeps more unreferenced vnodes around so that files which have only
been stat'd are less likely to be kicked out of the system before we
have a chance to read them, etc. These vnodes may still be freed via
the normal vnlru_proc() routines which may some day become a real lru.
of the device id.
- Use BAR2 rather than BAR0 for the Rocketport UPCI 8O card. I suspect
that other UPCI cards might need to use BAR2 as well.
Tested by: wsk at gddsn dot org dot cn
MFC after: 1 week
actual root file system is mounted, the first entry on the mountlist
is not the root file system and the timestamp for that entry is
typically 0. Passing that to inittodr() caused annoying errors on
alpha and ia64.
So, call inittodr() for all file systems on mountlist, but only when
the timestamp (mnt_time) is non-zero.
with the wrong language parameter when retrieving the device serial
number. This invalid request caused some devices not to work at
all.
PR: usb/79190
Submitted by: Hans Petter Selasky <hselasky@c2i.net>
Instead, explicitly enable them when we setup the interrupt handler.
Also, move the setting of stathz and profhz down to the same place so
that the code flow is simpler and easier to follow.
- Don't setup an interrupt handler for IRQ0 if we are using the lapic timer
as it doesn't do anything productive in that case.
Replace a KASSERT of LINUX_IFNAMSIZ == IFNAMSIZ with a preprocessor
check and #error message. This will prevent nasty suprises if users
change IFNAMSIZ without updating the linux code appropriatly.
lockmgr locks that this thread owns. This is complicated due to
LK_KERNPROC and because lockmgr tolerates unlocking an unlocked lock.
Sponsored by: Isilon Systes, Inc.
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.
Sponsored by: Isilon Systems, Inc.
these filesystems will support shared locks until they are explicitly
modified to do so. Careful review must be done to ensure that this
is safe for each individual filesystem.
Sponsored by: Isilon Systems, Inc.
lookup to do shared locks on the root. Filesystems are free to ignore
flags and instead acquire an exclusive lock if they do not support
shared locks.
Sponsored by: Isilon Systems, Inc.
before it can call VOP_INACTIVE(). This must use the EXCLUPGRADE path
because we may violate some lock order with another locked vnode if
we drop and reacquire the lock. If EXCLUPGRADE fails, we mark the
vnode with VI_OWEINACT. This case should be very rare.
- Clear VI_OWEINACT in vinactive() and vbusy().
- If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well.
Sponsored by: Isilon Systems, Inc.
necessary since we disable the shared locks in vfs_cache, but it is
prefered that the option not leak out into filesystems when it is
disabled.
Sponsored by: Isilon Systems, Inc.
config option have now been fixed. All filesystems are properly locked
and checked via DEBUG_VFS_LOCKS. Remove the workaround code.
Sponsored by: Isilon Systems, Inc.
non-maskable).
- The NFS client needs to guard against spurious wakeups
while waiting for the response. ltrace causes the process
under question to wakeup (possibly from ptrace()), which
causes NFS to wakeup from tsleep without the response being
delivered.
Submitted by: Mohan Srinivasan
disables tag queuing temporarily in order to allow controllers a window
to safely perform transfer negotiation with non-compliant devices. Before
this change, CAM would restore the queue depth to the controller specified
maximum or device quirk level rather than any depth determined by reactions
to QUEUE FULL/BUSY events or an explicit user setting.
During device probe, initialize the flags field for XPT_SCAN_BUS.
The uninitialized value often confused CAM into not bothering to
issue an AC_FOUND_DEVICE async event for new devices. The reason
this bug wasn't reported earlier is that CAM manually announces
devices after the initial system bus scans.
MFC: 3 days
svr4_do_getmsg(). In principle this bug could disclose data from
kernel memory, but in practice, the SVR4 emulation layer is probably
not functional enough to cause the relevant code path to be executed.
In any case, the emulator has been disconnected from the build since
5.0-RELEASE.
Found by: Coverity Prevent analysis tool
to kmem_alloc(). Failure to do this made it possible for user
processes to cause a hard lock on i386 kernels. I believe this only
affects 6-CURRENT on or after 2005-01-26.
Found by: Coverity Prevent analysis tool
Security: Local DOS
with the IP_HDRINCL option set. Without this change, a Linux process
with access to a raw socket could cause a kernel panic. Raw sockets
must be created by root, and are generally not consigned to untrusted
applications; hence, the security implications of this bug are
minimal. I believe this only affects 6-CURRENT on or after 2005-01-30.
Found by: Coverity Prevent analysis tool
Security: Local DOS
validation error in procfs/linprocfs that can be exploited by local
users to cause a kernel panic. All versions of FreeBSD with the patch
referenced in SA-04:17.procfs have this bug, but versions without that
patch have a more serious bug instead. This problem only affects
systems on which procfs or linprocfs is mounted.
Found by: Coverity Prevent analysis tool
Security: Local DOS
FreeBSD based on aue(4) it was picked by OpenBSD, then from OpenBSD ported
to NetBSD and finally NetBSD version merged with original one goes into
FreeBSD.
Obtained from: http://www.gank.org/freebsd/cdce/
NetBSD
OpenBSD
be pass-thru mode, when traffic is not copied by ng_tee, but passed thru
ng_netflow.
Changes made:
- In ng_netflow_rcvdata() do all necessary pulluping: Ethernet header,
IP header, and TCP/UDP header.
- Pass only pointer to struct ip to ng_netflow_flow_add(). Any TCP/UDP
headers are guaranteed to by after it.
- Merge make_flow_rec() function into ng_netflow_flow_add().
be pass-thru mode, when traffic is not copied by ng_tee, but passed thru
ng_netflow.
Changes made:
- In ng_netflow_rcvdata() do all necessary pulluping: Ethernet header,
IP header, and TCP/UDP header.
- Pass only pointer to struct ip to ng_netflow_flow_add(). Any TCP/UDP
headers are guaranteed to by after it.
- Merge make_flow_rec() function into ng_netflow_flow_add().
modern CPUs that have multiple VID#s that aren't detectable via public
methods. We use the control value from acpi_perf as the id16 for setting
a given frequency.
Add two another workarounds for carp(4) interfaces:
- do not add connected route when address is assigned to carp(4) interface
- do not add connected route when other interface goes down
Embrace workarounds with #ifdef DEV_CARP
(like an EC/SMbus controller) to access the EC address space. Access
is synchronized by the EcLock/Unlock routines in EcSpaceHandler().
Tested by: Hans Petter Selasky
configure_final(), assert that "cold" is true in usb_cold_explore()
when there are busses to explore. When USB is kldloaded after boot,
usb_cold_explore() will still get invoked but the list of busses
to explore in that case should always be empty.
transfer, which lead to panics or page faults. For example if a
transfer timed out, another thread could come along and attempt to
abort the same transfer while the timeout task was sleeping in
the *_abort_xfer() function.
Add an "aborting" flag to the private transfer state in each host
controller driver and use this to ensure that the abort is only
executed once. Also prioritise normal abort requests over timeouts
so that the callback is always given a status of USB_CANCELLED even
if the timeout-initiated abort began first.
The crashes caused by this bug were mainly reported in connection
with lpd printing to a USB printer.
PR: usb/78208, usb/78986
inevitable component in Sun Exx00 machines and provides serial ports,
NVRAM and TOD amongst others which are handled by uart(4) and eeprom(4)
respectively). This driver currently only prints out information about
the chassis on attach and allows to blink the 'Cycling' LED (which is
duplicated on the front panel) of the clock board just like fhc(4) does
for the other boards. The device name for the LED is /dev/led/clockboard.
Obtained from: OpenBSD
Tested by: joerg
bus_generic_rl_release_resource() for the bus_release_resource() method
instead of a local copy.
- Correctly handle pass-through allocations in fhc_alloc_resource().
- In case the board model can't be determined just print "unknown model"
so the physical slot number is reported in any case.
- Add support for blinking the 'Cycling' LED of boards on a fhc(4) hanging
of off the nexus (i.e. all boards except the clock board) via led(4).
All boards have at least 3 controllable status LEDs, 'Power', 'Failure'
and 'Cycling'. While the 'Cycling' LED is suitable for signaling from
the OS the others are better off being controlled by the firmware.
The device name for the 'Cycling' LED of each board is /dev/led/boardX
where X is the physical slot number of the board. [1]
Obtained from: OpenBSD [1]
Tested by: joerg [1]
bus_generic_rl_release_resource() for the bus_release_resource() method
instead of a local copy.
- Correctly handle pass-through allocations in central_alloc_resource().
This is mentioned in the Handbook but it is not as obvious to new
users why bpf is needed compared to the other largely self-explanatory
items in GENERIC.
PR: conf/40855
MFC after: 1 week
last in the list rather than first.
This makes the resouces print in the 4.x order rather than the 5.x order
(eg fdc0 at 0x3f0-0x3f5,0x3f7 is 4.x, but 0x3f7,0x3f0-0x3f5 is 5.x). This
also means that the pci code will once again print the resources in BAR
ascending order.
w/o problems than I was before... This simply brings back the knote_delete
as knlist_delete which will also drop the knote's, instead of just clearing
the list and seeing _ONESHOT...
Fix a race where if a note was _INFLUX and _DETACHED, it could end up being
modified... whoopse..
MFC after: 1 week
Prodded by: ambrisko and dwhite
same value as the previous ioctls so no binary change. Also, make a few
style changes to reduce diffs to my tree.
Loosely based on code from: Hans Petter Selasky
system have been attached, but no later. This ensures that we do
not explore ohci or uhci busses before the companion echi controller
has been initialised, so it should fix the problem of multi-speed
USB devices getting attached as USB 1 devices first and then
re-attached as USB 2.
Some further changes are needed on architectures that do not currently
allow hooks to be inserted before configure_final() - alpha, ia64,
powerpc and sparc64. On these architectures the exploration will
now be delayed until the usb kthread runs.
alignment restrictive, and help performance on some ethernet cards which
currently copy the entire packet a couple bytes to get the packet aligned
properly...
Wordsmithing by: dwhite
Obtained from: NetBSD (code only)
I'll clean it up later: rwatson
in the window between the beginning of panic() and entering the debugger,
it's possible to receive interrupts. If we receive an interrupt, don't
preempt if panicstr != NULL, as the system is in the process of failing, and
the preempting thread is likely to stumble over the failure. The typical
scenario is during the printf() in panic() prior to entering the debugger,
but when running with a slower console type such as serial console.
It could be that the panic string should be passed to the debugger to print,
so that it can run from the debugger's environment rather than a regular
kernel printf.
Glanced at by: jhb
Do our best to plug some memory leaks (VPD data, jumbo memory buffer,...).
Log if we cannot free because memory still in use[1].
Change locking to avoid ''acquiring duplicate lock of same
type: "network driver"'' and potential deadlock. Also seems to fix LOR #063.
[1] This change does not solve problems if buffers are still in use when
unloading if_sk.ko. There is ongoing work which will address jumbogram
allocations in a more general way.
PR: kern/75677 (with changes, no mii fixes in here)
Tested by: net, Antoine Brodin (slightly different version)
Approved by: rwatson (mentor)
MFC after: 5 days
Obtained from: NetBSD if_sk.c rev. 1.11
* Take PHY out of reset for Yukon Lite Rev. A3.
Submitted by: postings on net@ in thread "skc0: no PHY found", 2005-02-22
Tested by: net
Approved by: rwatson (mentor)
MFC after: 5 days
if the interface is marked RUNNING.
Obtained from: NetBSD if_sk.c rev. 1.12
* Don't initialize the card (and start an autonegotiation) every time the IP
address changes. Makes 'dhclient sk0' invocations way faster and more
consistant. i.e. one DHCPREQUEST elicits the DHCPACK.
Obtained from: OpenBSD if_sk.c rev. 1.56
* Additional locking changes in sk_ioctl.
PR: kern/61296 should see improvements by the last two.
Approved by: rwatson (mentor)
MFC after: 5 days
doing that in bfe_stop().
This should fix a panic recently reported on -current occuring when taking
device down then up. In the original implementation, an "ifconfig bfe0 down"
triggers bfe_stop(), which also destroys all TX/RX descriptor dmamaps. Hence
the subsequent "ifconfig bfe0 up" would force the device to use those
already-released dmamap and thus panic the kernel.
PR: kern/77804
Submitted by: Frank Mayhar <frank at exit dot com>
Reviewed by: dmlb, sam (mentor)
Tested by: Phil <pcasidy at casidy dot com>, myself
MFC after: 1 week
session in tprintf(). SESSRELE() needs to properly dispose of the
sessions mutex.
Add sessrele() which does the proper cleanup and have SESSRELE() call it.
Use SESSRELE also in pgdelete().
Found by: Coverity (ID:526)
a given page and, if the pmap is the current pmap, write back the associated
cache line.
Use pmap_wb_page in pmap_qenter() instead of inconditionally
write back/invalidating the data cache.
o Use IP_NPX in preference to hard coded value to write 0 to clear busy#
o Use md macro for a full reset of the npx
o Use IRQ_NPX in preference to hard coded value for each platform.
# The other two ifdefs in this file are hard to remove
rman_resource_resournce_bound wrt end parameter. The end parameter
here was the same as the start. However, it should be start + count -
1, so make it that instead.
it to get better hashing in vfs_hash.
In case of an insert collision in vfs_hash_insert(), put the loosing vnode
on a special list so that vfs_hash_remove() can just assume that it is on
a list.
Drop the VI_HASHED flag.
where having this disabled was actually hurting us, since so many
BIOSes include legacy USB emulation that takes control of all usb
ports and only the ehci driver knows how to disable it.
instead of failing.
When looking for a region to allocate, we used to check to see if the
start address was < end. In the case where A..B is allocated already,
and one wants to allocate A..C (B < C), then this test would
improperly fail (which means we'd examine that region as a possible
one), and we'd return the region B+1..C+(B-A+1) rather than NULL.
Since C+(B-A+1) is necessarily larger than C (end argument), this is
incorrect behavior for rman_reserve_resource_bound().
The fix is to exclude those regions where r->r_start + count - 1 > end
rather than r->r_start > end. This bug has been in this code for a
very long time. I believe that all other tests against end are
correctly done.
This is why sio0 generated a message about interrupts not being
enabled properly for the device. When fdc had a bug that allocated
from 0x3f7 to 0x3fb, sio0 was then given 0x3fc-0x404 rather than the
0x3f8-0x3ff that it wanted. Now when fdc has the same bug, sio0 fails
to allocate its ports, which is the proper behavior. Since the probe
failed, we never saw the messed up resources reported.
I suspect that there are other places in the tree that have weird
looping or other odd work arounds to try to cope with the observed
weirdness this bug can introduce. These workarounds should be located
and eliminated.
Minor debug write fix to match the above test done as well.
'nice' by: mdodd
Sponsored by: timing solutions (http://www.timing.com/)
I think all we really need is -fno-sse2.
I really don't like cluttering up the compiler invocation,
but this bigger hammer will fix reported problems for now.
to mistakes from day 1, it has always had semantics inconsistent with
SVR4 and its successors. In particular, given argument M:
- On Solaris and FreeBSD/{alpha,sparc64}, it clobbers the old flags
and *sets* the new flag word to M. (NetBSD, too?)
- On FreeBSD/{amd64,i386}, it *clears* the flags that are specified in M
and leaves the remaining flags unchanged (modulo a small bug on amd64.)
- On FreeBSD/ia64, it is not implemented.
There is no way to fix fpsetsticky() to DTRT for both old FreeBSD apps
and apps ported from other operating systems, so the best approach
seems to be to kill the function and fix any apps that break. I
couldn't find any ports that use it, and any such ports would already
be broken on FreeBSD/ia64 and Linux anyway.
By the way, the routine has always been undocumented in FreeBSD,
except for an MLINK to a manpage that doesn't describe it. This
manpage has stated since 5.3-RELEASE that the functions it describes
are deprecated, so that must mean that functions that it is *supposed*
to describe but doesn't are even *more* deprecated. ;-)
Note that fpresetsticky() has been retained on FreeBSD/i386. As far
as I can tell, no other operating systems or ports of FreeBSD
implement it, so there's nothing for it to be inconsistent with.
PR: 75862
Suggested by: bde
- Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so
no one else attempts to grow a dependency on them.
- Now that objects with pages hold the vnode we don't have to do unlocked
checks for the page count in the vm object in VSHOULDFREE. These three
macros could simply check for holdcnt state transitions to determine
whether the vnode is on the free list already, but the extra safety
the flag affords us is probably worth the minimal cost.
- The leafonly sysctl and code have been dead for several years now,
remove the sysctl and the code that employed it from vtryrecycle().
- vtryrecycle() also no longer has to check the object's page count as
the object holds the vnode until it reaches 0.
Sponsored by: Isilon Systems, Inc.
is inserted.
- In vm_page_remove() drop the backing vnode when the last page
is removed.
- Don't check the vnode to see if it must be reclaimed on every
call to vm_page_free_toq() as we only check it now when it is
actually required. This saves us two lock operations per call.
Sponsored by: Isilon Systems, Inc.
that they set v->v_vnlock. This is true for all filesystems in the
tree.
- Remove all uses of LK_THISLAYER. If the lower layer is locked, the
null layer is locked. We only use vget() to get a reference now.
null essentially does no locking. This fixes LOOKUP_SHARED with
nullfs.
- Remove the special LK_DRAIN considerations, I do not believe this is
needed now as LK_DRAIN doesn't destroy the lower vnode's lock, and
it's hardly used anymore.
- Add one well commented hack to prevent the lowervp from going away
while we're in it's VOP_LOCK routine. This can only happen if we're
forcibly unmounted while some callers are waiting in the lock. In
this case the lowervp could be recycled after we drop our last ref
in null_reclaim(). Prevent this with a vhold().
that we free that resource. All the other resources are freed in
their own routine, but since we haven't saved a pointer to this one,
it is leaked. This is the failure case that lead to the sio ports
that weren't working, I think.
sys/bus_dma.h instead of being copied in every single arch. This slightly
reorders a flag that was specific to AXP and thus changes the ABI there.
The interface still relies on bus_space definitions found in <machine/bus.h>
so it cannot be included on its own yet, but that will be fixed at a later
date. Add an MD <machine/bus_dma.h> for ever arch for consistency and to
allow for future MD augmentation of the API. sparc64 makes heavy use of
this right now due to its different bus_dma implemenation.
as suggested by Matt's comment. Also fix some style and paranoia issues.
The entire function could benefit from review by a VFS guru.
MFC after: 6 weeks
Since we used an sbuf of size resid to accumulate dirents, we would end
up returning one byte short when we had enough dirents to fill or exceed
the size of the sbuf (the last byte being lost to bogus NUL termination)
causing the next call to return EINVAL due to an unaligned offset. This
went undetected for a long time because I did most of my testing in
single-user mode, where there are rarely enough processes to fill the
4096-byte buffer ls(1) uses. The most common symptom of this bug is that
tab completion of /proc or /compat/linux/proc does not work properly when
many processes are running.
Also, a check near the top would return EINVAL if resid was smaller than
PFS_DELEN, even if it was 0, which is frequently the case and perfectly
allowable. Change the test so that it returns 0 if resid is 0.
MFC after: 2 weeks
to get from (mount + inode) to vnode. These tables are mostly
copy&pasted from UFS, sized based on desiredvnodes and therefore
quite large (128K-512K). Several filesystems are buggy enough that
they allocate the hash table even before they know if they will
ever be used or not.
Add "vfs_hash", a system wide hash table, which will replace all
the per-filesystem hash-tables.
The fields we add to struct vnode will more or less be saved in
the respective filesystems inodes.
Having one central implementation will save code and will allow us
to justify the complexity of code to dynamically (re)size the hash
at a later point.
to use only the holdcnt to determine whether a vnode may be recycled,
simplifying the V* macros as well as vtryrecycle(), etc.
Sponsored by: Isilon Systems, Inc.
vtryrecycle(). All obj refs also ref the vnode.
- Consistently use v_incr_usecount() to increment the usecount. This will
be more important later.
Sponsored by: Isilon Systems, Inc.
cleared if the host controller retries the transfer and is successful,
but we were interpreting these bits as indicating a fatal error.
Ignore these error bits, and instead use the HALTED bit to determine
if the transfer failed. Also update the USBD_STALLED detection to
ignore these bits.
Obtained from: OpenBSD
the filesystem. Check that rather than VI_XLOCK.
- VOP_INACTIVE should no longer drop the vnode lock.
- The vnode lock is required around calls to vrecycle() and vgone().
Sponsored by: Isilon Systems, Inc.
vnode lock. Remove the c_lock and use the vn lock in its place.
- Keep the coda lock functions so that the debugging information is
preserved, but call directly to the vop_std*lock routines for the real
functionality.
Sponsored by: Isilon Systems, Inc.
the filesystem. Check that rather than VI_XLOCK.
- Shorten ffs_reload by one step. The old check for an inactive vnode
was slightly racey, and the code which deals with still active vnodes
is not much more expensive.
Sponsored by: Isilon Systems, Inc.
required.
- In ufs_close(), don't do the EAGAIN vrele hack, the top layer now calls
vn_start_write before the lock is acquired as it should.
Sponsored by: Isilon Systems, Inc.
- Also in ufs_inactive, don't acquire the vnode interlock where it isn't
strictly needed. Also owning the vnode interlock while calling vprint()
will cause locking assertions to trip.
Sponsored by: Isilon Systems, Inc.
on an unlinked file. We can't know if this is the case until after we
have the lock.
- Lock the vnode in vn_close, many filesystems had code which was unsafe
without the lock held, and holding it greatly simplifies vgone().
- Adjust vn_lock() to check for the VI_DOOMED flag where appropriate.
Sponsored by: Isilon Systems, Inc.
- Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do
writing ops without suspending. This could suspend the vlruproc which
should not be a problem under normal circumstances.
- Manually implement VMIGHTFREE in vlrureclaim as this was the only instance
where it was used.
- Acquire a lock before calling vgone() as it now requires it.
- Move the acquisition of the vnode interlock from vtryrecycle() to
getnewvnode() so that if it fails we don't drop and reacquire the
vnode_free_list_mtx.
- Check for a usecount or holdcount at the end of vtryrecycle() in case
someone grabbed a ref while we were recycling. Abort the recycle, and
on the final ref drop this vnode will be placed on the head of the free
list.
- Move the redundant VOP_INACTIVE protection code into the local
vinactive() routine to avoid code bloat.
- Keep the vnode lock held across calls to vgone() in several places.
- vgonel() no longer uses XLOCK, instead callers must hold an exclusive
vnode lock. The VI_DOOMED flag is set to allow other threads to detect
a vnode which is no longer valid. This flag is set until the last
reference is gone, and there are no chances for a new ref. vgonel()
holds this lock across the entire function, which greatly simplifies
logic.
_ Only vfree() in one place in vgone() not three.
- Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock
in the LK_NOWAIT case. In other cases, check after we have slept and
acquired an exlusive lock. This will simulate the old vx_wait()
behavior.
Sponsored by: Isilon Systems, Inc.
against recycling.
- Modify VSHOULDFREE, VCANRECYCLE, etc. now that certain flags are no
longer important. Remove VMIGHTFREE as it is only used in one place.
Sponsored by: Isilon Systems, Inc.
between passes over a QH. Previously the accesses to a QH were
bunched together in time, so the interval was often much longer
than intended. This now appears to match the diagrams in the EHCI
spec, so remove the XXX comment.
to stll be able to mount NFS root as prescribed by DCHP configuration. Since
pxeboot is using TFTP to get to the files, pxeboot can not rely on NFS to
provide it a root directory hande as a side effect. pxeboot has to make RPC
mount call itself.
a serial console anyway because input-device is set to keyboard and
output-device is set to screen but no keyboard is plugged in don't
assume that a device node for the input-device alias exists. While
this is true for RS232 keyboards (the node of the SCC and UART
respectively which controls the keyboard doesn't disappear when no
keyboard is plugged in) this assumption breaks for USB keyboards.
It's most likely also not true for PS/2 keyboards but OFW doesn't
reliably switch to a serial console when the potential keyboard is
a PS/2 one which isn't plugged in so this couldn't be verified
properly.
Reported by: Will Andrews <will@csociety.org>, obrien
MFC after: 1 week
so that the socket lock is held over the test-and-set removal of the
accept filter option during connect, and the two socket mutex regions
(transition to connected, perform accept filter) are combined.
from uipc_socket.c to uipc_accf.c in do_getopt_accept_filter(), so that it
now matches do_setopt_accept_filter(). Slightly reformulate the logic to
match the optimistic allocation of storage for the argument in advance,
and slightly expand the coverage of the socket lock.
OR the physical address with alpha_XXX_dmamap_or to get the DMA address,
like the name of the variable suggests. However, while we were doing
this correctly in the alpha_XXX_dmamap() macro, the busdma code added
the variable to the physical address instead of or'ing it. Fortunately
and if my math is not entirely wrong, you would need more than 128GB of
RAM and a device able to do DMA in 64bits to experience the bug.
Spotted by: cognet
long filename. Each substring is indexed by the windows ID, a
sequential one-based value. The previous code was extremely slow,
doing a malloc/strcpy/free for each substring.
This code optimizes these routines with this in mind, using the ID
to index into a single array and concatenating each WIN_CHARS chunk
at once. (The last chunk is variable-length.)
This code has been tested as working on an FS with difficult filename
sizes (255, 13, 26, etc.) It gives a 77.1% decrease in profiled
time (total across all functions) and a 73.7% decrease in wall time.
Test was "ls -laR > /dev/null".
Per-function time savings:
mbnambuf_init: -90.7%
mbnambuf_write: -18.7%
mbnambuf_flush: -67.1%
MFC after: 1 month
socket lock around knlist_init(), so don't.
Hard code the setting of the socket reference count to 1 rather than
using soref() to avoid asserting the socket lock, since we've not yet
exposed the socket to other threads.
This removes two mutex operations from each socket allocation.
so that the socket does not generate SIGPIPE, only EPIPE, when a write
is attempted after socket shutdown. When the option was introduced in
2002, this required the logic for determining whether SIGPIPE was
generated to be pushed down from dofilewrite() to the socket layer so
that the socket options could be considered. However, the change in
2002 omitted modification to soo_write() required to add that logic,
resulting in SIGPIPE not being generated even without SO_NOSIGPIPE when
the socket was written to using write() or related generic system calls.
This change adds the EPIPE logic to soo_write(), generating a SIGPIPE
signal to the process associated with the passed uio in the event that
the SO_NOSIGPIPE option is not set.
Notes:
- The are upsides and downsides to placing this logic in the socket
layer as opposed to the file descriptor layer. This is really fd
layer logic, but because we need so_options, we have a choice of
layering violations and pick this one.
- SIGPIPE possibly should be delivered to the thread performing the
write, not the process performing the write.
- uio->uio_td and the td argument to soo_write() might potentially
differ; we use the thread in the uio argument.
- The "sigpipe" regression test in src/tools/regression/sockets/sigpipe
tests for the bug.
Submitted by: Mikko Tyolajarvi <mbsd at pacbell dot net>
Talked with: glebius, alfred
PR: 78478
MFC after: 1 week
to syncrhonize access to the data as a result. This makes the pps
less likely to miss the 1ms pulse that I'm feeding it, but not
entirely reliable yet on my 133MHz P5.
Reviewed by: phk
unknown (since my sony vaio didn't :-(.
Instead, fix the problem described by 1.49 in a different way: just
add the two calls I'd hoped I'd avoid in 1.49 by doing the (wrong)
gymnastics there. While 1.49 is a good direction to go in, each step
of the way should work :-(.
resources. When allocating 6 ports for a 4 port range isa code
returns an error. I'm not sure yet why this is the case, but suspect
it is just a non-regularity in how the resource allocation code works
which should be corrected. Use 1 as the ports size in this case.
However, in the hints case, we have to specify the length, so use 6 in
that case. I believe that this is also acpi friendly.
Also, complain when we can't allocate FDOUT register space. Right now we
silently fail when we can't. This failure is referred to above.
When there's no resource for FDCTL, go ahead and allocate one by hand.
Many PNPBIOS tables don't list this resource, and our hints mechanism also
doesn't cover that range. If we can't allocate it, whine, but fake up
something. Before, we were always bogusly faking it and no one noticed
the sham (save the original author who has now fixed his private shame).
per-connection and globally. This eliminates potential DoS attacks
where SACK scoreboard elements tie up too much memory.
Submitted by: Raja Mukerji (raja at moselle dot com).
Reviewed by: Mohan Srinivasan (mohans at yahoo-inc dot com).
asks that each buffer be (2048 * 256) bytes long. I suspect that alignment
isn't a real requirement since busdma only recently started honoring it. The
size is also bogus. Fix both of these and stop busdma from trying to
exhaust the system memory pool with bounce pages.
Submitted by: Kevin Oberman
MFC After: 7 days
- Fix a bug in the same condition where we forgot to drop the ACPI pcib
lock. This fixes hangs after the pcib0 attach on some machines.
Tested by: sos (2)
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.
Suggested by: alfred
Add support for passing in a mutex. If NULL is passed a global
subr_unit mutex is used.
Add alloc_unrl() which expects the mutex to be held.
Allocating a unit will never sleep as it does not need to allocate
memory.
Cut possible range in half so we can use -1 to mean "out of number".
Collapse first and last runs into the head by means of counters.
This saves memory in the common case(s).
ever working correctly: the code was linking the QHs together but
then immediately overwriting the "next" pointers. Oops. Also
initialise qh_endphub, since the EHCI spec says that we should
always set the pipe multiplier field to something sensible.
This appears to make basic split transactions work, so enable split
transactions for control, bulk and interrupt pipes (split isochronous
transfers are not yet implemented). It should now be possible to
use USB1 devices even when they are connected through a USB2 hub.
seem to be necessary anymore, and it prevents tasting a valid drive
when booting with geom_vinum already loaded, since SCSI disks set their
sectorsize not until first opening them.
from an mbuf into the fxp_encap() function, as done in other drivers.
- Don't waste time calling bus_dmamap_load_mbuf() if we know the mbuf
chain is too long to fit in a TX descriptor, call m_defrag() first.
- Convert fxp(4) to use bus_dmamap_load_mbuf_sg().
for the duration of the send() call. Such approach may be less than ideal
in threading environment, when several threads share the same socket and it
might happen that several of them are calling linux_send() at the same time
with and without SO_NOSIGPIPE set.
However, such race condition is very unlikely in practice, therefore this
change provides practical improvement compared to the previous behaviour.
PR: kern/76426
Submitted by: Steven Hartland <killing@multiplay.co.uk>
MFC after: 3 days
at some point result in a status event being triggered (it should
be a link down event: the Microsoft driver design guide says you
should generate one when the NIC is initialized). Some drivers
generate the event during MiniportInitialize(), such that by the
time MiniportInitialize() completes, the NIC is ready to go. But
some drivers, in particular the ones for Atheros wireless NICs,
don't generate the event until after a device interrupt occurs
at some point after MiniportInitialize() has completed.
The gotcha is that you have to wait until the link status event
occurs one way or the other before you try to fiddle with any
settings (ssid, channel, etc...). For the drivers that set the
event sycnhronously this isn't a problem, but for the others
we have to pause after calling ndis_init_nic() and wait for the event
to arrive before continuing. Failing to wait can cause big trouble:
on my SMP system, calling ndis_setstate_80211() after ndis_init_nic()
completes, but _before_ the link event arrives, will lock up or
reset the system.
What we do now is check to see if a link event arrived while
ndis_init_nic() was running, and if it didn't we msleep() until
it does.
Along the way, I discovered a few other problems:
- Defered procedure calls run at PASSIVE_LEVEL, not DISPATCH_LEVEL.
ntoskrnl_run_dpc() has been fixed accordingly. (I read the documentation
wrong.)
- Similarly, the NDIS interrupt handler, which is essentially a
DPC, also doesn't need to run at DISPATCH_LEVEL. ndis_intrtask()
has been fixed accordingly.
- MiniportQueryInformation() and MiniportSetInformation() run at
DISPATCH_LEVEL, and each request must complete before another
can be submitted. ndis_get_info() and ndis_set_info() have been
fixed accordingly.
- Turned the sleep lock that guards the NDIS thread job list into
a spin lock. We never do anything with this lock held except manage
the job list (no other locks are held), so it's safe to do this,
and it's possible that ndis_sched() and ndis_unsched() can be
called from DISPATCH_LEVEL, so using a sleep lock here is
semantically incorrect. Also updated subr_witness.c to add the
lock to the order list.
- "options" is followed by the characters \040\011, not \011\011.
Correct both my own sins and those of others.
- Comment blocks start and end with an empty line ^#$.
- Remove non-standard comments added in my last commit.
Requested by: njl
Correctness confirmed by: bde
since there are often significant holes in the memory map due to the
kernel, loader and OFW data structures not being included: Maxmem is
the highest available, so can be misleading.
them all, otherwise the driver will be useless and will only confuse user
as manual page says nothing about the need to enable one of those frame
types explicitly in the kernel config.
PR: kern/47152
Submitted by: Andriy Gapon <avg@icyb.net.ua>
MFC after: 3 days
as we can't rely on a trap happening, as it is done normally.
While I'm there, uncomment the call to cpu_dcache_wbinv_range() in
pmap_kenter_internal, as we don't call cpu_dcache_wbinv_all() there anymore.
asynchronously by different threads. Thus, declare as volatile the
reference count that is accessed through m_ext's pointer, ref_cnt.
Revert the previous change, revision 1.144, that casts as volatile a
single dereference of ref_cnt.
Reviewed by: bmilekic, dwhite
Problem reported by: kris
MFC after: 3 days
for a signal, because kernel stack is swappable, this causes page fault
in kernel under heavy swapping case. Fix this bug by eliminating unneeded
code.
Change fhc(4) to use IRQ numbers instead of RIDs for allocating the
IRQs of children. This works similar to e.g. sbus(4), i.e. add the
IRQ resources as fully specified to the resource lists of the children,
allocate them like normal. When establishing the interrupt search the
interrupt maps of the children for a matching INO to determine which
map we need to write the fully specified interrupt number to and to
enable the mapping (before the RID was used to indicate which interrupt
map to use).
- dev/puc/puc.c:
Revert rev. 1.38, with the above change fhc(4) no longer needs special
treatment for allocating IRQs.
Thanks to: joerg for providing access to an E3500
modulating the STPCLK# pin based on the duty cycle. Since p4tcc uses the
same mechanism (but internal to the CPU), we triggered a hang on some
systems at low frequencies when both were in use. Now, disable
acpi_throttle when p4tcc is also present.
Tested by: Kevin Oberman
- Use FBSDID.
- Remove unused macro.
- Use auto-generated typedefs for the prototypes of the bus and device
interface functions.
- Terminate the output of device_printf(9) with a newline char.
- Honour the return values of malloc(), OF_getprop(), etc.
- Use __func__ instead of hardcoded function names.
- Print the physical slot number and the board model on attach.
MFC after: 1 month
- Use FBSDID.
- Remove an unused include.
- Use auto-generated typedefs for the prototypes of the device interface
functions.
- Terminate the output of device_printf(9) with a newline char.
- Honour the return value of malloc(3).
MFC after: 1 month
with no associated data. Also revert previous changes that allocate off
of the stack instead of using malloc, as it's not needed. Many thanks to
LSI for investigating and fixing these problems.
Submitted by: rajeshpr @ lsil . com
panic with the NDISulator if you did "ifconfig ndis0 10.0.0.1/24,"
whereas "ifconfig ndis0 10.0.0.1/24 up" worked fine. The double fault
was caused by the ifconfig thread running out of kernel stack space.
(This was partly due to the NDIsulator using a couple of big buffers on
the stack, but even after fixing that the double fault persisted.)
It turns out that ndis_init() is called in both cases, but in the first
case the code path passes through ieee80211_ioctl(), and it turns out
ieee80211_ioctl() consumes a whopping 2400 bytes of stack space.
Apparently, gcc -O2 causes the ieee80211_ioctl_get80211() routine to
be inlined into ieee80211_ioctl(), and for some reason which I do not
fully understand, this causes ieee80211_ioctl() to consume an extra 2K
of stack space.
To prevent this overly agressive optimization, ieee80211_ioctl_get80211()
is now declared with __attribute__ ((noinline)). With this change,
ieee80211_ioctl() now only reserves about 200 bytes of stack instead of 2400.
create kernel threads and call rfork(2) with RFTHREAD flag set in this case,
which puts parent and child into the same threading group. As a result
all threads that belong to the same program end up in the same threading
group.
This is similar to what linuxthreads port does, though in this case we don't
have a luxury of having access to the source code and there is no definite
way to differentiate linux_clone() called for threading purposes from other
uses, so that we have to resort to heuristics.
Allow SIGTHR to be delivered between all processes in the same threading
group previously it has been blocked for s[ug]id processes.
This also should improve locking of the same file descriptor from different
threads in programs running under linux compat layer.
PR: kern/72922
Reported by: Andriy Gapon <avg@icyb.net.ua>
Idea suggested by: rwatson
precision when IP packet may travel through internet for several seconds.
Also uptime measured in milliseconds overflows every 48+ days.
But we have to do same to keep compatibility with Cisco and flow-tools.
Make a macro MILLIUPTIME, which does overflowable multiplication to 1000.
Requested by: Sergey Ryabin, Oleg Bulyzhin
MFC after: 1 week
both consuming 1K of stack space. This is unfriendly. Allocate the buffers
off the heap instead. It's a little slower, but these aren't performance
critical routines.
Also, add a spinlock to NdisAllocatePacketPool(), NdisAllocatePacket(),
NdisFreePacketPool() and NdisFreePacket(). The pool is maintained as a
linked list. I don't know for a fact that it can be corrupted, but why
take chances.
a libalias application (e.g. natd, ppp, etc.) to crash. Note: Skinny support
is not enabled in natd or ppp by default.
Approved by: secteam (nectar)
MFC after: 1 day
Secuiryt: This fixes a remote DoS exploit
call vector which was added in rev. 1.52. This change was done way before
sparc64 switched to a 64-bit time_t so all binaries are expected to have
been recompiled by now.
aid for ABI breakages caused by system call changes. These changes were
done way before sparc64 switched to a 64-bit time_t so all binaries are
expected to have been recompiled by now.
a vlan interface attached to a fxp(4) card when it has not been
initialized yet. We now set the links from our internel TX descriptor
structure to the TX command blocks at attach time rather than at init
time. While I'm here, slightly improve the style in fxp_attach().
PR: kern/78112
Reported by: Gavin Atkinson <gavin.atkinson@ury.york.ac.uk> and others
Tested by: flz, Gavin Atkinson <gavin.atkinson@ury.york.ac.uk>
MFC after: 1 week
place.
This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.
By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.
Submitted by: netchild
Reviewed by: various developers on arch@, some time ago
We need to be able to test for the (possible) non-existence of the
FPSWA code.
PR: ia64/77591
Submitted by: Christian Kandeler (christian dot kandeler at hob dot de)
MFC after: 1 day
set the interrupt handler to be INTR_MPSAFE now that xpt_done() can be
called without Giant. Giant is still on the top half of the driver and
the timeout handlers.
timer case:
- Remove the virtual fooclock interrupt counters as they have served their
purpose.
- Adjust the dividers for the different clock such that profhz is now a
multiple of stathz as in the non-lapic case, and the timer now runs at
hz * 2 rather than hz * 3. With the new divisors, the default clock
rates are:
kern.clockrate: { hz = 1000, tick = 1000, profhz = 666, stathz = 133 }
sleeping, so in do_tdsignal, we no longer need to test td_waitset.
now td_waitset is only used to give a thread higher priority when
delivering signal to multithreads process.
This also fixes a bug:
when a thread in sigwait states was suspended and later resumed
by SIGCONT, it can no longer receive signals belong to waitset.
with shared IRQs in case the bus code, MD interrupt code, etc. permits.
Together with sys/sparc64/sparc64/intr_machdep.c rev. 1.21 this fixes
an endless loop in uart_intr() when using the second NS16550 on the ISA
bus of sparc64 machines.
- Destroy the hardware mutex on detach and in case attaching fails.
Approved by: marcel
for this are the on-board SCCs and UARTs that use a shared IRQ. [1]
- Rework the interrupt counting code to account for shared interrupts. [1]
- In case ithread_add_handler() failed in inthand_add() just return with
the error code instead of setting up a non-fast handler regardless or
setting up a non-fast handler instead of a fast handler. I can't think
of a situation where the former behaviour would do the right thing.
Reviewed by: marcel [1]
Based on: sys/i386/i386/intr_machdep.c [1]
- Use FBSDID.
- Use uintXX_t instead of u_intXX_t.
- Be consistent with white-space.
- Mark some globals as static.
- Add a missing prototype.
- Remove a unused variable.
- etc.
Failure to do this will result in following ata_pio_read() calls walking
off the end of the read buffer.
This resolves the "memory modified after free" panics common with Thinkpads
and CD/DVD drives.
Submitted by: Nate Lawson <nate AT root.org>
idle the 'mask' variable could be set to 0, resulting in the timeout loop
running for the full 31 seconds.
Handling this case eliminates long hangs on resume on some systems.
Submitted by: Nate Lawson <nate AT root.org>
and the X1034A (quad HME; QFE) cards the X1033A (single HME) don't have a
PCI-PCI-bridge so we can't rely on the PCI slot number being useable as
index for the network address to read from the VPD on the latter. Use
the end tag to determine whether it is a QFE VPD with 4 NAs and only use
the slot number as index in this case.
- Remove a useless check.
Prodded by: joerg
Additional testing by: joerg
MFC after: 1 day
malloc(sizeof(device_object), ...) by mistake. Correct this, and
rename "dobj" to "drv" to make it a bit clearer what this variable
is supposed to be.
Spotted by: Mikore Li at Sun dot comnospamplzkthx
attached to a parent interface we use its mutex to lock the softc. This
means that in several places like carp_ioctl() we lock softc conditionaly.
This should be redesigned.
To avoid LORs when MII announces us a link state change, we schedule
a quick callout and call carp_carpdev_state_locked() from it.
Initialize callouts using NET_CALLOUT_MPSAFE.
Sponsored by: Rambler
Reviewed by: mlaier
- In carp_send_ad_all() walk through list of all carp interfaces
instead of walking through list of all interfaces.
Sponsored by: Rambler
Reviewed by: mlaier
its return value and free resources if function returns error. Plug
several memory leaks with this change.
Submitted by: archie
Found by: Coverity Prevent analysis tool
o usb_subr.c, add delta 1.119:
Move usb_get_string() and make it public.
o usbdi.c, bring on par with 1.106, this includes:
- Make an iterator abstraction for looping through all descriptors.
- Whine about not being able to figure out default language if we are debugging.
- Move usb_get_string() and make it public.
o usbdi.h, bring on par with 1.64, this includes:
- Make an iterator abstraction for looping through all descriptors.
- Move usb_get_string() and make it public.
o usbdi_util.c, bring on par with 1.42, this includes:
- Add usbd_get_protocol().
- Use NULL instead of 0.
- Fix (mostly harmless) typo.
- Move utility routine from uirda.c to usbdi_util.c.
o usbdi_util.h, bring on par with 1.31, this includes:
- Add usbd_get_protocol().
- Move utility routine from uirda.c to usbdi_util.c.
MFC after: 3 days
is particularly useful when VESA is available (either `options VESA'
or load the vesa module), as BIOSes in some notebooks may correctly
save and restore LCD panel settings using VESA in cases where calling
the video BIOS POST is not effective. On some systems it may also
be necessary to set the hw.acpi.reset_video sysctl to 0.
Also save the DAC state, increase the maximum save state size from
4k to 8k, and refuse to save the VESA state if the BIOS reports it
is larger than the maximum size we can handle.
It doesn't appear that anything currently uses this code, but it
turns out to be capable of restoring some notebook displays to a
working state after a suspend-resume cycle.
if either INVARIANTS or INVARIANT_SUPPORT is defined so that kernel modules
that want to use mtx_assert() only need to define INVARIANTS.
MFC after: 1 week
attempting to change the BPF filter on a BPF descriptor at the same
time: retrieve the old filter pointer under the same locked region
as setting the new pointer.
MFC after: 3 days
- Use our loop DLT type, not OpenBSD. [1]
- The fields that are converted to network byte order are not 32-bit
fields but 16-bit fields, so htons should be used in htonl. [1]
- Secondly, ip_input changes ip->ip_len into its value without
the ip-header length. So, restore the length to make bpf happy. [1]
- Use bpf_mtap2(), use temporary af1, since bpf_mtap2 doesn't
understand uint8_t af identifier.
Submitted by: Frank Volf [1]
yet I only changed one of them. So when we loaded drivers, we'd fail
to allocate resources correct.
This pointed out that we were doing the wrong thing when we failed to
attach a child. We released all the resources and almost deleted the
child. Instead, we should keep the resources allocated so when/if a
driver is loaded, we can go w/o having to allocate them. We use
pci_cfg_save/restore to restore the BARs with these resources.
This seems to fix the problems that we were seeing that I thought
might have magically gone away in the last revision of cardbus.c (but
really didn't).
Noticed by: avatar (nicely done!)
shared-last-sector problem.
After this change, even if there is more than one provider with the same
last sector, the proper one will be chosen based on its size.
It still doesn't fix the 'c' partition problem (when da0s1 can be confused
with da0s1c) and situation when 'a' partition starts at offset 0
(then da0s1a can be confused with da0s1 and da0s1c). One can use '-h'
option there, when creating device or avoid sharing last sector.
Actually, when providers share the same last sector and their size is equal,
they provide exactly the same data, so the name (da0s1, da0s1a, da0s1c)
isn't important at all.
- Provide backward compatibility.
- Update copyright's year.
MFC after: 1 week
- Add debug.watchdog tunable, so we can specify watchdog CPU from loader
which will help to debug hangs on boot.
- Remove 'U' from debug.watchdog sysctl definition, so if we set it to '-1'
it really shows '-1'.
- Fix comment.
ignore the sack options in that segment. Else we'd end up
corrupting the scoreboard.
Found by: Raja Mukerji (raja at moselle dot com)
Submitted by: Mohan Srinivasan
m_pullup. icmp6_notify_error continued to use the old pointer,
which after the m_pullup is not suitable as a packet header any
longer (see m_move_pkthdr).
and this is what causes the kernel panic in sbappendaddr later on.
PR: kern/77934
Submitted by: Gerd Rausch <gerd@juniper.net>
MFC after: 2 days
code requires it to be 0 when a jumbo payload option is contained.
PR: kern/77934
Submitted by: Gerd Rausch <gerd@juniper.net>
Obtained from: KAME
MFC after: 2 days
testing it to know whether we should enable the 82503 serial mode...
Move code to the right location and disallow the use of the 82503
serial mode if the sc->revision field is 0 again. This makes fxp(4)
work correctly with ATMEL 350 93C46 cards (3 port 82559 based with a
82555 PHY), as well as with the older ATMEL 220 93C46 (same flavour)
and with the even older 10Mbps-only 82557 cards with the 82503 serial
interface.
Tested by: Andre Albsmeier <andrer@albsmeier.net>, krion
MFC after: 2 weeks
SMP systems. It appears all drivers except ichss should attach to each
CPU and that settings should be performed on each CPU. Add comments about
this. Also, add a guard for p4tcc's identify method being called more than
once.
debug.cpufreq.lowest tunable and sysctl. Some systems seem to have problems
with the lowest frequencies so setting this prevents them from being
available or used.
introducing the disk formats for _RuneLocale and friends.
The disk formats do not have (useless) pointers and have 32-bit
quantities instead of rune_t and long. (htonl(3) only works
with 32-bit quantities, so there's no loss).
Bootstrap mklocale(1) when necessary. (Bootstrapping from 4.x
would be trivial (verified), but we no longer provide pre-5.3
source upgrades and this is the first commit to actually break
it.)
ARM_TP_ADDRESS, where the tp will be stored. On CPUs that support it, a cache
line will be allocated and locked for this address, so that it will never go
to RAM. On CPUs that does not, a page is allocated for it (it will be a bit
slower, and is wrong for SMP, but should be fine for UP).
The tp is still stored in the mdthread struct, and at each context switch,
ARM_TP_ADDRESS gets updated.
Suggested by: davidxu
uart(4) to support the Zilog 8530 SCCs which hang off of a FireHose
bus on Sun E4000/E5000 class machines.
Beside the fact that a puc_fhc.c would just be a copy of puc_sbus.c
with s,sbus,fhc,g the reason why the declaration for fhc(4) was
sticked into puc_sbus.c is that both of these front-ends for puc(4)
will go away once there is a scc(4).
Discussed with: marcel
Tested by: hrs, kris
MFC after: 3 days
both a scc(4) is under way and fhc(4) will be change to use INOs this
shouldn't stay in HEAD for too long but we need a MFC-able solution
for FreeBSD 5.4.
Discussed with: marcel
Tested by: hrs, kris
MFC after: 3 days
that describe a buffer of variable size). The problem is, allocating
MDLs off the heap is slow, and it can happen that drivers will allocate
lots and lots of lots of MDLs as they run.
As a compromise, we now do the following: we pre-allocate a zone for
MDLs big enough to describe any buffer with 16 or less pages. If
IoAllocateMdl() needs a MDL for a buffer with 16 or less pages, we'll
allocate it from the zone. Otherwise, we allocate it from the heap.
MDLs allocate from the zone have a flag set in their mdl_flags field.
When the MDL is released, IoMdlFree() will uma_zfree() the MDL if
it has the MDL_ZONE_ALLOCED flag set, otherwise it will release it
to the heap.
The assumption is that 16 pages is a "big number" and we will rarely
need MDLs larger than that.
- Moved the ndis_buffer zone to subr_ntoskrnl.c from kern_ndis.c
and named it mdl_zone.
- Modified IoAllocateMdl() and IoFreeMdl() to use uma_zalloc() and
uma_zfree() if necessary.
- Made ndis_mtop() use IoAllocateMdl() instead of calling uma_zalloc()
directly.
Inspired by: discussion with Giridhar Pemmasani
clock time to uptime because wall clock time may go backwards.
This is a change in the API which will impact SNMP agents who are using
ifi_epoch to set RFC2233's ifCounterDiscontinuityTime. None are know to
exist today. This will not impact applications that are using the
<index, epoch> tuple to verify interface uniqueness except that it
eliminates a race which could lead to a false assumption of uniqueness.
Because this is a behavior change, bump __FreeBSD_version.
Discussed with: re (jhb, scottl)
MFC after: 3 days
Pointed out by: pkh (way back at EuroBSDCon)
Pointy hat: brooks
o change the mapping arrays to have a zero offset rather than base 1;
this eliminates lots of signo adjustments and brings the code
back inline with the original netbsd code
o purge use of SVR4_SIGTBLZ; SVR4_NSIG is the only definition for
how big a mapping array is
o change the mapping loops to explicitly ignore signal 0
o purge some bogus code from bsd_to_svr4_sigset
o adjust svr4_sysentvec to deal with the mapping table change
Enticed into fixing by: Coverity Prevent analysis tool
Glanced at by: marcel, jhb
owned by a process when it forks, and creates a matching set of references
for the child process, as prescribed by POSIX.
In order to avoid races with other threads in the parent process during
fork(), it is necessary to allocate a temporary reference list while
holding the sem_lock, then transfer those references to the new process
once the sem_lock is released. The implementation is inefficient but
appears functional; in order to improve the efficiency, it will be
necessary to modify the existing structures and logic, which generally
rely on O(n) operations over the global set of semaphores.
PAGE_SIZE.
Unlike originator of the PR suggests retain MAXSHELLCMDLEN definition
(he has been proposing to replace it with PAGE_SIZE everywhere), not only
this reduced the diff significantly, but prevents code obfuscation and also
allows to increase/decrease this parameter easily if needed.
PR: kern/64196
Submitted by: Magnus Bäckström <b@etek.chalmers.se>
doesn't change functionality, but makes code more logical.
Obtained from: DrafonFlyBSD
o Use VOP_GETATTR() to obtain actual size of file and parse no more than that.
Previously, we parsed MAXSHELLCMDLEN characters regardless of the actual file
size. This makes the following working:
$ printf '#!/bin/echo' > /tmp/test.sh
$ chmod 755 /tmp/test.sh
$ /tmp/test.sh
Previously, attempts to execve() that shell script has been failing with bogus
ENAMETOOLONG.
PR: kern/64196
Submitted by: Magnus B.ckstr.m <b@etek.chalmers.se>
- Simplify CARP_LOG() and making it working (we don't have addlog in FreeBSD).
- Introduce CARP_DEBUG() which logs with LOG_DEBUG severity when
net.inet.carp.log > 1
- Use CARP_DEBUG to log state changes of carp interfaces.
After CARP_LOG() cleanup it appeared that carp_input_c() does not need sc
argument. Remove it.
Sponsored by: Rambler
script. Otherwise it's possible to panic kernel by constructing a shell
script with first line not ending in '\n'.
Also, treat '\0' as line terminating character, which may me useful in
some situations.
Submitted by: gad
- store assigned PCI addresses at cninit time for later mmap range
check
- implement set_border to scrub X remnants when switching back to VTYs
- implement mmap, only allowing addresses within the range of the
console adapter.
for a given device in some circumstances, so move the PDO creation
to the attach routine so we don't end up creating two PDOs.
Also, when we skip the call to ndis_convert_res() in if_ndis.c:ndis_attach(),
initialize sc->ndis_block->nmb_rlist to NULL. We don't explicitly zero
the miniport block, so this will make sure ndis_unload_driver() does
the right thing.
when we create a PDO, the driver_object associated with it is that
of the parent driver, not the driver we're trying to attach. For
example, if we attach a PCI device, the PDO we pass to the NdisAddDevice()
function should contain a pointer to fake_pci_driver, not to the NDIS
driver itself. For PCI or PCMCIA devices this doesn't matter because
the child never needs to talk to the parent bus driver, but for USB,
the child needs to be able to send IRPs to the parent USB bus driver, and
for that to work the parent USB bus driver has to be hung off the PDO.
This involves modifying windrv_lookup() so that we can search for
bus drivers by name, if necessary. Our fake bus drivers attach themselves
as "PCI Bus," "PCCARD Bus" and "USB Bus," so we can search for them
using those names.
The individual attachment stubs now create and attach PDOs to the
parent bus drivers instead of hanging them off the NDIS driver's
object, and in if_ndis.c, we now search for the correct driver
object depending on the bus type, and use that to find the correct PDO.
With this fix, I can get my sample USB ethernet driver to deliver
an IRP to my fake parent USB bus driver's dispatch routines.
- Add stub modules for USB support: subr_usbd.c, usbd_var.h and
if_ndis_usb.c. The subr_usbd.c module is hooked up the build
but currently doesn't do very much. It provides the stub USB
parent driver object and a dispatch routine for
IRM_MJ_INTERNAL_DEVICE_CONTROL. The only exported function at
the moment is USBD_GetUSBDIVersion(). The if_ndis_usb.c stub
compiles, but is not hooked up to the build yet. I'm putting
these here so I can keep them under source code control as I
flesh them out.
right for certain MAP_FIXED mappings on ia64 but it will work fine for all
other mappings and works fine on amd64.
Requested by: ps, Christian Zander
MFC after: 1 week
- In kern_ndis.c:ndis_unload_driver(), test that ndis_block->nmb_rlist
is not NULL before trying to free() it.
- In subr_pe.c:pe_get_import_descriptor(), do a case-insensitive
match on the import module name. Most drivers I have encountered
link against "ntoskrnl.exe" but the ASIX USB ethernet driver I'm
testing with wants "NTOSKRNL.EXE."
- In subr_ntoskrnl.c:IoAllocateIrp(), return a pointer to the IRP
instead of NULL. (Stub code leftover.)
- Also in subr_ntoskrnl.c, add ExAllocatePoolWithTag() and ExFreePool()
to the function table list so they'll get exported to drivers properly.
panic at kmem_alloc() via malloc(9).
PR: kern/77748
Submitted by: Wojciech A. Koszek
OK'ed by: brooks
Security: local DoS, a sample code in the PR.
MFC after: 3 days
mastering on all other interfaces:
- call carp_carpdev_state() on initialize instead of just setting to INIT
- in carp_carpdev_state() check that interface is UP, instead of checking
that it is not DOWN, because a rebooted machine may have interface in
UNKNOWN state.
Sponsored by: Rambler
Obtained from: OpenBSD (partially)
when trim'ing space off the back of a chain; this is indirect
solution to a potential null ptr deref
Noticed by: Coverity Prevent analysis tool (null ptr deref)
Reviewed by: dg, rwatson
vn_extattr_rm. This is meant to catch conditions where IO_NODELOCKED
has been specified without the vnode being locked.
Discussed with: rwatson
MFC after: 1 week
object on to the zone allocator. It should be noted that uma_zalloc(9)
uses bzero to zero out the object so there probably wont be any
real performance benefit. If UMA grows the ability to supply
zeroed zones more efficiently in the future, we will not have to
modify all the existing consumers.
Discussed with: rwatson,julian
MFC after: 1 week
and a machine-independent though inefficient InterlockedExchange().
In Windows, InterlockedExchange() appears to be implemented in header
files via inline assembly. I would prefer using an atomic.h macro for
this, but there doesn't seem to be one that just does a plain old
atomic exchange (as opposed to compare and exchange). Also implement
IoSetCancelRoutine(), which is just a macro that uses InterlockedExchange().
Fill in IoBuildSynchronousFsdRequest(), IoBuildAsynchronousFsdRequest()
and IoBuildDeviceIoControlRequest() so that they do something useful,
and add a bunch of #defines to ntoskrnl_var.h to help make these work.
These may require some tweaks later.
a bugfix of clearing the On-Demand flag when going back to 100%. It
has been tested and works on an IBM R32. Note original work done by
Ted Unangst and sobomax@.
the previous one failed and there are more than one plex in the volume.
This could have led to a flood of error messages on the console and
probably a deadlock in certain situations.
has been set. Assert that this is the case so that we catch filesystems
who are using naked VOP_LOCKs in illegal cases.
Sponsored by: Isilon Systems, Inc.
considered to be as good as an exclusive lock, although there is still a
possibility of someone acquiring a VOP LOCK while xlock is held.
Sponsored by: Isilon Systems, Inc.
chipset. Add support for this card. Office Max has them on sale and
I was surprised that we didn't have it in our supported list when I
plugged it in...
IRQ 0 and not an ExtINT pin. The MADT enumerators ignore the PC-AT flag
and ignore overrides that map IRQ 0 to pin 2 when this quirk is present.
- Add a block comment above the quirks to document each quirk so that we
can use more verbose descriptions quirks.
MFC after: 2 weeks
ACPI MADT only does if the PC-AT flag is set), then don't assume that pin 0
on the first I/O APIC is an ExtINT pin. Instead, assume that it is ISA
IRQ 0.
with the kernel compile time option:
options IPFIREWALL_FORWARD_EXTENDED
This option has to be specified in addition to IPFIRWALL_FORWARD.
With this option even packets targeted for an IP address local
to the host can be redirected. All restrictions to ensure proper
behaviour for locally generated packets are turned off. Firewall
rules have to be carefully crafted to make sure that things like
PMTU discovery do not break.
Document the two kernel options.
PR: kern/71910
PR: kern/73129
MFC after: 1 week
so that parent interface is not left in promiscous mode after carp
interface is destroyed.
This is not perfect, since promisc counter is added when carp
interface is assigned an IP address. However, when address is removed
parent interface is still in promiscuous mode. Only removal of
carp interface removes promisc from parent. Same way in OpenBSD.
Sponsored by: Rambler
List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent. There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".
Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().
rename idestroy_dev() to destroy_devl() for consistency.
Add LIST_ENTRY de_alias to struct devfs_dirent.
Remove v_specnext from struct vnode.
Change si_hlist to si_alist in struct cdev.
String new devfs vnodes' devfs_dirent on si_alist when
we create them and take them off in devfs_reclaim().
Fix devfs_revoke() accordingly. Also don't clear fields
devfs_reclaim() will clear when called from vgone();
Let devfs_reclaim() call dev_rel() instead of vgonel().
Move the usecount tracking from dev_rel() to devfs_reclaim(),
and let dev_rel() take a struct cdev argument instead of vnode.
Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
devfs_reclaim()) when they are no longer used. (This
should maybe happen in devfs_close() instead.)
that NFS ever started using it. Long time ago I added the necessary
vhold()/vdrop() calls to replace it, but forgot to remove the v_id code.
Do it now.
It is _never_ OK to find a vnode from a struct cdev because you have
no way of telling if you get the right one. You might be in jail or
chroot for instance.
The fundamental problem is that we get only the lower 8 bits of the
minor device number so there is no guarantee that we can actually
find the disk device in question at all.
This was probably a bigger issue pre-GEOM where the upper bits
signaled which slice were in use.
The secondary problem is how we get from (partial) dev_t to vnode.
The correct implementation will involve traversing the mount list
looking for a perfect match or a possible match (for truncated
minor).
hosts to share an IP address, providing high availability and load
balancing.
Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.
FreeBSD port done solely by Max Laier.
Patch by: mlaier
Obtained from: OpenBSD (mickey, mcbride)
address is not supplied, then jail IP is choosed and in_pcbbind() is called.
Since udp_output() does not save local addr after call to in_pcbconnect_setup(),
in_pcbbind() is called for each packet, and this is incorrect.
So, we shall treat jailed sockets specially in udp_output(), we will save
their local address.
This fixes a long standing bug with broken sendto() system call in jails.
PR: kern/26506
Reviewed by: rwatson
MFC after: 2 weeks
loopback interface. Nobody have explained me sense of this check.
It breaks connect() system call to a destination address which is
loopback routed (e.g. blackholed).
Reviewed by: silence on net@
MFC after: 2 weeks
modes, systems may take longer. If the status values don't match, try
matching just the lowest 8 bits if no bits above 8 are set in the desired
value. The IBM R32 has other bits set in the status register that are
irrelevant to the expected value.
the switch. Other interim tests (i.e., for minimum runtime) could
invalidate the start time. This fixes transitions to cooler states in that
now they go to the next active state (_AC0 -> _AC1) instead of going
straight to off (_AC0 -> off).
Submitted by: Alexandre "Sunny" Kovalenko (Alex.Kovalenko / verizon.net)
locks held, specify the ACPI_ISR flag to keep it from acquiring any more
mutexes (which could potentially sleep.) This should fix "could sleep"
warning messages on the following path:
msleep()
AcpiOsWaitSemaphore()
AcpiUtAcquireMutex()
AcpiDisableGpe()
EcGpeHandler()
AcpiEvGpeDispatch()
AcpiEvGpeDetect()
AcpiEvGpeDetect()
AcpiEvSciXruptHandler()
a socket from a regular socket to a listening socket able to accept new
connections. As part of this state transition, solisten() calls into the
protocol to update protocol-layer state. There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.
This change does the following:
- Pushes the socket state transition from the socket layer solisten() to
to socket "library" routines called from the protocol. This permits
the socket routines to be called while holding the protocol mutexes,
preventing a race exposing the incomplete socket state transition to TCP
after the TCP state transition has completed. The check for a socket
layer state transition is performed by solisten_proto_check(), and the
actual transition is performed by solisten_proto().
- Holds the socket lock for the duration of the socket state test and set,
and over the protocol layer state transition, which is now possible as
the socket lock is acquired by the protocol layer, rather than vice
versa. This prevents additional state related races in the socket
layer.
This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another. Similar changes are likely require
elsewhere in the socket/protocol code.
Reported by: Peter Holm <peter@holm.cc>
Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod: gnn
possible that the same packet would show up multiple times. This poses some
constraints on the TBD locking for snc(4) (see comment).
Obtained from: DragonFlyBSD
Submitted by: Joerg Sonnenberger
Reviewed by: rwatson
was a bad idea, but since it is done like this in the vendor source we keep
it around for older versions. As a safe guard against future misuse we don't
even define CALLOUT_INITIALIZER anymore.
This fixes ALTQ after callout_init_mtx() and takes altq_var.h off the vendor
branch.
Submitted by: Divacky Roman <xdivac02NOstud.fit.vutbrSPAMcz> (w/ changes)
on the previous generation of Pentium-M processors (Banias). Support for
Dothan and later processors involves working with acpi_perf(4) to extract
information about supported states. This driver should work on MP systems
including HTT. It is experimental and may have a few bugs but has been
tested to not crash at least.
Thanks to Colin Percival for his initial work on this driver.
only call the protocol's pru_rcvd() if the protocol has the flag
PR_WANTRCVD set. This brings that instance of pru_rcvd() into line with
the rest, which do check the flag.
MFC after: 3 days
very slow process, especially for large file systems that is just
recovered from a crash.
Since the summary is already re-sync'ed every 30 second, we will
not lag behind too much after a crash. With this consideration
in mind, it is more reasonable to transfer the responsibility to
background fsck, to reduce the delay after a crash.
Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to
control this behavior. When set to nonzero, we will get the
"old" behavior, that the summary is computed immediately at mount
time.
Add five new sysctl variables to adjust ndir, nbfree, nifree,
nffree and numclusters respectively. Teach fsck_ffs about these
API, however, intentionally not to check the existence, since
kernels without these sysctls must have recomputed the summary
and hence no adjustments are necessary.
This change has eliminated the usual tens of minutes of delay of
mounting large dirty volumes.
Reviewed by: mckusick
MFC After: 1 week
of the global UNIX domain socket mutex: no protection is needed that
early in the setup of the UNIX domain socket and socket structures.
MFC after: 3 days
preliminary support for using the GCC-compatibility of ICC was committed
but couldn't be tested at that time due to problems with ICC itself. Since
ICC 8.1 it's possible to use its GCC-compatibility under FreeBSD and it
turned out that a typedef for __gnuc_va_list is required in that case.
Revert the part of rev. 1.8 which #ifdef'ed out __gnuc_va_list for ICC.
MFC after: 1 week
patch from kan@).
Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close. This is not yet a generally safe function, but for this very
specific use it is safe. This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
a packet has VLAN mbuf tag attached. This is faster to check than
m_tag_locate(), and allows us to use the tags in non-vlan(4) VLAN
producers.
The first argument to VLAN_OUTPUT_TAG() is now unused but retained
for backward compatibility.
While here, embellish a fix in rev. 1.174 of if_ethersubr.c -- it
now checks for packets with VLAN (mbuf) tags, and it should now
be possible to bridge(4) on vlan(4)'s whose parent interfaces
support VLAN decapsulation in hardware.
Reviewed by: sam
pointers in argv and envv in userland and use that together with
kern_execve() and exec_free_args() to implement freebsd32_execve()
without using the stackgap.
- Fix freebsd32_adjtime() to call adjtime() rather than utimes(). Still
uses stackgap for now.
- Use kern_setitimer(), kern_getitimer(), kern_select(), kern_utimes(),
kern_statfs(), kern_fstatfs(), kern_fhstatfs(), kern_stat(),
kern_fstat(), and kern_lstat().
Tested by: cokane (amd64)
Silence on: amd64, ia64
pointers in argv and envv in userland and use that together with
kern_execve() and exec_free_args() to implement linux_execve() for the
amd64/linux32 ABI without using the stackgap.
- Implement linux_nanosleep() using the recently added kern_nanosleep().
- Use linux_emul_convpath() instead of linux_emul_find() in
exec_linux_imgact_try().
Tested by: cokane
Silence on: amd64
the semantics in that the returned filename to use is now a kernel
pointer rather than a user space pointer. This required changing the
arguments to the CHECKALT*() macros some and changing the various system
calls that used pathnames to use the kern_foo() functions that can accept
kernel space filename pointers instead of calling the system call
directly.
- Use kern_open(), kern_stat(), kern_lstat(), kern_fstat(), kern_access(),
kern_truncate(), kern_pathconf(), kern_execve(), kern_select(),
kern_setitimer(), kern_getitimer(), kern_statfs(), and kern_fstatfs().
Silence on: alpha@
like a valid range. We already do this in the memory case (although
the code there is somewhat different than the I/o case because we have
to deal with different kinds of memory). Since most laptops don't
have non-subtractive bridges, this wasn't seen in practice.
Evidentally the Compaq R3000 hits this problem with PC Cards.
Some minor style fixes while I'm here.
Submitted by: Jung-uk Kim
mounted (is it Joliet, RockRidge, High Sierra) based on bootverbose.
Most file systems don't generate log messages based on details of the
file system superblock, and these log messages disrupt sysinstall output
during a new install from CD. We may want to explore exposing this
status information using nmount() at some point.
MFC after: 3 days
Note that this results in more kernel virtual memory being reserved for
temporary storage of the args. The args temporary space is allocated out
of exec_map (a submap of kernel_map). This will use roughly 4MB of KVM.
OK'ed by: dg
copy op to shift arguments on the stack instead of transfering each
argument one by one through a register. Probably doesn't affect overall
operation, but makes the code a little less grotty and easier to update
later if I choose to make the wrapper handle more args. Also add
comments.
so->so_options when solisten() will succeed, rather than setting it
conditionally based on there not being queued sockets in the completed
socket queue. Otherwise, if the protocol exposes new sockets via the
completed queue before solisten() completes, the listen() system call
will succeed, but the socket and protocol state will be out of sync.
For TCP, this didn't happen in practice, as the TCP code will panic if
a new connection comes in after the tcpcb has been transitioned to a
listening state but the socket doesn't have SO_ACCEPTCONN set.
This is historical behavior resulting from bitrot since 4.3BSD, in which
that line of code was associated with the conditional NULL'ing of the
connection queue pointers (one-time initialization to be performed
during the transition to a listening socket), which are now initialized
separately.
Discussed with: fenner, gnn
MFC after: 3 days
driver. This used to be handled by cpufreq_drv_settings() but it's
useful to get the type/flags separately from getting the settings.
(For example, you don't have to pass an array of cf_setting just to find
the driver type.)
Use this new method in our in-tree drivers to detect reliably if acpi_perf
is present and owns the hardware. This simplifies logic in drivers as well
as fixing a bug introduced in my last commit where too many drivers attached.
are equal to PCCARD_TPCE_FS_MEMSPACE_NONE, memspace will be zero, so
testing for this case inside of the if statement results in dead code.
We'd fail to set a value to zero that's already zero (since it is
initialized to 0 indirectly) with this code being there. Well, except
in the very rare case that we have a card that has a defualt entry
that includes a memory space followed by one that has no memory space
(these are extremely rare, I don't recall ever having seen one :-).
Fix this by setting num_memspace to 0 in a more appropriate place.
Submitted by: Coverity Prevent analysis tool
than the generic ne-2000 string. This should have no effect on the
actual support of the parts, just reporting what the part was.
Also, rename a few functins and symbols to reflect a more generic
part support that grew out of the early specific support.
soref() to also covering the update of so_state. While no other user
threads can update the socket state here as it's not yet hooked up to
the file descriptor array yet, the protocol could also frob the
socket state here, leading to a lost update to the so_state field.
No reported instances of this bug (as yet).
MFC after: 3 days
connection status before inserting the new socket into the listen
socket's accept queue, or there might be a race in which another thread
wakes up when the accept lock is released, and sees the socket before its
state is set correctly. The wakeup still occurs after the accept lock is
released. There have been no diagnoses of this bug in real-world systems
(as yet).
MFC after: 3 days
or just offering info. In the former case, we don't probe/attach to allow
the ACPI driver precedence. A refinement of this would be to actually
use the info provided by acpi_perf(4) to get the real CPU clock rates
instead of estimating them but since all systems that support both
acpi_perf(4) and ichss(4) export the control registers to acpi_perf(4),
it can just handle the registers on its own.
loaded, the tick interrupt enabled and a handler that resets the tick
counter on every tick interrupt. While this isn't documented this can
cause DELAY() to wait for a value the tick counter will not reach when
used in early boot, i.e. before cpu_initclocks() is called, depending
on when in the cycle DELAY() is called, the delay value and the value
the tick compare register is set to. The excessive use of DELAY() in
uart(4) when probing Sun keyboards seems to always manage to trigger
this, resulting in a hang during boot.
Disable the tick interrupt in tick_init(), which is called early in
sparc64_init(), until the interrupt is enabled again in tick_start(),
called by cpu_initclocks(), with our own handler. This fixes the hang
during probing Sun keyboards on AXi boards and Ultra 10, with other
machines like Ultra 5 probably being affected but not tested.
Additional testing by: Matthias Muthmann
MFC after: 1 week
aic7xxx.c:
Allow print_reg() to be called with a NULL column.
aic79xx.c:
Correct new usage of SCB_GET_TAG().
aic7xxx.c:
Fix stray ahd that snuck in here.
statement from some files, so re-add it for the moment, until the
related legalese is sorted out. This change affects:
sys/kern/kern_mbuf.c
sys/vm/memguard.c
sys/vm/memguard.h
sys/vm/uma.h
sys/vm/uma_core.c
sys/vm/uma_dbg.c
sys/vm/uma_dbg.h
sys/vm/uma_int.h
UMA_ZONE_REFCNT and UMA_ZONE_MALLOC zones, as the page(s) undoubtedly
came from kmem_map for those two. Previously it would set it back
to NULL for UMA_ZONE_REFCNT zones and although this was probably not
fatal, it added MORE code for no reason.
for now) exactly the same as KfAcquireSpinLock() and KfReleaseSpinLock().
I implemented the former as small routines in subr_ntoskrnl.c that just
turned around and invoked the latter. But I don't really need the wrapper
routines: I can just create an entries in the ntoskrnl func table that
map KeAcquireSpinLockRaiseToDpc() and KeReleaseSpinLock() to
KfAcquireSpinLock() and KfReleaseSpinLock() directly. This means
the stubs can go away.
close holes in detecting busfrees that occur after a packetized target
transitions to a non-packetized phase. The most common case where this
occurs is when a target is externally reset so the controller believes
a packetzied negotiation agreement is still in effect. Unfortunately,
disabling this feature seems to cause problems for the 7901B. Re-enable
ehanced busfree detection for this part until I can get my hands on a
samble to figure out if the old workaround is necessary and, if so, how
to make it work correctly.
This flag means "wait for all pending requests before returning to userland".
There are pending events for sure, because we just created new provider and
other classes want to taste it, but we cannot answer on I/O requests until
we're here.
Ville-Pertti Keinonen (will at exomi dot comohmygodnospampleasekthx)
deserves a big thanks for submitting initial patches to make it
work. I have mangled his contributions appropriately.
The main gotcha with Windows/x86-64 is that Microsoft uses a different
calling convention than everyone else. The standard ABI requires using
6 registers for argument passing, with other arguments on the stack.
Microsoft uses only 4 registers, and requires the caller to leave room
on the stack for the register arguments incase the callee needs to
spill them. Unlike x86, where Microsoft uses a mix of _cdecl, _stdcall
and _fastcall, all routines on Windows/x86-64 uses the same convention.
This unfortunately means that all the functions we export to the
driver require an intermediate translation wrapper. Similarly, we have
to wrap all calls back into the driver binary itself.
The original patches provided macros to wrap every single routine at
compile time, providing a secondary jump table with a customized
wrapper for each exported routine. I decided to use a different approach:
the call wrapper for each function is created from a template at
runtime, and the routine to jump to is patched into the wrapper as
it is created. The subr_pe module has been modified to patch in the
wrapped function instead of the original. (On x86, the wrapping
routine is a no-op.)
There are some minor API differences that had to be accounted for:
- KeAcquireSpinLock() is a real function on amd64, not a macro wrapper
around KfAcquireSpinLock()
- NdisFreeBuffer() is actually IoFreeMdl(). I had to change the whole
NDIS_BUFFER API a bit to accomodate this.
Bugs fixed along the way:
- IoAllocateMdl() always returned NULL
- kern_windrv.c:windrv_unload() wasn't releasing private driver object
extensions correctly (found thanks to memguard)
This has only been tested with the driver for the Broadcom 802.11g
chipset, which was the only Windows/x86-64 driver I could find.
reported to the sender - in the case where the sender sends data
outside the window (as WinXP does :().
Reported by: Sam Jensen <sam at wand dot net dot nz>
Submitted by: Mohan Srinivasan
an unused pageq queue reference in the page structure to stash a pointer
to the MemGuard FIFO. Using the page->object field caused problems
because when vm_map_protect() was called the second time to set
VM_PROT_DEFAULT back onto a set of pages in memguard_map, the protection
in the VM would be changed but the PMAP code would lazily not restore
the PG_RW bit on the underlying pages right away (see pmap_protect()).
So when a page fault finally occured and the VM noticed the faulting
address corresponds to a page that _does_ have write access now, it
would then call into PMAP to set back PG_RW (i386 case being discussed
here). However, before it got to do that, an assertion on the object
lock not being owned would get triggered, as the object of the faulting
page would need to be locked but was overloaded by MemGuard. This is
precisely why MemGuard cannot overload page->object.
Submitted by: Alan Cox (alc@)
the rate for the 100% state once. Afterwards, use that value for deriving
states. This should fix the problem where the calibrated frequency was
different once a switch was done, giving a different set of levels each
time. Also, properly search for the right cpufreqX device when detaching.
parent cpu device before passing it to pcpu_find(). Get the ivars from the
child, not parent cpu device. These bugs would cause a panic when
dereferencing the pcpu ivar, but weren't present in the acpi attachment
which it seems most people are using.
and into the bus front ends. For ISA and C-BUS cards, we always need
to grab it. For PC Card, already committed, we need to do some sanity
checking on the data that's in the ROMs before we decide that they are
OK to use. The PC Card code has already been committed and is
independent of this code (which also has to work on NE-1000 cards,
assuming that those cards still work :-).
with the latest changes. They actually have valid ROM data at location
0 of memory, just like a real NE-2000 ISA card. Use this data, if
the ROM passes a few basic tests, as an additional source for the MAC
address. Prefer the CIS over this source, but have it take precidence
over falling back to reading the attribtue memory.
o Minor cleanup of a few devices that we match on based on CIS string.
Remove the SACK "initburst" sysctl.
- Fix bugs in SACK dupack and partialack handling that can cause
large bursts while in SACK recovery.
Submitted by: Mohan Srinivasan
override the current freq level temporarily and restore it when the
higher priority condition is past. Note that only the first overridden
value is saved. Callers pass NULL to CPUFREQ_SET to restore the saved
level. Priorities are not yet used so this commit should have no effect.
- refactor ngd_constructor, so that make_dev() is called without
any locks held, since it mallocs memory with M_WAITOK flag.
- rename global mtx, to have name different to per-node mtx
MFC after: 2 weeks
removes netgraph node and unwraps Ethernet interface.
This gives us ability to unload ng_ether.ko, when all interfaces
are detached, making ng_ether(4) developers happy.
Reviewed by: ru
driver did VLAN decapsulation in hardware, we were passing a frame
as if it came for the parent (non-VLAN) interface. Stop this from
happening.
Reminded by: glebius
Security: This could pose a security risk in some setups
o Use SYSCTL_IN() macro instead of direct call of copyin(9).
Submitted by: ume
o Move sysctl_drop() implementation to sys/netinet/tcp_subr.c where
most of tcp sysctls live.
o There are net.inet[6].tcp[6].getcred sysctls already, no needs in
a separate struct tcp_ident_mapping.
Suggested by: ume
bridge between OLDCARD and NEWCARD for drivers to inquire after the
function number (eg, 0, 1, 2). Nobody ever used it, so retire it
with honors. NEWCARD never implemented it, and the same information
can be obtained by the pccard_get_function_number().
MFC After: 3 days
proper way, or at least the same way that NetBSD and Linux do things
(I've been unable to obtain datasheets for these parts to know for
sure). This has some marginal improvement in the DL10022 and DL10019
cards that I have. Also, report which type, exactly.
# There's one or two ed cards that I have which still don't work, but I think
# that's due to MII losage on the card that's not presently compensated
# for in the MII drivers.
tree since 2003/02/20, and I recently cleaned it up. I'd even closed
the PR that I obtained this from Fri Jul 18 23:25:08 MDT 2003 since
I looked at my p4 tree.
PR: 46889
Submitted by: HASEGAWA Tomoki
memory disk is larger than the number of available sf_bufs, this improves
performance on SMPs by eliminating interprocessor TLB shootdowns. For
example, with 6656 sf_bufs, the default on my test machine, and a 256MB
swap-backed memory disk, I see the command
"dd if=/dev/md0 of=/dev/null bs=64k" achieve ~489MB/sec with the default,
shared mappings, and ~587MB/sec with CPU private mappings.
base transfer speed to CAM. The actual value used (40MB/s) is fairly
arbitrary, but assumes the same 33% overhead as was implied by the
1MB/s figure we used for USB1 devices.
are not added to the list(s) of available settings. However, other drivers
can call the CPUFREQ_DRV_SETTINGS() method on those devices directly to
get info about available settings.
Update the acpi_perf(4) driver to use this flag in the presence of
"functional fixed hardware." Thus, future drivers like Powernow can
query acpi_perf for platform info but perform frequency transitions
themselves.
on dev.cpu.0 will affect all of the CPUs together. In the future,
independent control will be supported but this is good enough for now.
Check that the timecounter isn't TSC before switching (from Colin Percival.)
former is callable from user space and the latter from the kernel one. Make
kernel version take additional argument which tells if the respective call
should check for additional restrictions for sending signals to suid/sugid
applications or not.
Make all emulation layers using non-checked version, since signal numbers in
emulation layers can have different meaning that in native mode and such
protection can cause misbehaviour.
As a result remove LIBTHR from the signals allowed to be delivered to a
suid/sugid application.
Requested (sorta) by: rwatson
MFC after: 2 weeks
throttling, neglecting to do this kept the sysctls from appearing.
Attach an acpi_throttle device to each CPU that supports it.
Don't add a device if the P_BLK is invalid or if _PTC is not present.
This removes extraneous probe/attach failure messages on some machines.
Make the cpu throttle state local to the softc to account for partial
successes when changing the clock rate on MP machines.
for nodes hanging off of Central (untested), FireHose (untested) and
PCI (tested) busses.
- Add an additional parameter to OF_decode_addr() which specifies the
index of the register bank to decode.
These should allow to eventually add support for the Z8530 hanging off of
FireHose to uart(4) and to write support for PCI-based graphics adapters.
Suggested by: tmm (back in '03)
which will finally lead to kernel panic.
Security: This prevents a local (root-launched) DoS
Submitted by: Wojciech A. Koszek [dunstan at freebsd czest pl]
PR: 77421
MFC After: 1 week
o Add a fallback location for the MAC address. Most of the early ne2000
PC Cards were built from the same parts, so most of them have the same
address in the CIS to grab the MAC from. Use this address as our
fallback if we don't find anything better.
o Add printf, in bootverbose, noting the MAC addresses that we find along
the way.
# Better sanity checking of the MAC address is needed. Will have to
# investigate using/creating a centralized function to do this as a number
# of other PC Card drivers each have their own ad-hoc tests.
a definite setup was broken: two ng_ksockets are connected to each other,
connect()ed to different remote hosts, and bind()ed to different local
interfaces. In this case one ng_ksocket is fooled with tag from the other
one.
Put node id into tag. In rcvdata method utilize tag only if it has our
own id inside or id equals zero. The latter case is added to support
packets send by some third, not ng_ksocket node.
MFC after: 1 week
1. Dependency on netgraph module was broken (wrong version).
2. Netgraph node type was never destroyed on unload. This
was masked by problem #1.
Fixed both by using NETGRAPH_INIT(). Now netgraph node type
is created on module load, as in the rest of netgraph modules.
This information will be very useful for people who are tuning applications
which have a dependence on IPC mechanisms.
The following OIDs were documented:
Message queues:
kern.ipc.msgmax
kern.ipc.msgmni
kern.ipc.msgmnb
kern.ipc.msgtlq
kern.ipc.msgssz
kern.ipc.msgseg
Semaphores:
kern.ipc.semmap
kern.ipc.semmni
kern.ipc.semmns
kern.ipc.semmnu
kern.ipc.semmsl
kern.ipc.semopm
kern.ipc.semume
kern.ipc.semusz
kern.ipc.semvmx
kern.ipc.semaem
Shared memory:
kern.ipc.shmmax
kern.ipc.shmmin
kern.ipc.shmmni
kern.ipc.shmseg
kern.ipc.shmall
kern.ipc.shm_use_phys
kern.ipc.shm_allow_removed
kern.ipc.shmsegs
These new descriptions can be viewed using sysctl -d
PR: kern/65219
Submitted by: Dan Nelson <dnelson at allantgroup dot com> (modified)
No objections: developers@
Descriptions reviewed by: gnn
MFC after: 1 week
at some offset. Unlike newer cards, the MAC address wasn't part of
the CIS as a specific FUNCE. These older cards were having their MAC
address show up as 0:2:4:6:8:a because that's what's in the ROM
locations that would be there in a real ne2000.
This patch allows one to specify the offset for the MAC address for
these cards. Specify one for the IBM Ethernet II card, as it is one
that has this problem. One shouldn't specify this unless the MAC
address really isn't in the CIS at all.
Side note: The novell probe likely shouldn't read the MAC address, and
that should be moved to the bus specific attach routine(s), maybe as a
convenience function in if_ed_novell.c.
My IBM Ethernet II (aka Info Mover) now has a believable MAC address.
with net byte order. Change byte order to net in ng_ipfw_input(), change
byte order to host before ip_output(), do not change before ip_input().
In collaboration with: ru
suid application. The problem is that Linux applications using old Linux
threads (pre-NPTL) use signal 32 (linux SIGRTMIN) for communication between
thread-processes. If such an linux application is installed suid or sgid
and security.bsd.conservative_signals=1 (default), then permission will be
denied to send such a signal and the application will freeze.
I believe the same will be true for native applications that use libthr,
since libthr uses SIGTHR for implementing conditional variables.
PR: 72922
Submitted by: Andriy Gapon <avg@icyb.net.ua>
MFC after: 2 weeks
are NOVELL NE2000 with just a tiny quirk that's non vendor specific.
Instead, use the chip_type of DL100XX instead. This is more inline
with how the AX88190 support was added, and seems a little cleaner.
can retransmit on TX underrun and set TOK in addition to TUND. Also add a
check to prevent overflow of the addressable threshold.
This fixes some reports of rl(4) slowness, believed to be related to ALTQ
before.
PR: kern/61448
Submitted by: Tim Draegen-Gilman <timNOeudaemonSPAMnet> (with changes)
MFC after: 1 week
list, set `curr_callout' to NULL. This ensures that we won't attempt
to cancel the current callout if the original callout structure
gets recycled while we wait to acquire Giant.
This is reported to fix an intermittent syscons problem that was
introduced by revision 1.96.
pessmize the error recover path through edintr by calling these
functions, rather than expanding it inline. This error path already
does a lot in it, so an extra function call will be lost in the noise.
It also happens rarely.
while (complicated-expr)
continune;
in preference to
while (complicated-expr);
since the code generated is identical, and the former is easier to read,
especially for complicated-expr that reach to the end of the line...
a little bit of complexity but performance requirements lacking (this is
a debugging allocator after all), it's really not too bad (still
only 317 lines).
Also add an additional check to help catch really weird 3-threads-involved
races: make memguard_free() write to the first page handed back, always,
before it does anything else.
Note that there is still a problem in VM+PMAP (specifically with
vm_map_protect) w.r.t. MemGuard uses it, but this will be fixed shortly
and this change stands on its own.
Older cards have it reversed.
Also, use some already defined values instead of magic numbers.
PR: 73324
Submitted by: arne_woerner@yahoo.com
MFC after: 1 week
do not need to perform an extra memory fetch in the Packet (Mbuf+Cluster)
constructor to initialize the reference counter anymore. The reference
counts are located in a separate memory region (in the slab header,
because this zone is UMA_ZONE_REFCNT), so the memory fetch resulted very
often in a cache miss. Additionally, and perhaps more significantly,
optimize the free mbuf+cluster (packet) case, which is very common, to
no longer require an atomic operation on free (to verify the reference
counter) if the reference on the cluster has never been increased (also
very common). Reduces an atomic on mbuf free on average.
Original patch submitted by: Gerrit Nagelhout <gnagelhout@sandvine.com>
fix the problem with device discovery seen by some people.
2. Change to make 3ware CLI/3DM work on amd64.
3. Fix a potential problem that could cause the driver to do strlen(NULL) when
using older firmware.
Reviewed by:scottl
probing the novell ne[12]000 cards. It should be its own thing, ala
how we do the dl100xx support doing its own thing at the right time.
For the moment, it is just a function, which makes the mainline of the
generic probe easier to follow.
Also, correct a couple of comments that looked wrong.
# there may be a bug in setting up gwether, in that we set
# sc->rec_page_stop based on memsize, rather than sc->mem_size, so if
# these two are different, then the rec_page_stop will be wrong. I'm
# hesitant to fix it without real hardware to test with. Since
# gwether isn't in the hardware list of the man page nor in the commit
# messages, it is hard to know for sure.
invalidate pending io and dependencies. However, vinvalbuf() rightfully
does not call vnode_pager_setsize() for us. We must do this here. This
could potentially have caused numerous kinds of bugs, but it was
specifically causing msync() deadlocks because msync() was writing
flushing pages that should not have been valid.
Sponsored by: Isilon Systems, Inc.
Reported by: kkenn
and wd80x3 support. Make the obscure ISA cards optional, and add
those options to NOTES on i386 (note: the ifdef around the whole code
is for module building). Tweak pc98 ed support to include wd80x3 too.
Add goo for alpha too.
The affected cards are the 3Com 3C503, HP LAN+ and SIC (whatever that
is). I couldn't find any of these for sale on ebay, so they are
untested. If you have one of these cards, and send it to me, I'll
ensure that you have no future problems with it...
Minor cleanups as well by using functions rather than cut and paste
code for some probing operations (where the function call overhead is
lost in the noise).
Remove use of kvtop, since they aren't required anymore. This driver
needs to get its memory mapped act together, however, and use bus
space. It doesn't right now.
This reduces the size of if_ed.ko from about 51k to 33k on my laptop.
The difference is that the callout function installed via the
ng_callout() method is guaranteed to NOT fire after the shutdown
method was run (when a node is marked NGF_INVALID). Also, the
shutdown method and the callout function are guaranteed to NOT
run at the same time, as both require the writer lock. Thus
we can safely ignore a zero return value from ng_uncallout()
(callout_stop()) in shutdown methods, and go on with freeing
the node.
The said revision broke the node shutdown -- ng_bridge_timeout()
is no longer fired after ng_bridge_shutdown() was run, resulting
in a memory leak, dead nodes, and inability to unload the module.
Fix this by cancelling the callout on shutdown, and moving part
responsible for freeing a node resources from ng_bridge_timer()
to ng_bridge_shutdown().
Noticed by: ru
Submitted by: glebius, ru
Giant held. In camisr(), move the ccb_bioq elements to a temporary local list
and then process the elements off of that list. This enables the list to be
processed by only taking the ccb_bioq_lock once and only for a very short
time.
ccb_bioq_lock is a leaf mutex, so it's fine to call xpt_done() with other
locks held. This is just a very minor step in the work to lock CAM, but
it allows us to avoid some messy locking/unlock dances in certain drivers.
4 mutex operations per I/O requests.
- Use only one mutex to protect both (incoming and outgoing) queue.
As MUTEX_PROFILING(9) shows, there is no big contention for this lock.
- Protect sc_queue_count with queue mutex, instead of doing atomic
operations on it.
- Remove DROP_GIANT()/PICKUP_GIANT() - ggate is marked as MPSAFE and no
Giant there.
same as the LINKSYS COMBO_ECARD (which also seems to be the same as
another linksys product that also has a modem, but I can't find that
one at the moment). Remove the PCM100, since it is now no longer
used.
o The COMBO_ECARD comes in many flavors, it seems, so probe both the DL10019
and the AX88x90 on it. Since this seems to work with no ill effects, maybe
the probing should happen more generally rather than being table driven.
Need to think more about this.
o Remove PCM100 because it is duplicative (the ETHERFAST is the pcm100 and
apparently has the same IDs). It was here for NetBSD because they match
up an expected MAC address OID, but since we don't bother with that, we
don't need to be so finely discriminating.
o Minor style nit.
if_ed_isa.c, and they seem to not be helpful anymore.
o Fix style issues from de-Pification.
o change from _isa_ to _cbus_ to the largest extent possible to reflect that
this is really for cbus, not isa.
o Use ANSI function definitions.
o Use ed_clear_memory
o eliminate kvtop
which will help to debug hangs on boot.
- Remove 'U' from debug.watchdog sysctl definition, so if we set it to '-1'
it really shows '-1'.
- Fix comment.
Reviewed by: rwatson
behaviour of chflags within a jail. If set to 0 (the default), then a
jailed root user is treated as an unprivileged user; if set to 1, then
a jailed root user is treated the same as an unjailed root user.
This is necessary to allow "make installworld" to work inside a jail,
since it attempts to manipulate the system immutable flag on certain
files.
Discussed with: csjp, rwatson
MFC after: 2 weeks
Give FFS vnodes a specific bufwrite method which contains all the
background write stuff and then calls into the default bufwrite()
for the rest of the job.
Remove all the background write related stuff from the normal bufwrite.
This drags the softdep_move_dependencies() back into FFS.
Long term, it is worth looking at simply copying the data into
allocated memory and issuing the bio directly and not create the
"shadow buf" in the first place (just like copy-on-write is done
in snapshots for instance). I don't think we really gain anything
but complexity from doing this with a buf.
rather than forwarding interrupts from the clock devices around using IPIs:
- Add an IDT vector that pushes a clock frame and calls
lapic_handle_timer().
- Add functions to program the local APIC timer including setting the
divisor, and setting up the timer to either down a periodic countdown
or one-shot countdown.
- Add a lapic_setup_clock() function that the BSP calls from
cpu_init_clocks() to setup the local APIC timer if it is going to be
used. The setup uses a one-shot countdown to calibrate the timer. We
then program the timer on each CPU to fire at a frequency of hz * 3.
stathz is defined as freq / 23 (hz * 3 / 23), and profhz is defined as
freq / 2 (hz * 3 / 2). This gives the clocks relatively prime divisors
while keeping a low LCM for the frequency of the clock interrupts.
Thanks to Peter Jeremy for suggesting this approach.
- Remove the hardclock and statclock forwarding code including the two
associated IPIs. The bitmap IPI handler has now effectively degenerated
to just IPI_AST.
- When the local APIC timer is used we don't turn the RTC on at all, but
we still enable interrupts on the ISA timer 0 (i8254) for timecounting
purposes.
Windows DRIVER_OBJECT and DEVICE_OBJECT mechanism so that we can
simulate driver stacking.
In Windows, each loaded driver image is attached to a DRIVER_OBJECT
structure. Windows uses the registry to match up a given vendor/device
ID combination with a corresponding DRIVER_OBJECT. When a driver image
is first loaded, its DriverEntry() routine is invoked, which sets up
the AddDevice() function pointer in the DRIVER_OBJECT and creates
a dispatch table (based on IRP major codes). When a Windows bus driver
detects a new device, it creates a Physical Device Object (PDO) for
it. This is a DEVICE_OBJECT structure, with semantics analagous to
that of a device_t in FreeBSD. The Windows PNP manager will invoke
the driver's AddDevice() function and pass it pointers to the DRIVER_OBJECT
and the PDO.
The AddDevice() function then creates a new DRIVER_OBJECT structure of
its own. This is known as the Functional Device Object (FDO) and
corresponds roughly to a private softc instance. The driver uses
IoAttachDeviceToDeviceStack() to add this device object to the
driver stack for this PDO. Subsequent drivers (called filter drivers
in Windows-speak) can be loaded which add themselves to the stack.
When someone issues an IRP to a device, it travel along the stack
passing through several possible filter drivers until it reaches
the functional driver (which actually knows how to talk to the hardware)
at which point it will be completed. This is how Windows achieves
driver layering.
Project Evil now simulates most of this. if_ndis now has a modevent
handler which will use MOD_LOAD and MOD_UNLOAD events to drive the
creation and destruction of DRIVER_OBJECTs. (The load event also
does the relocation/dynalinking of the image.) We don't have a registry,
so the DRIVER_OBJECTS are stored in a linked list for now. Eventually,
the list entry will contain the vendor/device ID list extracted from
the .INF file. When ndis_probe() is called and detectes a supported
device, it will create a PDO for the device instance and attach it
to the DRIVER_OBJECT just as in Windows. ndis_attach() will then call
our NdisAddDevice() handler to create the FDO. The NDIS miniport block
is now a device extension hung off the FDO, just as it is in Windows.
The miniport characteristics table is now an extension hung off the
DRIVER_OBJECT as well (the characteristics are the same for all devices
handled by a given driver, so they don't need to be per-instance.)
We also do an IoAttachDeviceToDeviceStack() to put the FDO on the
stack for the PDO. There are a couple of fake bus drivers created
for the PCI and pccard buses. Eventually, there will be one for USB,
which will actually accept USB IRP.s
Things should still work just as before, only now we do things in
the proper order and maintain the correct framework to support passing
IRPs between drivers.
Various changes:
- corrected the comments about IRQL handling in subr_hal.c to more
accurately reflect reality
- update ndiscvt to make the drv_data symbol in ndis_driver_data.h a
global so that if_ndis_pci.o and/or if_ndis_pccard.o can see it.
- Obtain the softc pointer from the miniport block by referencing
the PDO rather than a private pointer of our own (nmb_ifp is no
longer used)
- implement IoAttachDeviceToDeviceStack(), IoDetachDevice(),
IoGetAttachedDevice(), IoAllocateDriverObjectExtension(),
IoGetDriverObjectExtension(), IoCreateDevice(), IoDeleteDevice(),
IoAllocateIrp(), IoReuseIrp(), IoMakeAssociatedIrp(), IoFreeIrp(),
IoInitializeIrp()
- fix a few mistakes in the driver_object and device_object definitions
- add a new module, kern_windrv.c, to handle the driver registration
and relocation/dynalinkign duties (which don't really belong in
kern_ndis.c).
- made ndis_block and ndis_chars in the ndis_softc stucture pointers
and modified all references to it
- fixed NdisMRegisterMiniport() and NdisInitializeWrapper() so they
work correctly with the new driver_object mechanism
- changed ndis_attach() to call NdisAddDevice() instead of ndis_load_driver()
(which is now deprecated)
- used ExAllocatePoolWithTag()/ExFreePool() in lookaside list routines
instead of kludged up alloc/free routines
- added kern_windrv.c to sys/modules/ndis/Makefile and files.i386.
The "business class upgrade" was implemented in UFS's VOP_LOCK
implementation ufs_lock() which is the wrong layer, so move it to
ffs_lock().
Also, as long as we have not abandonned advanced vfs-stacking we
should not preclude it from happening: instead of implementing a
copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at
vop_stdlock() at the bottom.
The "business class upgrade" was implemented in UFS's VOP_LOCK
implementation ufs_lock() which is the wrong layer, so move it to
ffs_lock().
Also, as long as we have not abandonned advanced vfs-stacking we
should not preclude it from happening: instead of implementing a
copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at
vop_stdlock() at the bottom.
This allows stacked or partitioned filesystems to say "Continue
the normal resolution from here", for instace from FFS to UFS.
Use VNASSERT() instead of KASSERT().
on my P3, microbenchmarks show the unrolled version is 78x faster. In
actual use (recursive ls), this gives an average of 9% improvement in
system time and 2% improvement in wall time.
Make the special hp versions match the general ones. Also use fixed
types in the WD80x3_generic probe, and change callers' arrays to
match. Fix a couple of minor style issues by using newstyle function
definitions in a couple places.
if_ed and rename it to ed_detach(). Tell other busses to use this
routine for detach.
Since I don't actually have any non-pccard ed hardware I can test
with, I've only tested with my pccards.
More improvements in this area likely are possible.
Prodded by: rwatson
copying data to a temporary buffer before the I/O, but also copying that
temporary buffer back to the original data location after the I/O. When
you're dumping kernel heap and stack and protected pages, this is very
very bad.
A belated thanks to Robert Watson for donating hardware for this (and future)
work.
MFC after: 3 days
the semantics in that the returned filename to use is now a kernel
pointer rather than a user space pointer. This required changing the
arguments to the CHECKALT*() macros some and changing the various system
calls that used pathnames to use the kern_foo() functions that can accept
kernel space filename pointers instead of calling the system call
directly.
- Use kern_open(), kern_access(), kern_execve(), kern_mkfifo(), kern_mknod(),
kern_setitimer(), kern_getrusage(), kern_utimes(), kern_unlink(),
kern_chdir(), kern_chmod(), kern_chown(), kern_symlink(), kern_readlink(),
kern_select(), kern_statfs(), kern_fstatfs(), kern_stat(), kern_lstat(),
kern_fstat().
- Drop the unused 'uap' argument from spx_open().
- Replace a stale duplication of vn_access() in xenix_access() lacking
recent additions such as MAC checks, etc. with a call to kern_access().
the semantics in that the returned filename to use is now a kernel
pointer rather than a user space pointer. This required changing the
arguments to the CHECKALT*() macros some and changing the various system
calls that used pathnames to use the kern_foo() functions that can accept
kernel space filename pointers instead of calling the system call
directly.
- Use kern_open(), kern_access(), kern_msgctl(), kern_execve(),
kern_mkfifo(), kern_mknod(), kern_statfs(), kern_fstatfs(),
kern_setitimer(), kern_stat(), kern_lstat(), kern_fstat(), kern_utimes(),
kern_pathconf(), and kern_unlink().
duplicating the contents of the same functions inline.
- Consolidate common code to convert a BSD statfs struct to a Linux struct
into a static worker function.
structure in the struct pointed to by the 3rd argument for IPC_STAT and
get rid of the 4th argument. The old way returned a pointer into the
kernel array that the calling function would then access afterwards
without holding the appropriate locks and doing non-lock-safe things like
copyout() with the data anyways. This change removes that unsafeness and
resulting race conditions as well as simplifying the interface.
- Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(),
fstatfs(), and fhstatfs(). Use these wrappers to cut out a lot of
code duplication for freebsd4 and netbsd compatability system calls.
- Add a new lookup function kern_alternate_path() that looks up a filename
under an alternate prefix and determines which filename should be used.
This is basically a more general version of linux_emul_convpath() that
can be shared by all the ABIs thus allowing for further reduction of
code duplication.
reboot. Safter the reboot the TCC is usually in the Automatic mode, in which
reading current performance level is likely to produce bogus results make sure
to switch it to the On-Demand mode and set to some known performance level.
Unfortunately there is no reliable way to check that TCC is in the Automatic
mode. Reading bit 4 of ACPI Thermal Monitor Control Register produces 0
regardless of the current mode.
MFC after: 1 week
callout is first initialised, using a new function callout_init_mtx().
The callout system will acquire this mutex before calling the callout
function and release it on return.
In addition, the callout system uses the mutex to avoid most of the
complications and race conditions inherent in asynchronous timer
facilities, so mutex-protected callouts have much simpler semantics.
As long as the mutex is held when invoking callout_stop() or
callout_reset(), then these functions will guarantee that the callout
will be stopped, even if softclock() had already begun to process
the callout.
Existing Giant-locked callouts will automatically pick up the new
race-free semantics. This should close a number of race conditions
in the USB code and probably other areas of the kernel too.
There should be no change in behaviour for "MP-safe" callouts; these
still need to use the techniques mentioned in timeout(9) to avoid
race conditions.
practice (which we seem to mostly follow in the tree). Move the
$FreeBSD$ tag to its more proper place after all copyright and license
notices. Add '-' to the copyright notice for Christian E. Hopps so my
copyright script picks it up.
resulting in a size_t due to C's rules of arithmetic. Rather than
bogusly cast the result to a uint8_t, fix the printf format specifier
to have a 'z' modifier which tells the compiler that the sizes really
do match.
It turns out that change 1.75 was incorrect to assume that this
'really' was a 8bit quantity. It isn't. Although the hardware
appears to limit things to < 256, it would be a bug that should be
caught by debug printf it it were. Casting it to uint8_t would have
lost this useful information.
Aslo add 'z' to a nearby debug statement that's never compiled in.
frequency as a percentage of the base rate and do not change the base
rate directly. The cpufreq framework combines these with absolute drivers
to produce synthesized levels made of one or more settings.
They have nothing at all to do with CIS parsing.
Remove some unused funce parsing: nothing used the results.
Use more of pccard_cis.h's deifnitions for the cardbus specific cis
parsing we do. More work is needed in this area.
This reduces the size of the cardbus module by 380 bytes or so...
doing it in the cpu driver. The previous code was incorrect anyway since
this value controls Px states, not throttling as the comment said. Since
we didn't support Px states before, there was no impact. Also, note that
we delay the write to SMI_CMD until after booting is complete since it
sometimes triggers a change in the frequency and we want to have all
drivers ready to detect/handle this.
- Bring IPsec support from the ports collection [1].
- Bring -o ("once only") option from the ports
collection [2].
- Adopt the Makefile framework into
usr.bin/nc/Makefile.
- Add a knob to control whether to build nc(1),
NO_NETCAT.
- Bump __FreeBSD_version so ports collection can
detect this change.
Original patchset are contributed to the ports collection by:
[1] nectar, [2] joerg.
Note: WARNS?=6 patchset spined off in this commit, in order not
to take too many files off the vendor branch.
uses the i8237 without trying to emulate the PC architecture move
the register definitions for the i8237 chip into the central include
file for the chip, except for the PC98 case which is magic.
Add new isa_dmatc() function which tells us as cheaply as possible
if the terminal count has been reached for a given channel.
utility:
The tcpdrop command drops the TCP connection specified by the
local address laddr, port lport and the foreign address faddr,
port fport.
Obtained from: OpenBSD
Reviewed by: rwatson (locking), ru (man page), -current
MFC after: 1 month
devclass. As pointed out by dfr@, devclasses don't have to share the same
linkage if multiple drivers have the same name. Newbus should match the
devclasses based on name and allocate non-conflicting unit numbers.
millisecond it is calibrating. Suggested by jhb@ and bde@. Don't clobber
the tsc_freq with the new value since it isn't accurate enough for
timecounters and the timecounter system as a whole needs support for
changing rates before we do this. Subtract 0.5% from our measurement
to account for overhead in DELAY. Note that this interface is for
estimating the clockrate and needs to work well at runtime so doing a full
calibration including disabling interrupts for a second is not feasible.
the PERF_CTRL register in our probe method so that we can tell earlier
that another driver should handle this device due to FFixedHW. This avoids
scaring users when attach failed when we really wanted probe to fail.
type. This is needed if the resource is to be released later. The RID is
still also present, though less necessary since rman_get_rid() can be used
to obtain it from the resource.
settings as exported via the ACPI _PSS method. OEMs use this interface
to encapsulate chipset or processor-specific methods (e.g., SpeedStep or
Powernow) and export their settings in a standard way. On systems that
have valid ACPI Performance states and a hardware-specific driver (e.g.,
ichss), acpi_perf(4) is preferred.
select the CPU frequency level (say for cooling). The driver interface
allows hardware drivers to announce themselves as capable of adjusting
an individual frequency setting.
interrupts, read from the interrupt status register to clear any pending
interrupts. Otherwise in some rare cases the RTC would never fire any
interrupts as it constantly thinks it has an interrupt pending.
PR: i386/17800
PR: kern/76776
Submitted by: Jose M. Alcaide jose at we dot lc dot ehu dot es
MFC after: 2 weeks
- Add buffer size limitations (overflow will not be possible anymore).
- Add 'visible' option, which will allow for passphrase reading in the
future.
- Remove special treatment of '@' and '#', those two are only confusing.
Discussed with: rwatson
MFC after: 2 weeks
release) the sio spin mutex, as use of synchronization primitives in
the debugger can result in substantial problems. With this patch in
place entering the debugger via a serial console is made
substantially more reliable.
MFC after: 1 week
Tested by: kris
Discussed with: bde
tond and not fromnd. This could lead us to leak Giant, or unlock it
twice, depending on the filesystems involved. renames within a single
filesystem would not have caused any problems.
Sponsored by: Isilon Systems, Inc.
AUE_NULL. This is a place-holder to allow other audit infrastructure
to be introduced, such as an updated syscalls.master file format,
while the license on the real audit_kevents.h is fixed.
The cause of "Duplicate mbuf free panic" is in the programming
error of hme_load_txmbuf(). The code path of the panic is the
following.
1. Due to unknown reason DMA engine was freezed. So TX descritors
of HME become full and the last failed attempt to transmit a
packet had set its associated mbuf address to hme_txdesc
structure. Also the failed packet is requeued into interface
queue structure in order to retrasmit it when there are more
available TX descritors.
2. Since DMA engine was freezed, if_timer starts to decrement its
counter. When if_timer expires it tries to reset HME. During
the reset phase, hme_meminit() is called and it frees all
associated mbuf with descriptors. The last failed mbuf is also
freed here.
3. After HME reset completed, HME starts to retransmit packets
by dequeing the first packet in interface queue.(Note! the
packet was already freed in hme_meminit()!)
4. When a TX completion interrupt is posted by the HME, driver
tries to free the successfylly transmitted mbuf. Since the
mbuf was freed in step2, now we get "Duplicate mbuf free panic".
However, the real cause is in DMA engine freeze. Since no fatal
errors reported via interrupts, there might be other cause of
the freeze. I tried hard to understand the cause of DMA engine
freeze but couldn't find any clues. It seems that the freeze
happens under very high network loads(e.g. 7.5-8.0 MB/s TX speed).
Though this fix is not enough to eliminate DMA engine freeze it's
better than panic.
Reported by: jhb via sparc64 ML
executed. This appears to violate most of the UNIX-ish standards.
One example quote from:
http://www.opengroup.org/onlinepubs/009695399/functions/exec.html
Upon successful completion, the exec functions shall mark for update
the st_atime field of the file. If an exec function failed but was
able to locate the process image file, whether the st_atime field is
marked for update is unspecified. Should the exec function succeed,
the process image file shall be considered to have been opened with
open().
This appears to take care of it for ufs filesystems, doing the necessary
sanity checks (read-only filesystem, etc) without violating any other
standards (setting atime for any open appears to be allowed in any standards
I could find).
Noticed by: cperciva
Reviewed by: kan, rwatson
CIS, weren't actually used anywhere (other than the generic PC Card
code when certain variables are defined). They aren't used in NetBSD
either. Make things simpler by removing them. Change PLANEX_2 to
PLANEX and tweak wi and owi to use that instead. The PLANEX id seems
to actually be pci ID assigned to planex, not its pcmcia id. Ooops.
I don't know if this is a reporting error from where this entry came
from, or if it is a mistake on PLANEX's part. I suspect the latter,
as ACTIONTEC and NEWMEDIA made the same mistake (although new media
may be because it uses an advansys chip inside). Make a note of this
in the file. The 0xc entires may be JEITA assigned, so note that as
well.
# This leaves just 3 entries that are totally unknown: airvast, archos
# and edimax although the arivast number is the same assigned to
# avertec in usb...
This driver implements "unaddressed listen only mode", which is what
printers and plotters commonly do on GP-IB busses.
This means that you can capture print/plot like output from your
instruments by configuring them as necessary (good luck!) and
cat -u /dev/gpib0l > /tmp/somefile
Since there is no way to know when no more output is comming you
will have to ctrl-C the cat process when it is done (that is why
the -u is important).
before entering ng_netflow. In this case it will have not NULL m_pkthdr.rcvif.
However, it will enter ng_iface soon with another index. So let in_ifIndex
value configured by user override m_pkthdr.rcvif.
Reported by: Damir Bikmuhametov
MFC after: 1 week
all reserved, as the lisence makes clear), and strike the third clause
(now this is a 2-clause liberal BSDL as are the rest of files I hold
copyright over).
The presence or absence of a keyboard does not change whether an
UART is designed as a keyboard port or not and thus whether we
can use the port as a TTY or not.
We now call sunkbd_attach() even when we didn't previously find
a keyboard. Emit a useful message stating that no keyboard was
found, but don't do anything else.
MFC after: 5 days
a UMA zone instead. This should eliminate a bit of the locking
overhead associated with with malloc and reduce the memory
consumption associated with each new state.
Reviewed by: rwatson, andre
Silence on: ipfw@
MFC after: 1 week
second; since the default hz has changed to 1000 times a second,
this resulted in unecessary work being performed.
MFC after: 2 weeks
Discussed with: phk, cperciva
General head nod: silby
ServeRAID 4 - 7 models right now. Support for older cards is possible, but
I don't have any hardware to experiment with.
Thanks to Jack Hammer at Adaptec for providing debugging hints.
Sponsored by: ImproWare AG, Switzerland
sb_state shouldn't be erased, when socket buffer is flushed by sorflush().
When sb_state was bzero'ed, a recently set SBS_CANTRCVMORE flag was cleared.
If a socket was shutdown(SHUT_RD), a subsequent read() would block on it.
Reported by: Ed Maste, Gerrit Nagelhout
Reviewed by: rwatson
o Disable ofw_console(4), sab(4) and zs(4).
sab(4) and zs(4) are disabled because the hardware controlled by
them is handled by uart(4)+puc(4) and the latter combination is
functionally complete and up to date.
ofw_console(4) is disabled because it doesn't claim the device it
controls (through OFW) and thus interferes with puc(4)+uart(4),
which has sufficient knowledge to extract the necessary information
from OFW to setup the console. Put differently, ofw_console(4) is
not a proper device driver and can only do harm. Its functionality
is completely handled by uart(4).
This commit makes uart(4) the default driver for serial ports.
MFC after: 2 weeks
engineering the pending interrupt sources from the current
state of the controller. For channel A we can always read the
interrupt pending register (RR3). For channel B we can read
the interrupt vector register (RR2) because it contains the
modified vector and thus includes the interrupt source.
Since we currently need puc(4) for the Z8530, we know that
the interrupt handler for both channels will be called and
thus that RR3 will always be read at least once, even if ch A
has no pending interrupt.
NOTE: The modified interrupt vector has no value that represent
a lack of pending interrupt for channel B. That is, the
value read when no interrupts are pending is the same as the
value for the special receive condition. Fortunately, we don't
actually have to depend on that interrupt source. This does
mean that we need to properly handle the overflow condition,
when we read received character from the chip.
o The DSR signal is represented by the SYNC bit in the external
status register (RR0). We now properly track DSR.
o It's save to enable the external/status interrupt source. We
now get interrupts when line signals (DSR, DCD or CTS) change.
Problems fixes:
o interrupt storms.
o blocked open(2).
o lack of (hardware) flow control.
o unable to report DSR.
MFC after: 5 days
providing special version of CDIOCREADSUBCHANNEL ioctl(), which assumes that
result has to be placed into kernel space not user space. In the long run
more generic solution has to be designed WRT emulating various ioctl()s
that operate on userspace buffers, but right now there is only one such
ioctl() is emulated, so that it makes little sense.
MFC after: 2 weeks
copies arguments into the kernel space and one that operates
completely in the kernel space;
o use kernel-only version of execve(2) to kill another stackgap in
linuxlator/i386.
Obtained from: DragonFlyBSD (partially)
MFC after: 2 weeks
called in "open", causing mmap() to fail.
Where possible, pass size of file to vnode_create_vobject() rather
than having it find it out the hard way via VOP_LOOKUP
Reviewed by: phk
Add minor2unit() in addition to dev2unit() and unit2minor().
If it wasn't such a hazzle we should redefine minor numbers in
the kernel without the gap for the major number, but it's not worth
the bother (yet).
a process return to userspace if it had pending GEOM events.
We need to have the same check in the exit pass to catch the case
where a GEOM related filedescriptor is not explicitly closed by
the process.
Bumped into by: people using dd(1) to build releases, nanobsd etc.
thought. I'm unsure why I thought this was the case, but it
definitely isn't for this card. If another card with the other ID
makes an appearance, then we'll add a second entry for it.
# With this change my Olicom OC2220 is now working again, since I make
# this commit with that device. :-)
EtherJet, the interrupt is selected in the eeprom based on the layout
of the PC Card board. Since this is encoded into the EEPROM, and has
no relationship to the IRQ that the pccard bridge routes the PC Card's
interrupt pin to.
As such, stop writing to that register. This gets my EtherJet working.
# The eeprom reading code appears to be totally wrong for my EtherJet
# card. This causes the card to bogusly detect the media options
# available.
significant clean up and optimizations:
- don't call bioq_disksort() on every command, the hardware will do that for
us.
- remove all of the complicated bio deferral code. bio's that can't be
serviced immediately can just wait on the bioq.
- Only reserve one command object for doing control commands to the card.
This simplifies a lot of code and significantly reduces the size of the
command struct.
- Allocate commands out of a slab instead of embedding them into the softc.
- Call the command action method directly instead of having ips_get_free_cmd()
call it indirectly.
MFC After: 1 week
mmap() on NTFS files was hosed, returning pages offset from the
start of the disk rather than the start of the file. (ie, "cp" of
a 1-block file would get you a copy of the boot sector, not the
data in the file.) The solution isn't ideal, but gives a functioning
filesystem.
Cached vnode lookup was also broken, resulting in vnode haemorrhage.
A lookup on the same file twice would give you two vnodes, and the
resulting cached pages.
Just recently, mmap() was broken due to a lack of a call to
vnode_create_vobject() in ntfs_open().
Discussed with: phk@
null-modem tty device emulate the speed settings faithfully.
The speed is emulated independently for the two directions, using
the slower of the local sides ispeed and the remote sides ospeed.
The emulated speed takes settings of bits/char, parity and stopbit
into account.
Inspired by: The BSD-DK Editor Celebrity Deathmatch Contest
i386_{get,set}_ioperm() and make those APIs visible in the kernel namespace;
o use i386_{get,set}_ldt() and i386_{get,set}_ioperm() instead of sysarch()
in the linuxlator, which allows to kill another two stackgaps.
MFC after: 2 weeks
the interface when going to toggle VLAN support for
internal reasons. If the IFCAP_VLAN_HWTAGGING bit is
cleared, we should rely on the (re)init routine to turn
VLAN support off and never touch the relevant hardware bits.
This applies to other capability bits, too. The user
obviously has a reason for clearing a capability bit,
e.g., if his particular NIC is buggy and hangs if a
certain hardware capability is turned on even for a
fraction of a second.
The flag adapter->em_insert_vlan_header still is set or
reset irrespective of the IFCAP_VLAN_HWTAGGING setting,
as before, in order to handle the case when a user sets
promiscuous mode on an interface first and later turns
its IFCAP_VLAN_HWTAGGING bit on.
This change might look orthogonal to rev#1.85, but in fact
it is not. It introduces bugfixes that hopefully will make
implementing the general scheme mentioned in the commit
message of rev#1.85 easier.
configuration: it appears to work properly in the non-promiscuous case, but
we've not yet implemented a more general solution that maintains full
functionality with promiscuous mode enabled. While my hope is that we can
get one implemented soon, this will improve functionality substantially in
the mean time.
MFC after: 3 days
AX88190 ones, but that one only minorly):
o don't set flags in the match routine. They appear to be cleared
when probe/attach is called. Before this change, they were
always treated as a simple ne2000, which would fail to get the
right NIC address.
o Lookup device again in the probe routine and probe based on the
cards that you see.
o Detect and report the DL10022 seprately from the DL10019 cards.
While I'm here:
o remove a bad printf
o change another bad printf to device_printf.
o minor style(9) formatting tweaks.
# note: a lot of OEM entries are in the ed_pccard_products such that we can
# likely remove, or collapse, many of them.
This makes all of my DL100xx cards at least probe the ethernet address
correctly, which it wasn't doing before. I can't seem to locate my
AX88xxx based cards, so those haven't been tested, but they were
busted before the change so they can't be any worse now...
address, and additional information. Then the printing of the
ethernet address was moved into ether_attach, and so we were printing
orphaned information about the card. Now the probe message is
prefixed by edX:. Prepare for it to move under bootverbose, but don't
move it there yet (the || 1 trick).
from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrapper
of that syscall.
MFC after: 2 weeks
versions of the Racore PC Card Ethernet card. Rearrange to reflect
this reality. This ejects IODATA from 0x1bf, which belongs to Racore.
Thanks to Wilko for providing me with a dumpcis for the DEPCM card.
Also, added Nextcom Nexthawk card from NetBSD
pops data from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrappers
of those syscalls.
MFC after: 2 weeks
attributes in casts (i.e. foo = (__stdcall sometype)bar). This only
happens in two places where we need to set up function pointers, so
work around the problem with some void pointer magic.
missed that when the vnode bypass was introduced.
Deal with zero length transfers before we even get to fo_ops->fo_read().
Found by: Slawa Olhovchenkov <slwzxy.spb.ru@zxy.spb.ru>
PR: 75758
It reports itself as SCSI-3 but doesnt like getting probed on high luns
because it hangs hard after finding itself again on lun 32...
Suggested by: Kenneth Merry
with NFS.
We are moving responsibility for creating the vnode_pager object into
the filesystems which own the vnode, and this is one of the places
we have to cover.
We call vnode_create_vobject() directly because we own the vnode.
If we can get the size easily, pass it as an argument to save the
call to VOP_GETATTR() in vnode_create_vobject()
the name Sande^H^H^H^H^Hvnode_create_vobject().
Make the new function take a size argument which removes the need for
a VOP_STAT() or a very pessimistic guess for disks.
Call that new function from vop_stdcreatevobject().
Make vnode_pager_alloc() private now that its only user came home.
o mark rx frames including FCS in the payload with the
IEEE80211_RADIOTAP_F_FCS flag
o remove hack to copy 802.11 headers with padding out of line; instead mark
the frames with IEEE80211_RADIOTAP_F_DATAPAD and require applications to
do the work
o split precalculated radiotap flags into tx+rx now that they can be different
Note the full usefulness of these changes depends on updates to applications
that process radiotap data.
o don't reclaim any previous beacon state in ath_beacon_alloc; do it
explicitly in ath_newstate
o reference count the node held in the beacon frame state block
o process ibss merge more intelligently; let the state machine do the
right thing instead of explicitly setting the new bssi id
o explicitly stop tx dma before doing beacon setup to handle the ibss
merge case
USB device support):
- Convert all of my locally chosen function names to their actual
Windows equivalents, where applicable. This is a big no-op change
since it doesn't affect functionality, but it helps avoid a bit
of confusion (it's now a lot easier to see which functions are
emulated Windows API routines and which are just locally defined).
- Turn ndis_buffer into an mdl, like it should have been. The structure
is the same, but now it belongs to the subr_ntoskrnl module.
- Implement a bunch of MDL handling macros from Windows and use them where
applicable.
- Correct the implementation of IoFreeMdl().
- Properly implement IoAllocateMdl() and MmBuildMdlForNonPagedPool().
- Add the definitions for struct irp and struct driver_object.
- Add IMPORT_FUNC() and IMPORT_FUNC_MAP() macros to make formatting
the module function tables a little cleaner. (Should also help
with AMD64 support later on.)
- Fix if_ndis.c to use KeRaiseIrql() and KeLowerIrql() instead of
the previous calls to hal_raise_irql() and hal_lower_irql() which
have been renamed.
The function renaming generated a lot of churn here, but there should
be very little operational effect.
short to unsigned short.
- Add SYSCTL_PROC() around somaxconn, not accepting values < 1 or > U_SHRTMAX.
Before this change setting somaxconn to smth above 32767 and calling
listen(fd, -1) lead to a socket, which doesn't accept connections at all.
Reviewed by: rwatson
Reported by: Igor Sysoev
- Use VFS_LOCK_GIANT() rather than directly acquiring giant in places
where giant is only held because vfs requires it.
Sponsored By: Isilon Systems, Inc.
- Remove some KASSERTs which are invalid if the appropriate lock is
not held.
- Slightly restructure bremfree() so that it is more sane.
- Change the flush code in bdwrite() to avoid acquiring a mutex
whenever possible.
- Change the flush code in bdwrite() to avoid holding the bufobj mutex
while calling buf_countdeps(). This introduces a lock-order
relationship with the softdep lock that can not otherwise be resolved.
- Don't set B_DONE until bufdone() is complete, otherwise another
processor may believe the buf is done before it is.
- Only acquire Giant if the caller has set b_iodone. Don't grab giant
around normal bufdone() calls.
Sponsored By: Isilon Systems, Inc.
to off.
- Protect access to mnt_kern_flag with the mointpoint mutex.
- Remove some KASSERTs which are not legal checks without the appropriate
locks held.
- Use VCANRECYCLE() rather than rolling several slightly different
checks together.
- Return from vtryrecycle() with a recycled vnode rather than a locked
vnode. This simplifies some locking.
- Remove several GIANT_REQUIRED lines.
- Add a few KASSERTs to help with INACT debugging.
Sponsored By: Isilon Systems, Inc.
- Protect access to mnt_kern_flag with the mountpoint mutex.
- Use the appropriate nd flags to deal with giant in vn_open_cred().
We currently determine whether the caller is mpsafe by checking
for a valid fdidx. Any caller coming from user-space is now
mpsafe and supplies a valid fd. No kenrel callers have been
converted to mpsafe, so this check is sufficient for now.
- Use VFS_LOCK_GIANT instead of manual giant acquisition where
appropriate.
Sponsored By: Isilon Systems, Inc.
require it.
- Track the status of Giant with the nd flag HASGIANT.
- Release giant on return of namei() callers are not marked MPSAFE as
they already own giant.
Sponsored By: Isilon Systems, Inc.
vnode lock is much simpler than I originally thought it would be.
Now, the cache lock is always acquired before the vnode lock.
- Provide some gotos in __getcwd() to simplify the unlocking a bit.
- Move Giant acquisition down into __getcwd().
Sponsored By: Isilon Systems, Inc.
if the lockmgr interlock is dropped after the caller's interlock
is dropped.
- Change some lockmgr KTRs to be slightly more helpful.
Sponsored By: Isilon Systems, Inc.
- Expand the scope of lk to cover not only interrupt races, but also
top-half races, which includes many new uses over global top-half
only data.
- Get rid of interlocked_sleep() and use msleep or BUF_LOCK where
appropriate.
- Use the lk mutex in place of the various hand rolled semaphores.
- Stop dropping the lk lock before we panic.
- Fix getdirtybuf() callers so that they reacquire access to whatever
softdep datastructure they were inxpecting in the failure/retry
case. Previously, sleeps in getdirtybuf() could leave us with
pointers to bad memory.
- Update handling of ffs to be compatible with ffs locking changes.
Sponsored By: Isilon Systems, Inc.
- Use the buffer lock on the superblock buf to serialize calls to
sbupdate.
- Set the MNTK_MPSAFE flag when QUOTA is not defined in the kernel.
Sponsored By: Isilon Systems, Inc.
it is now quite naturally protected by the ufsmount mutex.
- Use the ufs lock to protect various fields in struct fs, primarily the
cg summary needs protection to avoid allocation races. Several
functions have been slightly re-arranged to reduce the number of
lock operations.
- Adjust several functions (blkfree, freefile, etc.) to accept a
ufsmount as an argument so that we may access the ufs lock.
Sponsored By: Isilon Systems, Inc.
any per-instance global data that is not already protected by a
buf or vnode lock. Presently, only fields in ffs's struct fs utilize
this lock.
- Sort some ufsmount members so that fields used for quotas are grouped
together. This is in anticipation of quota locking.
Sponsored By: Isilon Systems, Inc.
caller may not be holding Giant, and namei() should acquire it as
necessary. HASGIANT is used to indicate when namei() is returning
with a reference to a vnode that requires giant, and giant is locked.
- Add the macro NDHASGIANT() which can be used in conjunction with
VFS_UNLOCK_GIANT() in callers who have marked the nd with MPSAFE.
Sponsored By: Isilon Systems, Inc.
must be held when any vnode owned by the filesystem is manipulated.
- Add VFS_LOCK_GIANT and VFS_UNLOCK_GIANT macros which are used to
conditionally lock and unlock Giant based on a particular mountpoint.
NetBSD went this route a while ago. FreeBSD originally tried this to
cope with multifunction cards. However, it turns out that we're
better off not worrying about the function number, and instead worry
about the function type for the function. This has worked well in
NetBSD, and all FreeBSD's relevant drivers have been converted.
# I'll rework the macros that specify them shortly, as soon as I can
# come up with a good, compatible way to deal...
Use the correct number of handles for multihandle returns.
Very, very, rarely on some SMP systems we've seen an 'unstable' type
in the response queue. I dunno whether or not it's a bug in our
handling, or whether there's a cache incoherency issue, but
try to guard against it.
MFC after: 2 weeks
have seen in the isa pnp case where a resource buts up against
0xffffffff. This would only impact when the board was booted without
ACPI.
Submitted by: Ed Maste (freebsd-stable <20050103145720.GA90754@sandvine.com>)
MFC After: 5 days
its ability to automatically scan and attach luns for modern storage
which has luns in the 0..1000 range, not 0..7.
The correct thing would be to do REPORT LUNS for devices whose LUN0
version shows a version >= SCSI3, but lacking that we should be able
to search higher than LUN 7 if we're >= SCSI3 with no ill effects.
This change keeps all of the QUIRK_HILUNS quirks, obeys the QUIRK_NOLUNS,
and introduces a QUIRK_NOHILUNS which will keep searches above LUN 7
happening for devices that report >= SCSI3 compliance. I doubt the latter
will be needed, but you never know.
This allowed me to randomly scan and attach > 500 disks at a time in
a situation where quirking for QUIRK_HILUNS wasn't practical (the
vendor id and product id changes of the virtualization changes
constantly).
Reviewed by: ken@freebsd.org, scottl@freebsd.org, gibbs@freebsd.org
MFC after: 2 weeks
witness_proc_has_locks(), as they are unused, which results in a compiler
error. This problem was introduced with the implementation of "show
alllocks".
Spotted by: Artem Kuchin <matrix at itlegion dot ru>
frame includes FCS (requires applications to be updated, but since
we weren't doing the out-of-line FCS stuff anyway app changes
were needed already)
o add a flag to indicate padding exists between the 802.11 header and
the payload (e.g. for Atheros cards)
o diff reducation against netbsd
MFC after: 1 week
in mddestroy() to properly free already allocated memory.
This fixes a panic when we want to create too big memory backed device
with preallocate memory (-o reserve).
- Remove redundant { }.
MFC after: 1 week
address, nor do we need the alignment requirements, so eliminate them.
This likely means that we can now collapse some of the entries as we
have no need of them anymore (they match other entries and were there
only to get the right attr memory offset of the enet addr).
designed to help detect tamper-after-free scenarios, a problem more
and more common and likely with multithreaded kernels where race
conditions are more prevalent.
Currently MemGuard can only take over malloc()/realloc()/free() for
particular (a) malloc type(s) and the code brought in with this
change manually instruments it to take over M_SUBPROC allocations
as an example. If you are planning to use it, for now you must:
1) Put "options DEBUG_MEMGUARD" in your kernel config.
2) Edit src/sys/kern/kern_malloc.c manually, look for
"XXX CHANGEME" and replace the M_SUBPROC comparison with
the appropriate malloc type (this might require additional
but small/simple code modification if, say, the malloc type
is declared out of scope).
3) Build and install your kernel. Tune vm.memguard_divisor
boot-time tunable which is used to scale how much of kmem_map
you want to allott for MemGuard's use. The default is 10,
so kmem_size/10.
ToDo:
1) Bring in a memguard(9) man page.
2) Better instrumentation (e.g., boot-time) of MemGuard taking
over malloc types.
3) Teach UMA about MemGuard to allow MemGuard to override zone
allocations too.
4) Improve MemGuard if necessary.
This work is partly based on some old patches from Ian Dowse.
cards work. These changes depend on the expanded funce parsing that
just was committed to pccard_cis.c. In NetBSD the ethernet address
was read out of attr memory directly. We rely on the kernel pccard
parser to pulll this information out of what appears to be an obsolete
funce with the information in it.
# I'm still getting the no rx interrupt sometimes with some hub/switches
# for reasons unknown... But usually only one and only when dhclient
# runs.
as type 0, rather than the usualy type 4. Assume that this format is
from an old standard and go with it. The Fujitsu FMV-186A and Silicom
Ethernet cards I have both have tuples with this format, and they are
both pretty old cards.
# if somebody knows for sure, please let me know.
in BSD class, ie. if provider below us uses the same metadata, don't
create slices based on the metadata.
This allows to create slices on geoms with rank != 1 without hacks.
Discussed with: phk
Approved by: phk
MFC after: 2 weeks
aware of any fe based cards that do anything except network (well,
maybe the fujitsu scsi/lan card, but I've only seen two of those on
ebay in the last 3 years).
replacement address for an rdr rule. Some rdr rules have no address family
(when the replacement is a table and no other criterion implies one AF).
In this case, pf would fail to select a replacement address and drop the
packet due to translation failure.
Found by: Gustavo A. Baratto
virtual COM port. This makes the use of the Dell OpenManage tools on FreeBSD
considerably easier, and is based on Chuck Cranor's original patch for 4.6.
Reviewed by: imp
Tested by: dpk at dpk dot net
MFC after: 1 week
bridge in the device tree which lacks the mandatory (also by the OFW PCI
bus binding spec) "reg" property. Change the code to just ignore nodes
missing the "reg" property instead of panicing when encountering such a
node. Also ignore nodes without a "name" property (guaranteed by the OFW
PCI bus binding spec). This brings the behaviour of the MD OFW PCI code
regarding such incomplete nodes in line with the EBus and the SBus code.
Tested by: Cyril Tikhomiroff <tikho@anor.net>
MFC after: 1 month
to elide. This is a somewhat more convenient way of specifying in
e.g. make.conf a list of modules you know you will never need.
PR: kern/76225
Submitted by: David Yeske <dyeske@yahoo.com>
MFC after: 2 weeks
now use a pool mutex to manage the reference counts. This fixes races
resulting in use-after-free.
Tested by: kris, David Cornejo dave at dogwood dot com
Reported by: bmilekic's MemGuard
MFC after: 1 week
o rework pll setup code to follow h/w specification
o add hint.hifn.X.pllconfig to specify reference clock setup
requirements; default is pci66 which means the clock is
derived from the PCI bus clock and the card resides in a
66MHz slot
Tested on 7955 and 7956 cards; support for 7954 cards not enabled
since we have no cards to test against.
In collaboration with Poul-Henning Kamp.
Reviewed by: phk
MFC after: 1 week
Rather than have a twisty maze of special case allocations, move
instead to a data driven allocation. This should be the most robust
way to cope with the resource problems that the multiplicity of ways
of encoding 5 registers that have the misfortune of not being a power
of 2 nor contiguous.
Also, make it less impossible that pccard will work. I've not been able
to get my libretto floppy working, but it now fails later than before.
phk and I had similar ideas on this during the 5.3 release cycle, but
it wasn't until recently that I could test more than one allocation
scenario.
MFC After: 1 month (5.4 if possible, 5.5 if not)
and tweaks. The code was actually quite broken because it discarded the
upper bits of the 64 bit division. We only had a 50% chance of scaling up
the blocksize for large NFS client mounts when it was needed. For 5.x and
beyond, this was harmless because we could represent the result in either
case. For 4.x this was a big problem though. (4.x also has a df(1) bug to
compound the problem)
happen on the first management frame received from a neighbor; we assume
any merge candidate will send more frames and those should be processed
with a suitable table entry.
Stepped on by: Tai-hwa Liang
interrupts that have a trigger mode of conforming. This fixes problems on
some older machines that still route PCI devices via ISA interrupts when
using an I/O APIC.
Tested by: Peter Trifonov pvtrifonov at mail dot ru
MFC after: 1 month
instead of burying that in the atpic(4) code as atpic(4) is not the only
user of elcr(4). Change the elcr(4) code to export a global elcr_found
variable that other code can check to see if a valid ELCR was found.
MFC after: 1 month
producers rather than consumers as new-bus resources only handle consumed
resources. We already do this for the other ACPI resource types that
support the producer/consumer attribute.
Hold a lock on the table instead of futzing with reference counts which
was potentially dangerous except drivers were quiescent while we did this
so the table contents never changed. Disable the hack logic for removing
scan candidates with multiple association failures; it's never done the
right thing and will be fixed correctly with background scanning goes in.
object (/) rather than the pci bus object when walking the _PRT to force
attach devices. We already look up relative to the root object when doing
interrupt routing.
Suggested by: njl
For such devices, we require _PRS to exist and we warn if any of the
resources in _PRS are not IRQ resources (since we'll have no way of knowing
which of those resources to use without a working _CRS). When it does
come time to set resources, we build up a resource buffer from scratch
as we do for devices with _CRS that only have IRQ resources.
- Fix a bug with setting extended IRQ resources where we set the IRQ value
in the wrong resource structure meaning that whichever IRQ was listed in
_PRS was used instead. This might fix some weird issues on certain boxes
where IRQs > 16 don't seem to work when using ACPI.
- Fix a bug with how we walked the resource buffer after _SRS to call
config_intr() in that the 'end' variable was not properly updated, so we
could either terminate the loop early or loop after the end of the
buffer.
Tested by: pjd
not we're going to process the frame; this makes the counters reflect frames
actually processes instead of received (discarded frames were already counted)
o increase the max per-frame tx descriptor count and the number of tx
buffers for forthcoming fast frame support
o correct the max scatter/gather count; it cannot be larger than the
max(tx,rx,beacon) descriptor counts
(fix imported from madwifi by Takanori Watanabe)
o eliminate save/restore of pci registers handled by the system
o eliminate duplicate zero of the softc (noted by njl)
o consolidate common code
MFC after: 1 week
and always has been, but the system call itself returns
errno in a register so the problem is really a function of
libc, not the system call.
Discussed with : Matthew Dillion <dillon@apollo.backplane.com>
(calling a __dead2 function such as panic() at the end of a function), the
saved %eip on the stack will actually not be part of the function that
executed a call instruction but instead will be the first instruction of
the next function in the text. This happens with dblfault_handler() and
syscall() for example. Work around this in the one place it matters by
looking at the saved %eip - 1 to determine the calling function when we
check for "magic" frames.
MFC after: 2 weeks
provides truer debugger stack traces. For those that want to stick with
-O2 kernel builds, one should probably add -fno-optimize-sibling-calls
so that each stack frame as a frame pointer.
It is semi-promissed by the Release Engineers that when RELENG_6 is
created we go back to -O2.
Desired by: scottl, jhb
they both happen before pipe backing allocation occurs. Previously,
a pipe memory shortage would cause a panic due to a KNOTE call
on an uninitialized si_note.
Reported by: Peter Holm
MFC after: 1 week
for the vast majority of our cards. However, they are critically
needed to distinguish different fe based PC Cards (the FMV-182 from
the 182A) which need to be treated differently (the ethernet address
is loaded not from the standard CIS-based ethernet tuples, but from
differing locations in attribute space based on the version string in
CIS3. This should have no impact for other users of this function.
unhappiness lately.
As far as I can tell, no files that have made it safely to disk
have been endangered, but stuff in transit has been in peril.
Pointy hat: phk
- Introduce the amr_io_lock to control access to command queues, bio queues,
and the hardware.
- Eliminate the taskqueue and do all completion processing in the ithread.
- Assign a static slot number to each command instead of doing a linear
search for free slots each time a command is needed.
- Modify the interrupt handler to more closely match what Linux does, for
safety.
and BBO is BO's backing object. Now, suppose that O and BO are being
collapsed. Furthermore, suppose that BO has been marked dead
(OBJ_DEAD) by vm_object_backing_scan() and that either
vm_object_backing_scan() has been forced to sleep due to encountering
a busy page or vm_object_collapse() has been forced to sleep due to
memory allocation in the swap pager. If vm_object_deallocate() is
then called on BBO and BO is BBO's only shadow object,
vm_object_deallocate() will collapse BO and BBO. In doing so, it adds
a necessary temporary reference to BO. If this collapse also sleeps
and the prior collapse resumes first, the temporary reference will
cause vm_object_collapse to panic with the message "backing_object %p
was somehow re-referenced during collapse!"
Resolve this race by changing vm_object_deallocate() such that it
doesn't collapse BO and BBO if BO is marked dead. Once O and BO are
collapsed, vm_object_collapse() will attempt to collapse O and BBO.
So, vm_object_deallocate() on BBO need do nothing.
Reported by: Peter Holm on 20050107
URL: http://www.holm.cc/stress/log/cons102.html
In collaboration with: tegge@
Candidate for RELENG_4 and RELENG_5
MFC after: 2 weeks
Without this fix, when ACLs are set via tunefs(8) on the root file system,
they are removed on boot when 'mount -a' is called, because mount(8)
called for the root file system always add MNT_UPDATE flag and MNT_UPDATE
flag isn't perfect.
Now, one cannot remove ACLs stored in superblock (configured with tunefs(8))
via 'mount -a' nor 'mount -u -o noacls <file system>', but it is still
possible to mount file system which doesn't have ACLs in superblock via
'mount -o acls <file system>' or /etc/fstab's 'acls' option.
Reported by: Lech Lorens/pl.comp.os.bsd
Discussed with: phk, rwatson
Reviewed by: rwatson
MFC after: 2 weeks
calls MiniportQueryInformation(), it will return NDIS_STATUS_PENDING.
When this happens, ndis_get_info() will sleep waiting for a completion
event. If two threads call ndis_get_info() and both end up having to
sleep, they will both end up waiting on the same wait channel, which
can cause a panic in sleepq_add() if INVARIANTS are turned on.
Fix this by having ndis_get_info() use a common mutex rather than
using the process mutex with PROC_LOCK(). Also do the same for
ndis_set_info(). Note that Pierre's original patch also made ndis_thsuspend()
use the new mutex, but ndis_thsuspend() shouldn't need this since
it will make each thread that calls it sleep on a unique wait channel.
Also, it occured to me that we probably don't want to enter
MiniportQueryInformation() or MiniportSetInformation() from more
than one thread at any given time, so now we acquire a Windows
spinlock before calling either of them. The Microsoft documentation
says that MiniportQueryInformation() and MiniportSetInformation()
are called at DISPATCH_LEVEL, and previously we would call
KeRaiseIrql() to set the IRQL to DISPATCH_LEVEL before entering
either routine, but this only guarantees mutual exclusion on
uniprocessor machines. To make it SMP safe, we need to use a real
spinlock. For now, I'm abusing the spinlock embedded in the
NDIS_MINIPORT_BLOCK structure for this purpose. (This may need to be
applied to some of the other routines in kern_ndis.c at a later date.)
Export ntoskrnl_init_lock() (KeInitializeSpinlock()) from subr_ntoskrnl.c
since we need to use in in kern_ndis.c, and since it's technically part
of the Windows kernel DDK API along with the other spinlock routines. Use
it in subr_ndis.c too rather than frobbing the spinlock directly.
the last action of kern_exit(). Instead, it is a MD callout to cleanup
per-process state during exit.
- Add notes of concern to Alpha and ia64 about the possible need to drop
fp state in cpu_thread_exit() rather than in cpu_exit() since it is
per-thread state rather than per-process.
- ip_fw_chk() returns action as function return value. Field retval is
removed from args structure. Action is not flag any more. It is one
of integer constants.
- Any action-specific cookies are returned either in new "cookie" field
in args structure (dummynet, future netgraph glue), or in mbuf tag
attached to packet (divert, tee, some future action).
o Convert parsing of return value from ip_fw_chk() in ipfw_check_{in,out}()
to a switch structure, so that the functions are more readable, and a future
actions can be added with less modifications.
Approved by: andre
MFC after: 2 months
and KASSERT coverage.
After this check there is only one "nasty" cast in this code but there
is a KASSERT to protect against the wrong argument structure behind
that cast.
Un-inlining the meat of VOP_FOO() saves 35kB of text segment on a typical
kernel with no change in performance.
We also now run the checking and tracing on VOP's which have been layered
by nullfs, umapfs, deadfs or unionfs.
Add new (non-inline) VOP_FOO_AP() functions which take a "struct
foo_args" argument and does everything the VOP_FOO() macros
used to do with checks and debugging code.
Add KASSERT to VOP_FOO_AP() check for argument type being
correct.
Slim down VOP_FOO() inline functions to just stuff arguments
into the struct foo_args and call VOP_FOO_AP().
Put function pointer to VOP_FOO_AP() into vop_foo_desc structure
and make VCALL() use it instead of the current offsetoff() hack.
Retire vcall() which implemented the offsetoff()
Make deadfs and unionfs use VOP_FOO_AP() calls instead of
VCALL(), we know which specific call we want already.
Remove unneeded arguments to VCALL() in nullfs and umapfs bypass
functions.
Remove unused vdesc_offset and VOFFSET().
Generally improve style/readability of the generated code.
so we need to acquire Giant in netgraph methods, so that we don't
race with line discipline methods. Remove NET_NEEDS_GIANT.
- Packets coming into node from netgraph are queued in ifqueue
attached to node private data.
- Mutex in struct ifqueue is used to lock not only the queue, but
the whole private data, and tp->t_lsc field.
- tp->t_lsc pointer is used to indicate whether line discipline is
attached to netgraph or not.
- Use FLG_DIE flag to indicate that node may be destroyed.
(This protection doesn't work, and it didn't before. Must be redesigned.)
- Increment ngt_unit atomically, removing mutex.
- Acquire Giant, when executing ngt_start() from netgraph context.
- Acquire Giant, when {,de}registering line discipline.
- Uncomment forcing queue mode on peers hook, since this is reasonable.
- Force queue mode on our hook, to avoid acquiring Giant when coming from
network stack. We may already hold some mutexes at this point.
Cleanups:
- Use callout_pending() instead of our own flag.
- Remove spl(9) calls. Now we can use return() instead of ERROUT().
style(9):
- Sort includes.
- Sparse initializer for struct linesw.
- Remove some empty lines, sort declarations.
Reviewed by: julian, phk
MFC after: 1 month
of len in tcp_output(), in the case where the FIN has already been
transmitted. The mis-computation of len is because of a gcc
optimization issue, which this change works around.
Submitted by: Mohan Srinivasan
interrupt is wired up to all the I/O APICs in the system. If the system
has only one I/O APIC, then just act as if the entry specified that APIC.
We still don't try to handle global entries in a system with multiple I/O
APICs.
Tested by: Peter Trifonov pvtrifonov at mail dot ru
MFC after: 1 week
up its pending error state, which may be set in some rare conditions resulting
in connect() syscall returning that bogus error and making application believe
that attempt to change association has failed, while it has not in fact.
There is sockets/reconnect regression test which excersises this bug.
MFC after: 2 weeks
errno can be tampered potentially by nested signal handle.
Now all error codes are returned in negative value, positive value are
reserved for future expansion.
external source (i.e., _STA). The previous case only handled calls
occurring within AML. This should fix Toshibas, among others. Thanks
to Robert Moore of Intel for the fix.
MFC after: 2 days
the given providers. Without even one of the configured components there
should be no way to get the secret.
Supported by: WHEEL Sp. z o.o.
http://www.wheel.pl
- Use callout_pending() instead of our own flags.
- Remove home-grown protection of node, which has a scheduled
callout().
- Remove spl(9) calls.
Tested by: bz
TAILQ_FOREACH_SAFE().
Loose the error pointer argument and return any errors the normal way.
Return EAGAIN for the case where more work needs to be done.
e.g. at the loader:
set hint.pcib.1.skipslot=26
This allows undocumented and problematic hardware on some systems
to be ignored, for instance, the USB keyboard/mouse that shows up
on a 12" albook that doesn't exist nor do anything other than eat up
the syscons keyboard. Another one is the unused USB cell in the old
366MHz iBook that locks up the machine when probed.
In a way this is temporary, since there are better fixes for the
above problems, but will be useful in the meantime by allowing
a keyboard to be used to help debug said fixes :)
- while here remove some trailing white space
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:
The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.
Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.
If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.
Discussed with: rwatson
read the ethernet address from the attribute space hasn't been
implemented. Also add flags for the MBH10302. The flags and maddr
fields will be used when reading from the attribute space...
recursion from the VM is handled (and the calling code that allocates
buckets knows how to deal with it), we do not want to prevent allocation
from the slab header zones (slabzone and slabrefzone) if uk_recurse is
not zero for them. The reason is that it could lead to NULL being
returned for the slab header allocations even in the M_WAITOK
case, and the caller can't handle that (this is also explained in a
comment with this commit).
The problem analysis is documented in our mailing lists:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153445+0+archive/2004/freebsd-current/20041231.freebsd-current
(see entire thread for proper context).
Crash dump data provided by: Peter Holm <peter@holm.cc>
This is just a workaround for a know problem with Motorola E1000
phone. Something is wrong with the configuration of L2CAP/RFCOMM
channel. Even though we set L2CAP MTU to 132 bytes (default RFCOMM
MTU 127 + 5 bytes RFCOMM frame header) and the phone accepts it,
the phone still sends oversized L2CAP packets. It appears that the
phone wants to use bigger (667 bytes) RFCOMM frames, but it does
not segment them according to the configured L2CAP MTU. The 667
bytes RFCOMM frame size corresponds to the default L2CAP MTU of
672 bytes (667 + 5 bytes RFCOMM frame header).
This problem only appears if connection was initiated from the
phone. I'm not sure who is at fault here, so for now just put
workaround in place. Quick look at the spec did not reveal any
anwser.
Tested by: Jes < jjess at freebsd dot polarhome dot com >
MFC after: 3 days
allows my 3com cards to work again. It appears that this code was
once there, but I removed it when I added the alignment issues.
MFC After: 5 days
PR: 70639 (and likely others)
o Implement a shiny new algorithm to keep track of finger movement at
slow speeds. This dramatically reduces the level of questionable
language from users trying to resize windows.
o Properly catch the many extra buttons and dials which manufacturers
are known to screw onto Synaptics touchpad controllers. Currently,
up to seven buttons are known to work, more should work too.
o Add a number of sysctls allowing one to tune the driver to taste in
a simple way:
# Should the extra buttons act as axes or as middle button
hw.psm.synaptics.directional_scrolls
# These control the 'stickiness' at low speeds
hw.psm.synaptics.low_speed_threshold
hw.psm.synaptics.min_movement
hw.psm.synaptics.squelch_level
PR: kern/75725
Submitted by: Jason Kuri <jay@oneway.com>
MFC after: 1 month
fe1: <EAGLE Technology NE200 ETHERNET LAN MBH10302 04>
As reported by Sean Shapira. This appears to be working. Eagle used
Fujitsu's vendor number, with a product number of 4 (which is the same
as the vendor number, which is a little suspect). Since there's no
apparent conflict, go ahead and use it.
Submitted by: Sean Shapira
card, and works with that driver. However, Eagle is using Fujitsu's
vendor number and a product code of 4, which seems a little odd.
Still, there's no conflicts...
1/ doesn't matter on most of our architectures
2/ will never happen unless we start queueing multiple trasactions
to a single endpoint at one time (which we do not allow yet).
If anyone has a big_endian machine with EHCI they might check this
if they are having problems with EHCI but it's unlikely even there..
Submitted by: Hans Petter Selasky <hselasky@c2i.net>
MFC after: 3 days
turns out that LINE30_ROW was always defined, not LINE30. I confused
this for LINE30 and did the unifdef -DLINE30 using that mistaken
belief. This corrects that problem.
Submitted by: nyan-san
won't exist for EBus. Just fail the allocation by returning NULL.
Now drivers that are MI can try resources that the driver knows may
be used by the device.
structures in IPX/SPX -- primarily, sequence numbering, PCB lists,
and PCBs for IPX raw sockets, IPX datagram sockets, and IPX/SPX.
As such, remove remove NET_NEEDS_GIANT() for IPX, and remove the
assertion of Giant in the ipxintr() IPX input path.
Note that IPX/SPX is not fully MPSAFE, and that there are some
problems with IPX/SPX locking that will require some further work.
However, it is now safe enough to run in general without the Giant
lock.
MFC after: 4 weeks
portion of IPX/SPX:
- Protect IPX PCB lists with the IPX PCB list mutex, in particular
when calling PCB and PCB list manipulation routines in ipx_pcb.c.
- Protect both IPX PCB state and SPX PCB state using the IPX PCB
mutex.
- Generally annotate locking, as well as adding liberal use of lock
assertions to document locking requirements.
- Where possible, use unlocked reads when reading integer or smaller
sized socket options on SPX sockets.
- De-spl throughout.
Notes:
- spx_input() expects both the list mutex and PCB mutex to be held
on entry, but will release both on return. Because sonewconn() is
called from spx_input(), it may actually drop one PCB lock and
acquire another during generation of a new connection, meaning the
caller is not in a position to unlock the PCB mutex.
MFC after: 3 weeks
were derived from more complex TCP versions of the same:
- spx_close(), spx_disconnect(), spx_drop(), and spx_usrclosed() all
always free's the spxpcb invalidating the argument, so a return
value is not required to indicate if it has.
- Annotate that the cb arguments to each of these functions is
invalidated via a comment.
- When tearing down a pcb due to sonewconn() having failed, mark the
cb as NULL; later, when deciding whether to store trace information
due to SO_DEBUG, check that cb is not NULL before dereferencing or
a NULL pointer dereference may occur.
MFC after: 3 weeks
When processing socket options against IPX PCBs, generally protect
PCB fields using the IPX PCB mutex. Where possible, use unlocked
reads on integer values to avoid locking overhead.
MFC after: 3 weeks
protocol methods relating to IPX. Conditionally acquire the PCB list
lock in the send operation only if the socket requires binding in order
to use the requested address.
Remove spl's generally no longer required during these accesses.
MFC after: 3 weeks
the IPX-related PCB routines. In general, the list lock is required
to iterate the PCB list, either for read or write; the PCB lock is
required to access or modify a PCB. To change the binding of a PCB,
both locks must be held.
MFC after: 3 weeks
IPX PCB lists. Add macros to initialize, destroy, lock, unlock,
and assert the mutex. Initialize the mutex when IPX is started.
Add per-IPX PCB mutexes, ipxp_mtx in struct ipxpcb, to protect
per-PCB IPX/SPX state. Add macros to initialize, destroy, lock,
unlock, and assert the mutex. Initialize the mutex when a new
PCB is allocated; destroy it when the PCB is free'd.
MFC after: 2 weeks
copyright/license header and was only used by 30line.h. It appears
that the copyright/license in 30line.h covers the old contents
module.h anyway, so this simplifies things a little while cleaning up
one obscure potential license confusion...
Revired by: nyan-san
- Introduce another ng_ether(4) callback ng_ether_link_state_p, which
is called from if_link_state_change(), every time link is changed.
- In ng_ether_link_state() send netgraph control message notifying
of link state change to a node connected to "lower" hook.
Reviewed by: sam
MFC after: 2 weeks
place device objects in \ (in this case, PCI links.) Work around this by
starting our probe from \. To avoid attaching system scope objects,
explicitly skip them. (I think it's an ACPI-CA bug that \_SB and \_TZ have
device and thermal object types.) Thanks to pjd@ for testing.
MFC after: 2 weeks
before deciding to do more expensive locking to account for process
exit. This acceptable minor race avoids two mutex operations in
that highly common case of accounting not being enabled.
MFC after: 2 weeks
- Fix the MP Table pci bridge drivers to not probe the configuration table
unless we actually have one. Machines using a default configuration do
not have such a table.
- Only allow default configuration types of 5 (ISA + PCI) and 6 (EISA +
PCI) as the others are not likely to work. Types 1 through 4 use an
external APIC (probably with 80486 processors) which we certainly do not
support, and type 7 uses an MCA bus which has not been tested with the
new MP Table code.
- Correct the fact that the single I/O APIC in a default configuration has
an ID of 2, not 0.
- Fix off by one errors in setting the bus types from the default_data[]
arrays for default configurations.
- Explicitly configure each of the 16 interrupt pins on the sole I/O APIC
when using a default configuration. This is especially helpful for type
6 (EISA + PCI) since the EISA interrupts need to have their polarity
programmed based on the values in the ELCR.
Much thanks to the submitter and tester who endured several rounds of
testing to get this fixed.
MFC after: 1 week
Tested by: Georg Schwarz georg dot schwarz at freenet dot de
cuts to the chase and fills in a provided s/g list. This is meant to optimize
out the cost of the callback since the callback doesn't serve much purpose for
mbufs since mbuf loads will never be deferred. This is just for amd64 and
i386 at the moment, other arches will be coming shortly.
queue and (possibly) unlocking the containing object from
vm_page_alloc() to vm_page_select_cache(). Recent optimizations to
vm_map_pmap_enter() (see vm_map.c revisions 1.362 and 1.363) and
pmap_enter_quick() have resulted in panic()s because vm_page_alloc()
mistakenly unlocked objects that had not been locked by
vm_page_select_cache().
Reported by: Peter Holm and Kris Kennaway
from 4.x kernel config files. User's wishing to upgrade from 4.x to 6
will need to go through 5.x, or grab this script from there. These
scripts will remain in RELENG_5...
queue to the free queue. With this change, if a page from the cache
queue belongs to a locked object, it is simply skipped over rather
than moved to the inactive queue.
SI_SUB_INIT_IF but before SI_SUB_DRIVERS. Make Netgraph(4)
framework initialize at SI_SUB_NETGRAPH level.
This does not address the bigger problem: MODULE_DEPEND
does not seem to work when modules are compiled in the
kernel, but it fixes the problem with Netgraph Bluetooth
device drivers reported by a few folks.
PR: i386/69876
Reviewed by: julian, rik, scottl
MFC after: 3 days
and if the client (erroneously) reads the RPC length as 0 bytes, the
client can loop around in the socket callback. Explicitly check for
the length being 0 case and teardown/re-connect.
Submitted by: Mohan Srinivasan
turn it back on. Specifically, the actual changes are now less intrusive
in that the _get_spin_lock() and _rel_spin_lock() macros now have their
contents changed for UP vs SMP kernels which centralizes the changes.
Also, UP kernels do not use _mtx_lock_spin() and no longer include it. The
UP versions of the spin lock functions do not use any atomic operations,
but simple compares and stores which allow mtx_owned() to still work for
spin locks while removing the overhead of atomic operations.
Tested on: i386, alpha
to be the same as Boca Research Turbo Serial 654 (4 serial port).
While add the 8 port variants as well.
Submitted by: sten@blinkenlights.nl
PR: kern/75793
MFC after: 1 week
The main changes are:
1. Use of multiple bus dma tags.
2. Timing of CAM requests by the driver.
3, Firmware interface change relating to retrieving AEN's.
4. Removal of twa_intrhook.
5. Bundling of latest firmware with BBU capability.
Reviewed by:re
Approved by:re
interface as well. This is not an expected revision id per the
datasheet, but unfortunately there are such cards out there with
a 82557 chipset, and they want to use the 82503.
PR: kern/75739
Reported by: Andre Albsmeier <andre.albsmeier@siemens.com>
- Mark mount, unmount and nmount MPSAFE.
- Add a stub for _umtx_op().
- Mark open(), link(), unlink(), and freebsd32_sigaction() MPSAFE.
Pointy hats to: several
(we ignore it).
- Remove code used for handling spoil events, as spoiling is not possible
anymore, because we keep consumers open for writing all the time.
MFC after: 4 days
After disscussing things I have decided to take the easy and
consistent 90% solution instead of aiming for the very involved 99%
solution.
If we allow forceful unmounts of DEVFS we need to decide how to handle
the devices which are in use through this filesystem at the time.
We cannot just readopt the open devices in the main /dev instance since
that would open us to security issues.
For the majority of the devices, this is relatively straightforward
as we can just pretend they got revoke(2)'ed.
Some devices get tricky: /dev/console and /dev/tty for instance
does a sort of recursive open of the real console device. Other devices
may be mmap'ed (kill the processes ?).
And then there are disk devices which are mounted.
The correct thing here would be to recursively unmount the filesystems
mounte from devices from our DEVFS instance (forcefully) and if
this succeeds, complete the forcefully unmount of DEVFS. But if
one of the forceful unmounts fail we cannot complete the forceful
unmount of DEVFS, but we are likely to already have severed a lot
of stuff in the process of trying.
Event attempting this would be a lot of code for a very far out
corner-case which most people would never see or get in touch with.
It's just not worth it.
3C920B-EMB-WNM Integrated Fast Ethernet Controller
Submitter reports that the card appears to autonegotiate properly, and
operate well with high levels of NFS traffic.
PR: 75253
Submitted by: "Oleg V. Nauman" <oleg at reis dot zp dot ua>
MFC after: 2 weeks
only one set is needed for either endianess. This also completes them for
big endian archs and fixes the compilation of libncp, netncp, etc. there.
Reviewed by: bp, rwatson
Compile tested on: i386, sparc64
MFC after: 1 week
o Move the sysctls under debug.psm.* and hw.psm.* making them a bit
clearer and more consistent with other drivers.
o Remove the debug.psm_soft_timeout sysctl. It was introduced many
moons ago in r1.64 but never referenced anywhere.
o Introduce hw.psm.tap_threshold and hw.psm.tap_timeout to control
the behaviour of taps on touchpads. People might like to fiddle
with these if tapping seems to slow or too fast for them.
o Add debug.psm.loglevel as a tunable so that verbosity can be set
easily at boot-time (to watch probes and such) without having to
compile a kernel with options PSM_DEBUG=N.
that the RFC 793 specification for accepting RST packets should be
following. When followed, this makes one vulnerable to the attacks
described in "slipping in the window", but it may be necessary in
some odd circumstances.
spx_reass() to increase atomicity across multiple operations on the
socket buffer when iterating over the SPX fragment reassembly list
for the ipxpcb, as well a to reduce the number of locking operations.
record loop for ACK'd data, rather than relying on lokcing in
sbdroprecord() and sowwakeup(), reducing the number of lock operations
as well as eliminating a possible race against the head of the send
buffer mbuf chain. Use the _locked variants of sbdroprecord() and
sowwakeup().
the peer address by using M_WAITOK in ipx_setpeeraddr() to prevent
allocation failure. The socket reference used to reach these calls
will prevent the ipxpcb from being released prematurely.
properly handle the case where a connection is disconnected. The
queue(9)-enabled version of this code broke from the inner but not
outer loop, and so potentially frobbed an ipxpcb flag after the ipxpcb
was free'd, which might be picked up later by the malloc debugging
code. Properly break from the loop context and avoid touching the
cb/ipxpcb after free.
connection rates, which is causing problems for some users.
To retain the security advantage of random ports and ensure
correct operation for high connection rate users, disable
port randomization during periods of high connection rates.
Whenever the connection rate exceeds randomcps (10 by default),
randomization will be disabled for randomtime (45 by default)
seconds. These thresholds may be tuned via sysctl.
Many thanks to Igor Sysoev, who proved the necessity of this
change and tested many preliminary versions of the patch.
MFC After: 20 seconds
expects a locked route reference. This removes a panic that occurs
when connected ipxpcb is closed and its route free'd, and may have been
present since the route locking took place.
MFC after: 2 weeks
o implement double-extended and single precision loads and stores,
o implement double precision stores,
o replace the machdep.unaligned_print sysctl with debug.unaligned_print
and change the default value to 0,
o replace the machdep.unaligned_sigbus sysctl with debug.unaligned_test,
o Remmove the fillfd() function. The function is trvial enough for
inline assembly.
The debug.unaligned_test sysctl is used to test the emulation of
misaligned loads and stores. When PSR.ac is 0, the CPU will handle
misaligned memory accesses itselfi and we don't get an exception
for it. When PSR.ac is 1, the process needs to be signalled and we
should not emulate. The sysctl takes effect when PSR.ac is 1 and
tells us that we should emulate and not send a signal.
PR: 72268
MFC after: 1 week
function provided by the driver limits allocations to the page size,
i.e. 4KB on i385 and 8KB on typical 64 bit processors. Since amd64
has 64 bit pointers, but only 4KB pages, an array of pointers that
just fits into one page on all the other processors, does require
2 pages on amd64.
In order to make this driver useful on amd64, the allocation unit
has been increased to 2 pages on amd64 and contigmalloc() is used
instead of malloc(). All other processor types are unaffected by
this change. This modification has only been compile-tested on
amd64, yet, but should just work (FLW).
machine; instead use the intended entry points. There's still
too much incestuous knowledge about the internals of the
802.11 layer but this at least fixes adhoc mode.
o ic_inact_auth is a bad name, it's the inactivity threshold
for being associated but not authorized; use it that way
o reset ni_inact when switching inactivity thresholds to
minimize the race against the timer (don't want to lock
for this stuff)
o change the inactivity probe threshold from a one-shot to
cover a range: when below this threshold but not expired
send a probe each inactivity interval; should probably
guard against the interval being turned way down as this
could cause us to spam the net with probes
we're at it:
o WPA/802.11i has a unicast key and a group key; in station mode
everything is sent with the unicast key--we were consulting the
destination mac address and incorrectly using the group key
o (perpetuate fallback use of the default tx key to maintain
compatibility with the way wpa_supplicant works)
o correct EAPOL encryption logic to check unicast key instead
of assuming other state implies this
o move QoS encapsulation up to before enmic work so TKIP has the
information required to calculate the pseudo-header
o do not do QoS-encapsulation of EAPOL frames as some ap's do the
wrong thing with such frames (may need to revisit this if ap's
start dropping non-QoS frames from stations assoc'd with QoS)
o move ieee80211_mbuf_adjust closer to its caller
unloaded, cleanup, or return ebusy of that's inconvenient.' The
default module hanlder for newbus will now call this when we get a
MOD_QUIESCE event, but in the future may call this at other times.
This shouldn't change any actual behavior until drivers start to use it.
suggested by Peter Edwards. This seems to fix my fxp problems and
likely will fix his as well. Use DELAY rather than *sleep because we
can be called from any context.
o catch one place where we were not using ath_chan_change to
switch channels; this fixes a problem where the channel
settings were not being correctly reported in captured packets
o return unique channel identification in the channel flags;
ethereal gets confused if you return merged flags (e.g. ofdm,
cck, and 2Ghz) (this is workaround and should be removed if
we can ever cleanup radiotap consumers)
o correct short/long preamble flag state for rx and treat tx
the same--use a new hwflags array that gives us the data
based on the h/w rate index/cookie
o add gross hack to handle radiotap capture of frames that
come in with hardware padding; should be replaced by a
flag in the radiotap header and more smarts in the apps
that decode radiotap data
o lintval is in ms; must convert to TU's for passing to the hal
o roundup to calculate nexttbtt (should look at current tsf and pull the
calculated nextbtt forward but this'll do for now)
o don't or- in HAL_BEACON_RESET_TSF when doing station timer setup; this
is not needed and messes up the sleep timer calcs, though it's unclear
if it mattered as the hal masks these values before use
Submitted by: Thorsten von Eicken
the netperf branch but for some reason didn't trigger a build failure
locally when I merged to CVS and omitted it. Presumably driver error.
Pointed out by: cperciva, tinderbox
schedulers a bit to ensure more correct handling of priorities and fewer
priority inversions:
- Add two functions to the sched(9) API to handle priority lending:
sched_lend_prio() and sched_unlend_prio(). The turnstile code uses these
functions to ask the scheduler to lend a thread a set priority and to
tell the scheduler when it thinks it is ok for a thread to stop borrowing
priority. The unlend case is slightly complex in that the turnstile code
tells the scheduler what the minimum priority of the thread needs to be
to satisfy the requirements of any other threads blocked on locks owned
by the thread in question. The scheduler then decides where the thread
can go back to normal mode (if it's normal priority is high enough to
satisfy the pending lock requests) or it it should continue to use the
priority specified to the sched_unlend_prio() call. This involves adding
a new per-thread flag TDF_BORROWING that replaces the ULE-only kse flag
for priority elevation.
- Schedulers now refuse to lower the priority of a thread that is currently
borrowing another therad's priority.
- If a scheduler changes the priority of a thread that is currently sitting
on a turnstile, it will call a new function turnstile_adjust() to inform
the turnstile code of the change. This function resorts the thread on
the priority list of the turnstile if needed, and if the thread ends up
at the head of the list (due to having the highest priority) and its
priority was raised, then it will propagate that new priority to the
owner of the lock it is blocked on.
Some additional fixes specific to the 4BSD scheduler include:
- Common code for updating the priority of a thread when the user priority
of its associated kse group has been consolidated in a new static
function resetpriority_thread(). One change to this function is that
it will now only adjust the priority of a thread if it already has a
time sharing priority, thus preserving any boosts from a tsleep() until
the thread returns to userland. Also, resetpriority() no longer calls
maybe_resched() on each thread in the group. Instead, the code calling
resetpriority() is responsible for calling resetpriority_thread() on
any threads that need to be updated.
- schedcpu() now uses resetpriority_thread() instead of just calling
sched_prio() directly after it updates a kse group's user priority.
- sched_clock() now uses resetpriority_thread() rather than writing
directly to td_priority.
- sched_nice() now updates all the priorities of the threads after the
group priority has been adjusted.
Discussed with: bde
Reviewed by: ups, jeffr
Tested on: 4bsd, ule
Tested on: i386, alpha, sparc64
- ipx_pcbnotify(), which is never called.
- ipx_rtchange(), which is never called, is incomplete inplemented, and
also #ifdef notdef.
- spx_fixmtu(), which is never called, is incompletely implemented, and
also #ifdef notdef.
other comments. Clarify that the next two things needed for SMP are
two lines.
- Expand mii abbreviation to miibus for clarity in the USB ethernet
comment.
is the case for most other sysctls in the System V IPC message queue
implementation.
PR: 75541
Submitted by: Sergiy Vyshnevetskiy <serg at vostok dot net>
MFC after: 2 weeks
generic bridge support was biting us more than it helped, whenever a new chipset
came out from a vendor and misprogramming it caused strange hangs or corruption.
[2] Add a large number of PCI IDs based on what the linux drivers support.
Note that the new PCI IDs haven't been tested, they're just *likely* to work.
In particular the VIA AGP 8x chipsets are concerning, due to lack of testing,
possible issues (kern/69953), and not having a nice "does this bridge say it
would do 8x" function. However, this shouldn't make the situation worse, since
these chips would have probed in the past anyway.
more general than previous. It also lets me implement cancelable point
in thread library. Also in theory, umtx_lock and umtx_unlock can
be implemented by using umtx_wait and umtx_wake, all atomic operations
can be done in userland without kernel's casuptr() function.
to remove a transaction from the async schedule. The previous method didn't
work well and led to the hardware writing to free'd buffers etc, as
it didn't always know that the transaction had been aborted.
Written after consultation with David Brownell who wrote the Linux
EHCI driver.
As part of this give the sqh structure a "previous" pointer.
MFC after: 1 week
rather than a softc pointer (with the bus structure at the start).
This is a non-functional change. It just helps when reading the code to
know that the ehci, ohci and uhci drivers share the bus structure, not the
entire softc.
of lock types in the kernel. This results in an increase of witness
data usage from ~145k to ~280k on i386 for kernels with
'options WITNESS'.
- Remove the unused witness malloc bucket.
Submitted by: Michal Mertl mime at traveller dot cz (1)
This allows boot to proceed on a real system until the issue
of calling back into certain OpenFirmware calls (e.g. finddevice)
in thread context is understood.
(this commit only affects psim users, of which I think I am the
only one...)
Silence on: net@, current@, hackers@.
No objections: joerg
Requested by: by many (mostly Cronyx) users for a long long time.
MFC after: 10 days
PR: kern/21771, kern/66348
show file name for 'mdconfig -l -u <x>' command.
This allows to preserve API/ABI compatibility with version 0 (that's why
I changed version number back to 0) and will allow to merge this change
to RELENG_5.
MFC after: 5 days
system from an AP at runtime (i.e., calling cpu_reset from ddb). Someday,
if we move to an NMI for stopping cpus instead, we can do away with this.
Requested by: jhb
- Remove the sched_add wrapper that used sched_add_internal() as a backend.
Its only purpose was to interpret one flag and turn it into an int. Do
the right thing and interpret the flag in sched_add() instead.
- Pass the flag argument to sched_add() to kseq_runq_add() so that we can
get the SRQ_PREEMPT optimization too.
- Add a KEF_INTERNAL flag. If KEF_INTERNAL is set we don't adjust the SLOT
counts, otherwise the slot counts are adjusted as soon as we enter
sched_add() or sched_rem() rather than when the thread is actually placed
on the run queue. This greatly simplifies the handling of slots.
- Remove the explicit prevention of migration for ithreads on non-x86
platforms. This was never shown to have any real benefit.
- Remove the unused class argument to KSE_CAN_MIGRATE().
- Add ktr points for thread migration events.
- Fix a long standing bug on platforms which don't initialize the cpu
topology. The ksg_maxid variable was never correctly set on these
platforms which caused the long term load balancer to never inspect
more than the first group or processor.
- Fix another bug which prevented the long term load balancer from working
properly. If stathz != hz we can't expect sched_clock() to be called
on the exact tick count that we're anticipating.
- Rearrange sched_switch() a bit to reduce indentation levels.
and threads currently holding sleep mutexes (and spin mutexes for
curthread). This can be quite useful in looking for a lock condition
summary for a system, as it avoids manually iterating through threads
and processes to find all the interesting locks.
NB: "alllocks" is up there with "lockedvnods" for a bad argument for
show.
MFC after: 2 weeks
ADVANCELOGIC->AVANCELOGIC (nothing in the tree uses it, so safe to do)
sort HAGIWARA vendor entry
sort ACTIONTAR vendor entry
Minor change to SYSTEMTALKS vendor entry.
lower the priority of the returning thread to a user priority before
calling into thread_userret() which would call wakeup() which in turn would
cause the returning thread to eventually context switch rather than
completing its slice. Allowing this thread to complete its slice first
yields a 15% performance improvement in super-smack on my dual opteron with
4BSD.
Add $NetBSD$ in a comment at the top
Update copyright dates
Update header comment
Add some of the entries not present in FreeBSD's usbdevs file
Harmonize some descriptions with NetBSD where NetBSD's were shorter
More work needs to happen here, as there's many conflicting vendor
names. There's also more harmonization that can happen before that
problem is tackled.
This was inspired by recent discussions, but none of the patches
posted were consulted to produce this commit. Other, similar ones
will follow.
cases for tcp_input():
While it is true that the pcbinfo lock provides a pseudo-reference to
inpcbs, both the inpcb and pcbinfo locks are required to free an
un-referenced inpcb. As such, we can release the pcbinfo lock as
long as the inpcb remains locked with the confidence that it will not
be garbage-collected. This leads to a less conservative locking
strategy that should reduce contention on the TCP pcbinfo lock.
Discussed with: sam
After this change, when component is disconnected because of an I/O error,
it will not be connected and synchronized automatically, it will be logged
as broken and skipped. Autosynchronization can occur, when component is
disconnected (on orphan event) and connected again - there were no I/O
error, so there is no need to not connected the component, but when there were
writes while it wasn't connected, it will be synchronized.
This fix cases, when component is disconnected because of I/O error and can be
connected again and again.
- Bump version number.
- Implement backward compatibility mechanism. After this change when metadata in
old version is detected, it is automatically upgraded to the new (current)
version.
One of a set of patches submitted by Kazuhito HONDA
to make the usb audio driver a lot more capable.
PR: 75274
Submitted by: Kazuhito HONDA (kazuhito at ph dot noda dot tus dot ac dot jp)
Obtained from: NetBSD (indirectly)
MFC after: 2 weeks
flag and busy field with the global page queues lock to synchronizing their
access with the containing object's lock. Specifically, acquire the
containing object's lock before reading the page's PG_BUSY flag and busy
field in vm_fault().
Reviewed by: tegge@
though these aren't used yet.
- Add missing function prototypes for some static functions.
- Allow lvt_mode() to handle an LVT entry with a delivery mode of fixed.
- Consolidate code duplicated in lapic_init() and lapic_setup() to program
the spurious vector register of a local APIC in a static lapic_enable()
function.
- Dump the timer, thermal, error, and performance counter LVT entries
during lapic_dump().
- Program LVT pins (currently only LINT0 and LINT1) after the local
APIC has been software enabled via lapic_enable() since otherwise the
LVT programming will not be able to unmask LVT sources.
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.
Remove a bogus comment from pmap_enter_quick().
Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.
Currently this is only used to initiailize the TPR to 0 during initial
setup.
- Reallocate vectors for the local APIC timer, error, and thermal LVT
entries. The timer entry is allocated from the top of the I/O interrupt
range reducing the number of vectors available for hardware interrupts
to 191. Linux happens to use the same exact vector for its timer
interrupt as well. If the timer vector shared the same priority queue
as the IPI handlers, then the frequency that the timer vector will
eventually be firing at can interact badly with the IPIs resulting in
the queue filling and the dreaded IPI stuck panics, hence it being located
at the top of the previous priority queue instead.
- Fixup various minor nits in comments.
multiple MIB entries using sysctl in short order, which might
result in unexpected values for tcp_maxidle being generated by
tcp_slowtimo. In practice, this will not happen, or at least,
doesn't require an explicit comment.
MFC after: 2 weeks
substitute for a global mutex protecting the socket count and
generation number.
The observation that soreceive_rcvoob() can't return an mbuf
chain is a property, not a bug, so remove the XXXRW.
In sorflush, s/existing/previous/ for code when describing prior
behavior.
For SO_LINGER socket option retrieval, remove an XXXRW about why
we hold the mutex: this is correct and not dubious.
MFC after: 2 weeks
unnecessary use of a global variable and simplify the return case.
While here, use ()'s around return values.
In sodealloc(), remove a comment about why we bump the gencnt and
decrement the socket count separately. It doesn't add
substantially to the reading, and clutters the function.
MFC after: 2 weeks
After this change, when component is disconnected because of an I/O error,
it will not be connected and synchronized automatically, it will be logged
as broken and skipped. Autosynchronization can occur, when component is
disconnected (on orphan event) and connected again - there were no I/O
error, so there is no need to not connected the component, but when there were
writes while it wasn't connected, it will be synchronized.
This fix cases, when component is disconnected because of I/O error and can be
connected again and again.
- Bump version number.
- Add version change history.
- Implement backward compatibility mechanism. After this change when metadata in
old version is detected, it is automatically upgraded to the new (current)
version.
occur between a reader and a writer that results in a panic upon close,
e.g.,
"panic: sbflush_locked: cc 4 || mb 0xffffff0052afa400 || mbcnt 0"
Reviewed by: rwatson@
MFC after: 2 weeks
methods:
Read can see O_NONBLOCK and O_DIRECT.
Write can see O_NONBLOCK, O_DIRECT and O_FSYNC.
In addition O_DIRECT is shadowed as IO_DIRECT for now for backwards
compatibility.
fcntl.h.
This is in preparation for making the flags passed to device drivers be
consistently from fcntl.h for all entrypoints.
Today open, close and ioctl uses fcntl.h flags, while read and write
uses vnode.h flags.
while doing g_(read|write)_data() (e.g. BSD). This can cause a deadlock
in MIRROR class. Not sure if this is safe to drop the topology lock in BSD
class, so change the code in MIRROR class to avoid this deadlock.
GigE controller, so handle this.
- Use the outbound window 0 if the PCI mem requested is in its range, instead
of inconditionally use the outbound window 1.
This should be enough to get FreeBSD/arm to work on the IQ80321 board as well.
Reported and tested by: Jia-Shiun Li <jiashiun at gmail dot com>
This is part of an ongoing cycle of commits on all the BSDs to
merge the USB vendor and device defintions..
A merge from OpenBSD is still pending.
Submitted by: barry bouwsma (freebsd-misuser@NOSPAM.dyndns.dk)
Obtained from: NetBSD
MFC after: 1 week
multiple IRQs (which is nonsense for _CRS) when the link hasn't been
programmed. Before, this was a KASSERT. A ServerWorks system was
seen returning IRQs of 0, 2 in response to _CRS before link setup.
Thanks to sam@ for quick testing and turnaround on this.
Tested by: sam
datasheet says it is only valid for such chipsets and shouldn't be used
with others. This fixes some 82559 based cards which otherwise only
work at 10Mbit.
MFC after: 5 days
Tested by: krion
In contrast to OpenBSD we enable jumbo frame support
depending on MTU setting (like done for xmac).
Approved by: pjd (mentor)
Obtained from: OpenBSD if_sk.c r1.52 (YU_SMR_MFL_JUMBO flag)
Tested by: Heinz Knocke <knockefreebsd at o2 dot pl>
MFC after: 5 days
Keeping consumers open when device is closed is very hard. We need to
open consumers sometimes to update metadata, etc.
Many hacks was introduced in the past to made it possible. You cannot
be sure that you can open consumer for writing always, even if you think
it should be allowed. If one of the mirror components is for example da0
and you try to open it, you can get EPERM when da0s1 is opened for reading
(because BSD class opens consumers (da0) with an extra 'e' bit set).
Waiting for the events queue to be empty may do the trick, but it makes
code much uglier (as you cannot always call g_waitidle()), it doesn't
solve all edge cases and it can introduce deadlocks if there are events
in the queue that wait for gmirror.
I removed those hacks. Now all consumers are open r1w1e1 always, even if
device is closed. Maybe it is less clean from GEOM perspective, but simpify
code a lot and make it much more reliable.
The only issue was retaste event which is sent when we close consumers
opened for writing. I ignore retaste event by not detaching consumer
immediately (so retaste event is not send to my class) and sending event
right after it to detach and destroy consumer.
prevents a possible endless loop in pf_get_sport() with 'static-port'
ICMP state entries use the ICMP ID as port for the unique state key. When
checking for a usable key, construct the key in the same way. Otherwise,
a colliding key might be missed or a state insertion might be refused even
though it could be inserted. The second case triggers the endless loop,
possibly allowing a NATed LAN client to lock up the kernel.
PR: kern/74930
Reported and tested by: Hugo Silva, Srebrenko Sehic
MFC after: 3 days
- Implement arm_mask_irqs and arm_unmask_irqs
- Provide the available physical address range after pmap_bootstrap allocated
things, instead or before, or bad things happen.
call mmap() to create a shared space, and then initialize umtx on it,
after that, each thread in different processes can use the umtx same
as threads in same process.
2. introduce a new syscall _umtx_op to support timed lock and condition
variable semantics. also, orignal umtx_lock and umtx_unlock inline
functions now are reimplemented by using _umtx_op, the _umtx_op can
use arbitrary id not just a thread id.
there is some hope for the 32-bit management utilities to run. I've used
the cli successfully, but 3dm2 doesn't work for other reasons. Of course,
a native binary of the 3dm2 and cli would be much better, but that doesn't
exist.
o don't encapsulate on tx; the chip expect a raw frame w/o the crypto header
o clear the WEP bit in the 802.11 header on rx so the 802.11 layer doesn't
try to strip the crypto header
o clobber the "drop unencoded frames" state bit when privacy is enabled so
rx'd frames we pass up to the 802.11 layer are not discarded as unencrypted
This stuff will need to be redone if anyone decides to add WPA support.
of sillyrenames (which were limited to 58 per pid per directory,
for no good reason). The new format of sillyrenames looks like
.nfs.0000b31a.00d24.4
^^^^^^^^ ^^^^^
ticks pid
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from: Yahoo!
port during the device probe as this can cause hangs on some machines,
specifically Compaq R3000Z series amd64 laptops. The flag is bit 3, or
0x8.
PR: amd64/67745
Reported by: Neil Winterbauer newntrbr at ucla dot edu, many others
Tested by: ade, astrodog at gmail dot com, many others
MFC after: 1 week
- NFS direct IO completely bypasses the buffer and page caches.
If a file is open for direct IO all caching is disabled.
- Direct IO for Directories will be addressed later.
- 2 new NFS directio related sysctls are added. One is a knob to
disable NFS direct IO completely (direct IO is enabled by default).
The other is to disallow mmaped IO on a file that has at least one
O_DIRECT open (see the comment in nfs_vnops.c for more details).
The default is to allow mmaps on a file that has O_DIRECT opens.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from: Yahoo!
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.
Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)
pointers to an integer via uintptr_t.
Fix an apparent bug that caused a compile failure.
ieee80211_iterate_nodes() takes ic->ic_sta as its first argument on the
onoe module. It had just 'ic' here in the same context, which was a
mismatched argument.
of a sizeof, need to use %z to get the correct type on all our platforms.
Also, convert integers<->pointers via uintptr_t.
(I think Sam's instructions were for me to commit this. If I
misunderstood, then I apologize in advance.)
nice of 0. Doing so can cause an infinite loop because they should be
running, but a nice -20 process could prevent them from doing so.
- Add a new flag KEF_PRIOELEV to flag a thread that has had its priority
elevated due to priority propagation. If a thread has had its priority
elevated, we assume that it must go on the current queue and it must
get a slice.
- In sched_userret() if our priority was elevated and we shouldn't have
a timeslice, yield here until we should.
Found/Tested by: glebius
which holds on to just the data structure and the mutex. (The
existing refcount (fd_refcnt) holds onto the open files in the
descriptor.)
The fd_holdcnt is protected by fdesc_mtx, fd_refcnt by FILEDESC_LOCK.
Add fdhold(struct proc *) which gets a hold on the filedescriptors of
the specified proc..
Add fddrop(struct filedesc *) which drops the fd_holdcnt and if zero
destroys the mutex and frees the memory.
Initialize the fd_holdcnt to one in fdinit(). Normal operations on
the filedesc structure will not change it.
In fdfree() use fddrop() to dispose of the mutex and structure. Hold
the FILEDESC_LOCK() until we have cleaned out the contents and carefully
set the fields to null values during cleanup.
Use fdhold()/fddrop() in mountcheckdirs() and sysctl_kern_file().
for ensuring that a process' filedesc is not shared with anybody.
Use it in the two places which previously had private implmentations.
This collects all fd_refcnt handling in kern_descrip.c
to better keep track of the total amoutn transferred during a
transfer. Seems similar to some code in the NetBSD version.
I notice they have incorporated matches from him so I don't know which
direction it went.
Submitted by: damien.bergamini@free.fr
Obtained from: patches to make the ueagle driver work
MFC after: 1 week
You could turn this off by debug.mpsafenet=0 for full network
stack or via debug.{cp|cx|ctau}.mpsafenet for cp(4), cx(4) and
ctau(4) accordingly.
MFC after: 10 days
nice value above 0, set it to 0 so that it may proceed with haste.
This is especially important on ULE, where adjusting the priority
does not guarantee that a thread will be granted a greater time slice.
Now only things that are different between us and NetBSD show up.
Means that these files are more of NetBSD style in some places but
since thay are NetBSD files, um, that's ok.
Obtained from: NetBSD
MFC after: 1 week
do things correctly from an aliasing perspective. Put the
vop_generic_args element as the first element for all the vop_*_args
and adjust the code to take the address of that instead of the
structure.
OK'd based on a vague description by: phk
specifically, vm_pgmoveco():
1. If vm_pgmoveco() sleeps on a busy page, it must redo the look up
because the page may have been freed.
2. If the receive buffer is copy-on-write due to, for example, a fork,
then although the first vm object in the shadow chain may not contain
a page there may still be one from a backing object that is mapped.
Thus, a pmap_remove() is required for the new page rather than the
backing object's page to been seen by the application.
Also, add some comments to vm_pgmoveco() and update some assertions.
Tested by: ken@
pointer to eliminate the hundreds of warnings that we have in tree at
the moment.
# Chances are good that all the struct vop_*_args should have, as its
# first element, the struct vop_generic_args, and when necessary to
# reference it, we just take its address rather than going through
# this double case.
the ISA and CBUS (called isa on pc98) attachments. Eliminate all PC98
ifdefs in the process (the driver in pc98/pc98/mse.c was a copy of the one
in i386/isa/mse.c with PC98 ifdefs). Create a module for this driver.
I've tested this my PC-9821RaS40 with moused. I've not tested this on i386
because I have no InPort cards, or similar such things. NEC standardized
on bus mice very early, long before ps/2 mice ports apeared, so all PC-98
machines supported by FreeBSD/pc98 have bus mice, I believe.
Reviewed by: nyan-san
"vm_fault: fault on nofault entry, addr: %lx" panic. The problem was a
stale PTE in the TLB that marked the page as not present, even though
we had a good PTE in the VHPT. We typically don't yet insert PTEs in
the TLB. We do that lazily. The CPU will look for the PTE in the VHPT
when there's no PTE in the TLB. Unfortunately this doesn't handle the
case of the stale PTE in the TLB. The quick fix is to invalidate the
TLB (sloppily) when the VHPT doesn't contain a valid PTE. This is also
the only case that may cause a PTE in the TLB that marks a page as
non-present.
four different locations on a prospective filesystem.
If we found none, we forgot to invalidate the four buffers, thus the
following sequence would fails:
(md0 = blank disk)
mount /dev/md0 /mnt
(fails, no superblocks)
newfs /dev/md0
(writes using physio which does not go through buffercache).
mount /dev/md0 /mnt
(still fails, the four cached buffers still contain no superblocks)
Found by: ru
between object code generated without the flag but it makes sense and might
make a difference in the future.
PR: kern/53008
Submitted by: Jens Rehsack rehsack at liwing de
without Open Firmware:
- The PCI data structure of some HME PROMs contains a non-zero interface
revision in the class code. Thus remove the checks for matching class
code and PCI data structure length and revsion. These were pretty much
useless anyway as we only really need the pointer to the VPD which is
located before the structure length and revision fields.
- On Sun QFE (Quad FastEthernet) cards read the Nth MAC-address for the
Nth HME controller instead of always the first one for all four HMEs. [1]
- Improve the comment describing the used VPD format to better reflect
reality.
- Minor clean-up.
Prodded by: joerg [1]
ia64) was not the result of a change in the vector operations. It
was caused by the NFS locking code using a FIFO and those bypassing
the vnode. This indirectly caused the panic. The NFS locking code
has been changed.
Requested by: phk
on ia64) was not the result of a change in the vector operations. It
was caused by the NFS locking code using a FIFO and those bypassing
the vnode. This indirectly caused the panic. The NFS locking code has
been changed.
Requested by: phk
ia64) was not the result of a change in the vector operations. It
was caused by the NFS locking code using a FIFO and those bypassing
the vnode. This indirectly caused the panic. The NFS locking code has
been changed.
Requested by: phk
on ia64) was not the result of a change in the vector operations. It
was caused by the NFS locking code using a FIFO and those bypassing
the vnode. This indirectly caused the panic. The NFS locking code has
been changed.
Requested by: phk
Andre:
First lets get major new features into the kernel in a clean and nice way,
and then start optimizing. In this case we don't have any obfusication that
makes later profiling and/or optimizing difficult in any way.
Requested by: csjp, sam
either src or dst) fails. This closes a potential data loss case
(where the fsync failed with ENOSPC, for example).
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from: Yahoo!
Kick off a readahead only when sequential access is detected. This
eliminates wasteful readaheads in random file access.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from: Yahoo!
mechanism used by pfil. This shared locking mechanism will remove
a nasty lock order reversal which occurs when ucred based rules
are used which results in hard locks while mpsafenet=1.
So this removes the debug.mpsafenet=0 requirement when using
ucred based rules with IPFW.
It should be noted that this locking mechanism does not guarantee
fairness between read and write locks, and that it will favor
firewall chain readers over writers. This seemed acceptable since
write operations to firewall chains protected by this lock tend to
be less frequent than reads.
Reviewed by: andre, rwatson
Tested by: myself, seanc
Silence on: ipfw@
MFC after: 1 month
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.
Submitted by: Henry Whincup <henry@jot.to>
MFC after: 1 week
completely. For some reason (that I am still curious about) we started to no
longer manage to finish the initialization before the timeouts run the first
time leading to panics when using uninitialized mutex etc.
The root of this problem is that we currently first link a domain to the
domains list and only later initialize the domain's protocols. This should
be reworked in the future, but with the current API it is not possible in
all situations. We settle with this lazy fix for now.
Tested by: gnn, ru, myself
stepped the process to the system call), we need to clear the trap flag
from the new frame unless the debugger had set PF_FORK on the parent.
Otherwise, the child will receive a (likely unexpected) SIGTRAP when it
executes the first instruction after returning to userland.
Reviewed by: bde
MFC after: 3 days
[Changes listed only since last public release 0.9.12.14; for changes
prior to that consult the CVS logs at http://madwifi.sourceforge.net]
o reorg directory structure to have a single set of public binary builds
shared by all systems
o support for new parts (all shipping pci/cardbus parts to this date work)
o new capabilities for identifying various chip features
o set/get tx power cap for supporting 802.11h information element
o revised api for set/get tx queue properties
o support for updating CTS in frames when doing packet bursting
o support for querying which tx queues have pending interrupts
here but it includes completed 802.11g, WPA, 802.11i, 802.1x, WME/WMM,
AP-side power-save, crypto plugin framework, authenticator plugin framework,
and access control plugin frameowrk.
security.mac.portacl.autoport_exempt
This sysctl exempts to bind port '0' as long as IP_PORTRANGELOW hasn't
been set on the socket. This is quite useful as it allows applications
to use automatic binding without adding overly broad rules for the
binding of port 0. This sysctl defaults to enabled.
This is a slight variation on the patch submitted by the contributor.
MFC after: 2 weeks
Submitted by: Michal Mertl <mime at traveller dot cz>
the PCI bus. We presently have no drivers for these devices, so they
are powered down. This is undesirable behavior since it breaks the
system when the base peripherals go away suddenly in the middle of
boot.
# if we ever get generic drivers for memory and/or base peripherals, then
# we can remove the tests here.
revision 1.55, the address parameter to vnode_pager_addr() was changed
from an unsigned 32-bit quantity to a signed 64-bit quantity. However,
an out-of-range check on the address was not updated. Consequently,
memory-mapped I/O on files greater than 2GB could cause a kernel panic.
Since the address is now a signed 64-bit quantity, the problem resolution
is simply to remove a cast.
Reviewed by: bde@ and tegge@
PR: 73010
MFC after: 1 week
as this may cause deadlocks.
This should fix kern/72123.
Discussed with: jhb
Tested by: Nik Azim Azam, Andy Farkas, Flack Man, Aykut KARA
Izzet BESKARDES, Jens Binnewies, Karl Keusgen
Approved by: sam (mentor)
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:
cd9660:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()
nfs(client):
Convert to nmount (the simple way, mount_nfs(8) is still necessary).
Add omount compat shims.
Drop COMPAT_PRELITE2 mount arg compatibility.
ffs:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()
Remove vfs_omount() method, all filesystems are now converted.
Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.
Change rootmounting to use DEVFS trampoline:
vfs_mount.c:
Mount devfs on /. Devfs needs no 'from' so this is clean.
symlink /dev to /. This makes it possible to lookup /dev/foo.
Mount "real" root filesystem on /.
Surgically move the devfs mountpoint from under the real root
filesystem onto /dev in the real root filesystem.
Remove now unnecessary getdiskbyname().
kern_init.c:
Don't do devfs mounting and rootvnode assignment here, it was
already handled by vfs_mount.c.
Remove now unused bdevvp(), addaliasu() and addalias(). Put the
few necessary lines in devfs where they belong. This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.
Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway. Correct information is provided by
statfs(/).
These devices should be probed first because they are at fixed
locations and cannot be turned off. ISA PNP devices, on the other
hand, can be turned off and often can be flexible in the resources
they use. Probe them last, as always.
eg. if the firmware load fails. Shortish MFC timeout so this can be merged
before the 4.11 freeze.
PR: kern/34306
Submitted by: gibbs
Approved by: gibbs, imp (mentor)
MFC after: 5 days
upcalls which do RPC header parsing and match up the reply with the
request. NFS calls now sleep on the nfsreq structure. This enables
us to eliminate the NFS recvlock.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Same comment as msdosfs applies: It would be nice if we had generic option
names for charset conversions.
Use vfs_mountefrom(). Rely on vfs_mount.c calling VFS_STATFS().
- Do not put/remove node references, since this no longer
needed.
- Remove timerActive flag, use callout flags.
- Schedule next callout after doing current one.
Reviewed by: archie
Approved by: julian (mentor)
the sx lock was used previously because we might sleep allocating
additional memory by using auto-extending sbufs. However, we no longer
do this, instead retaining the user-submitted rule string, so mutexes
can be used instead. Annotate the reason for not using the sbuf-related
rule-to-string code with a comment.
Switch to using TAILQ_CONCAT() instead of manual list copying, as it's
O(1), reducing the rule replacement step under the mutex from O(2N) to
O(2).
Remove now uneeded vnode-related includes.
MFC after: 2 weeks
- Change the cached mtime to a 'struct timespec' from a
time_t. Improving the precision of the cached mtime tightens up
NFS' "close-to-open" consistency considerably.
- Always force an over-the-wire consistency check from nfs_open()
(unless the file is marked modified). This further improves
NFS' "close-to-open" consistency.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Add a vfs_cmount() function which converts omount argument stucture
to nmount arguments.
Convert vfs_omount() to vfs_mount() and parse nmount arguments.
This is 100% compatible with existing userland.
Later on, but before userland gets converted to nmount we may want
to revisit the names of the mountoptions, for instance it may make
sense to use consistent options for charset conversion etc.
vnode EXCLUSIVE lock. This prevents threads from adding pages to
the vnode while an invalidation is in progress, closing potential
races. In the bioread() path, callers acquire the SHARED vnode lock
- so while an invalidate was in progress, it was possible to fault
in new pages onto the vnode causing the invalidation to take a while
or fail. We saw these races at Yahoo! with very large files+heavy
concurrent access. Forcing an upgrade to EXCLUSIVE lock before doing
the invalidation closes all these races.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
vfs_flagopt() for binary/boolean options.
vfs_getopts() for string options
vfs_filteropt() to check for unknown options.
vfs_scanopt() for scanf() like processing of options.
Also add function for setting the stat.f_mntfromname field.
socket callbacks or similar callers, from both the NFS client and the
server.
Instituted nfsm_dissect_nonblock(), nfsm_dissect_xx_nonblock(). And
nfsm_disct() now takes an extra M_TRYWAIT/M_DONTWAIT argument.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
is safe to turn off the nfsnode's NMODIFIED flag.
- Move the check for signals to the top of the loop where we loop
around the dirty buffers on the vnode, scheduling writes. This
ensures that we'll break ouf of the flush operation on reception of
a signal.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Root filessytems (like NFS) don't have an associated disk device,
and even if they had, the exact semantics would be filesystem
dependent and should be implemented there.
userland and a dedicated system call to get replies.
The vnode-bypass of fifos broke this into a panic.
Ditch all the magic and create a device /dev/nfslock instead, and
use that for both directions apart from the shorter path, this is
also faster because the device driver runs Giant free using the
vnode bypass.
Noticed by: marcel
actually is a property of the northbridge and applies to all PCI/PCI-X/PCIe
devices in the system, though only PCIe devices will respond to registers
higher than 256. This uses per-CPU pools of temporary mappings so that
the whole 256MB of configuration space doesn't have to be mapped all at
once. While the sf_buf API was considered for this, the fact that it
requires sleep locks and can return failure made it unsuitable for this use.
For now only the Intel Grantsdale and Lindenhurst (925 and 752x) chipsets are
supported. Since there doesn't appear to be a compatible way to determine
northbridge support, new chipsets will have to be explicitely added in the
future.
zero-copy receive of jumbo frames. This eliminates the need for the
jumbo frame allocator implemented in kern/uipc_jumbo.c and sys/jumbo.h.
Remove it.
Note: Zero-copy receive of jumbo frames did not work without these changes;
I believe there was insufficient locking on the jumbo vm object.
Tested by: ken@
Discussed with: gallatin@
properly support bounce buffers and resource shortages. This allows the
driver to work properly and reliably with more than 4GB of RAM. Of the
three data paths that exist in the driver, (block, CAM, ioctl), the ioctl
path has not been well tested with these changes due to difficulty with
finding an application that uses it that actually works.
Sponsored by: The FreeBSD Foundation and FreeBSD Systems, Inc.
and annotate that nfs_mountroot assumes it is OK to step on the
values in the global NFSv3 diskless structure as the mountroot
function is called during a serialized part of the boot, before
any other NFS client activity occurs.
MFC after: 2 weeks
doesn't. Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.
Fix this the cleaner way: Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.
commit. In the new world order, the transitive closure on the vector
operations is not precomputed. As such, it's unsafe to actually use
any of the function pointers in an indirect function call. They can
be null, and we need to use the default vector in that case.
This is mostly a quick fix for the four function pointers that are
ed explicitly. A more generic or scalable solution is likely to see
the light of day.
No pathos on: current@
tcpip_fillheaders()
tcp_discardcb()
tcp_close()
tcp_notify()
tcp_new_isn()
tcp_xmit_bandwidth_limit()
Fix a locking comment in tcp_twstart(): the pcbinfo will be locked (and
is asserted).
MFC after: 2 weeks
inp->inp_moptions pointer, so that ip_getmoptions() can perform
necessary locking when doing non-atomic reads.
Lock the inpcb by default to copy any data to local variables, then
unlock before performing sooptcopyout().
MFC after: 2 weeks
to implement the sanity check should have been changed when we converted
the implementation of vm_pindex_t from 32 to 64 bits. (Thus, RELENG_4 is
not affected.) The consequence of this error would be a legimate write to
an extremely large file being treated as an errant attempt to write meta-
data.
Discussed with: tegge@
modifications to the inpcb IP options mbuf:
- Lock the inpcb before passing it into ip_pcbopts() in order to prevent
simulatenous reads and read-modify-writes that could result in races.
- Pass the inpcb reference into ip_pcbopts() instead of the option chain
pointer in the inpcb.
- Assert the inpcb lock in ip_pcbots.
- Convert one or two uses of a pointer as a boolean or an integer
comparison to a comparison with NULL for readability.
- Always check that index number passed from userland
is <= NG_NETFLOW_MAXIFACES. [1]
- Increase NG_NETFLOW_MAXIFACES up to 512. [2]
Noticed by: Roman Palagin [1]
Requested by: Yuri Y. Bushmelev [2]
MFC after: 1 week
header. pf finds the first TCP/UDP/ICMP6 header to filter by traversing
the header chain. In the case where headers are skipped, the protocol
checksum verification used the wrong length (included the skipped headers),
leading to incorrectly mismatching checksums. Such IPv6 packets with
headers were silently dropped.
Discovered by: Bernhard Schmidt
MFC after: 1 week
and that I've verified things seem to basically work. I was able to
boot and hot plug usb devices. Please let me know if this causes
problems for anybody.
The push down of giant has proceeded to the point that this will start
to matter more and more.
anywhere in the DAG. This includes configurations that are not
allowed by the EFI specification.
o Reject a GPT partition table if it's not preceeded by a PMBR.
There's no need to preserve the MBR partitioning anymore as GPT
is mature and with the first bullet extending the applicability
of GPT, it's better to be a bit more strict.
If we are resuming non-MPSAFE drivers, they need Giant held for them.
This may fix some obscure suspend/resume problems. It has fixed keyrate
setting problems that were triggered by cardbus (MPSAFE) changing the
ordering for syscons resume (non-MPSAFE). Also, add some asserts that
Giant is held in our suspend/resume and shutdown methods.
Found by: iedowse
MFC after: 2 days
because we know it then and we need it when inserting a component which
wasn't destroyed while device was running.
Reported by: Michael Handler <handler@grendel.net>
MFC after: 1 week
and if so call it.
The cmount method will gather and interpret omount() style arguments,
and issue a kern_[v]mount() call to execute the corresponding nmount
operation.
- Initialize sc->pcn_type during ATTACH as softc contents may not surivive
from PROBE.
- Print out chip-id to assist with ongoing pcn(4) debugging efforts.
non-standard BIOSen. We used to implement this in local patches but
now that ACPI-CA has merged/re-implemented most of our fixes, they were
no longer needed and we just needed to turn this knob on. Also, remove
an unnecessary cast.
Tested by: phk
we really want vs. the size changing 'long' (i386 vs. AMD64).
This fixes the problem with DRM with Radeon's on AMD64.
Submitted by: Jung-uk Kim <jkim@niksun.com>
back on again in resume. Override the default of D3 with the value the
BIOS specifies in _SxD, if present. Skip serial devices (PNP05xx) since
they seem to hang when set to D3 and may require special driver support.
Also, skip non-type 0 PCI devices (i.e., bridges) since our we don't yet
save/restore their config space and that seems to be necessary.
If this gives you trouble with suspend/resume, you can disable the new
ACPI and PCI power behavior separately with these tunables & sysctls:
debug.acpi.do_powerstate
hw.pci.do_powerstate
Approved by: imp (pci)
Tested by: acpi@ (numerous)
instead for the time being. Intel should fix this.
Note that if this commit is correct, it is made on the vendor branch.
We expect the Intel folks to fix it, and we don't want to unnecessarily
take files off the vendor branch.
Approved by: njl
MFC after: 1 week
initializations but we did have lofty goals and big ideals.
Adjust to more contemporary circumstances and gain type checking.
Replace the entire vop_t frobbing thing with properly typed
structures. The only casualty is that we can not add a new
VOP_ method with a loadable module. History has not given
us reason to belive this would ever be feasible in the the
first place.
Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.
Give coda correct prototypes and function definitions for
all vop_()s.
Generate a bit more data from the vnode_if.src file: a
struct vop_vector and protype typedefs for all vop methods.
Add a new vop_bypass() and make vop_default be a pointer
to another struct vop_vector.
Remove a lot of vfs_init since vop_vector is ready to use
from the compiler.
Cast various vop_mumble() to void * with uppercase name,
for instance VOP_PANIC, VOP_NULL etc.
Implement VCALL() by making vdesc_offset the offsetof() the
relevant function pointer in vop_vector. This is disgusting
but since the code is generated by a script comparatively
safe. The alternative for nullfs etc. would be much worse.
Fix up all vnode method vectors to remove casts so they
become typesafe. (The bulk of this is generated by scripts)
in the _PRS or _CRS of link devices. If faced with multiple DPFs in a
_PRS, we just use the first one. We assume that if _CRS has DPF tags they
only contain a single set since multiple DPFs wouldn't make any sense. In
practice, the only DPFs I've seen so far for link devices are that the one
IRQ resource is surrounded by a DPF tag pair for no apparent reason, and
this should handle that case fine now.
- Only allocate link structures for IRQ resources for link devices rather
than allocating a link structure for every resource.
Reviewed by: njl
Tested by: phk
in the error cases, causing panics.
Adapted from similar fix to NFSv3 mkdir submitted by Mohan Srinivasan mohans
at yahoo-inc dot com
Approved by: alfred
should not return ERESTART after it caught a signal, otherwise
thr_wake() call will be lost, also a timeout wait should not be
restarted. Final, using wakeup not wakeup_one to be safeness.
they would leave enough elements on the stack that if you escaped to the
loader prompt and then typed 'setenv', it would pull in all of the leaked
junk and cause an exception in the environment. There still seems to be
3 leaked elements, but they don't appear to be coming from this file.
a deadlock (with NFS exclusive vnode locks enabled). Lookup
grabs the parent's lock and wants to lock child. Readdirplus
locks the child and wants to lock parent (for loading the attrs
for ".."). The fix is to not load the attrs for ".." in
readdirplus.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
This closes a major hole in close-to-open consistency support.
Added a new sysctl so that this can be disabled for single NFS
client applications with very large amounts of mmap'ed IO (for
performance).
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
returned back to df from a statfs call. Causing df to print negative
values.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
FP registers fall in that category. Passing the new register value
by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
the packet will have the register value in target-byte order and
in the memory representation (modulo the fact that bytes are sent
as 2 printable hexadecimal numbers of course). We only need to
decode the packet to have a pointer to the register value.
This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.
Tested on: alpha, amd64, i386, ia64, sparc64
@sys/dev/acpica/acpi_pci_link.c:153" panic by backing out rev 1.37 in the SMP
case. It appears that on a dual-proc machine the assertions in the rev 1.37
commit log hold true.
Introduce domain_init_status to keep track of the init status of the domains
list (surprise). 0 = uninitialized, 1 = initialized/unpopulated, 2 =
initialized/done. Higher values can be used to support late addition of
domains which right now "works", but is potential dangerous. I choose to
only give a warning when doing so.
Use domain_init_status with if_attachdomain[1]() to ensure that we have a
complete domains list when we init the if_afdata array. Store the current
value of domain_init_status in if_afdata_initialized. This way we can update
if_afdata after a new protocol has been added (once that is allowed).
Submitted by: se (with changes)
Reviewed by: julian, glebius, se
PR: kern/73321 (partly)
call net_add_domain(). Calling this function too early (or late) breaks
assertations about the global domains list.
Actually it should be forbidden to call net_add_domain() outside of
SI_SUB_PROTO_DOMAIN completely as there are many places where we traverse
the domains list unprotected, but for now we allow late calls (mostly to
support netgraph). In order to really fix this we have to lock the domains
list in all places or find another way to ensure that we can safely walk the
list while another thread might be adding a new domain.
Spotted by: se
Reviewed by: julian, glebius
PR: kern/73321 (partly)
lock collision.
2. Fix two race conditions. One is between _umtx_unlock and signal,
also a thread was marked TDF_UMTXWAKEUP by _umtx_unlock, it is
possible a signal delivered to the thread will cause msleep
returns EINTR, and the thread breaks out of loop, this causes
umtx ownership is not transfered to the thread. Another is in
_umtx_unlock itself, when the function sets the umtx to
UMTX_UNOWNED state, a new thread can come in and lock the umtx,
also the function tries to set contested bit flag, but it will
fail. Although the function will wake a blocked thread, if that
thread breaks out of loop by signal, no contested bit will be set.
observations lead me to believe that the convetion for pc98 boot
loaders is to have a jump unstruction, followed by a string, followed
by code. The jump usually doesn't have a nop after it and usually the
string is NUL terminated, but Grub/98 breaks both of these rules.
# I looked for, but failed to find the Minux boot blocks for PC-9801 port.
512. If I had an audio cdrom in my cd player when I booted my system,
I'd get a panic from geom because you can't read 8192 bytes from an
audio cdrom.
Remove XXX comment about IPL1 and replace it with some information
from my soon to be published web page on the pc98 disk layout. The
IPL1 test was the result of an observation of a disk with FreeBSD's
boot0 program. It was testing part of an area what appears to be
reserved for a boot loader name, which comes after a jump over this
area. I don't yet know if it is required to be any specific jump
instruction, or if the destination has to be location 11. [1]
[1] FreeBSD Press No. 13, page 115, poorly translated by myself. The
picture there shows offset 8 as the destination of the jump, but
FreeBSD's boot0 program has three padding NULs after the IPL1 name and
uses a 16-bit 'jmp' instruction.
resource lists. It used to be sized based only on _CRS, hence _PRS could
perform an out-of-bounds access if it was larger (i.e., when there are
dependent functions). Add asserts to detect this case. Note, this is
only a temporary fix and I believe _PRS and _CRS should have separate
arrays.
Also, fix a typo where the wrong irq was being check for the APIC case.
Submitted by: tegge
to do a window update to the peer (thru an ACK) from soreceive()
itself. TCP will do that upon return from the socket callback.
Sending a window update from soreceive() results in a lock reversal.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
soreceive(), then pass in M_DONTWAIT to m_copym(). Also fix up error
handling for the case where m_copym() returns failure.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
that the exclusive lock is already held, then we call panic. Don't
clobber internal lock state before panic'ing. This change improves
debugging if this case were to happen.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
Approved by: Robert Watson <rwatson@freebsd.org>
Add locking to the IPv6 scoping code.
All spl() like calls have also been removed.
Cleaning up the handling of ifnet data will happen at a later date.
correct open/close behaviour of filesystems:
When an ioctl to modify the MBR arrives, we cannot take for granted that
we have the consumer open.
The symptom is that one cannot run 'boot0cfg -s2 /dev/ad0' in single-user
mode because / is the only open partition in only open r1w0e1.
If it is not, we attempt to increase the write count by one and
decrease it again afterwards.
Presumably most if not all other slices suffer from the same problem.
the address of a channel on a SCC, it returns 0 on failure. [1]
- Hardcode channel 1 for the keyboard on Z8530, the information present
in the Open Firmware device tree doesn't allow to determine this via
uart_cpu_channel(). This makes the keyboard (if one backs out rev. 1.5
of sys/dev/puc/puc_sbus.c and has both keyboard and mouse plugged in to
avoid the hang that revision works around) and consequently syscons(4)
on Ultra 2 work. There's a problem with the keyboard LEDs similar to
the one on Ultra 60 (LEDs don't get lit under X) though, instead of
lighting just a specific single one all get lit and can't be turned off
again. [1]
- Add comments about what uart_cpu_channel() and uart_cpu_getdev_keyboard()
do and their constraints.
- Improve the comments about what uart_cpu_getdev_[console,dbgport]() do,
they don't return an address (as in bus) but an Open Firmware package
handle.
Reviewed by: marcel (modulo the comments) [1]
instead acquire it conditionally in closef() if it is required for
advisory locking. This removes Giant from the close() path of sockets
and pipes (and any other objects that don't acquire Giant in their
fo_close path, such as kqueues). Giant will still be acquired twice for
vnodes -- once for advisory lock teardown, and a second time in the
fo_close method. Both Poul-Henning and I believe that the advisory lock
teardown code can be moved into the vn_closefile path shortly.
This trims a percent or two off the cost of most non-vnode close
operations on SMP, but has a fairly minimal impact on UP where the cost
of a single mutex operation is pretty low.
pointer updates: test available space while holding the socket buffer
mutex, and continue to hold until until the pointer update has been
performed.
MFC after: 2 weeks
o Remove a bogus comment that relates to alpha.
o s/u_int64_t/uint64_t/g
o Add bi_spare2 to make the internal padding explicit.
o Move BOOTINFO_MAGIC after the field it applies to.
changing the Makefile, fail the creation of loader.efi when there are
unresolved symbols in loader.sym. This avoids silently creating a
faulty EFI binary.
operation (by subtracting the absolute result from 0), don't test
for overflow.
This avoids an arithmetic exception when dividing LONG_MIN by 1:
This is the only case that causes overflow, and the resulting value
is correct under 2's compliment arithmetic.
PR: 72024
Approved by: dwmalone@
Obtained from: NetBSD
MFC after: 4 days
normal PPP compression, as a workaround for certain (arguably) broken
Linux PPP implementations that can't handle this particular case.
MFC after: 1 week
NetBSD got activated. NetBSD has an additional change in
their mii.c rev 1.26 which got missed with that merger:
: When probing for a PHY, look at the EXTSTAT bit in the BMSR, as well,
: not just the media mask. This prevents PHYs/TBIs that only support
: Gigabit media from slipping through the cracks.
With this GE only ones like from the SK-9844 are detected again.
PR: i386/63313, i386/71733, kern/73725
Tested by: matt baker <matt at sevenone dot com>, Jin Guojun <jin at george dot lbl dot gov>
Approved by: rwatson (mentor)
Obtained from: NetBSD mii.c rev 1.26
MFC after: 1 week
This socket option allows processes query a TCP socket for some low
level transmission details, such as the current send, bandwidth, and
congestion windows. Linux provides a 'struct tcpinfo' structure
containing various variables, rather than separate socket options;
this makes the API somewhat fragile as it makes it dificult to add
new entries of interest as requirements and implementation evolve.
As such, I've included a large pad at the end of the structure.
Right now, relatively few of the Linux API fields are filled in, and
some contain no logical equivilent on FreeBSD. I've include __'d
entries in the structure to make it easier to figure ou what is and
isn't omitted. This API/ABI should be considered unstable for the
time being.
Null_open() was only here to handle MNT_NODEV, but since that does
not affect any filesystems anymore, it could only have any effect
if you nullfs mounted a devfs but didn't want devices to show up.
If you need that, there are easier ways.
window was 0 bytes in size. This may have been the cause of unsolved
"connection not closing" reports over the years.
Thanks to Michiel Boland for providing the fix and providing a concise
test program for the problem.
Submitted by: Michiel Boland
MFC after: 2 weeks
Historically, our contigmalloc1() and contigmalloc2() assumes
that a page in PQ_CACHE can be unconditionally reused by busying
and freeing it. Unfortunatelly, when object happens to be not
NULL, the code will set m->object to NULL and disregard the fact
that the page is actually in the VM page bucket, resulting in
page bucket hash table corruption and finally, a filesystem
corruption, or a 'page not in hash' panic.
This commit has borrowed the idea taken from DragonFlyBSD's fix
to the VM fix by Matthew Dillon[1]. This version of patch will
do the following checks:
- When scanning pages in PQ_CACHE, check hold_count and
skip over pages that are held temporarily.
- For pages in PQ_CACHE and selected as candidate of being
freed, check if it is busy at that time.
Note: It seems that this is might be unrelated to kern/72539.
Obtained from: DragonFlyBSD, sys/vm/vm_contig.c,v 1.11 and 1.12 [1]
Reminded by: Matt Dillon
Reworked by: alc
MFC After: 1 week
and assume that the BIOS has set it up for us. This allows folks with a
serial-aware BIOS to set the BIOS to speeds above 9600 and allow boot0 to
just use the existing settings.
- Purge some gratuitous cpp comments as per style(9).
Submitted by: Danny Braniss danny at cs dot huji dot ac dot il (1)
MFC after: 1 month
assembler to cpp(1) comment conversions. This allows btx to compile again
when BTX_SERIAL is defined.
Reported by: Danny Braniss danny at cs dot huji dot ac dot il
MFC after: 1 month
'binat from ... to ... -> (if)' are used, where the interface
is dynamic.
Discovered by: kos(at)bastard(dot)net
Analyzed by: Pyun YongHyeon
Approved by: mlaier (mentor)
MFC after: 1 week
contents of the tcpcb are read and modified in volume.
In tcp_input(), replace th comparison with 0 with a comparison with
NULL.
At the 'findpcb', 'dropafterack', and 'dropwithreset' labels in
tcp_input(), assert 'headlocked'. Try to improve consistency between
various assertions regarding headlocked to be more informative.
MFC after: 2 weeks
Printf() a warning if if_attachdomain() is called more than once on an
interface to generate some noise on mailing lists when this occurs.
Fix up style in if_start(), where spaces crept in instead of tabs at
some point.
MFC after: 1 week
MFC note: Not the printf().
When a series of traces is included in a bug report, this will make it
easier to tie the trace information back to ps or threads output,
each of which will show the pid or the tid, but usually not both.
- Use a new-bus device driver for the ACPI PCI link devices. The devices
are called pci_linkX. The driver includes suspend/resume support so that
the ACPI bridge drivers no longer have to poke the links to get them
to handle suspend/resume. Also, the code to handle which IRQs a link is
routed to and choosing an IRQ when a link is not already routed is all
contained in the link driver. The PCI bridge drivers now ask the link
driver which IRQ to use once they determine that a _PRT entry does not
use a hardwired interrupt number.
- The new link driver includes support for multiple IRQ resources per
link device as well as preserving any non-IRQ resources when adjusting
the IRQ that a link is routed to.
- The entire approach to routing when using a link device is now
link-centric rather than pci bus/device/pin specific. Thus, when
using a tunable to override the default IRQ settings, one now uses
a single tunable to route an entire link rather than routing a single
device that uses the link (which has great foot-shooting potential if
the user tries to route the same link to two different IRQs using two
different pci bus/device/pin hints). For example, to adjust the IRQ
that \_SB_.LNKA uses, one would set 'hw.pci.link.LNKA.irq=10' from the
loader.
- As a side effect of having the link driver, unused link devices will now
be disabled when they are probed.
- The algorithm for choosing an IRQ for a link that doesn't already have an
IRQ assigned is now much closer to the one used in $PIR routing. When a
link is routed via an ISA IRQ, only known-good IRQs that the BIOS has
already used are used for routing instead of using probabilities to
guess at which IRQs are probably not used by an ISA device. One change
from $PIR is that the SCI is always considered a viable ISA IRQ, so that
if the BIOS does not setup any IRQs the kernel will degenerate to routing
all interrupts over the SCI. For non ISA IRQs, interrupts are picked
from the possible pool using a simplistic weighting algorithm.
Tested by: ru, scottl, others on acpi@
Reviewed by: njl
release the pipe mutex before calling fsetown(), as fsetown()
may block. The sigio code protects the pipe sigio data using
its own mutex, and the pipe reference count held by the caller
will prevent the pipe from being prematurely garbage-collected.
Discovered by: imp
is a PAL ID, while PCPU_GET(cpuid) is a FreeBSD CPU ID. The FreeBSD CPU
ID of the BSP is always zero, so use that to see which CPU should run the
full clock functions.
structure, so assert the inpcb lock associated with the tcptw.
Also assert the tcbinfo lock, as tcp_timewait() may call
tcp_twclose() or tcp_2msl_rest(), which require it. Since
tcp_timewait() is already called with that lock from tcp_input(),
this doesn't change current locking, merely documents reasons for
it.
In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest()
is called, which requires that lock.
In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop()
is called, which requires that lock.
Document the locking strategy for the time wait queues in tcp_timer.c,
which consists of protecting the time wait queues in the same manner
as the tcbinfo structure (using the tcbinfo lock).
In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait
queues are modified.
In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait
queues may be modified.
In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait
queues may be modified.
MFC after: 2 weeks
but unlikely races that could be corrected by having tcp_keepcnt
and tcp_keepintvl modifications go through handler functions via
sysctl, but probably is not worth doing. Updates to multiple
sysctls within evaluation of a single addition are unlikely.
Annotate that tcp_canceltimers() is currently unused.
De-spl tcp_timer_delack().
De-spl tcp_timer_2msl().
MFC after: 2 weeks
on the tcpcb, but also calls into tcp_close() and tcp_twrespond().
Annotate that tcp_twrecycleable() requires the inpcb lock because it does
a series of non-atomic reads of the tcpcb, but is currently called
without the inpcb lock by the caller. This is a bug.
Assert the inpcb lock in tcp_twclose() as it performs a read-modify-write
of the timewait structure/inpcb, and calls in_pcbdetach() which requires
the lock.
Assert the inpcb lock in tcp_twrespond(), as it performs multiple
non-atomic reads of the tcptw and inpcb structures, as well as calling
mac_create_mbuf_from_inpcb(), tcpip_fillheaders(), which require the
inpcb lock.
MFC after: 2 weeks
protects access to the ISN state variables.
Acquire the tcbinfo write lock in tcp_isn_tick() to synchronize
timer-driven isn bumping.
Staticize internal ISN variables since they're not used outside of
tcp_subr.c.
MFC after: 2 weeks
It means, that node listens to flow control messages from downstreams
and removes link from list of active links whenever a LINK_IS_DOWN message
is received. If LINK_IS_UP message is received, then links is put
back into list of active links.
Approved by: julian (mentor), implicitly
MFC after: 1 week
o Implement some netgraph flow control:
- Whenever status of HDLC heartbeat from pear is timed out,
send NGM_LINK_IS_DOWN message.
- If HDLC link changes status from down to up, send
NGM_LINK_IS_UP message.
Approved by: julian (mentor), implicitly
MFC after: 1 week
- Let hme_start()/hme_init() acquire lock and then call
hme_start_locked()/hme_init_locked() respectivly.
- Teardown interrupt handler before hme_detach().
- Remove IFF_NEEDSGIANT flag and mark interrupt handler INTR_MPSAFE.
- Set callout handler to CALLOUT_MPSAFE.
- Add locks in hme MII interface.
Reviewed by: jake
Tested by: Julian C. Dunn <jdunn at opentrend dot net>
MFC after: 2 weeks
Consolidate all of the bounce tests into the BUS_DMA_COULD_BOUNCE flag.
Allocate the bounce zone at either tag creation or map creation to help
avoid null-pointer derefs later on. Track total pages per zone so that
each zone can get a minimum allocation at tag creation time instead of
being defeated by mis-behaving tags that suck up the max amount.
Allocate the bounce zone at either tag creation or map creation to help
avoid null-pointer derefs later on. Track total pages per zone so that
each zone can get a minimum allocation at tag creation time instead of
being defeated by mis-behaving tags that suck up the max amount.
without ever being changed to actually work with an i8251. Nobody is
working on this either at the moment, so it's not about to change
soon.
When the code necessary to support the i8251 is committed, this can
be reverted again.
not used and aliases for other defines.
o Add REG_DATA as an alias for com_data. Likewise for other register
defines.
o Add LCR_SBREAK and make CFCR_SBREAK an alias for it. Likewise for
the other LCR register bits that are known with the CFCR prefix.
o Add MCR_IE and make MCR_IENABLE an alias for it.
o Add LSR_TEMT and make LSR_TSRE an alias for it.
o Add LSR_THRE and make LSR_TXRDY as alias for it.
o Add FCR_ENABLE and make FIFO_ENABLE as alias for it. Likewise for
the other FCR register bits that are known with the FIFO prefix.
o Add EFR_CTS and make EFR_AUTOCTS an alias for it.
o Add EFR_RTS and make EFR_AUTORTS an alias for it.
This is a first step in cleaning up the definitions in this file.
USPACE_SVC_STACK_TOP, USPACE_SVC_STACK_BOTTOM, USPACE_UNDEF_STACK_TOP,
and USPACE_UNDEF_STACK_BOTTOM look wrong to me, so I'm leaving them
alone.
Reviewed by: arch@
mp_machdep.c was using UAREA_PAGES to allocate something that isn't a
U area, and that there seems to be an implicit assumption that the PCB
is just past the end of the kernel stack.
Reviewed by: arch@
to us was to help out the Linux port, but really just invited overflow.
In fact, the request sense timer was overflowing prior to this change making
it much shorter than intended.
aic_osm_lib.h:
Be more careful about overflow in all timer/timeout primitives.
from divert sockets.
- Remove div_disconnect() method, since it shouldn't be called now.
- Remove div_abort() method. It was never called directly, since protocol
doesn't have listen queue. It was called only from div_disconnect(),
which is removed now.
Reviewed by: rwatson, maxim
Approved by: julian (mentor)
MT5 after: 1 week
MT4 after: 1 month
setting the B_REMFREE flag in the buf. This is done to prevent lock order
reversals with code that must call bremfree() with a local lock held.
This also reduces overhead by removing two lock operations per buf for
fsync() and similar.
- Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock
has been acquired so that we may remove ourself from the free-list.
- Provide a bremfreef() function to immediately remove a buf from a
free-list for use only by NFS. This is done because the nfsclient code
overloads the b_freelist queue for its own async. io queue.
- Simplify the numfreebuffers accounting by removing a switch statement
that executed the same code in every possible case.
- getnewbuf() can encounter locked bufs on free-lists once Giant is removed.
Remove a panic associated with this condition and delay asserts that
inspect the buf until after it is locked.
Reviewed by: phk
Sponsored by: Isilon Systems, Inc.
to request from devices during the "long inquiry" portion of our probe.
This same bug was fixed in the 4.x stream a few years ago, but the fix
was never propogated to -current.
This fix is slightly different than in -stable:
o Use offsetof() instead of a hard coded constant so as the make
the code more self-explainatory.
o Round odd long inquiry lengths up so as to avoid tickling ignore
wide residue bugs in broken parallel SCSI devices running with a
wide transfer negotiation.
MFC: 3 days
after allowing more than one address with the same prefix.
Reported by: Vladimir Grebenschikov <vova NO fbsd SPAM ru>
Submitted by: ru (also NetBSD rev. 1.83)
Pointyhat to: mlaier
queue a packet to the hardware... instead of when the hardware queue is
empty..
don't initalize cur_tx now that it doesn't need to be...
Pointed out by: bde
clock found on the ISA bus (some USIIe, USIIi and USIIIi models) and
EBus (USIII models) instead of a MK48Txx clock.
Testet by: Matthew T. Lager" <freebsd@trinetworks.com> on Sun Fire V100,
Xavier Beaudouin <kiwi@oav.net> on Netra X1 (initial version)
respective NetBSD driver for use with the genclock interface.
It's first use will be on sparc64 but it was also tested on alpha with
a preliminary patch to switch alpha to use the genclock code together
with this driver instead of the respective code in alpha/alpha/clock.c
and the rather MD mcclock(4). Using it on i386 and amd64 won't be that
hard but some changes/extensions to improve the genclock code in general
should be done first, e.g. add locking and make it easier to access the
NVRAM usually coupled with RTCs.
- The claim in the commit log of rev. 1.11 of dev/uart/uart_cpu_sparc64.c
etc. that UARTs are the only relevant ISA devices on sparc64 turned out
to be false. While there are sparc64 models where UARTs are the only
devices on the ISA bus there are in fact also low-cost models where all
devices traditionally found on the EBus are hooked up to the ISA bus.
There are also models that use a mix between EBus and ISA devices with
things like an AT keyboard controller and other rather interesting
devices that we might want to support in the futute hook up to the ISA
bus.
In order to not need to add sparc64 specific device_identify methods to
all of the respective ISA drivers and also not add OFW specific code to
the common ISA code make the sparc64 ISA bus code fake up PnP devices so
most ISA drivers probe their devices without further changes.
Unfortunately Sun doesn't adhere to the ISA bindings defined in IEEE
1275-1994 for the properties of most of the ISA devices which would
allow to obtain the vendor and logical IDs from their properties. So we
we just use a simple table which maps the name properties to PnP IDs.
This could be done in a more sophisticated way but I courrently don't
see the need for this. [1]
- Add the children with fully mapped and specified resources (in the OFW
sense) similar to what is done in the EBus code for the IRQ resources
of the children as adjusting the resources and the resource list entries
respectively in isa_alloc_resource() as done perviously causes trouble
with drivers which use rman_get_start(), pass-through or allocate and
release resources multiple times, etc.
Adjusting the resources might be better off in a bus_activate_resource
method but the common ISA code currently doesn't allow for an
isa_activate_resource(). [2]
With this change:
- ppbus(4) and lpt(4) attach and work (modulo ECP mode, which requires
real ISADMA code but it currently only consists of stubs on sparc64).
- atkbdc(4) and atkbdc(4) attach, no further testing done.
- fdc(4) itself attaches but causes a hang while attaching fd0 also
when is DMA disabled, further work in fdc(4) is required here as e.g.
fd0 uses the address of fd1 on sparc64 (not sure if sparc64 supports
more than one floppy drive at all).
All of these drivers previously caused panics in the sparc64 ISA code.
- Minor changes, e.g. use __FBSDID, remove a dupe word in a comment and
declare one global variable which isn't used outside of isa.c static.
o dev/uart/uart_cpu_sparc64.c and modules/uart/Makefile:
- Remove the code for registering the UARTs on the ISA bus from the
sparc64 uart_cpu_identify() again and rely on probing them via PnP.
Original idea by: tmm [1]
No objections by: tmm [1], [2]
away, instead only exit storming mode when an interrupt stops firing long
enough for the ithread to exit the loop and go back to sleep.
Tested by: macrus (cruder version)
MAC policies to perform object life cycle operations and access
control checks.
Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from: TrustedBSD Project
Sponsored by: DARPA, SPAWAR, McAfee Research
message queues, shared memory segments, and semaphores), add a struct
label pointer, which will hold the MAC labels for the objects. As a
result of recent work to separate kernel and user space ABIs, this
should not break the ABI for applications using System V IPC, but will
require a rebuild of the ipcs monitoring tool.
Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from: TrustedBSD Project
Sponsored by: DARPA, SPAWAR, McAfee Research
objects and operations:
- System V IPC message, message queue, semaphore, and shared memory
segment init, destroy, cleanup, create operations.
- System V IPC message, message queue, seamphore, and shared memory
segment access control entry points, including rights to attach,
destroy, and manipulate these IPC objects.
Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from: TrustedBSD Project
Sponsored by: DARPA, SPAWAR, McAfee Research
I have in mind for the genclock interface):
- Recognize the MK48T18 as well (differs from the MK48T08 only in
packaging options and voltages).
- Allow MD code to provide functions for reading/writing NVRAM/RTC
locations.
If passed NULL, the old behaviour using bus_space_{read,write}_1() is
used. Otherwise, all access to the chip goes via the MD functions.
This is necessary for mvmeppc boards where the mk48txx NVRAM/RTC is
not directly addressable.
- Cleanup MI mk48txx(4) todclock driver:
- Prepare mk48txxvar.h and leave only register definitions in
mk48txxreg.h.
- Define struct mk48txx_softc as usual devices and allocate necessary
members in it.
- Change mk48txx_attach() to only take a device_t.
o While converting the sparc64 eeprom driver to the above changes:
- Remove some dead code and stale comments.
- Use the NVRAM size provided by the mk48txx driver instead of hardcoding
it as suggested by a comment.
- Add a comment about why it doesn't make much sense to read the hostid
directly from the NVRAM except for displaying it when attaching.
- Don't print the hostid if it reads all zero because it's stored
elsewhere.
Don't grab Giant in the upper syscall/wrapper code
NET_LOCK_GIANT in the socket code (sockets/fifos).
mtx_lock(&Giant) in the vnode code.
mtx_lock(&Giant) in the opencrypto code. (This may actually not be
needed, but better safe than sorry).
Devfs grabs Giant if the driver is marked as needing Giant.
Don't grab Giant in the upper syscall/wrapper code
NET_LOCK_GIANT in the socket code (sockets/fifos).
mtx_lock(&Giant) in the vnode code.
Devfs grabs Giant if the driver is marked as needing Giant.
FILEDESC_LOCK_FAST will just grab the interlocking mutex and hold
it. This should be used for simple modifications of a field.
FILEDESC_LOCK holds a (homegrown) sleepable lock which should be used
where sleeping is required.
The homegrown lock will probably be replaced with a generic type of lock
once we have found out how that should look.
Help and reviews by: rwatson
after boot so that PCI is initialized and we can probe for the problem
chipsets. Note that while probed but unusable states are disabled, they
aren't freed yet. In the future, it may make sense to detach them.
Tested by: Adam K Kirchoff <adamk at voicenet com>
MFC after: 2 days
- Have TS_ZOMBIE ttys return POLLHUP instead of POLLERR
- Remove unneeded POLLWRNORM (old bug)
- TS_ZOMBIE ttys will set POLLIN and POLLRDNORM
- Do not call selrecord in TS_ZOMBIE ttys
PR: kern/73821
Reviewed by: bde
MFC after: 4 weeks
a storm is detected, enter "storming" mode which throttles the interrupt
source such that the handlers are run once every clock tick. Previously
we allowed a full set of storm_threshold interations through the handler
before going back to sleep. Also, this currently will intentionally exit
storming mode once a second to see if the storm has passed.
Tested by: marcus
Discussed with: bde
backed out commits were trying to address: when cancelling the timeout
callout, also cancel the abort_task event, since it is possible that
the timeout has already fired and set up an abort_task.
also fix up handling and proding of the tx, _OACTIVE is now handled
better...
Submitted by: Peter Edwards (sk_jfree)
Obtained from: OpenBSD and/or NetBSD (tx prod)
instead of a vnode for it.
The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.
(vfs_cluster doesn't use the backing store argument.)
i386 to dev/acpi_support. In theory, these devices could be found
other than in i386 machines only as amd64 becomes more popular. These
drivers don't appear to do anything i386 specific, so move them to
dev/acpi_support. Move config lines to files so that those
architectures that don't support kernel modules can build them into
the kernel. At the same time, rename acpi_snc to acpi_sony to follow
the lead of all the other specialty devices.
connects to the keyboard and mouse and needs some special treatment.
Until this is fully understood, implemented and tested, simply avoid
probing the second Z8530. This is also what the zs(4) driver does.
table with console settings, we now only need to know at which
address the UART lives. Leaving the baudrate unspecified results
in us using the baudrate at which the UART operates. This removes
one parameter that can interfere with a successful installation
out of the box.
current baudrate setting. Use this ioctl() when we don't know the
baudrate of the sysdev (as represented by a 0 value). When the
ioctl() fails, e.g. when the backend hasn't implemented it or the
hardware doesn't provide the means to determine its current baudrate
setting, we invalidate the baudrate setting by setting it to -1.
None of the backends currently implement the new ioctl().
A baudrate we consider insane is silently replaced with 0. When the
baudrate is 0, we will not try to program the hardware. Instead we
leave the communication speed unaltered, maximizing the chance to
have a working console. Obviously this means we allow specifying a
0 baudrate for exactly that purpose.
- Because em_encap() can now fail in a way that leaves us without an
mbuf chain, potentially set *m_headp to NULL if that happens, so that
the caller can do the right thing. This case can occur when we try
to prepend the vlan header mbuf but can't allocate additional memory.
- Modify the caller of em_encap() to detect a NULL m_head and not try
to queue the mbuf if that happens.
- When em_encap() fails, make sure to call bus_dmamap_destroy() to
clean up.
but sk(4) is so prevalent on AMD64 motherboards we need to reduce the number
of round trips in the mailing lists trying to get sufficient information to
make sure we've got a handle on all the problems and are working towards
making sk(4) solid.
Submitted by: bz
isn't worth adding to the modules lists that we have to hard code for
this to work. Since we print PID right away, we have a trace point
already.
Minor knf while I'm here.
to a cdev and a devsw, doing all the relevant checks along the way.
Add the check to see if fp->f_vnode->v_rdev differs from our cached
fp->f_data copy of our cdev. If it does the device was revoked and
we return ENXIO.
Use this in all the places where sleeping with the lock held is not
an issue.
The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.
return EINVAL rather than setting error, and don't free sops
unconditionally. The first change was merged accidentally as part of
the larger set of changes to introduce MAC labels and access control,
and potentially lead to continued processing of a request even after
it was determined to be invalid. The second change was due to changes
in the semaphore code since the original work was performed.
Pointed out by: truckman
models of laptops, which are essentially the same as the normal
ones, as far as acpi_asus is concerned[1]
o Use the above as an excuse to reshuffle the mess I made of the
probe function when I originally wrote it.
Reported by: Soeren Larsen <soeren@whiteswan.dk>
and has been broken twice:
- in the beginning of div_output() replace KASSERT with assignment, as
it was in rev. 1.83. [1] [to be MFCed]
- refactor changes introduced in rev. 1.100: do not prepend a new tag
unconditionally. Before doing this check whether we have one. [2]
A small note for all hacking in this area:
when divert socket is not a real userland, but ng_ksocket(4), we receive
_the same_ mbufs, that we transmitted to socket. These mbufs have rcvif,
the tags we've put on them. And we should treat them correctly.
Discussed with: mlaier [1]
Silence from: green [2]
Reviewed by: maxim
Approved by: julian (mentor)
MFC after: 1 week
This makes it possible to have more than one address with the same prefix.
The first address added is used for the route. On deletion of an address
with IFA_ROUTE set, we try to find a "fallback" address and hand over the
route if possible.
I plan to MFC this in 4 weeks, hence I keep the - now obsolete - argument to
in_ifscrub as it must be considered KAPI as it is not static in in.c. I will
clean this after the MFC.
Discussed on: arch, net
Tested by: many testers of the CARP patches
Nits from: ru, Andrea Campi <andrea+freebsd_arch webcom it>
Obtained from: WIDE via OpenBSD
MFC after: 1 month
to be modified and extended without breaking the user space ABI:
Use _kernel variants on _ds structures for System V sempahores, message
queues, and shared memory. When interfacing with userspace, export
only the _ds subsets of the _kernel data structures. A lot of search
and replace.
Define the message structure in the _KERNEL portion of msg.h so that it
can be used by other kernel consumers, but not exposed to user space.
Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from: TrustedBSD Project
Sponsored by: DARPA, SPAWAR, McAfee Research
to be modified and extended without breaking the user space ABI:
Define _kernel wrapper data structures for the user-exposed data
structures that current server as the internal data structures for
the implementation:
- struct msqid_kernel wraps struct msqid_ds.
- struct semid_kernel wraps truct semid_ds.
- struct shmid_kernel wraps struct shmid_ds.
- Don't expose extern definition 'shmsegs' outside of sysv_shm.c.
Submitted by: Dandekar Hrishikesh <rishi_dandekar at sbcglobal dot net>
Obtained from: TrustedBSD Project
Sponsored by: DARPA, SPAWAR, McAfee Research
promiscuous mode introduced in 1.45, which programs the em card not
to strip or prepend tags when in promiscuous mode without also
modifying behavior to manually prepend a vlan header in the event
that the card isn't doing it on transmit. Due to a feature of card
operation, if the global VLAN prepend/strip register isn't set,
setting the VLAN tag flag on individual packet descriptors will
cause the packet to be transmitted using ISL encapsulation rather
than 802.1Q VLAN encapsulation.
This fix causes em_encap() to prepend the header by tracking whether
the card is configured to temporarily disable prepending/stripping
due to promiscuous mode. As a result, entering promiscuous mode on
the parent em interface no longer causes vlans to appear to "wedge"
or transmit ISL-encapsulated frames, which typically will not be
configured/spoken by the other endpoints on the VLAN trunk. This
bug may also exist in other drivers, and the additional vlan
encapsulation logic should be abstracted and centralized in
if_vlan.c if so.
RELENG_5_3 candidate.
MFC after: 1 week
Tested by: pjd, rwatson
Reported by: astesin at ukrtelecom dot net
Reported by: Mike Tancsa <mike at sentex dot net>
Reported by: Iasen Kostov <tbyte at OTEL dot net>
reports of problems. The bug is probably that there are cases where
`xfer->timeout && !sc->sc_bus.use_polling' is not a suitable test
for an active timeout callout, so an explicit flag will be necessary.
Apologies for the breakage.
the tree. Small tweaks were made by myself to eliminate unnecessary
includes and some other minor issues. Last time I asked takawata-san
about this driver, he suggested I commit it.
Submitted by: takawata
of atomic_store_rel().
- Use the 80386 versions of atomic_load_acq() and atomic_store_rel() that
do not use serializing instructions on all UP kernels since a UP machine
does need to synchronize with other CPUs. This trims lots of cycles from
spin locks on UP kernels among other things.
Benchmarked by: rwatson
a bridge without a _PRT were a _PRT was needed. Instead, the warning in
dmesg is a false warning and only serves to cause unnecessary concern.
MFC after: 1 week
be made holding the NFS server mutex. To clean this up, introduce a
version of the function, nfsrv_access_withgiant(), that expects the
NFS server mutex to already have been dropped and Giant acquired.
Wrap nfsrv_access() around this. This permits callers to more
efficiently check access if they're in a code block performing VFS
operations, and can be substitited for the nfsrv_access() call that
triggered this bug.
PR: 73807, 73208
MFC after: 1 week
that the events queue is empty. In other case we're able to hit the race
where for example da0s1 is tasted by some other class, which means that
da0 is open with exclusive bit set, which means that we can't open da0
for writing if it is our component.
Reported by: Attila Nagy <bra@fsn.hu> (and somebody else sometime ago,
but I cannot find who it was)
transfer timeouts that typically cause a transfer to be completed
twice, resulting in panics and page faults:
o A transfer completion interrupt could arrive while an abort_task
event was set up, so the transfer would be aborted after it had
completed. This is very easy to reproduce. Fix this by setting
the transfer status to USBD_TIMEOUT before scheduling the
abort_task so that the transfer completion code will ignore it.
o The transfer completion code could execute concurrently with the
timeout callout, leaving the callout blocked (e.g. waiting for
Giant) while the transfer completion code runs. In this case,
callout_stop() does not prevent the callout from running, so
again the timeout code would run after the transfer was complete.
Handle this case by checking the return value from callout_stop(),
and ignoring the transfer if the callout could not be removed.
o Finally, protect against a timeout callout occurring while a
transfer is being aborted by another process. Here we arrange
for the timeout processing to ignore the transfer, and use
callout_drain() to ensure that the callout has really gone before
completing the transfer.
This was tested by repeatedly performing USB transfers with a timeout
set to approximately the same as the normal transfer completion
time. In the PR below, apparently this occurred by accident with a
particular printer and the default timeout.
PR: kern/71491
changes associated with adding System V IPC support. This will prevent
old modules from being used with the new kernel, and new modules from
being used with the old kernel.
appropriate for different tag requirements. With the former global pool,
bounce pages might get allocated that are appropriate for one tag, but not
appropriate for another, but the system had no way to distinguish between them.
Now zones with distinct attributes are created to hold pages, and each tag
that requires bouncing is associated with a zone. New zones are created as
needed if no existing zones can meet the requirements of the tag. Stats for
each zone are tracked via the hw.busdma sysctl node.
This should help drivers that are failing with mysterious data corruption.
MFC After: 1 week
includes the latter, but also declares variables which are defined
in kern/subr_param.c).
Change som VM parameters from quad_t to unsigned long. They refer to
quantities (size limits for text, heap and stack segments) which must
necessarily be smaller than the size of the address space, so long is
adequate on all platforms.
MFC after: 1 week
sensitive, but less excercised location (the watchdog). While here use the
*_start_locked function directly to avoid drop, grab, drop lock.
I have to be very careful with future ALTQ patches!
Found & reviewed by: rwatson
MFC after: 3 days
The tunable vfs.devfs.fops controls this feature and defaults to off.
When enabled (vfs.devfs.fops=1 in loader), device vnodes opened
through a filedescriptor gets a special fops vector which instead
of the detour through the vnode layer goes directly to DEVFS.
Amongst other things this allows us to run Giant free read/write to
device drivers which have been weaned off D_NEEDGIANT.
Currently this means /dev/null, /dev/zero, disks, (and maybe the
random stuff ?)
On a 700MHz K7 machine this doubles the speed of
dd if=/dev/zero of=/dev/null bs=1 count=1000000
This roughly translates to shaving 2usec of each read/write syscall.
The poll/kqfilter paths need more work before they are giant free,
this work is ongoing in p4::phk_bufwork
Please test this and report any problems, LORs etc.
Change the spelling of the "catch" option to be consistent with the new
options. Implement the "no wait" option. An implementation of the "CPU
private" for i386 will be committed at a later date.
This generates a KTR event for each critical section entered and exited.
It would be desirable to also log the filename and line number of the
source entering or exiting the critical section, but this requires
hacking up the critical section API, so I've not done that yet.
retain the pcbinfo lock until we're done using a pcb in the in-bound
path, as the pcbinfo lock acts as a pseuo-reference to prevent the pcb
from potentially being recycled. Clean up assertions and make sure to
assert that the pcbinfo is locked at the head of code subsections where
it is needed. Free the mbuf at the end of tcp_input after releasing
any held locks to reduce the time the locks are held.
MFC after: 3 weeks
number of entries into bucket_zone_lookup(), which helps make more
clear the logic of consumers of bucket zones.
Annotate the behavior of bucket_init() with a comment indicating
how the various data structures, including the bucket lookup tables,
are initialized.
swapoff: failed to locate %d swap blocks
The race occurred because putpages() can block between the time it
allocates swap space and the time it updates the swap metadata to
associate that space with a vm_object, so swapoff() would complain
about the temporary inconsistency. I hoped to fix this by making
swp_pager_getswapspace() and swp_pager_meta_build() a single atomic
operation, but that proved to be inconvenient. With this change,
swapoff() simply doesn't attempt to be so clever about detecting when
all the pageout activity to the target device should have drained.
because this call is only needed to wake threads that slept when they
discovered a dead object connected to a vnode. To eliminate unnecessary
calls to wakeup() by vnode_pager_dealloc(), introduce a new flag,
OBJ_DISCONNECTWNT.
Reviewed by: tegge@
Expose some of the amd64-specific sysarch functions to allow alternative
implementations of the %fs/%gs code for TLS, threads, etc. USER_LDT does
not exist on the amd64 kernel, so we have to implement things other ways.
priority. The sleep queues don't get updated when the priority of
threads changes, so sleepq_signal() might not always wakeup the
highest priority thread. Updating the queues when thread priorities
change cannot be easily done due to lock orders, so instead we do an
O(n) walk of the queue for a sleepq_signal() operation instead of O(1).
On the other hand, adding a thread to a sleep queue now goes from O(n)
to O(1) so it ends up as an even tradeoff. The correctness here with
regards to priorities is actually fairly important. msleep() gives
interactive threads their priority "boost" after they are placed on the
queue, but before this fix that "boost" wasn't used to determine the
highest priority thread that sleepq_signal() awoke.
- Fix up some comments.
Inspired by: ups, bde
associated with each processor. This ID is inferred from the index
of the pcs structure in the hwprb.
- Give Alpha CPUs FreeBSD CPU IDs more like other architectures where the
boot processor is always CPU 0 and the other processors are numbered
1 ... N. List active CPUs in the system in cpu_mp_announce() as well.
Silence on: alpha@
thread is created rather than adjusting the priority in the main
function. (kthread_create() should probably take the initial priority
as an argument.)
- Only yield the CPU in the !PREEMPTION case if there are any other
runnable threads. Yielding when there isn't anything else better to do
just wastes time in pointless context switches (albeit while the system
is idle.)
- Tweak the updating of the ithread name in ithread_update() so that the
'+' and '*' characters for device names that were too short only get
added at the end after as many device names as possible were fit into
the allocated space. Prior to this, some long devices would result
in '+' chars showing up between two different devices rather than at the
end.
seconds of idling, DRITY flags are removed.
- If mirror is in idle state or is not open for writing, sleep without
timeout when waiting for I/O requests.
- Don't use atomic operations, for now sysctls are protected by Giant.
- Update debugs.
- Fix for good (I hope) force-stopping mirrors and some filure cases
(e.g. the last good component dies when synchronization is in progress).
Don't use ->nstart/->nend consumer's fields, as this could be racy,
because those fields are used in g_down/g_up, use ->index consumer's
field instead for tracking number of not finished requests.
Reported by: marcel
- After 5 seconds of idle time (this should be configurable) mark all
dirty providers as clean, so when mirror is not used in 5 seconds
and there will be power failure, no synchronization on boot is needed.
Idea from: sorry, I can't find who suggested this
- When there are no ACTIVE components and no NEW components destroy whole
mirror, not only provider.
- Fix one debug to show information about I/O request, before we change
its command.
- Eliminate an initialized but unused variable.
- Eliminate an unnecessary call to clear the page's PG_BUSY flag. (The
call to vm_page_rename() already clears the page's PG_BUSY flag through
its call to vm_page_remove().)
In order to avoid livelock, swapoff() skips over objects with a
nonzero pip count and makes another pass if necessary. Since it is
impossible to know which objects we care about, it would choose an
arbitrary object with a nonzero pip count and wait for it before
making another pass, the theory being that this object would finish
paging about as quickly as the ones we care about. Unfortunately,
we may have slept since we acquired a reference to this object.
Hack around this problem by tsleep()ing on the pointer anyway, but
timeout after a fixed interval. More elegant solutions are possible,
but the ones I considered unnecessarily complicate this rare case.
Also, kill some nits that seem to have crept into the swapoff() code
in the last 75 revisions or so:
- Don't pass both sp and sp->sw_used to swap_pager_swapoff(), since
the latter can be derived from the former.
- Replace swp_pager_find_dev() with something simpler. There's no
need to iterate over the entire list of swap devices just to determine
if a given block is assigned to the one we're interested in.
- Expand the scope of the swhash_mtx in a couple of places so that it
isn't released and reacquired once for every hash bucket.
- Don't drop the swhash_mtx while holding a reference to an object.
We need to lock the object first. Unfortunately, doing so would
violate the established lock order, so use VM_OBJECT_TRYLOCK() and
try again on a subsequent pass if the object is already locked.
- Refactor swp_pager_force_pagein() and swap_pager_swapoff() a bit.
syslog(3) if we are a priveleged program (sshd, su, etc.).
- Make syslogd open an additional socket /var/run/logpriv, with 0600
permissions.
- In libc, try to use this socket.
- Do not loop forever if we are using this socket (partial backout of 1.31)
Reviewed by: dwmalone, Andrea Campi <andrea webcom it>
Approved by: julian (mentor)
MFC after: 1 month
out c->c_func, we can't take it after callout_stop(). To take it before
we need to acquire callout_lock, to avoid race. This commit narrows
down area where lock is held, but hack is still present.
This should be redesigned.
Approved by: julian (mentor)
specifiers.
This helps the port team to decide whether to use local patch for
applications that makes use of these GNU extensions (and hopefully we
can get rid of these patches finally)
Requested by: marcus
destined for a blackhole route.
This also means that blackhole routes do not need to be bound to lo(4)
or disc(4) interfaces for the net.inet.ip.fastforwarding=1 case.
Submitted by: james at towardex dot com
Sponsored by: eXtensible Open Router Project <URL:http://www.xorp.org/>
MFC after: 3 weeks
udp_in6, and udp_ip6 to pass socket address state between udp_input(),
udp_append(), and soappendaddr_locked(). While file in the default
configuration, when running with multiple netisrs or direct ithread
dispatch, this can result in races wherein user processes using
recvmsg() get back the wrong source IP/port. To correct this and
related races:
- Eliminate udp_ip6, which is believed to be generated but then never
used. Eliminate ip_2_ip6_hdr() as it is now unneeded.
- Eliminate setting, testing, and existence of 'init' status fields
for the IPv6 structures. While with multiple UDP delivery this
could lead to amortization of IPv4 -> IPv6 conversion when
delivering an IPv4 UDP packet to an IPv6 socket, it added
substantial complexity and side effects.
- Move global structures into the stack, declaring udp_in in
udp_input(), and udp_in6 in udp_append() to be used if a conversion
is required. Pass &udp_in into udp_append().
- Re-annotate comments to reflect updates.
With this change, UDP appears to operate correctly in the presence of
substantial inbound processing parallelism. This solution avoids
introducing additional synchronization, but does increase the
potential stack depth.
Discovered by: kris (Bug Magnet)
MFC after: 3 weeks
motherboard, in practice the changes resulted in many false positives for
heavy network loads, etc. resulting in poor performance. Also, the
motherboard referenced in the 1.109 log has other problems and simply does
not seem to work with the APIC enabled even with the changes in 1.109. The
correct fix for that board seems to be to not use the APIC at all. One
thing kept from 1.109 is that throttled interrupts are now effectively
polled on every clock tick rather than just 10 times per second.
MFC after: 1 month
Tested by: Shunsuke SHINOMIYA shino at fornext dot org
need for most calls to vm_page_busy(). Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.
This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set. Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition. Typically,
this is when it is undergoing I/O.
(ifconfig xl0 name foo) as well as some special interfaces such as the 6to4
tunnel.
Reported by: Ed Schouten <ed (at) il ! fontys , nl>
Tested by: freebsd-pf
PR: kern/72444
MFC after: 3 weeks
just a convenience function to be called from debuggers that gets
compiled in when EHCI_DEBUG is defined. Move its declaration to
make this more obvious.
supported for STAILQ via STAILQ_CONCAT().
(2) Maintain a count of the number of entries in the thread-local entropy
fifo so that we can keep the other fifo counts in synch.
MFC after: 3 weeks
MFC with: randomdev_soft.c revisions 1.5 and 1.6
Suggested by: jhb (1)
o Reduce the interrupt delay to 2 microframes.
o Follow the spec more closely when updating the overlay qTD in the QH.
o No need to generate an interrupt at the data part of a control
transfer, it's generated by the status transfer.
o Make sure to update the data toggle on short transfers.
o Turn the printf about needing toggle update into a DPRINTF.
o Keep track of what high speed port (if any) a device belongs to
so we can set the transaction translator fields for the transfer.
o Verbosely refuse to open low/full speed pipes that depend on
unimplemented split transaction support.
o Fix various typos in comments.
Obtained from: NetBSD
A complete rationale and discussion is given in this message
and the resulting discussion:
http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706
Note that this commit removes only the functional part of T/TCP
from the tcp_* related functions in the kernel. Other features
introduced with RFC1644 are left intact (socket layer changes,
sendmsg(2) on connection oriented protocols) and are meant to
be reused by a simpler and less intrusive reimplemention of the
previous T/TCP functionality.
Discussed on: -arch
field created for line disciplne drivers private use. Also add NET_NEEDS_GIANT
warning. For whatever reason ng_tty(4) was fixed but ng_h4(4) was not :(
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.
MFC after: 1 month
Inspired by: kris
* Announce some more fields from ro area for better debugging of broken
sk(4)s on various boards.
Submitted by: Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net>
This is magic and no other operating system do so (i.e. Solaris, Tru64,
Linux, AIX, HP-UX, Irix, MacOS X, NetBSD).
Discussed on: current@
Reported by: S³awek ¯ak <zaks@prioris.mini.pw.edu.pl>
entered the interface start function, no packets were actually dequeued.
Therefore, keep a count of how many packets we really added onto the tx
chain, and initiate a transmit only if the count is non-zero.
for modules linked into the kernel or loaded very early, panics will
result otherwise, as the CV code it calls will panic due to its use
of a mutex before it is initialized.
keyboards to work if no PS/2 keyboard is attached. The position in the
menu was chosen to avoid moving option 6 (loader prompt). This should
be a no-op on non-i386/amd64 machines.
outside of the nice threshold due to a recently awoken thread with a
lower nice value. This further reduces the amount of time a positively
niced thread gets while running in conjunction with a workload that has
many short sleeps (ie buildworld).
under high load: only set function state to loop and continuing sending
if there is no data left to send.
RELENG_5_3 candidate.
Feet provided: Peter Losher <Peter underscore Losher at isc dot org>
Diagnosed by: Aniel Hartmeier <daniel at benzedrine dot cx>
Submitted by: mohan <mohans at yahoo-inc dot com>
in orden to harden the ABI for 5.x; this will permit us to modify
the locking in the ifnet packet dispatch without requiring drivers
to be recompiled.
MFC after: 3 days
Discussed at: EuroBSDCon Developer's Summit
check for TD_ON_RUNQ() no longer means the thread is really on a run-
queue. I suspect this state should be re-evaluated as it must mean
something else now. This fixes ULE+KSE+PREEMPTION on UP x86.
This eliminates a bunch of vnode overhead (approx 1-2 % speed
improvement) and gives us more control over the access to the storage
device.
Access counts on the underlying device are not correctly tracked and
therefore it is possible to read-only mount the same disk device multiple
times:
syv# mount -p
/dev/md0 /var ufs rw 2 2
/dev/ad0 /mnt ufs ro 1 1
/dev/ad0 /mnt2 ufs ro 1 1
/dev/ad0 /mnt3 ufs ro 1 1
Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches
things in RAM) this is not possible with read-write mounts, and the system
will correctly reject this.
Details:
Add a geom consumer and a bufobj pointer to ufsmount.
Eliminate the vnode argument from softdep_disk_prewrite().
Pick the vnode out of bp->b_vp for now. Eventually we
should find it through bp->b_bufobj->b_private.
In the mountcode, use g_vfs_open() once we have used
VOP_ACCESS() to check permissions.
When upgrading and downgrading between r/o and r/w do the
right thing with GEOM access counts. Remove all the
workarounds for not being able to do this with VOP_OPEN().
If we are the root mount, drop the exclusive access count
until we upgrade to r/w. This allows fsck of the root
filesystem and the MNT_RELOAD to work correctly.
Set bo_private to the GEOM consumer on the device bufobj.
Change the ffs_ops->strategy function to call g_vfs_strategy()
In ufs_strategy() directly call the strategy on the disk
bufobj. Same in rawread.
In ffs_fsync() we will no longer see VCHR device nodes, so
remove code which synced the filesystem mounted on it, in
case we came there. I'm not sure this code made sense in
the first place since we would have taken the specfs route
on such a vnode.
Redo the highly bogus readblock() function in the snapshot
code to something slightly less bogus: Constructing an uio
and using physio was really quite a detour. Instead just
fill in a bio and ship it down.
At some point later the syncer will unlearn about vnodes and the filesystems
method called by the syncer will know enough about what's in bo_private to
do the right thing.
[1] Ok, I know, but I couldn't resist the pun.
buf->b-dev.
Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.
Assert copyright on vfs_bio.c and update copyright message to canonical
text. There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.
IF INVARIANTS is defined, and in the rare case that we have
allocated some objects from the slab and at least one initializer
on at least one of those objects failed, and we need to fail the
allocation and push the uninitialized items back into the slab
caches -- in that scenario, we would fail to [re]set the
bucket cache's ub_bucket item references to NULL, which would
eventually trigger a KASSERT.
an inordinate amount of synchronous console output that is fairly
undesirable on slower serial console. It's easily hit by accident
when frobbing other sysctls late at night.
the device is suspended or shutting down. This will need to be rethought
slightly if we implement suspend/resume support within vr(4).
This appears to fix the vr_shutdown() panic on SMP machines.
My theory here is there's a race somewhere during vr_detach() with
vr_intr() in the SMP case which was sometimes being triggered,
although quite why this was happening is unclear (vr_stop() also
explicitly disables interrupts by writing to the IMR register).
MFC-to-RELENG_5* candidate.
PR: kern/62889
Tested by: seb at struchtrup dot com
MFC after: 10 days
that was greater than 4G. I originally used the same values as i386 in
order to save opening a new PML4 page slot, but in the day of gigabytes
of memory, worrying about a 4K page seems futile. Moving from 8 to 32G
moves the page to a different index, it doesn't increase the number of
pages used.
Give ffs it's own bufobj->bo_ops vector and create a private strategy
routine, (currently misnamed for forwards compatibility), which is
just a copy of the generic bufstrategy routine except we call
softdep_disk_prewrite() directly instead of through the buf_prewrite()
indirection.
Teach UFS about the need for softdep_disk_prewrite() and call the
function directly in FFS.
Remove buf_prewrite() from the default bufstrategy() and from the
global bio_ops method vector.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open(). The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
is ffs_copyonwrite() and the only place it can be called from is FFS which
would never want to call another filesystems copyonwrite method, should one
exist, so there is no reason why anything generic should know about this.
on UltraSPARC workstations. The driver is based on OpenBSD's SBus
cs4231 driver and heavily modified to incorporate into sound(4)
infrastructure. Due to the lack of APCDMA documentation, the DMA
code of SBus cs4231 came from OpenBSD's driver.
The driver runs without Giant lock and supports both SBus and EBus
based CS4231 audio controller. Special thanks to marius for providing
feedbacks during the driver writing. His feedback made it possible
to write hiccup free playback code under high system loads.
Approved by: jake (mentor)
Reviewed by: marius (initial version)
Tested by: marius, kwm, Julian C. Dunn(jdunn AT opentrend DOT net)
and release of the global page queues lock required to make the call.
Remove GIANT_REQUIRED from vm_hold_free_pages(). All of its VM operations
are properly synchronized.
count to prevent sockets from being garbage collected during
socket-specific system calls. This is the same approach used in
most VFS-specific system calls, as well as generic file descriptor
system calls such as read() and write().
To do this, add a utility function getsock(), which is logically
identical to getvnode() used for the same purpose in VFS. Unlike
fgetsock(), it returns with the file reference count elevated, but
no bump of the socket reference count. Replace matching calls to
fputsock() with fdrop().
This change is made to all socket system calls other than
sendfile() and accept(), but the approach should be applicable to
those system calls also.
This shaves about four mutex operations off of each of these
system calls, including send() and recv() variants, adding about
1% to pps on minimal UDP packets for UP using netblast, and 4% on
SMP.
Reviewed by: pjd
Extend it with a strategy method.
Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.
Rename ibwrite to bufwrite().
Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.
Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().
Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
trigger a socket creation race some some kind). Checking for non-NULL socket
and credential is not a bad idea anyway. Unfortunatly too late for the
release.
Reported & tested by: Gilbert Cao
MFC after: 2 weeks
vm_page_sleep_if_busy(). (The motivation being to transition
synchronization of the vm_page's PG_BUSY flag from the global page queues
lock to the per-object lock.)
two loops in agp_generic_bind_memory(). As an intended side-effect, all
of the calls to vm_page_wakeup() are now performed with the containing
vm object lock held.
that indicates that the caller does not want a page with its busy flag set.
In many places, the global page queues lock is acquired and released just
to clear the busy flag on a just allocated page. Both the allocation of
the page and the clearing of the busy flag occur while the containing vm
object is locked. So, the busy flag might as well never be set.
This flag gets set whenever the thread posts an event on the GEOM
event queue, and if the flag is set when the thread is prepared
to return to userland from the kernel, g_waitidle() will be called
to make sure that the posted events have completed.
This can replace an insufficient number of g_waitidle() calls in
various other places, and has the advantage of being failsafe: Any
system call which does a VOP_OPEN()/VOP_CLOSE will now correctly
wait for any geom events it posted as part of spoils or tastes.
Assert that topology and Giant is not held in g_waitidle().
or pru_attach is NULL. With loadable protocols the SPACER dummy protocols
have valid function pointers for all methods to functions returning just
EOPNOTSUPP. Thus the early abort check would not detect immediately that
attach is not supported for this protocol. Instead it would correctly
get the EOPNOTSUPP error later on when it calls the protocol specific
attach function.
Add testing against the pru_attach_notsupp() function pointer to the
early abort check as well.
the final set of traces -- someone with more busdma background
will probably want to review and expand this, as well as port to
other platforms. This tracing is sufficient to identify key
busdma events on i386, and in particular to draw attention to
bounce buffering events that may have a substantial performance
impact.
o Instead of locking and unlocking all over the place, use
lock assertions to make certain that the bfe lock is held
where necessary.
o Create locked and unlocked versions of bfe_init and bfe_start. These
functions can be called from outside the module and by functions
within the bfe module. The calls from outside the module don't
hold the bfe lock so the unlocked versions called by these functions
simple obtain the bfe lock and call the locked version.
- Fix a typo (scp) in the locking macros that only worked because in all the
instances in which it was called the softc pointer happened to be named 'sc'.
- Mark the interrupt MPSAFE
Tested by: matusita, Dario Freni <saturnero@gufi.org>
Silence from: -net, wpaul
as well as document the properties of the mac_policy_conf structure.
Warn about the ABI risks in changing the structure without careful
consideration.
Obtained from: TrustedBSD Project
Sponsored by: SPAWAR
without a mountpoint. In this scenario, there's no useful source for
a label on the vnode, since we can't query the mountpoint for the
labeling strategy or default label.
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it. Here were those hacks that I have curs'd I know not how
oft. Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?
Move the skeleton of specfs into devfs where it now belongs and
bury the rest.
Initialize b_bufobj for all buffers.
Make incore() and gbincore() take a bufobj instead of a vnode.
Make inmem() local to vfs_bio.c
Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),
Make buf_vlist_add() take a bufobj instead of a vnode.
Eliminate other uses of bp->b_vp where bp->b_bufobj will do.
Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
in the g_up and g_down threads. Each time a bio is propelled up and
down the stack, an event is generating showing the provider, offset,
and length, as well as thread wakeup and work status information.
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj. Bufobj_wdrop() replaces vwakeup().
Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.
Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
Initialize the bo_mtx when we allocate a vnode i getnewvnode() For
now we point to the vnodes interlock mutex, that retains the exact
same locking sematics.
Move v_numoutput from vnode to bufobj. Add renaming macro to
postpone code sweep.
modes on a tty structure. Both the ".init" and the current settings
are initialized allowing the function to be used both at attach and
open time.
The function takes an argument to decide if echoing should be enabled
by default. Echoing should not be enabled for regular physical
serial ports unless they are consoles, in which case they should
be configured by ttyconsolemode() instead.
Use the new function throughout.
right bits rather than piggy-backing on the V* rights defined in
vnode.h. The mac_bsdextended bits are given the same values as the V*
bits to make the new kernel module binary compatible with the old
version of libugidfw that uses V* bits. This avoids leaking kernel
API/ABI to user management tools, and in particular should remove the
need for libugidfw to include vnode.h.
Requested by: phk
is locked when vm_page_io_finish() is called on a page. This is to satisfy
a new, post-RELENG_5 assertion in vm_page_io_finish(). (I am in the
process of transitioning the responsibility for synchronizing access to
various fields/flags on the page from the global page queues lock to the
per-object lock.)
Tripped over by: obrien@
with a weak memory model or x86 + PAE (or more specifically, your
driver is using bounce pages) and you have had problems with em(4),
this may fix it. At least this is needed to have em(4) work properly
on FreeBSD/arm.
Original version by: cognet
Reviewed by: tackerman
Tested by: cognet
protocols: it is possible for sockets to be created and attached
to the divert protocol between the test for sockets present and
successful unload of the registration handler. We will need to
explore more mature APIs for unregistering the protocol and then
draining consumers, or an atomic test-and-unregister mechanism.
of protocols. The call to divert_packet() is done through a function pointer. All
semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to
install divert rules and natd will complain about 'protocol not supported'. Once
it is loaded both will work and accept rules and open the divert socket. The module
can only be unloaded if no divert sockets are open. It does not close any divert
sockets when an unload is requested but will return EBUSY instead.
Add constants for SPI protocol delays that are needed for
target mode.
aic7xxx.c:
Correct a target mode issue that caused an occassional
spurious REQ to be seen on the bus when performing manual
message processing (e.g. transfer rate negotiation).
Enforce phase change bus settle rules with explicit
delays when performing manual message processing in
target mode. The sequencer already did this for
"fast-path", target mode message processing.
acquire Giant if the passed interface has IFF_NEEDSGIANT set on it.
Modify calls into (ifp)->if_ioctl() in if.c to use these macros in order
to ensure that Giant is held.
MFC after: 3 days
Bumped into by: jmg
protocols in inetsw[] and define initially eight spacer slots.
Remove conflicting declaration 'struct pr_usrreqs nousrreqs'. It is
now declared and initialized in kern/uipc_domain.c.
With pr_proto_register() it has become possible to dynamically load protocols
within the PF_INET domain. However the PF_INET domain has a second important
structure called ip_protox[] that is derived from the 'struct protosw inetsw[]'
and takes care of the de-multiplexing of the various protocols that ride on
top of IP packets.
The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[]
array mux in a consistent and easy way. To register a protocol within
ip_protox[] the existence of a corresponding and matching protocol definition
in inetsw[] is required. The function does not allow to overwrite an already
registered protocol. The unregister function simply replaces the mux slot with
the default index pointer to IPPROTO_RAW as it was previously.
as the original logic did. This fixes a race with vr_intr() which was
masked on UP systems and manifested on SMP systems.
PR: kern/62889
MFC after: 1 day
families.
The protosw[] array of any particular protocol family ("domain") is of fixed size
defined at compile time. This made it impossible to dynamically add or remove any
protocols to or from it. We work around this by introducing so called SPACER's
which are embedded into the protosw[] array at compile time. The SPACER's have
a special protocol number (32767) to indicate the fact that they are SPACER's but
are otherwise NULL. Only as many protocols can be dynamically loaded as SPACER's
are provided in the protosw[] structure.
The pr_usrreqs structure is treated more special and contains pointers to dummy
functions only returning EOPNOTSUPP. This is needed because the use of those
functions pointers is usually not checked within the kernel because until now it
was assumed to be a valid function pointer. Instead of fixing all potential
callers we just return a proper error code.
Two new functions provide a clean API to register and unregister a protocol. The
register function expects a pointer to a valid and complete struct protosw including
a pointer to struct pru_usrreqs provided by the caller. Upon successful registration
the pr_init() function will be called to finish initialization of the protocol. The
unregister function restores the SPACER in place of the protocol again. It is the
responseability of the caller to ensure proper closing of all sockets and freeing
of memory allocation by the unloading protocol.
sys/protosw.h
o Define generic PROTO_SPACER to be 32767
o Prototypes for all pru_*_notsupp() functions
o Prototypes for pf_proto_[un]register() functions
kern/uipc_domain.c
o Global struct pr_usrreqs nousrreqs containing valid pointers to the
pru_*_notsupp() functions
o New functions pf_proto_[un]register()
kern/uipc_socket2.c
o New functions bodies for all pru_*_notsupp() functions
the ATA pccard locking function. This makes pccard devices like
Compact Flash cards work again.
PR: kern/72805
Submitted by: James E. Flemer <jflemer@alum.rpi.edu>
MFC in: 2 days
frames. BGE hardware with the rx alignment bug will still be handled by the
calls to m_adj() that already exist. m_adj() is probably better suited for
this task anyways. Just as with if_em, this saves a malloc + several locks
per packet and prevents unneeded data copying within busdma.
Since the e1000 DMA engines hava no constraints on the alignment of buffer
transfers, there is no reason to tell busdma that there is. This save a
minimum of 1 malloc call per packet, which translates to eliminating 4 locks.
It also means that buffers are not needlessly bounced when transfered. The
end result is a 38% improvement in pps in a 4 way bridging environment.
Obtained from: Sandvine, Inc.
(usually taking 20 seconds to transmit a packet).. no longer fall back
to only transmitting one packet (instead of the entire queue) after we
have processed the entire send queue... I have no idea why we didn't
start seeing this problem ~6 years ago when this code was introduced...
(sorele()/sotryfree()):
- This permits the caller to acquire the accept mutex before the socket
mutex, avoiding sofree() having to drop the socket mutex and re-order,
which could lead to races permitting more than one thread to enter
sofree() after a socket is ready to be free'd.
- This also covers clearing of the so_pcb weak socket reference from
the protocol to the socket, preventing races in clearing and
evaluation of the reference such that sofree() might be called more
than once on the same socket.
This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket. The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets. The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.
RELENG_5_3 candidate.
MFC after: 3 days
Reviewed by: dwhite
Discussed with: gnn, dwhite, green
Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by: Vlad <marchenko at gmail dot com>
modes on a tty structure.
Both the ".init" and the current settings are initialized allowing
the function to be used both at attach and open time.
The function takes an argument to decide if echoing should be enabled.
Echoing should not be enabled for regular physical serial ports
unless they are consoles, in which case they should be configured
by ttyconsolemode() instead.
Use the new function throughout.
List of functional changes:
- Make a single device per single node with a single hook.
This gives us parrallelizm, which can't be achieved on a single
node with many devices/hooks. This also gives us flexibility - we
can play with a particular device node, not affecting others.
- Remove read queue as it is. Use struct ifqueue instead. This change
removes a lot of extra memcpy()ing, m_devget()ting and m_copymem()ming.
In ng_device_receivedata() we enqueue an mbuf and wake readers.
In ngdread() we take one mbuf from qeueue and uiomove() it to
userspace. If no mbuf is present we optionally block. [1]
- In ngdwrite() we create an mbuf from uio using m_uiotombuf().
This is faster then uiomove() into buffer, and then m_copydata(),
and this is much better than huge m_pullup().
- Perform locking of device
- Perform locking of connection list.
- Clear out _rcvmsg method, since it does nothing good yet.
- Implement NGM_DEVICE_GET_DEVNAME message.
- #if 0 ioctl method, while nothing is done here yet.
- Return immediately from ngdwrite() if uio_resid == 0.
List of tidyness changes:
- Introduce device2priv(), to remove cut'n'paste.
- Use MALLOC/FREE, instead of malloc/free.
- Use unit2minor().
- Use UID_ROOT/GID_WHEEL instead of 0/0.
- Define NGD_DEVICE_DEVNAME, use it.
- Use more nice macros for debugging. [2]
- Return Exxx, not -1.
style(9) changes:
- No "#endif" after short block.
- Break long lines.
- Remove extra spaces, add needed spaces.
[1] Obtained from: if_tun.c
[2] Obtained from: ng_pppoe.c
Reviewed by: marks
Approved by: julian (mentor)
MFC after: 1 month
failure in the NFS server would result in a leaked instance of the NFS
server subsystem lock. Liberally sprinkle assertions in all target
labels for error unwinding to assert the desired locking state.
RELENG_5_3 candidate.
MFC after: 3 days
Reported by: Wilkinson, Alex <alex dot wilkinson at dsto dot defence dot gov dot au>
errors are in rarely executed paths.
1. Each time the retry_alloc path is taken, the PG_BUSY must be set again.
Otherwise vm_page_remove() panics.
2. There is no need to set PG_BUSY on the newly allocated page before
freeing it. The page already has PG_BUSY set by vm_page_alloc().
Setting it again could cause an assertion failure.
MFC after: 2 weeks
vm_page_io_finish(). The motivation being to transition synchronization of
the vm_page's busy field from the global page queues lock to the per-object
lock.
invalidate the TLB(s) if the old mapping wasn't used by the CPU. With
network interfaces that implement checksum off-loading, the old mapping is
almost never used by the CPU, only by the device driver for setting up the
DMA operation.
Reviewed by: tegge@
critical_exit as the process is getting scheduled to run. This is subotimal
but for now avoid the LOR between the scheduler and the sleepq systems.
This is a 5.3 candidate.
Submitted by: davidxu
MFC After: 3 days
constrained to a small number of sessions by the small on-card memories found
in newer devices. This is really a stopgap solution as having session state
in main memory incurs a (small but noticeable) performance penalty. The better
solution is to manage session state so that it's cached on chip.
Obtained from: openbsd
* Get flags first, in case there is no devclass.
* Reset flags after each probe in case the next driver has no hints so it
doesn't inherit the old ones.
* Set them again before the winning probe.
Tested ok both with and without ACPI for ISA device flags.
Reviewed by: imp
MFC after: 1 day
providers for tasting. Before this hack, race below is possible:
SI_SUB_RAID (no not-fully-configured geoms, so don't block)
GEOM tasting (now geoms are created)
SI_SUB_MOUNT_ROOT (if root file system is placed on a mirror, it is
possible that this mirror is not fully configured yet)
There is a lot of work to do to avoid such hacks and I need a working
solution before 5.3, sorry.
Reported by: John Hay <jhay@icomtek.csir.co.za>
We have to use our own destroy_geom method, because default one, which
is a part of geom_slice is broken.
MT5 candidate.
PR: kern/72467
Submitted by: Vladimir Novoseltsev
The changes in the next commit would make the code totally unreadable
if the #ifdef'ing were maintained.
It might make a lot of sense to split if_cx.c in a netgraph related
and in a tty related file but I will not attempt that without hardware.
handle DMA addresses located above 1GB. The LBA(loop begin address)
register which holds DMA base address is 32bits register. But the
MSB 2bits are used for other purposes. This effectivly limits the
DMA address space up to 1GB.
Approved by: jake (mentor)
Reviewed by: truckman, matk
assign DMA address to the wrong address. It can cause system lockup
or other mysterious errors. Since most sound cards requires low DMA
address(BUS_SPACE_MAXADDR_24BIT) sndbuf_alloc() would fail when the
audio driver is loaded after long running of operations.
Approved by: jake (mentor)
Reviewed by: truckman, matk
- Implement dcons_ischar() and dcons_load_buffer().
- If loader passed a dcons buffer address, keep using it.
(We still need a patch to cheat memory management system.)
asynchronous. I realize that this means the custom application will
not work as written, but it is not okay to break most users of ugen(4).
The major problem is that a bulk read transfer is not an interrupt
saying that X bytes are available -- it is a request to be able to
receive up to X bytes, with T timeout, and S short-transfer-okayness.
The timeout is a software mechanism that ugen(4) provides and cannot
be implemented using asynchronous reads -- the timeout must start at
the time a read is done.
The status of up to how many bytes can be received in this transfer
and whether a short transfer returns data or error is also encoded
at least in ohci(4)'s requests to the controller. Trying to detect
the "maximum width" results in using a single buffer of far too
small when an application requests a large read.
Even if you combat this by replacing all buffers again with the
maximal sized read buffer (1kb) that ugen(4) would allow you to
use before, you don't get the right semantics -- you have to
throw data away or make all the timeouts invalid or make the
short-transfer settings invalid.
There is no way to do this right without extending the ugen(4) API
much further -- it breaks the USB camera interfaces used because
they need a chain of many maximal-width transfers, for example, and
it makes cross-platform support for all the BSDs gratuitously hard.
Instead of trying to do select(2) on a bulk read pipe -- which has
neither the information on desired transfer length nor ability to
implement timeout -- an application can simply use a kernel thread
and pipe to turn that endpoint into something poll-able.
It is unfortunate that bulk endpoints cannot provide the same semantics
that interrupt and isochronous endpoints can, but it is possible to just
use ioctl(USB_GET_ENDPOINT_DESC) to find out when different semantics
must be used without preventing the normal users of the ugen(4) device
from working.
The original idea was to use it for firmware upgrading and similar
operations. In real life almost all Bluetooth USB devices do not
need firmware download. If device does require firmware download
then ugen(4) (or specialized driver like ubtbcmfw(8)) should be
used instead.
MFC after: 3 days
in udp_input(), since the udbinfo lock is used to prevent removal of
the inpcb while in use (i.e., as a form of reference count) in the
in-bound path.
RELENG_5 candidate.
- Add a new _lock() call to each API that locks the associated chain lock
for a lock_object pointer or wait channel. The _lookup() functions now
require that the chain lock be locked via _lock() when they are called.
- Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup
the associated queue structure internally via _lookup() rather than
accepting a pointer from the caller. For turnstiles, this means that
the actual lookup of the turnstile in the hash table is only done when
the thread actually blocks rather than being done on each loop iteration
in _mtx_lock_sleep(). For sleep queues, this means that sleepq_lookup()
is no longer used outside of the sleep queue code except to implement an
assertion in cv_destroy().
- Change sleepq_broadcast() and sleepq_signal() to require that the chain
lock is already required. For condition variables, this lets the
cv_broadcast() and cv_signal() functions lock the sleep queue chain lock
while testing the waiters count. This means that the waiters count
internal to condition variables is no longer protected by the interlock
mutex and cv_broadcast() and cv_signal() now no longer require that the
interlock be held when they are called. This lets consumers of condition
variables drop the lock before waking other threads which can result in
fewer context switches.
MFC after: 1 month
it isn't printed if the IP address in question is '0.0.0.0', which is
used by nodes performing DHCP lookup, and so constitute a false
positive as a report of misconfiguration.
processes in jail could create raw sockets, additional access control
checks were added to raw IP sockets to limit the ways in which those
sockets could be used. Specifically, only the socket option IP_HDRINCL
was permitted in rip_ctloutput(). Other socket options were protected
by a call to suser(). This change was required to prevent processes
in a Jail from modifying system properties such as multicast routing
and firewall rule sets.
However, it also introduced a regression: processes that create a raw
socket with root privilege, but then downgraded credential (i.e., a
daemon giving up root, or a setuid process switching back to the real
uid) could no longer issue other unprivileged generic IP socket option
operations, such as IP_TOS, IP_TTL, and the multicast group membership
options, which prevented multicast routing daemons (and some other
tools) from operating correctly.
This change pushes the access control decision down to the granularity
of individual socket options, rather than all socket options, on raw
IP sockets. When rip_ctloutput() doesn't implement an option, it will
now pass the request directly to in_control() without an access
control check. This should restore the functionality of the generic
IP socket options for raw sockets in the above-described scenarios,
which may be confirmed with the ipsockopt regression test.
RELENG_5 candidate.
Reviewed by: csjp
Implement preemption between threads in the same ksegp in out of slot
situations to prevent priority inversion.
Tested by: pho
Reviewed by: jhb, julian
Approved by: sam (mentor)
MFC: ASAP
followed by "make depend" shouldn't do anything. It doesn't
seem to be a problem anymore, and if someone finds it to break
again, please contact me so we can work on a real fix.
Reviewed by: bde
- Sort kmod.mk knobs in the documentation section.
- Fixed misuses of the word "KLD" which stands for
"kernel ld", or "kernel linker", where kernel
module is meant.
- Removed redundant uses of ${.OBJDIR}.
- Whitespace and indentation fixes.
- CLEANFILES cleanup.
- Target redefinition protection (install.debug).
Submitted by: bde, ru
Reviewed by: ru, bde
- push all bridge logic from if_ethersubr.c into bridge.c
make bridge_in() return mbuf pointer (or NULL).
- call only bridge_in() from ether_input(), after ng_ether_input()
was optinally called.
- call bridge_in() from ng_ether_rcv_upper().
Long description: http://lists.freebsd.org/mailman/htdig/freebsd-net/2004-May/003881.html
Reported by: Jian-Wei Wang <jwwang at FreeBSD.csie.NCTU.edu.tw>
Tested by: myself, Sergey Lyubka
Reviewed by: sam
Approved by: julian (mentor)
MFC after: 2 months
interface with the network stack but are not yet sufficiently
synchronized to run without the Giant lock. It migh be possible
to mark the interfaces as IFF_NEEDSGIANT, but I'm unable to test
that configuration and am unfamiliar with the architecture of
i4b.
New devicename is ttyy{unit}{port}
No callout devices created as there is no modemcontrol on these ports.
Add data structure to represent each port to avoid excessive array use.
from within umass_ufi_transform(). This includes the 12-byte commands
FORMAT_UNIT, WRITE_AND_VERIFY, VERIFY, and READ_FORMAT_CAPACITIES
(sorted in numerical order).
Reviewed by: ken, scottl
MFC after: 2 weeks
md(8). The former is generally not going to fail, but the latter can
fail when the underlying swap device returns an error.
There are still plenty of other places where vm_pager_get_pages() failing
will lead directly to crashes, so it's a good idea to put your swap on
RAID if you care enough to put any of your disks on RAID....
by the time that kldload(8) returns. Satisfy that by making the GEOM
module load event -- only when the kernel is !cold -- wait until the
GEOM module init function has finished instead of returning immediately.
This is the other half of fixing md(8) (actually, "mfs" in fstab(5))
that is similar to r1.128 of src/sys/dev/md/md.c. This bug would be
why RAM disks would often fail on boot and the first call to mdconfig(8)
would probably fail.
pjd has ideas for not requiring kldload(8) to work synchronously for
control devices that could make this obsolete.
Silence on: -arch
that conjures up the device node so it isn't true PNP. Noticed by jhb@.
* Add an attachment for esscontrol since it too uses ISA_PNP_PROBE.
* Move an attachment from snd_mss to snd_pnpmss. The latter is the real
PNP user.
sysctl routines and state. Add some code to use it for signalling the need
to downconvert a data structure to 32 bits on a 64 bit OS when requested by
a 32 bit app.
I tried to do this in a generic abi wrapper that intercepted the sysctl
oid's, or looked up the format string etc, but it was a real can of worms
that turned into a fragile mess before I even got it partially working.
With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have
it not abort. Things like netstat, ps, etc have a long way to go.
This also fixes a bug in the kern.ps_strings and kern.usrstack hacks.
These do matter very much because they are used by libc_r and other things.
Extract the struct cdev pointer and the tty device from inside rather than
incorrectly casting the 'struct cdev *' pointer to a 'dev_t' int. Not
that this was particularly important since it was only used for reading
vmcore files.
that was fixed by this should not normally happen, and since I did not
record the traces of my failed build attempt that had been solved with
that change, it's not entirely clear whether it hadn't been a pilot
error on my end. In dubio pro reo. :-)
* Fix a bug where caches were flushed on non-C3 transitions.
* Be sure a working flush cache instruction is present before using it.
* Disable C3 completely if it isn't present.
well. This field is actually used by various netisr functions to determine
the availablility of the specified netisr. This uncomplete unregister leads
directly to a crash when the KLD unregistering the netisr is unloaded.
Submitted by: Sam <sah@softcardsystems.com>
MFC after: 3 days
remove previous entropy harvesting mutex names as they are no longer
present. Commit to this file was ommitted when randomdev_soft.c:1.5
was made.
Feet shot: Robert Huff <roberthuff at rcn dot com>
Sockets in the listen queues have reference counts of 0, so if the
protocol decides to disconnect the pcb and try to free the socket, this
triggered a race with accept() wherein accept() would bump the reference
count before sofree() had removed the socket from the listen queues,
resulting in a panic in sofree() when it discovered it was freeing a
referenced socket. This might happen if a RST came in prior to accept()
on a TCP connection.
The fix is two-fold: to expand the coverage of the accept mutex earlier
in sofree() to prevent accept() from grabbing the socket after the "is it
really safe to free" tests, and to expand the logic of the "is it really
safe to free" tests to check that the refcount is still 0 (i.e., we
didn't race).
RELENG_5 candidate.
Much discussion with and work by: green
Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by: Vlad <marchenko at gmail dot com>
may want to shut down here but the chance of BIOS vendors getting this
wrong is high. They're only supposed to announce this when all batteries
hit their critical level but past experience indicates we should be
conservative about this for now.
- Trade off granularity to reduce overhead, since the current model
doesn't appear to reduce contention substantially: move to a single
harvest mutex protecting harvesting queues, rather than one mutex
per source plus a mutex for the free list.
- Reduce mutex operations in a harvesting event to 2 from 4, and
maintain lockless read to avoid mutex operations if the queue is
full.
- When reaping harvested entries from the queue, move all entries from
the queue at once, and when done with them, insert them all into a
thread-local queue for processing; then insert them all into the
empty fifo at once. This reduces O(4n) mutex operations to O(2)
mutex operations per wakeup.
In the future, we may want to look at re-introducing granularity,
although perhaps at the granularity of the source rather than the
source class; both the new and old strategies would cause contention
between different instances of the same source (i.e., multiple
network interfaces).
Reviewed by: markm
reaching into the socket buffer. This prevents a number of potential
races, including dereferencing of sb_mb while unlocked leading to
a NULL pointer deref (how I found it). Potentially this might also
explain other "odd" TCP behavior on SMP boxes (although haven't
seen it reported).
RELENG_5 candidate.
and make it visible (same way as in OpenBSD). Describe usage in manpage.
This change is useful for creating custom free methods, which
call default free method at their end.
While here, make malloc declaration for mbuf tags more informative.
Approved by: julian (mentor), sam
MFC after: 1 month
when the spin lock in question isn't -- it's the critical_enter() that
KDB set. No more panic in DDB for console -> syscons -> tty -> knote
operations.
be used to announce various system activity.
The auxio device provides auxiliary I/O functions and is found on various
SBus/EBus UltraSPARC models. At present, only front panel LED is
controlled by this driver.
Approved by: jake (mentor)
Reviewed by: joerg
Tested by: joerg
state management corruption, mbuf leaks, general mbuf corruption,
and at least on i386 a first level splash damage radius that
encompasses up to about half a megabyte of the memory after
an mbuf cluster's allocation slab. In short, this has caused
instability nightmares anywhere the right kind of network traffic
is present.
When the polymorphic refcount slabs were added to UMA, the new types
were not used pervasively. In particular, the slab management
structure was turned into one for refcounts, and one for non-refcounts
(supposed to be mostly like the old slab management structure),
but the latter was almost always used through out. In general, every
access to zones with UMA_ZONE_REFCNT turned on corrupted the
"next free" slab offset offset and the refcount with each other and
with other allocations (on i386, 2 mbuf clusters per 4096 byte slab).
Fix things so that the right type is used to access refcounted zones
where it was not before. There are additional errors in gross
overestimation of padding, it seems, that would cause a large kegs
(nee zones) to be allocated when small ones would do. Unless I have
analyzed this incorrectly, it is not directly harmful.
with acpi but the timer runs twice as fast. Note that the main problem
(system doesn't work properly with acpi disabled) should be fixed separately.
Changes:
* Add a quirk to disable the timer
* Merge the P5A and P5A-B quirks since they appear to be based on the
same ASL.
PR: i386/72450
Tested by: Kevin Oberman <oberman es.net>
MFC after: 3 days
We return ENOBUF to indicate the problem, which is an errno that should be
handled well everywhere.
Requested & Submitted by: green
Silently okay'ed by: The rest of the firewall gang
MFC after: 3 days
Restructure pmap_enter() to prevent the loss of a page modified (PG_M) bit
in a race between processors. (This restructuring assumes the newly atomic
pte_load_store() for correct operation.)
Reviewed by: tegge@
PR: i386/61852
problem I had but it's happening in code that is messing around with
register windows - I'm willing to live with that piece being sensitive
to this and it looks like the other problems we had reported lately
are not fixed by using -O instead of -O2.
Sorry for the churn. Looks like I need a second pointy hat. Someone
tells me they stack well. :-))))
This also adds support for bigger disks on the controller I have access to,
and maybe others if I understood the adhoc methods used on those.
Those with more PC98 bigdrive controllers it is hereby invited to add/fix
support for those in geom_pc98.c and not using #ifdef PC98 all over the place.
callouts as non-CALLOUT_MPSAFE. Otherwise, they may trigger an
assertion regarding Giant if they enter other parts of the stack from
the callout.
MFC after: 3 days
Reported by: Dikshie < dikshie at ppk dot itb dot ac dot id >
-O2 on kernel compiles after all. While working on adding a KASSERT
to sparc64/sparc64/rwindow.c I found that it was "position sensitive",
putting it above a call to flushw() instead of below caused corruption
of processes on the system. jake and jhb have both confirmed there is
no obvious explanation for that. The exact same kernel code does not
have the process corruption problem if compiled with -O instead of -O2.
There have been signs of similar issues floated on the sparc64@ mailing
list, lets see if this helps make them go away.
Note this isn't an optimal fix as far as the file format goes, if this
disgusts too many people I'll fix it the right way. Since compiling
with something other than -O is a known problem this format would prevent
a change to the default causing grief. And this may also help motivate
finding out what the compiler is doing wrong so we can shift back to
using -O2. :-)
My turn for the pointy hat... One of the florescent ones...
MFC after: 2 days
After some discussion the best option seems to be to signal the thread's
death from within the kernel. This requires that thr_exit() take an
argument.
Discussed with: davidxu, deischen, marcel
MFC after: 3 days
allocate unallocated memory resources from the top 32MB of the address
space rather than the top 2GB. While the latter works on some
chipsets, it fails badly on others. 32MB is more conservative and
matches what cheap harware from this era is hardwired to pass.
RAM. Many older, legacy bridges only allow allocation from this
range. This only appies to devices who don't have their memory
assigned by the BIOS (since we allocate the ranges so assigned
exactly), so should have minimal impact.
Hoewver, for CardBus bridges (cbb), they rarely get the resources
allocated by the BIOS, and this patch helps them greatly. Typically
the 'bad Vcc' messages are caused by this problem.
(and panic). To try to finish making BPF safe, at the very least,
the BPF descriptor lock really needs to change into a reader/writer
lock that controls access to "settings," and a mutex that controls
access to the selinfo/knote/callout. Also, use of callout_drain()
instead of callout_stop() (which is really a much more widespread
issue).
Discussion: this panic (or waning) only occurs when the kernel is
compiled with INVARIANTS. Otherwise the problem (which means that
the vp->v_data field isn't NULL, and represents a coding error and
possibly a memory leak) is silently ignored by setting it to NULL
later on.
Panicking here isn't very helpful: by this time, we can only find
the symptoms. The panic occurs long after the reason for "not
cleaning" has been forgotten; in the case in point, it was the
result of severe file system corruption which left the v_type field
set to VBAD. That issue will be addressed by a separate commit.
all other threads to suicide, problem is execve() could be failed, and
a failed execve() would change threaded process to unthreaded, this side
effect is unexpected.
The new code introduces a new single threading mode SINGLE_BOUNDARY, in
the mode, all threads should suspend themself at user boundary except
the singler. we can not use SINGLE_NO_EXIT because we want to start from
a clean state if execve() is successful, suspending other threads at unknown
point and later resuming them from there and forcing them to exit at user
boundary may cause the process to start from a dirty state. If execve() is
successful, current thread upgrades to SINGLE_EXIT mode and forces other
threads to suicide at user boundary, otherwise, other threads will be resumed
and their interrupted syscall will be restarted.
Reviewed by: julian
table. acpidump(8) concatenates the body of the DSDT and SSDTs so an
edited ASL will contain all the necessary information. We can't use a
completely empty table since ACPI-CA reports this as a problem.
MFC after: 3 days
This really doesn't belong here but is preferred (for the moment) over
adding yet another mechanism for sending msgs from the kernel to user apps.
Reviewed by: imp
the raw values including for child process statistics and only compute the
system and user timevals on demand.
- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.
Submitted by: bde (mostly)
MFC after: 1 month
to control the packets injected while in sack recovery (for both
retransmissions and new data).
- Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c.
- Add a new sysctl (net.inet.tcp.sack.initburst) that controls the
number of sack retransmissions done upon initiation of sack recovery.
Submitted by: Mohan Srinivasan <mohans@yahoo-inc.com>
turnstile chain lock until after making all the awakened threads
runnable. First, this fixes a priority inversion race. Second, this
attempts to finish waking up all of the threads waiting on a turnstile
before doing a preemption.
Reviewed by: Stephan Uphoff (who found the priority inversion race)
+ * 9: 0x3f0-0x3f3,0x3f4-0x3f5,0x3f7
This requires only one change to support. Rather than keying on the
size of the resource being 2, instead key off the end & 7 being 3.
This covers the same cases that the size of 2 would catch, but also
covers the new above case.
In addition, I think it is clearer to use the end in preference to the
size and start for case #8 as well. Turns two tests into one, and
catches no other cases.
Make minor commentary changes to deal with new case #9.
# This change is specifically minimal to allow easy MFC. A more
# extensive change will go into current once I've had a chance to test
# it on a lot of hardware...
with different file systems. This may cause ill things
with my previous fix. Now it translate fsid of direct child of
mount point directory only.
Pointed out by: Uwe Doering
that is no longer required. (In fact, it is not clear that it was ever
required in HEAD or RELENG_4, only RELENG_3 required a work-around.) Now,
as before revision 1.251, if the preexisting PTE is invalid, pmap_enter()
does not call pmap_invalidate_page() to update the TLB(s).
Note: Even with this change, the handling of a copy-on-write fault is
inefficient, in such cases pmap_enter() calls pmap_invalidate_page() twice.
Discussed with: bde@
PR: kern/16568
so would cause kernel to produce an unkillable process in some cases,
especially, P_STOPPED_SINGLE has a singling thread, turning off the
bit would mess the state.
need to mask off the page offset bits. (This operation made some sense
prior to i386/i386/pmap.c revision 1.254 when we passed a physical address
rather than a vm_page pointer to pmap_enter().)
Discussed extensively with KAME. The API author's intent isn't clear at this
point, so rather than remove the code entirely, #if 0 out and put a big
comment in for now. The IPV6_RECVPATHMTU sockopt is available if the
application wants to be notified of the path MTU to optimize packet sizes.
Thanks to JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp> for putting up
with my incessant badgering on this issue, and fenner for pointing out
the API issue and suggesting solutions.
data endpoints. The control endpoint doesn't need read/write/poll
operations, and more importantly, the thread counts should be
separate so that the control endpoint can properly reference itself
while deleting and recreating the data endpoints.
* Add some macros that handle referencing/releasing devices, and use them
for sleeping/woken-up and open/close operations as apppropriate.
* Use d_purge for FreeBSD, and a loop testing the open status for all
the endpoints for NetBSD and OpenBSD, so that when the device is
detached, the right thing always happens.
restart the current waiting transfer. If this isn't done, the device's
next transfer (that we would like to do a short read on) is going to
return an error -- for short transfer.
* For bulk transfer endpoints, restore the maximum transfer length each
time a transfer is done, or the first short transfer will make all the
rest that size or smaller.
* Remove impossibilities (malloc(M_WAITOK) == NULL, &var == NULL).
New device names are "{tty|cua}A$(card)$(port)[.init|.lock]"
Put a portname in the port structure if SI_DEBUG is defined to avoid
need to inspect minor number to construct name..
Constify some strings.
Remove duplicated DBG_ #defines.
uses predate the change in the pmap_enter() interface that replaced the
page's physical address by the address of its vm_page structure. The
PHYS_TO_VM_PAGE() was being used to compute the address of the same vm_page
structure that was being passed in.
is a no-op on little endian architectures, but fixes getting the MAC
address for some dc(4) cards on big endian architectures.
This is a RELENG_5 candidate.
Tested by: gallatin (powerpc), marius (sparc64)
First version of the patch written by: gallatin
1. Process p1 is currently being swapped in.
2. Process p2 calls linux_ptrace(PTRACE_GETFPXREGS, p1_pid, ...)
3. After acquiring a reference to FIRST_THREAD_IN_PROC(p1),
p2 blocks in faultin() while p1 finishes being swapped in.
This means p2 won't get back the lock on p1 until after p1's
threads are runnable.
4. After p1 is swapped in, the first thread in p1 exits.
5. p2 now uses its dangling reference to p1's first thread.
after loading such a kernel, "module_path" will be set to
an insane value. Fixed example by providing an equivalent
setting. For the record, when automatically loading a
kernel (commands "boot" and "boot-conf"), the following is
tried, in this order:
path=/boot/${kernel} file=${bootfile}
path=/boot/${kernel} file=${kernel}
path=${kernel} file=${bootfile}
path=${kernel} file=${kernel}
path=${module_path} file=${kernel}
MP machines (hopefully). CPU timers are OK on UP machines but we
don't keep the timers in sync on MP machines so if the CPU's timer
is chosen as the primary timecounter it's possible for time to
not be monotonically increasing because different CPU's counters
may be used at different times. But the CPU's counters are otherwise
one of the higher quality counters available. So, on UP machines
we'll use a relatively high quality value but on MP machines we'll
use a quality that should prevent the CPU's counters from being chosen.
Requested by: green (who did the first version of the patch)
Reviewed by: marius, green
MFC after: 1 week
the structure space had been obtained from malloc() its contents is
random garbage. The choice of value being set is part of a larger effort
to solve some timecounter issues on MP machines (while working on that
we noticed this problem).
Noticed by: marius
Reviewed by: marius, green
MFC after: 3 days
the new subr_unit.c code.
For now assert Giant in ttycreate() and ttyfree(). It is not obvious that
it will ever pay off to lock these with anything else.
Allocation is always lowest free unit number.
A mixed range/bitmap strategy for maximum memory efficiency. In
the typical case where no unit numbers are freed total memory usage
is 56 bytes on i386.
malloc is called M_WAITOK but no locking is provided (yet). A bit of
experience will be necessary to determine the best strategy. Hopefully
a "caller provides locking" strategy can be maintained, but that may
require use of M_NOWAIT allocation and failure handling.
A userland test driver is included.
date: 2004/09/26 02:01:27; author: sam; state: Exp; lines: +0 -5
Correct handling of SADB_UPDATE and SADB_ADD requests. key_align may
split the mbuf due to use of m_pulldown. Discarding the result because
of this does not make sense as no subsequent code depends on the entire
msg being linearized (only the individual pieces). It's likely
something else is wrong here but for now this appears to get things back
to a working state.
Submitted by: Roselyn Lee
This change was also made in the KAME CVS repository as key.c:1.337 by
itojun.
people have reported problems (stickyness, aiming difficulty) which is proving
difficult to fix, so this will default to disable until sometime after 5.3R.
To enable Synaptics support, set the 'hw.psm.synaptics_support=1' tunable.
MT5 candidate.
Approved by: njl
longer than 'normal'. The cause is still being tracked down but
in the meantime there are machines where raising IPI_RETRIES does
help - it's not just a case of the machine staying locked up longer
and then panic-ing anyway. Several helpful folks on sparc64@ tried
a patch that helped figure out what to raise this number to.
Discussed on: sparc64@
MFC after: 3 days
pmap_copy(). This entails additional locking in pmap_copy() and the
addition of a "flags" parameter to the page table page allocator for
specifying whether it may sleep when memory is unavailable. (Already,
pmap_copy() checks the availability of memory, aborting if it is scarce.
In theory, another CPU could, however, allocate memory between
pmap_copy()'s check and the call to the page table page allocator,
causing the current thread to release its locks and sleep. This change
makes this scenario impossible.)
Reviewed by: tegge@
in the actual _FDE parsing. If the failure occurs earlier such as in
fdc_attach() then don't try to probe any drives.
MFC after: 3 days
Reviewed by: njl
Tested by: Christian Laursen xi at borderworlds dot dk
to make sure the pipe is ready. Some devices apparently don't support
the clear stall command however. So what happens when you issue such
devices a clear stall command? Typically, the command just times out.
This, at least, is the behavior I've observed with two devices that
I own: a Rio600 mp3 player and a T-Mobile Sidekick II.
It used to be that after the timeout expired, the pipe open operation
would conclude and you could still access the device, with the only
negative effect being a long delay on open. But in the recent past,
someone added code to make the timeout a fatal error, thereby breaking
the ability to communicate with these devices in any way.
I don't know exactly what the right solution is for this problem:
presumeably there is some way to determine whether or not a device
supports the 'clear stall' command beyond just issuing one and waiting
to see if it times out, but I don't know what that is. So for now,
I've added a special case to the error checking code so that the
timeout is once again non-fatal, thereby letting me use my two
devices again.
passing along socket information. This is required to work around a LOR with
the socket code which results in an easy reproducible hard lockup with
debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do
so later. The missing piece is to turn the filter locking into a leaf lock
and will follow in a seperate (later) commit.
This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in
forseeable future.
Suggested by: rwatson
A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;)
Reviewed by: rwatson, csjp
Tested by: -pf, -ipfw, LINT, csjp and myself
MFC after: 3 days
LOR IDs: 14 - 17 (not fixed yet)
This changes the naming of USB serial devices to: /dev/ttyU%d and
/dev/cuaU%d for call-in and call-out devices respectively. (Please
notice: capital 'U')
Please also note that we now have .init and .lock devices for USB
serial ports. These are not persistent across device removal. devd(8)
can be used to configure them on attachment time.
These changes also improve the chances of the system surviving if
the USB device is unplugged at an inconvenient time. At least we
do not rip things apart while there are any threads in the device
driver anymore.
Remove cdevsw, rely on the tty generic one.
Don't make_dev(), use ttycreate() which does all the magic.
In detach, do close procesing if we ripped things apart
while the device was open. Call ttyfree() once we're done
cleaning up.
generic way. This code will allow a similar amount of code to be
removed from most if not all serial port drivers.
Add generic cdevsw for tty devices.
Add generic slave cdevsw for init/lock devices.
Add ttypurge function which wakes up all know generic sleep
points in the tty code, and calls into the hw-driver if it
provides a method.
Add ttycreate function which creates tty device and optionally
cua device. In both cases .init/.lock devices are created
as well.
Change ttygone() slightly to also call the hw driver provided
purge routine.
Add ttyfree() which will purge and destroy the cdevs.
Add ttyconsole mode for setting console friendly termios
on a port.
is one, detect mbuf loops and stop, add an extra arg so you can only print
the first x bytes of the data per mbuf (print all if arg is -1), print
flags using %b (bitmask)...
No code in the tree appears to use m_print, and it's just a maner of adding
-1 as an additional arg to m_print to restore original behavior..
MFC after: 4 days
select(2), and discovered to my horror that ugen(4)'s bulk in/out support
is horribly lobotomized. Bulk transfers are done using the synchronous
API instead of the asynchronous one. This causes the following broken
behavior to occur:
- You open the bulk in/out ugen device and get a descriptor
- You create some other descriptor (socket, other device, etc...)
- You select on both the descriptors waiting until either one has
data ready to read
- Because of ugen's brokenness, you block in usb_bulk_transfer() inside
ugen_do_read() instead of blocking in select()
- The non-USB descriptor becomes ready for reading, but you remain blocked
on select()
- The USB descriptor becomes ready for reading
- Only now are you woken up so that you can ready data from either
descriptor.
The result is select() can only wake up when there's USB data pending. If
any other descriptor becomes ready, you lose: until the USB descriptor
becomes ready, you stay asleep.
The correct approach is to use async bulk transfers, so I changed
the read code to use the async bulk transfer API. I left the write
side alone for now since it's less of an issue.
Note that the uscanner driver has the same brokenness in it.
to 7422 since it appears that the 8169S can't transmit anything larger..
The 8169S can receive full jumbo frames, but we don't have an mru to let
the upper layers know this...
add fixup so that this driver should work on alignment constrained platforms
(!i386 && !amd64)
MFC after: 5 days
MAXWIN to the register window manipulation functions - rwindow_load()
calls rwindow_save() so this one addition should take care of both.
This should help find places that pcb_nsaved doesn't get initialized
properly.
Suggested by: jake
but it is possible:
1. Read data from good component for synchronization.
2. Write data to the same area.
3. Write synchronization data, which are now stale.
Found by: tegge (for gmirror)
on anything but DEVFS and in this case it was not even used (see below).
Put the NFS4 vop method for fifo's behind "#if 0" because it is unused.
Add a XXX comment to say that I think the unusedness is a bug.
for the new thread. The rest of the fields in the pcb wind up being
written to before they're read as a normal part of the pcb usage but
this field may be read upon return to userland, having it be uninitialized
garbage is bad.
Submitted by: Andrew Belashov (bel at orel dot ru)
Reviewed by: jake
MFC after: 3 days
but it is possible:
1. Read data from good component for synchronization.
2. Write data to the same area.
3. Write synchronization data, which are now stale.
Found by: tegge
mutexes instead.
This closes the last (known) race issues in ATA which should fix
the various hangs etc seen on heavy loaded systems.
Change from using timeout functions to using callout functions in
the timeout code. This together with above closes the race that could
happen if timeout and device interrupt occured simultaniously.
Also fix the possible recursion in ata_reinit() on very dodgy
devices that could take us down in the probe.
the trapframe via kdb_frame, but kdb_frame was not initialized until
after the call to kdb_cpu_trap(). Ergo: kdb_cpu_trap() was moved too
far up.
Pointy hat: marcel
the mbuf due to use of m_pulldown. Discarding the result because of this
does not make sense as no subsequent code depends on the entire msg being
linearized (only the individual pieces). It's likely something else is wrong
here but for now this appears to get things back to a working state.
Submitted by: Roselyn Lee
I have from Broadcom does not give much information on these devices,
so the Broadcom Linux driver was used for clues to what these chips
support. It turns out they are similar to the 5705 with the 5751
being the PCI-Express version and needing special work-arounds and
settings.
Use kthread_exit() instead of falling through the end of the worker
thread's main function. Since kthread_exit() wakeup(9)s everyone
sleeping on the thread handle, drop the superfluous wakeup() call.
When net.inet.ip.check_interface was MFCed to RELENG_4 3+ years ago in
rev. 1.130.2.17 ip_input.c it was 1 by default but shortly changed to
0 (accidently?) in rev. 1.130.2.20 in RELENG_4 only. Among with the
fact this knob is not documented it breaks POLA especially in bridge
environment.
OK'ed by: andre
Reviewed by: -current
frobbing the cdevsw.
In both cases we examine only the cdevsw and it is a good question if we
weren't better off copying those properties into the cdev in the first
place. This question will be revisited.
dev_refthread() will return the cdevsw pointer or NULL. If the
return value is non-NULL a threadcount is held which much be released
with dev_relthread(). If the returned cdevsw is NULL no threadcount
is held on the device.
use <machine/efi.h> for the necessary definitions. This makes the EFI
code in sys/boot/efi totally unused, except for pure EFI loaders. As
such, maintenance and porting (to IA-32) of the EFI code is made as easy
as possible.
because it was mostly irrelevant - except for the silly BIOS_PADDRTOVADDR
etc macros. Along the way of working around this, I missed a few things.
* Make syscons properly inherit the bios capslock/shiftlock/etc state like
i386 does. Note that we cannot inherit the bios key repeat rate because
that requires a bios call (which is impossible for us).
* Give syscons the ability to beep on amd64. Oops.
While here, make bios.c compile and add it to files.amd64.
PHYSADDR : Address of the physical memory
KERNPHYSADDR : Physical address where the kernel starts
KERNVIRTADDR : Virtual address of the kernel
STARTUP_PAGETABLE_ADDR : Where to put the page table at bootstrap
+ Xscale specific options
work out of the box too, but I have no hardware to test.
It works well enough to go multiuser. Network works, SATA does not, as I have
no drive to test.
Thanks to Intel for sending such a board.
Obtained from: NetBSD
Remove the cache state logic : right now, it provides more problems than it
helps.
Add helper functions for mapping devices while bootstrapping.
Reorganize the code a bit, and remove dead code.
Obtained from: NetBSD (partially)
tree:
- td_standin is (k + a) as it is only touched by either curthread or when
a thread is being created.
- td_upcall is (k + j)
- td_sticks is (k) rather than the earlier (j) note.
- td_uuticks and td_usticks are both (k).
- td_intrval is (j)
- Neither kg_nextupcall or kg_upquantum seem to be locked and that seems
to be on purpose, so mark those as (n).
It can be used to delay mounting root partition to give a chance to GEOM
providers to show up.
Now, when there is no needed provider, vfs_rootmount() function will look
for it every second and if it can't be find in defined time, it'll ask
for root device name (before this change it was done immediately).
This will allow to boot from gmirror device in degraded mode.
Users should move to the new geom_vinum implementation instead.
The refcount logic which is being added to devices to enable safe module
unloading and the buf/vm work also in progress would require a major rework
of the (old)-vinum code to comply with the new semantics.
The actual source files will not be removed until I have coordinated with
the geomvinum people if they need any bits repo-copied etc.
of the number of threads which are inside whatever is behind the
cdevsw for this particular cdev.
Make the device mutex visible through dev_lock() and dev_unlock().
We may want finer granularity later.
Replace spechash_mtx use with dev_lock()/dev_unlock().
A thread must hold mp while calling cv_signal(), cv_broadcast(), or
cv_broadcastpri() even though it isn't passed as an argument.
and is right with this claim.
While here remove a "\" from the macro -> __inline conversion.
Found by: csjp
MFC after: 4 days
old or previous value instead of void. This is not as is documented
in atomic(9), but is API (and ABI) compatible and simply makes sense.
This feature will primarily be used for atomic PTE updates in PMAP/ng.
was seen when configuring addresses on interfaces using ifconfig. This
patch has been verified to work with over eight thousand addresses
assigned to an interface.
LOR id: 031
panic on hub detach bugs that have been reported. This work around
detaches the device before deleting it. This changes the detach order
from in-order to pre-order. This avoids uhub's deleting the children
after its subdevs has been deleted.
This is only a workaround. This leads to a strange condition in the
device tree where attached devices are children of detached ones. I
really don't know what that's supposed to mean, but does violate my
sense of POLA. Fortunately, the violation is short lived, which is
why I'm going ahead and committing the work around.
# We really need to consider life w/o the multiple nested layers of
# compatibility macros. They make finding bugs like this *MUCH*
# harder.
Patch by: iadowse
MT5 before: next_release(5.3-BETA5) (unless someting better comes along)
multiprocessors. Specifically, the error is conditioning the call to
pmap_invalidate_page() on whether the pmap is active on the current CPU.
This call must be unconditional. Regardless of whether the pmap is active
on the CPU performing _pmap_unwire_pte_hold(), it could be active on another
CPU. For example, a call to pmap_remove_all() by the page daemon could
result in a call to _pmap_unwire_pte_hold() with the pmap inactive on the
current CPU and active on another CPU. In such circumstances, failing to
call pmap_invalidate_page() results in a stale TLB entry on the other CPU
that still maps the now deallocated page table page. What happens next is
typically a mysterious panic in pmap_enter() by the other CPU, either
"pmap_enter: attempted pmap_enter on 4MB page" or "pmap_enter: pte vanished,
va: 0x%lx". Both occur because the former page table page has been recycled
and allocated to a new purpose. Consequently, it no longer contains zeroes.
See also Peter's i386/i386/pmap.c revision 1.448 and the related e-mail
thread last year.
Many thanks to the engineers at Sandvine for providing clear and concise
information until all of the pieces of the puzzle fell into place and
for testing an earlier patch.
MT5 Candidate
Better to kill all other threads than to panic the system if 2 threads call
execve() at the same time. A better fix will be committed later.
Note that this only affects the case where the execve fails.
a stack trace from ddb, the output will pause with a '--More--' prompt
every 18 lines. If you hit Enter, it will print another line and prompt
again. If you hit space it will output another page and then prompt.
If you hit 'q' or 'x' it will abort the rest of the stack trace.
- Fix the sparc64 userland stack trace to honor the total count of lines
to print. This is useful if your trace happens to walk back onto
0xdeadc0de and gets stuck in an endless loop.
MFC after: 1 month
Tested on: i386, alpha, sparc64
Actually, it can even cause some problems, because GEOM requires sectorsize
to be more than 0 on first access, not on provider creation, so we can skip
valid providers by doing this check here.
Reported by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Sven Willenberger <sven@dmv.com>
Ask uma_zcreate() to align mbufs to MSIZE bytes (otherwise dtom() breaks)
As it happens, uma_zalloc_arg() always returned mbufs aligned to MSIZE
anyway, but that was an implementation side-effect....
KASSERT -> CTASSERT suggested by: dd@
Approved by: silence on -net
and 0x3f7. fdc_isa_alloc_resource() didn't work right in this case
(it accessed FDOUT correctly due to an overflow of the first resource.
It accesed FDSTS and FDDATA incorrectly via the second resource (which
wound up accessing FDOUT and the tape register at 0x3f3) and badly for
the CTL register (at location 0x3f4). This is a minimal fix that just
'eats' the first one if it covers two locations and has an offset of
0. This confusion lead the floppy driver to think there'd been a disk
change, which uncovered a deadlock in the floppy/geom code which lead
to a panic. These changes fix that by fixing the underlying resource
problem, but doesn't address the potential deadlock issue that might
still be there.
This is a minimal fix so it can more safely be merged into 5 w/o risk
for known working configurations (hence the use of the ugly goto,
which reduces case 8 to case 6 w/o affecting cases 1-7). A more
invasive fix that will handle more ACPI resource list diversity is in
the pipeline that should kill these issues once and for all, while
staying within the resources that we allocate.
Tested/Reported by: das
Reviewed by: njl
MFC before: re->next_release_name(5.3-BETA5);
registers are control bits or depending on the model contain additional
time bits with a different meaning than the lower ones. In order to
only read the desired time bits and not change the upper bits on write
use appropriate masks in the gettime and settime function respectively.
Due to the polarity of the stop oscillator bit and the fact that the
century bits aren't used on sparc64 not masking them didn't cause
problems so far.
- Fix two off-by-one errors in the handling of the day of week. The
genclock code represents the dow as 0 - 6 with 0 being Sunday but the
mk48txx use 1 - 7 with 1 being Sunday. In the settime function when
writing the dow to the clock the range wasn't adjusted accordingly but
the clock apparently played along nicely otherwise the second bug in
the gettime function which mapped 1 - 7 to 0 - 6 but with 0 meaning
Saturday would have been triggered. Fixing these makes the date being
stored in the same format Sun/Solaris uses and cures the "Invalid time
in real time clock. Check and reset the date immediately!" when the
date was set under Solaris prior to booting FreeBSD/sparc64. [1]
Looking at other clock drivers/code e.g. FreeBSD/alpha the former "bug",
i.e. storing the dow as 0 - 6 even when the clock uses 1 - 7, seems to
be common but might be on purpose for compatibility when multi-booting
with other OS which do the same. So it might make sense to add a flag
to handle the dow off-by-one for use of this driver on platforms other
than sparc64.
- Check the state of the battery on mk48txx that support this in the
attach function.
- Add a note that use of the century bit should be implemented but isn't
required at the moment because it isn't used on sparc64.
Problem noted by: joerg [1]
MT5 candidate.
the page table page's wired count rather than its hold count to contain
the reference count. My rationale for this change is based on several
factors:
1. The machine-independent and pmap layers used the same hold count field
in subtly different ways. The machine-independent layer uses the hold
count to implement a form of ephemeral wiring that is used by pipes,
physio, etc. In other words, subsystems where we wish to temporarily
block a page from being swapped out while it is mapped into the kernel's
address space. Such pages are never removed from the page queues.
Instead, the page daemon recognizes a non-zero hold count to mean "hands
off this page." In contrast, page table pages are never in the page
queues; they are wired from birth to death. The hold count was being
used as a kind of reference count, specifically, the number of valid
page table entries within the page. Not surprisingly, these two
different uses imply different synchronization rules: in the machine-
independent layer access to the hold count requires the page queues
lock; whereas in the pmap layer the pmap lock is required. Thus,
continued use by the pmap layer of vm_page_unhold(), which asserts that
the page queues lock is held, made no sense.
2. _pmap_unwire_pte_hold() was too forgiving in its handling of the wired
count. An unexpected wired count on a page table page was ignored and
the underlying page leaked.
3. In a word, microoptimization. Using the wired count exclusively, rather
than a combination of the wired and hold counts, makes the code slightly
smaller and faster.
Reviewed by: tegge@
UMA_ZONE_NOFREE to guarantee type stability, so proc_fini() should
never be called. Move an assertion from proc_fini() to proc_dtor()
and garbage-collect the rest of the unreachable code. I have retained
vm_proc_dispose(), since I consider its disuse a bug.
too much kernel copying, but it is not the right way to do it, and it is
in the way for straightening out the buffer cache.
The right way is to pass the VM page array down through the struct
bio to the disk device driver and DMA directly in to/out off the
physical memory. Once the VM/buf thing is sorted out it is next on
the list.
Retire most of vnode method. ffs_getpages(). It is not clear if what is
left shouldn't be in the default implementation which we now fall back to.
Retire specfs_getpages() as well, as it has no users now.
Completely remove the remaining EFI includes and add our own (type)
definitions instead. While here, abstract more of the internals by
providing interface functions.
Analogous to the drive level, give each volume and plex a worker thread
that picks up and processes incoming and completed BIOs.
This should fix the data corruption issues that have come up a few
weeks ago and improve performance, especially of RAID5 plexes.
The volume level needs a little work, though.
to 4.0 and RELENG_3), the BTX mini-kernel used paging rather than flat
mode and clients were limited to a virtual address space of 16 megabytes.
Because of this limitation, boot2 silently masked all physical addresses
in any binaries it loaded so that they were always loaded into the first
16 Meg. Since BTX no longer has this limitation (and hasn't for a long
time), remove the masking from boot2. This allows boot2 to load kernels
larger than about 12 to 14 meg (12 for non-PAE, 14 for PAE).
Submitted by: Sergey Lyubka devnull at uptsoft dot com
MFC after: 1 month
EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers
conflict with the Intel ACPI headers (duplicate type definitions), so
are being phased out in the kernel.
These normally only manifest if the ndis compat module is statically
compiled into a kernel image by way of 'options NDISAPI'.
Submitted by: Dmitri Nikulin
Approved by: wpaul
PR: kern/71449
MFC after: 1 week
driver. Trim its fingernails by removing some useless bits before
fixing the 'thread not terminated on detach' problem.
o dmacnt is no longer used now that we allocate at attach time. Remove
it from struct fdc_data.
o ISPNP was only ever set, but never tested. It used to be used for the
allocation routines to change how it allocated resources. Since that's
no longer necessary, retire the flag.
o ISPCMICA was only ever tested, but never set. GC it. This removes
a special case in determining the drive type. The drive type is
now set in fdc_pcmcia.c, so the hack isn't needed anymore. Sadly,
this isn't tested with a Y-E Data pcmcia floppy drive because there
are a number of other issues that preclude it from working.
o Fix ifdef for reading from the rtc. I'm of the opinion that this ifdef
should be moved into fdc_isa.c, but not today as ideally there'd be
other fixes to the probing of children. So now we just read it on
i386 ! pc98 (there's no #define for MACHINE_ARCH, just MACHINE, hence
this slightly inelegant kludge) and amd64. The PC98 exclusion likely
isn't meaningful since pc98 uses a different driver, but will be when
merging of the pc98 floppy code into this driver is complete (this is the
other reason I think this block of code belongs outside fdc.c).
All of these changes are safe to MT5.
most if not all of our tty drivers in the future.
Centralizing this stuff enables us to remove about 100 lines of
almost but not quite perfectly copy&paste code from each tty driver.
This is necessary so source upgrades use the correct binary.
MFC after: 3 days
For the record: Problem spotted by Scott Long, who mentioned
that source upgrades from 4.7 to recent 5.x and 6.0 are broken.
Detailed analysis shows that 4.7 has a broken make(1) binary.
A breakage was fixed in RELENG_4 in make/main.c,v 1.35.2.7 by
imp@, though the commit log erroneously stated "MFC 1.68"
while in fact it should have been spelled as "MFC 1.67".
copper NICs, a link change event is posted whenever MII autopolling is
toggled off and on, which happens whenever someone calls
bge_miibus_readreg() or bge_miibus_writereg() to access the PHY
registers. This means anytime someone called the SIOCGIFMEDIA ioctl
on a bge interface, the link would reset. Even a simple "ifconfig bge0"
would do it, though other apps like dhclient or the PPPoE daemon could
trigger it as well. An obvious symptom of this problem is lots of
"bgeX: gigabit link up" messages appearing on the console for no
apparent reason.
Through experimentation, I determined that when a real link change
event occurs, the BGE_MIMODE_AUTOPOLL in the BGE_MI_MODE register
is always set, so now if we have a copper NIC and an link change
event occurs and the BGE_MIMODE_AUTOPOLL bit is clear, we ignore
the event.
Note that this does not apply to the original BCM5700 chip since we
use a different method for sensing link changes with that chip (the
status block method was broken), nor to fiber optic NICs since they
don't use the GMII PHY access registers.
another way to misinterpret the spec. Also, always fall back to the hints
probe on any attach failure, not just when _FDE fails.
Thanks to imp and scottl for finding this.
Tested by: rwatson (minimally)
MFC after: 5 days
functions that can be called from enable/disable pf as well. This improves
switching from non-altq ruleset to altq ruleset (and the other way 'round)
by a great deal and makes pfctl act like the user would except it to.
PR: kern/71746
Tested by: Aurilien "beorn" Rougemont (PR submitter)
MFC after: 3 days
After this change it should be possible to use very big md(4) devices.
- Clean up and simplify the code a bit.
- Use humanize_number(3) to print size of md(4) devices.
- Add 't' suffix which stands for terabyte.
- Make '-S' to really work with all types of devices.
- Other minor changes.
it is only used in one function. While doing so, change its type to
vm_ooffset_t.
We are still limited for swap-backed devices to 16TB on 32-bit architectures
where PAGE_SIZE is 4096 bytes.
family to the ip_protox[] array. The protocol number of IPPROTO_DIVERT is
larger than IPPROTO_MAX and was initializing memory beyond the array.
Catch all these kinds of errors by ignoring protocols that are higher than
IPPROTO_MAX or 0 (zero).
Add more comments ip_init().
This make "_NEC" devices appear as "NEC" which is more corrent.
The reason is tha NEC originally screwed up on the byteorder in the
model string, so now that they have realized that they prefixed the '_'
so that not every ATA driver on the planet would call them "EN C" :)
reserving it at use time is more miserly, low memory (< 16MB)
evaporates quickly on many systems, so there may not be any suitable
buffers available. This specifically doesn't use the newer, fancier
isa_dma_init to ease merging to 5.
Reviewed by: tegge, phk
the firmware status register on the card to see if the firmware is still
running. There is no way to recover from this, but at least it can give
a hint as whether the car has crashed (which happens all too often).
MFC after: 3 days
errors for the attachment process for the floppy controller. This is
a band-aide because it doesn't try any of the fallback methods when
_FDE isn't long enough, but should be sufficient for people
experiencing the dreaded mutex not initialized panic.
not sure yet about 5.x... MFC if needed.
Also fixes small problems with examining some registers and
some specific gdb transfer problems.
As the patch says:
This is not a pretty patch and only meant as a temporary
fix until a better solution is committed.
PR: i386/71715
Submitted by: Stephan Uphoff <ups@tree.com>
MFC after: 1 week
it travels through the IP stack. This wasn't much of a problem because IP
source routing is disabled by default but when enabled together with SMP and
preemption it would have very likely cross-corrupted the IP options in transit.
The IP source route options of a packet are now stored in a mtag instead of the
global variable.
and which takes a M_WAITOK/M_NOWAIT flag argument.
Add compatibility isa_dmainit() macro which whines loudly if
isa_dma_init() fails.
Problem uncovered by: tegge
problems:
1) Add locking for SMP, code provided by Alan Cox
2) While testing Alan's patches, I observed serious problems with
the jumbo buffer allocation code (machine crashed twice), so I gutted
it and rewrote the receive handler to use multiple chained descriptors.
Each RX descriptor gets a single 2K cluster, and the chip will fill in
as many as it needs to hold the complete packet.
User reports that this corrects the data corruption issues previously
observed and discussed on -current.
Note that this driver still needs to be hit with the busdma stick.
I intend to inflict said beating in the near future.
MFC after: 1 week
careful with the skip condition this time. Addresses are only not taken into
account if:
- The interface is POINTTOPOINT
- There is no route installed for the address
- The user specified noalias (:0)
and - We are looking at an IPv4 address.
This should be enough paranoia to not cause any false positives.
PR: misc/69954
Discussed with: yongari
MFC after: 4 days
o Allow for up to 3 resource I/O ranges to be given for the floppy
controller, rather than just two that are allowed for now.
o Make sure that we can work with either a base address of 0x3f0 or 0x3f2.
o Create new inline functions to access the YE DATA's unique BDCR register.
o Update pccard attachment to add the fd device.
o Do some minor style(9) polishing.
# I'm guessing that the fdc pccard attachment broke some time ago, since
# there are a number of issues with it still.
save to call if_attachdomain from if_attach() (as done for if_loop.c). We
will now end up with a properly initialized if_afdata array and the nd6
callout will no longer try to deref a NULL pointer.
Still this is a temp workaround and the locking for if_afdata should be
revisited at a later point.
Requested by: rwatson
Discussed with and tested by: yongari (a while ago)
PR: kern/70393
MFC after: 5 days
filters). After the ipfw to pfil move ip_input() expects M_FASTFWD_OURS
tagged packets to have ip_len and ip_off in host byte order instead of
network byte order.
PR: kern/71652
Submitted by: mlaier (patch)
and sent to the DIVERT socket while the original packet continues with the
next rule. Unlike a normally diverted packet no IP reassembly attemts are
made on tee'd packets and they are passed upwards totally unmodified.
Note: This will not be MFC'd to 4.x because of major infrastucture changes.
PR: kern/64240 (and many others collapsed into that one)
doesn't do this is beyond me, but that will be investigated later. This
results in programming the chip with the correct frequency, which in turn
allows devices to negotiate up to the full 20MB/s.
and the previously malloc'ed snapshot lock.
Malloc struct snapdata instead of just the lock.
Replace snapshot fields in cdev with pointer to snapdata (saves 16 bytes).
While here, give the private readblock() function a vnode argument
in preparation for moving UFS to access GEOM directly.
preparation for integration of p4::phk_bufwork. In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly. Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.
the loss of a page modified (PG_M) bit in a race between processors.
Quoting Tor:
One scenario where the old code could cause a lost PG_M bit is a
multithreaded linux program (or FreeBSD program using the
linuxthreads port) where one thread was starting a subprocess.
The thread doing fork() would call vmspace_fork(), which would then
call vm_map_copy_entry() which would call pmap_protect() on an area
possibly accessed by other threads.
Additionally, make the clearing of PG_M by pmap_protect() unconditional if
write permission is removed. Previously, PG_M could persist on a read-only
unmanaged page. That seems inconsistent and confusing.
In collaboration with: tegge@
MT5 candidate
PR: 61852
calls in sb_cmd2() and sb_getmixer(). The lock has already be grabbed
before these functions are called.
This is a RELENG_5 candidate.
PR: 71189
Submitted by: stephane
MFC after: 3 days
the pseudo header. We really need the TCP packet length here. This happens
to end up in ip->ip_len in tcp_input.c, but here we should get it from the
len function variable instead.
Submitted by: yongari
Tested by: Nicolas Linard, yongari (sparc64 + hme)
MFC after: 5 days
fully initialed when the pmap layer tries to call sched_pini() early in the
boot and results in an quick panic. Use ke_pinned instead as was originally
done with Tor's patch.
Approved by: julian
value was only enough for 8GB of RAM, the new value can do 16GB. This still
isn't optimal since it doesn't scale. Fixing this for amd64 looks to be
fairly easy, but for i386 will be quite difficult.
Reviewed by: peter
scheduler specific extension to it. Put it in the extension as
the implimentation details of how the pinning is done needn't be visible
outside the scheduler.
Submitted by: tegge (of course!) (with changes)
MFC after: 3 days
VT6122 gigabit ethernet chip and integrated 10/100/1000 copper PHY.
The vge driver has been added to GENERIC for i386, pc98 and amd64,
but not to sparc or ia64 since I don't have the ability to test
it there. The vge(4) driver supports VLANs, checksum offload and
jumbo frames.
Also added the lge(4) and nge(4) drivers to GENERIC for i386 and
pc98 since I was in the neighborhood. There's no reason to leave them
out anymore.
to device driver unloading. Adding these two now and MT5'ing them will
allow us to preserve binary compatibility on RELENG_5 when these facilities
are MFC'ed.
MT5 Candiate.
holds sndstat_lock across a call to uiomove(), which is not legal
to do with a mutex because of the possibility that the data transfer
could sleep because of a page fault. It is not possible to just
unlock the mutex for the uiomove() call without introducing another
locking mechanism to prevent the body of sndstat_read() from being
re-entered. Converting sndstat_lock to an sx lock is the least
complicated change.
This is a candidate for RELENG_5.
LOR: 030
MFC after: 4 days
example the maximum segment size is 64K while the boundary is set
to 8K due to controller limitations. It is impossible to NOT cross
the boundary for any segment size that's larger than the boundary.
So, once we inherited the boundary from the parent tag, make sure
to reduce the maximum segment size to the boundary if it was larger.
MT5 candidate.
branch prediction optimization for LINT, because the kernel was too
large. This commit now removes it altogether since it causes build
failures for GENERIC kernels and the various applicable trends are
such that one can expect that it these failure will cause more
problems than they're worth in the future. These trends include:
1. Alpha was demoted from tier 1 to tier 2 due to lack of active
support. The number of people willing to fix build breakages
is not likely to increase and those developers that do have the
gumption to test MI changes on alpha are not likely to spend
time fixing unexpected build failures first.
2. The kernel will only increase in size. Even though stripped-down
kernels do link without problems now, compiler optimizations (like
inlining) and new (non-optional) functionality will likely cause
stripped-down kernels to break in the future as well.
So, with my asbestos suit on, get rid of potential problems before
they happen.
MT5 candidate.
redundant at this point and should be retired). Don't free subdevs if
we don't attach any devices. This was leaving stale device_t's
around. Don't touch the device if it isn't attached since the name
isn't meaningful then. Switch from strncpy (properly used) to
strlcpy.
From a patch submitted by Peter Pentchev
device_t instances when no driver attaches. They are left around, and
we need to remember them.
# The usbd_device_handle->subdevs[] array likely is completely bogus
# at this point, but one change at a time, since its removal will need
# to have similar code replace it extracted from newbus.
Part of the patch submitted by Peter Pentchev after an excellent
analysis of the underlying problems.
MFC After: 1 week
across frames. Basically, if the current frame is for the
'dblfault_handler' function, then get the next %eip and %ebp values to use
from the original TSS of the thread that has the saved state when the
double fault triggered.
MFC after: 4 days
produced better results for a test program I had here, it didn't
substantially change the number of crashes that I saw. Both the old
code and the new code seemed to produce the same crashes from the usb
layer. Since the new code also solves a close() crash, go with it
until the underlying issues wrt devices going away can be addressed.
it only if we weren't UP before. In some cases xl_init causes long media
re-negotiation, and ppp(8) fails to open PPPoE connection because it sets
IFF_UP every time before opening PPPoE connection.
PR: kern/69133
Patch by: mdodd
Approved by: wpaul, julian (mentor)
MFC after: 1 week
BPFD_LOCK() when removing a descriptor from an interface descriptor
list. Hold both over the operation, and do a better job at
maintaining the invariant that you can't find partially connected
descriptors on an active interface descriptor list.
This appears to close a race that resulted in the kernel performing
a NULL pointer dereference when BPF sessions are detached during
heavy network activity on SMP systems.
RELENG_5 candidate.
to use queue(3) list macros rather than hand-crafted lists. While
here, move to doubly linked lists to eliminate iterating lists in
order to remove entries. This change simplifies and clarifies the
list logic in the BPF descriptor code as a first step towards revising
the locking strategy.
RELENG_5 candidate.
Reviewed by: fenner
(disabled) defid_gen members from u_long to u_int32_t so that alignment
requirements don't cause the structure to become larger than struct fid
on LP64 platforms. This fixes NFS exports of msdos filesystems on at
least amd64.
PR: 71173
Fix a problem in previous: we can't blindly assume that we have
wincnt entries available at the offset the file has been found. If the dos
directory entry is not preceded by appropriate number of long name
entries (happens e.g. when the filesystem is corrupted, or when
the filename complies to DOS rules and doesn't use any long name entry),
we would overwrite random directory entries.
There are still some problems, the whole thing has to be revisited and solved
right.
Submitted by: Xin LI
Fix a panic that occurred when trying to traverse a corrupt msdosfs
filesystem. With this particular corruption, the code in pcbmap()
would compute an offset into an array that was way out of bounds,
so check the bounds before trying to access and return an error if
the offset would be out of bounds.
Submitted by: Xin LI
The reference counts are there to block detach until the sleepers in
read/write/ioctl have gotten out, not to prevent the open device from
going away. Restore the old behavior so that we have a chance to wake
up sleepers when the usb device goes away, so they can properly return
EIO back to the caller when this happens.
Otherwise, we have a guarnateed panic waiting to happen when a device
detaches with an active read channel.
This should be merged to 5 asap.
and was propagated to nearly every platform. The boundary of the child needs
to consider the boundary of the parent and pick the minimum of the two, not
the maximum. However, if either is 0 then pick the appropriate one.
This bug was exposed by a recent change to ATA, which should now be fixed by
this change. The alignment and maxsegsz tag attributes likely also need
a similar review in the near future.
This is a MT5 candidate.
Reviewed by: marcel
Submitted by: sos (in part)
to avoid ABI changes. It is set to the last time the interface
counters were zeroed, currently the time if_attach() was called. It is
intentended to be a valid value for RFC2233's ifCounterDiscontinuityTime
and to make it easier for applications to verify that the interface they
find at a given index is the one that was there last time they looked.
Due to space constraints ifi_epoch is a time_t rather then a struct
timeval. SNMP would prefer higher precision, but this unlikely to be
useful in practice.
the alignment and boundary constraints are being respected, which
fixes the reported ATA problems with SiI chips.
I consider the busdma implementation worrisome nonetheless. Not
only is there too much MI code duplicated in MD files, there's a
lot of questionable code. I smell a wholesale, cross-platform
overhaul coming...
MT5 candidate.
It can be switched back once 5.3 is tested and released. Also turn on
PREEMPTION as many of the stability problems with it have been fixed.
MT5: 3 days.
would turn off all fans when initializing a zone. However, the HP Omnibook
500 generates a notify saying the zone needs to be re-evaluated whenever
its fan is switched on or off. This produced an infinite loop. Also, note
that running _SCP can generate the same notify.
Since we need to make sure old fan references are turned off when getting
new ones, run acpi_tz_monitor() first. This will turn off any unneeded
fans. Then, check for new settings. After that, run acpi_tz_monitor()
again to turn on/off any fans referenced by the new settings.
Tested by: brooks
returned and then infer the state from calls to _ON/_OFF. This works
around a problem in systems that don't correctly report the state (i.e.
the HP Omnibook 500 reports "on" for its fan always after it has been
turned on once).
field.
Replace three instances of longhaired initialization va_filerev fields.
Added XXX comment wondering why we don't use random bits instead of
uptime of the system for this purpose.
I was unable to test this as the PAE kernel crashed with a "cannot copy
LDT" before coming up. When this gets a bit more testing, I'll fix the PAE
conf file to allow isp devices.
PR: 59728
loss links, and 1 second appeared to be too small for high latency links.
If we will receive more complaints, we should make this parameter configurable.
PR: kern/69536
Approved by: archie, julian (mentor)
MFC after: 3 days
happens when a proc exits, but needs to inform the user that this has
happened.. This also means we can remove the check for detached from
proc and sig f_detach functions as this is doing in kqueue now...
MFC after: 5 days
and you botch a call to nmount(2).
This is because there is an INVARIANTS check that asserts that
opt->len must be zero if opt->val is not NULL. The problem is that
the code does not actually follow this invariant if there is an
error while processing mount options.
Fix the code to honor the INVARIANT.
Silence on: fs@
sectorsize in order to avoid a lot of checks around various divisions etc.
Enforce the sectorsize being > 0 with a KASSERT on successful open.
Fix scsi_cd.c to return 2k sectors when no media inserted.
the flags field will be improperly initialized resulting in inconsistent
operation (sometimes with Giant, sometimes without, et al).
RELENG_5 candidate.
state test as well as set, or we risk a race between a socket wakeup
and registering for select() or poll() on the socket. This does
increase the cost of the poll operation, but can probably be optimized
some in the future.
This appears to correct poll() "wedges" experienced with X11 on SMP
systems with highly interactive applications, and might affect a plethora
of other select() driven applications.
RELENG_5 candidate.
Problem reported by: Maxim Maximov <mcsi at mcsi dot pp dot ru>
Debugged with help of: dwhite
cd9660_readdir() to return the address of the file's first data block as
the inode number instead of the address of the directory entry, but
neglected to update cd9660_vget_internal() for the new inode numbering
scheme.
Since the NFS server calls VFS_VGET (cd9660_vget()) with inode numbers
returned through VOP_READDIR (cd9660_readdir()) when servicing a READDIRPLUS
request, these two interfaces must agree on the numbering scheme; failure to
do so caused panics and/or bogus information about the entries to be returned
to clients using READDIRPLUS (Solaris, FreeBSD w/ mount -o rdirplus).
PR: 63446
to RS232 bridges, such as the one found in the DeLorme Earthmate USB GPS
receiver (which is the only device currently supported by this driver).
While other USB to serial drivers in the tree rely heavily on ucom, this
one is self-contained. The reason for that is that ucom assumes that
the bridge uses bulk pipes for I/O, while the Cypress parts actually
register as human interface devices and use HID reports for configuration
and I/O.
The driver is not entirely complete: there is no support yet for flow
control, and output doesn't seem to work, though I don't know if that is
because of a bug in the code, or simply because the Earthmate is a read-
only device.
but with slightly cleaned up interfaces.
The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.
The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.
Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.
Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.
The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.
A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.
Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.
Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week
write and zero-fill faults to run without holding Giant. It is still
possible to disable Giant-free operation by setting debug.mpsafevm to 0 in
loader.conf.
This essentially merges revs 1.143-1.1445 of sys/pci/if_rl.c.
This now marks the interrupt MPSAFE along with making the mutex non-recursive.
Looked over by: bms
In places where we have long delays that doesn't depend on too accurate
timing, use ata_udelay() instead of DELAY() so we dont uselessly spin
the CPU if not nessesary;
It makes no sense in a PAE environment and is impossible to handle correctly.
This case is also never used right now. This should make the iir(4) driver
ready for PAE.
number generator, which was re-seeded via a timeout. Now centralized
randomness/entropy is used, we can garbage collect the timeout and
re-seeding code (which was largely a no-op).
Discussed with: itojun, suz, JINMEI Tatuya < jinmei at isl dot rdc dot toshiba dot co dot jp >
FULL_PREEMPTION is defined. Add a runtime warning to ULE if PREEMPTION is
enabled (code inspired by the PREEMPTION warning in kern_switch.c). This
is a possible MT5 candidate.
returning incompletely initialized processes. This problem was
eliminated by kern_proc.c:1.215, which causes pfind() not to
return processes in the PRS_NEW state.
unless ACPI_DEBUG is defined. Users don't typically care about errant
breakpoint instructions. The HP Pavilion 7915 has this in its PCI0
_INI method for rev 0x6040000 of the RSDT.
Removed support for Intel 82541ER
Added fix for 82547 which corrects an issue with Jumbo frames larger than 10k.
Added fix for vlan tagged frames not being properly bridged.
Corrected TBI workaround.
Corrected incorrect LED operation issues
Submitted by: tackerman (Tony Ackerman)
MFC after: 2 weeks
have been in place all the time the mtx_assert in the ALTQ code just
discovered the shortcoming.
PR: i386/71195
Tested by: Bettan (PR originator), myself
MFC after: 5 days
increasing it. Add code to ifconfig to use this size to find the
sockaddr_dl after the struct if_data in the routing message. This
allows struct if_data to grow (up to 255 bytes) without breaking
ifconfig.
Submitted by: peter
should only affect current resources, it seems best to wait until all
configuration is done before disabling it. If this fixes any problems, it
is a MT5 candidate.
ENOENT when there's no PNP ID for this device node, or ENXIO when there
is one, but it doesn't match.
In the nonPNP case (as most Alpha systems appear to be), we were
treating the error return as an error, when it should be have ignored
it. Version 1.9 properly ignored it, but the attach re-write of 1.10
introduced this logic error.
Pointy Hat to: phk (for breaking it then asking me to fix it :-)
Sponsored by: The Voices in Bill Paul's Head, LLC
update tick count for userland in thread_userret. This change
also removes a "no upcall owned" panic because fuword() schedules
an upcall under heavily loaded, and code assumes there is no upcall
can occur.
Reported and Tested by: Peter Holm <peter@holm.cc>
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.
MFC after: 3 days
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.
MFC after: 2 days
remaining consumers to have the count passed as an option. This is
i4b, pc98/wdc, and coda.
Bump configvers.h from 500013 to 600000.
Remove heuristics that tried to parse "device ed5" as 5 units of the ed
device. This broke things like the snd_emu10k1 device, which required
quotes to make it parse right. The no-longer-needed quotes have been
removed from NOTES, GENERIC etc. eg, I've removed the quotes from:
device snd_maestro
device "snd_maestro3"
device snd_mss
I believe everything will still compile and work after this.
the old one is. Hence we need to evaluate the old one for _FDI since it
has a valid ACPI_HANDLE ivar. This is a minimal fix. Make a note that a
more complete one is to make fdc support the ACPI_HANDLE ivar for its
children.
This and the previous change are MT5 candidates.
time the interface counters were zeroed, currently the time if_attach()
was called. It is indentended to be a valid value for RFC2233's
ifCounterDiscontinuityTime and to make it easier for applications to
verify that the interface they find at a given index is the one that was
there last time they looked.
An if_epoch "compatability" macro has not been created as ifi_epoch has
never been a member of struct ifnet.
Approved by: andre, bms, wollman
so that it becomes happy and no longer panics the system
upon getting the very first packet to transmit.
Reported and tested by: Igor Timkin <ivt@gamma.ru>
Reviewed by: rwatson
MFC after: 5 days
syscall can interrupt other thread's syscall in sleepq_catch_signals().
Current, all callers know thread_suspend_check may suspend thread
itself, so we need't to check return_instead for normal suspension
flags (no P_SINGLE_EXIT set).
Tested by: deischen
Reported by: Maarten L. Hekkelman <m.hekkelman@cmbi.kun.nl>
do not set the virtual address to the bus address when the bus
doesn't have either of the PCI_RF_DENSE or PCI_RF_BWX flags set.
The TGA driver uses the virtual address to access the registers,
which on some machines can cause a memory management fault. Map
the bus address as K0SEG virtual memory instead. Note that with
some hardware combinations involving the TGA2 adapter this change
merely results that the memory management fault is replaced by a
machine check.
this in my tree for a while and in its disabled state there are no
issues. It isn't enabled yet because some drivers (in acpi) have side
effects in their probe routines that need to be resolved in some
manner before this can be turned on. The consensus at the last
developer's summit was to provide a static method for each driver
class that will return characteristics of the driver, one of which is
if can be reprobed idempotently.
address I've lost, that move the location information to the atttach
routine as well. While one could use devinfo to get this data, that
is difficult and error prone and subject to races for short lived
devices.
Would make a good MT5 candidate.
nor are they 3D accelarators as the description would like us to
believe. Since the TGA2 adapter has a VGA mode (unlike the TGA adapter),
one can use the VGA driver instead.
This fixes GENERIC kernels on alpha with TGA2 adapters.
page zeroing thread before it has been created. It was possible for
calls to free() very early in the boot process to panic here because
the sleep queues were not yet initialised. Specifically, sysinit_add()
running at SI_SUB_KLD would trigger this if the array of pointers
became big enough to require uma_large_alloc() allocations.
Submitted by: peter
format modules, which are currently only used on the amd64 platform.
This initial implementation just parses enough of the module to
allow it to extract dependencies and load all the bits into the
right place in memory, so the kernel must still do the full relocation
and linking. The details of the loaded sections are passed to the
kernel by supplying a copy of the ELF section header table as module
metadata with the MODINFOMD_SHDR tag.
better relocation support for the amd64 and i386 platforms. This
should not result in any change in functionality, but moves a step
towards supporting the relocatable object file modules on amd64.
The same hack/trick as load_elf*.c uses is used here to simultaneously
support both elf32 and elf64 on amd64 and i386.
operation using NET_NEEDS_GIANT(). This will result in a boot-time
restoration of Giant-enabled network operation, or run-time warning on
dynamic load (applicable only to the Netgraph component). Additional
components will likely need to be marked with this in the future.
will cause the network stack to operate without the Giant lock by
default. This change has the potential to improve performance by
increasing parallelism and decreasing latency in network processing.
Due to the potential exposure of existing or new bugs, the following
compatibility functionality is maintained:
- It is still possible to disable Giant-free operation by setting
debug.mpsafenet to 0 in loader.conf.
- Add "options NET_WITH_GIANT", which will restore the default value of
debug.mpsafenet to 0, and is intended for use on systems compiled with
known unsafe components, or where a more conservative configuration is
desired.
- Add a new declaration, NET_NEEDS_GIANT("componentname"), which permits
kernel components to declare dependence on Giant over the network
stack. If the declaration is made by a preloaded module or a compiled
in component, the disposition of debug.mpsafenet will be set to 0 and
a warning concerning performance degraded operation printed to the
console. If it is declared by a loadable kernel module after boot, a
warning is displayed but the disposition cannot be changed. This is
implemented by defining a new SYSINIT() value, SI_SUB_SETTINGS, which
is intended for the processing of configuration choices after tunables
are read in and the console is available to generate errors, but
before much else gets going.
This compatibility behavior will go away when we've finished the last
of the locking work and are confident that operation is correct.
This way of operation is more robust than the "AI" used
before.
Add flags to mbr accessible from make.conf as BOOT_MBR_FLAGS.
Only one flag is defined now, "allow using packet mode", which
is 0x80 in accord with the rest of i386 boot code. The "packet"
flag is on by default.
PR: i386/70241
Submitted by: Valentin Nechayev <netch <@> netch.kiev.ua> (inital version)
Discussed with: jhb (by Valentin Nechayev)
Tested on: bochs (with EDD turned on or off by patching the BIOS), PCs
the flag, fall back to the old INT13/AH=02 function if that fails.
This way of operation is less likely to fail with modern BIOSes and
large disks of strange geometries.
PR: i386/70241
Submitted by: Valentin Nechayev <netch <@> netch.kiev.ua> (inital version)
Discussed with: jhb (by Valentin Nechayev)
Tested on: bochs (with EDD turned on or off by patching the BIOS), PCs
need of sched_lock in some places. Also in thread_userret, remove
spare thread allocation code, it is already done in thread_user_enter.
Reviewed by: julian
preemption and/or the rev 1.79 kern_switch.c change that was backed out.
The thread was being assigned to a runq without adding in the load, which
would cause the counter to hit -1.
o Remove PSM_SYNCERR_THRESHOLD1. This value specified how many sync
errors were required before the mouse is re-initialised.
Re-initialisation is now done after (packetsize * 2) sync errors as
things aren't likely to improve after that.
o Reset lastinputerror when re-initialisation occurs. We don't want
to continue to drop data after re-initialisation.
o Count the number of failed packets independently of the syncerrors
statistic. syncerrors is useful for recovering sync within a single
packet. pkterrors allows us to detect when the mouse changes its
packet mode due to some external event (e.g. KVM switch).
o Reinitialize the mouse if we see more than psmpkterrthresh errors
during the validation period. The validation period begins as soon
as a sync error is detected and continues until psmerrsecs/msecs
time has elapsed. The defaults for these two values force a reset
if we see two packet errors in a 2 second period. This allows rapid
detection of packet framing errors caused by the mouse changing packet
modes.
o Export psmpkterrthresh as a sysctl
o Export psmloglevel as a sysctl.
o Enable more debugging code to be enabled at runtime via psmloglevel.
o Simplify verbose conditioned loging by using a VLOG macro.
o Add several comments describing the sync recovery algorithm of
this driver.
Large Portions by: Brian Somers <brian@Awfulhak.org>
Inspired and Frustrated by: Belkin KVMs
Reviewed by: njl, philip
pollfd's to avoid calling malloc() on small numbers of fd's. Because
smalltype's members have type char, its address might be misaligned
for a struct pollfd. Change the array of char to an array of struct
pollfd.
PR: kern/58214
Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at>
Reviewed by: bde (a long time ago)
MFC after: 3 days
these two reasons:
1. On ia64 a function pointer does not hold the address of the first
instruction of a functions implementation. It holds the address
of a function descriptor. Hence the user(), btrap(), eintr() and
bintr() prototypes are wrong for getting the actual code address.
2. The logic forces interrupt, trap and exception entry points to
be layed-out contiguously. This can not be achieved on ia64 and is
generally just bad programming.
The MCOUNT_FROMPC_USER macro is used to set the frompc argument to
some kernel address which represents any frompc that falls outside
the kernel text range. The macro can expand to ~0U to bail out in
that case.
The MCOUNT_FROMPC_INTR macro is used to set the frompc argument to
some kernel address to represent a call to a trap or interrupt
handler. This to avoid that the trap or interrupt handler appear to
be called from everywhere in the call graph. The macro can expand
to ~0U to prevent adjusting frompc. Note that the argument is selfpc,
not frompc.
This commit defines the macros on all architectures equivalently to
the original code in sys/libkern/mcount.c. People can take it from
here...
Compile-tested on: alpha, amd64, i386, ia64 and sparc64
Boot-tested on: i386
valid pmap to the pmap functions that require one. Remove the checks for
NULL. (These checks have their origins in the Mach pmap.c that was
integrated into BSD. None of the new code written specifically for
FreeBSD included them.)
diffs against #ifdef'd version of IPSEC, use "struct thread *p"
rather than "struct proc *p", fix some white space, and make some
already inconsistent white space inconsiste differently.
its users.
netisr_queue() now returns (0) on success and ERRNO on failure. At the
moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full)
are supported.
Previously it would return (1) on success but the return value of IF_HANDOFF()
was interpreted wrongly and (0) was actually returned on success. Due to this
schednetisr() was never called to kick the scheduling of the isr. However this
was masked by other normal packets coming through netisr_dispatch() causing the
dequeueing of waiting packets.
PR: kern/70988
Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after: 3 days
to point to a local IP address; and the packet was sourced from this host
we fill in the m_pkthdr.rcvif with a pointer to the loopback interface.
Before the function ifunit("lo0") was used to obtain the ifp. However
this is sub-optimal from a performance point of view and might be dangerous
if the loopback interface has been renamed. Use the global variable 'loif'
instead which always points to the loopback interface.
Submitted by: brooks
compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.
If no hooks are connected the entire packet filter hooks section and related
activities are jumped over. This removes any performance impact if no hooks
are active.
Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
Add missing untimeout that would get lost in handling of some
error situations, and caused what looked like random timeouts
afterwards when the timeout fired.
- Use copyinstr() to read cs_win, cs_dos, cs_local strings from the
mount argument structure instead of reading through user-space pointers(!).
- When mounting a filesystem, or updating an existing mount, only try to
update the iconv handles from the information in the mount argument
structure if the structure itself has the MSDOSFSMNT_KICONV flag set.
- Attempt to handle failure of update_mp() in the MNT_UPDATE case.
UNIX domain socket garbage collection implementation, as that risks
holding the mutex over potentially sleeping operations (as well as
introducing some nasty lock order issues, etc). unp_gc() will hold
the lock long enough to do necessary deferal checks and set that it's
running, but then release it until it needs to reset the gc state.
RELENG_5 candidate.
Discussed with: alfred
The C code assumes that the carry bit is always kept from the previous
operation. However, the pointer indexing requires another add operation.
Thus, the carry bit from the first operation is tromped over by the
"addl" operation that ends up following it, so the "adcl" that follows
that has no effect because the carry bit is cleared before it.
The result is checksum failure on received packets.
The larger issue is that there isn't any other way of preventing the compiler
inserting arbitrary instructions between different __asm statements (and
that the commit message in revision 1.13 of in_cksum.h is wrong on
this point). From
http://developer.apple.com/documentation/DeveloperTools/gcc-3.3/gcc/Extended-Asm.html
---8<---8<---8<---
You can't expect a sequence of volatile asm instructions to remain
perfectly consecutive. If you want consecutive output, use a single
asm. Also, GCC will perform some optimizations across a volatile
asm instruction; GCC does not "forget everything" when it encounters
a volatile asm instruction the way some other compilers do.
---8<---8<---8<---
Also, this change also makes the ASM code much easier to read.
PR: 69257
Submitted by: Mike Bristow <mike@urgle.com>, Qing Li <qing.li@bluecoat.com>
prevent leakage of Giant. With INVARIANTS, this results in an
assertion failure following execution of the RPC. Without INVARIANTS,
it could result in problems if the NFS server is killed causing nfsd
to return to user space holding Giant.
Feet provided by: brueffer
On a system with huge number of pipes, M_NOWAIT failes almost always,
because of memory fragmentation.
My fix is different than the patch proposed by Pawel Malachowski,
because in FreeBSD 5.x we cannot sleep while holding dummynet mutex
(in 4.x there is no such lock).
My fix is also ugly, but there is no easy way to prepare nice and clean fix.
PR: kern/46557
Submitted by: Eugene Grosbein <eugen@grosbein.pp.ru>
Reviewed by: mlaier
of the MCOUNT_ENTER, MCOUNT_EXIT and MCOUNT_DECL defines. Also make
sure there's a prototype of _MCOUNT_DECL(). This allows us to build
a kernel. There are still unresolved symbols, so linking fails.
_mcount() stub when profiling is enabled. Emit this code sequence
for assembly routines as welli (MCOUNT definition in <machine/asm.h>.
We do not pass the GOT entry however as the 4th argument, because it's
not used. The _mcount() stub calls __mcount(), which does the actual
work. Define _MCOUNT_DECL to define __mcount. We do not have an
implementation of mcount(), so we define MCOUNT as empty, but have a
weak alias to _mcount() in _mcount.S.
Note that the _mcount() stub in the kernel is slightly different from
the stub in userland. This is because we do not have to worry about
nested routines in the kernel.
by default. As such, mark if_dc as IFF_NEEDSGIANT until such
time as appropriate locking review and testing can take place,
and the locking can be enabled by default.
RELENG_5 candidate.
send routine. In IPv6 UDP, the thread will be passed to suser(), which
asserts that if a thread is used for a super user check, it be
curthread. Many of these protocol entry points probably need to
accept credentials instead of threads.
MT5 candidate.
Noticed/tested by: kuriyama
directly. This removes a few more users of the stackgap and also marks
the syscalls using these wrappers MP safe where appropriate.
Tested on: i386 with linux acroread5
Compiled on: i386, alpha LINT
date: 2004/08/22 14:48:55; author: rwatson; state: Exp; lines: +0 -2
Don't need to assert Giant in fw_output(), only in the firewire start
routine.
Approved by: re (scottl)
buffers with kqueue filters is no longer required: the kqueue framework
will guarantee that the mutex is held on entering the filter, either
due to a call from the socket code already holding the mutex, or by
explicitly acquiring it. This removes the last of the conditional
socket locking.
Set the DMA SGL length correctly if the DMA request must be chained because
it is too large to fit in one SGL.
This should fix this driver for some Dell Precision systems.
RELENG_5 candidate.
PR: kern/66479
Submitted by: HITOSHI Osada <qfh02545@nifty.com>
We were obtaining different spin mutexes (which disable interrupts after
aquisition) and spin waiting for delivery. For example, KSE processes
do LDT operations which use smp_rendezvous, while other parts of the
system are doing things like tlb shootdowns with a different mutex.
This patch uses the common smp_rendezvous mutex for all MD home-grown
IPIs that spinwait for delivery. Having the single mutex means that
the spinloop to aquire it will enable interrupts periodically, thus
avoiding the cross-ipi deadlock.
Obtained from: dwhite, alc
Reviewed by: jhb
the geometry code to grab a mutex that prohibits any driver on the
stack below it from sleeping, it's not safe to allow anything in
the top half of isp to sleep (excepting the thread that Fibre Channel
instances use to re-scan loops/fabrics).
hold its own values, pass them up to the parent (acpi0) and merge/uniq them
on the way. After the namespace evaluation, acpi will reserve these
resources and manage them via rman before bus_generic_probe() and
bus_generic_attach(). This is necessary because some systems specify
conflicting resources in separate sysresource objects. It's also cleaner
in that the interface between sysresource and acpi is now merely the parent's
resource list. This code handles the following cases:
1. Unique resource: add it to the parent via bus_set_resource().
2. New wholly contained in old: discard new.
3. New tail overlaps old head: grow old head downward.
AND/OR
4. New head overlaps old tail: grow old tail upward.
Tested by: Pawel Worach <sajd_at_telia.com>
Tested by: Radek Kozlowski <radek_at_raadradd.com>
MFC after: 5 days
with VmWare 4.x. At least with VmWare version 4.5.2, i386 version of
atomic_cmpset_int() is about 30 times slower than non-i386 version. It
makes this delta a good 5.3 MFC candidate, since otherwise it will
mislead users who run FreeBSD under modern VmWare otherwise.
valid; otherwise a caller could trick us into changing any 32-bit word
in kernel memory to LINUX_SOL_SOCKET (0x00000001) if its previous value
is SOL_SOCKET (0x0000ffff).
MFC after: 3 days
The prefix management code currently resides in nd6, leaving only the
unused router renumbering capability in the in6_prefix files. Removing
it will make it easier for us to provide locking for the remainder of
IPv6 by reducing the number of objects requiring synchronized access.
This functionality has also been removed from NetBSD and OpenBSD.
Submitted by: George Neville-Neil <gnn at neville-neil.com>
Discussed with/approved by: suz, keiichi at kame.net, core at kame.net
deal with 24-bit addresses. While the two other attachments, namely
isa and cbus, do it properly, the PCI attachment was passing
BUS_SPACE_MAXADDR instead of BUS_SPACE_MAXADDR_24BIT. This bug
became apparent with the new contigmalloc() code.
This fixes the problem reported with lnc(4) interfaces inside VMWare,
and should theoritically also fix any user of a PCI lnc(4) card. It
is a RELENG_5 MFC candidate.
Tested by: Florian Le Goff <madflo@beertech.org>
position that is 64-bit aligned and makes sure that the valid and
dirty fields are also 64-bit aligned. This means that if PAGE_SIZE
is 32K, the size of the vm_page structure is only increased by 8
bytes instead of 16 bytes. More importantly, the vm_page structure
is either 120 or 128 bytes on ia64. These are "interesting" sizes.
before returning. Device nodes are created via the "taste" mechanism,
so this is necessary in order to make sure that devfs entries are
created before mdconfig(8) returns.
This may be a MFC candidate for 5.3.
Suggested by: phk
allocation. Notably, in this case, the driver tries to allocate several
pieces of memory and then fails if the pieces allocated after the first
do not come after it physically, and within a specific range (8MB I
believe). Of course, this could just as easily fail for any number of
reasons, but it almost always fails now that contiguous allocations start
at the end of possible specified memory locations rather than the beginning.
Allocate all the possibly-needed memory up front, even though it's a waste,
to get around this. The least bogus solution would be to take the physical
address from the first allocation and create a new tag that specified that
further allocations must follow it within that 8MB window, then use that
when allocating new channels, but that's left for anyone else that really
feels like doing it.
Tested by: Erwin Lansing <erwin@lansing.dk>
Previously the early drop was disabled unconditionally for ALTQ-enabled
kernels.
This should give some benefit for the normal gateway + LAN-server case with
a busy LAN leg and an ALTQ managed uplink.
Reviewed and style help from: cperciva, pjd
verification of regular data when device is in complete state.
On verification error, EIO error is returned for the bio and sysctl
kern.geom.raid3.stat.parity_mismatch is increased.
Suggested by: phk
The whole problem seems to be size. Which is odd, because it is said
that size doesn't matter. Anyway... Add -Os to strategic places in the
makefile to have the final loader be as mall as possible. This seems
to be enough to make it work. For now... I think something is more
fundamentally wrong; or something more fundamental is wrong. Potato,
potaato.
be manipulated by prison root. In 4.x prison root can not manipulate
system flags, regardless of the security level. This behavior
should remain consistent to avoid any surprises which could lead
to security problems for system administrators which give out
privileged access to jails.
This commit changes suser_cred's flag argument from SUSER_ALLOWJAIL
to 0. This will prevent prison root from being able to manipulate
system flags on files.
This may be a MFC candidate for RELENG_5.
Discussed with: cperciva
Reviewed by: rwatson
Approved by: bmilekic (mentor)
PR: kern/70298
as m_len, or the pkthdr length will be inconsistent with the actual
length of data in the mbuf chain. The symptom of this occuring was
"out of data" warnings from in_cksum_skip() on large UDP packets sent
via the loopback interface.
Foot shot: green
The binutils 2.15 assembler now automaticly and non-optionally adds
the .eh_frame section for unwind information. This section appears
to wreck havoc to the final boot code. Fix this by using a special
linker script that discards the .eh_frame sections, but is otherwise
identical to the linker internal script used for -N.
Compiler used: gcc 3.3.5
Verified with: binutils 2.14 & binutils 2.15 (stock and in-tree)
Tested with: /boot/loader & /boot/netboot
the tunable or sysctl 'net.route.netisr_maxqlen'. Default the maximum
depth to 256 rather than IFQ_MAXLEN due to the downsides of dropping
routing messages.
MT5 candidate.
Discussed with: mdodd, mlaier, Vincent Jardin <jardin at 6wind.com>
rule only in place of all rules match. This is similar to how ipfw(8) works.
Provide a sysctl, mac_bsdextended_firstmatch_enabled, to enable this
feature.
Reviewed by: re (jhb)
Aprroved by: re (jhb)
manipulating a vnode, e.g., calling vput(). This reduces contention for
Giant during many copy-on-write faults, resulting in some additional
speedup on SMPs.
Note: debug_mpsafevm must be enabled for this optimization to take effect.
as well, even if device is in complete state.
I observe 40% of speed-up with this option for random read operations,
but slowdown for sequential reads.
Basically, without this option reading from a RAID3 device built from 5
components (c0-c4) looks like this:
Request no. Used components
1 c0+c1+c2+c3
2 c0+c1+c2+c3
3 c0+c1+c2+c3
With the new feature:
Request no. Used components
1 c0+c1+c2+c3
2 (c1^c2^c3^c4)+c1+c2+c3
3 c0+(c0^c2^c3^c4)+c2+c3
4 c0+c1+(c0^c1^c3^c4)+c3
5 c0+c1+c2+(c0^c1^c2^c4)
6 c0+c1+c2+c3
[...]
security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged
access to jails is given out, it is possible for prison root to manipulate
various network parameters which effect the host environment. This commit
plugs a number of security holes associated with the use of raw sockets
and prisons.
This commit makes the following changes:
- Add a comment to rtioctl warning developers that if they add
any ioctl commands, they should use super-user checks where necessary,
as it is possible for PRISON root to make it this far in execution.
- Add super-user checks for the execution of the SIOCGETVIFCNT
and SIOCGETSGCNT IP multicast ioctl commands.
- Add a super-user check to rip_ctloutput(). If the calling cred
is PRISON root, make sure the socket option name is IP_HDRINCL,
otherwise deny the request.
Although this patch corrects a number of security problems associated
with raw sockets and prisons, the warning in jail(8) should still
apply, and by default we should keep the default value of
security.jail.allow_raw_sockets MIB to 0 (or disabled) until
we are certain that we have tracked down all the problems.
Looking forward, we will probably want to eliminate the
references to curthread.
This may be a MFC candidate for RELENG_5.
Reviewed by: rwatson
Approved by: bmilekic (mentor)
result of the notify() function to decide if we need to unlock the
in6pcb or not, rather than always unlocking. Otherwise, we may unlock
and already unlocked in6pcb.
Reported by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Tested by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Discussed with: mdodd
UDP/IP header, make sure that space is also allocated for the link
layer header. If an mbuf must be allocated to hold the UDP/IP header
(very likely), then this will avoid an additional mbuf allocation at
the link layer. This trick is also used by TCP and other protocols to
avoid extra calls to the mbuf allocator in the ethernet (and related)
output routines.
to check aperture size, avoiding hangs. Maintain the rest of the bits when
setting/unsetting ATTBASE. This essentially matches Linux's AGP driver as well.
PR: kern/70037
Submitted by: Mark Tinguely <tinguely at casselton dot net>
Obtained from: NetBSD
in the shutdown_final state if the RB_NOSYNC flag is set.
The specific motivation in this case is that a system panic in an
interrupt context results in a call to module_shutdown(), which
calls g_modevent(), which calls g_malloc(..., M_WAITOK), which
results in a second panic. While g_modevent() could be fixed to
not call malloc() for MOD_SHUTDOWN events (which it doesn't handle
in any case), it is probably also a good idea to entirely skip the
execution of the module shutdown handlers after a panic.
This may be a MFC candidate for RELENG_5.
shutdown_pre_sync state if the RB_NOSYNC flag is set. This is the
likely cause of hangs after a system panic that are keeping crash
dumps from being done.
This is a MFC candidate for RELENG_5.
MFC after: 3 days
systems that have overlapping regions specified in their sysresource
objects. This patch fixes ATA DMA and acpi_timer allocation for such
sysctems. It should eventually be moved to resource_list_add() if it is
a valid generalized approach. The minimal approach for 5.3 is:
"Loop through all current resources to see if the new one overlaps
any existing ones. If so, the old one always takes precedence and
the new one is adjusted (or rejected). We check for three cases:
1. Tail of new resource overlaps head of old resource: truncate the
new resource so it is contiguous with the start of the old.
2. New resource wholly contained within the old resource: error.
3. Head of new resource overlaps tail of old resource: truncate the
new resource so it is contiguous, following the old."
Tested by: Radek Kozlowski <radek_at_raadradd.com>
Discussed with: imp
MFC after: 4 days
of 0x3f2-0x3f5,0x3f7 the ports are not 7 bytes apart. This should fix
floppy probing on such systems. (We handle the case of adjusting for
a start of 0x3f2 -> 0x3f0 separately, although that code should still be
checked if there are still floppy problems for others.)
Tested by: Sarunas Vancevicius <vsarunas_at_eircom.net>
MFC after: 3 days
sockets are connection-oriented for the purposes of kqueue
registration. Since UDP sockets aren't connection-oriented, this
appeared to break a great many things, such as RPC-based
applications and services (i.e., NFS). Since jmg isn't around I'm
backing this out before too many more feet are shot, but intend to
investigate the right solution with him once he's available.
Apologies to: jmg
Discussed with: imp, scottl
Centralize the fdctl_wr() function by adding the offset in
the resource to the softc structure.
Bugfix: Read the drive-change signal from the correct place:
same place as the ctl register.
Remove the cdevsw{} related code and implement a GEOM class.
Ditch the state-engine and park a thread on each controller
to service the queue.
Make the interrupt FAST & MPSAFE since it is just a simple
wakeup(9) call.
Rely on a per controller mutex to protect the bioqueues.
Grab GEOMs topology lock when we have to and Giant when
ISADMA needs it. Since all access to the hardware is
isolated in the per controller thread, the rest of the
driver is lock & Giant free.
Create a per-drive queue where requests are parked while
the motor spins up. When the motor is running the requests
are purged to the per controller queue. This allows
requests to other drives to be serviced during spin-up.
Only setup the motor-off timeout when we finish the last
request on the queue and cancel it when a new request
arrives. This fixes the bug in the old code where the motor
turned off while we were still retrying a request.
Make the "drive-change" work reliably. Probe the drive on
first opens. Probe with a recal and a seek to cyl=1 to
reset the drive change line and check again to see if we
have a media.
When we see the media disappear we destroy the geom provider,
create a new one, and flag that autodetection should happen
next time we see a media (unless a specific format is configured).
Add sysctl tunables for a lot of drive related parameters.
If you spend a lot of time waiting for floppies you can
grab the i82078 pdf from Intels web-page and try tuning
these.
Add sysctl debug.fdc.debugflags which will enable various
kinds of debugging printfs.
Add central definitions of our well known floppy formats.
Simplify datastructures for autoselection of format and
call the code at the right times.
Bugfix: Remove at least one piece of code which would have
made 2.88M floppies not work.
Use implied seeks on enhanced controllers.
Use multisector transfers on all controllers. Increase
ISADMA bounce buffers accordingly.
Fall back to single sector when retrying. Reset retry count
on every successful transaction.
Sort functions in a more sensible order and generally tidy
up a fair bit here and there.
Assorted related fixes and adjustments in userland utilities.
WORKAROUNDS:
Do allow r/w opens of r/o media but refuse actual write
operations. This is necessary until the p4::phk_bufwork
branch gets integrated (This problem relates to remounting
not reopening devices, see sys/*/*/${fs}_vfsops.c for details).
Keep PC98's private copy of the old floppy driver compiling
and presumably working (see below).
TODO (planned)
Move probing of drives until after interrupts/timeouts work
(like for ATA/SCSI drives).
TODO (unplanned)
This driver should be made to work on PC98 as well.
Test on YE-DATA PCMCIA floppy drive.
Fix 2.88M media.
This is a MT5 candidate (depends on the bioq_takefirst() addition).
is an effective band-aid for at least some of the scheduler corruption seen
recently. The real fix will involve protecting threads while they are
inconsistent, and will come later.
Submitted by: julian
requires a recompile of netgraph users.
Also change the size of a field in the bluetooth code
that was waiting for the next change that needed recompiles so
it could piggyback its way in.
Submitted by: jdp, maksim
MFC after: 2 days
changes to the ATA driver cause a kernel crash, no fault of the ATA
code. Work is in progress to add the necessary feature to the sparc64
kernel and this commit will be backed out when it is complete. This
bandaid is being put in mostly in the interests of getting the first
release snapshot done and out the door.
Tested on: Ultra-10 exhibiting the insta-panic.
MFC: Real Soon
If the bioq is empty, NULL is returned. Otherwise the front element
is removed and returned.
This can simplify locking in many drivers from:
lock()
bp = bioq_first(bq);
if (bp == NULL) {
unlock()
return
}
bioq_remove(bp, bq)
unlock
to:
lock()
bp = bioq_takefirst(bq);
unlock()
if (bp == NULL)
return;
Since pmap_enter() calls pmap_invalidate_page(), which needs interrupts
enabled in the SMP case, we defer the disable to right before saving the
register context. This has been incorrect for about a year but caused no
real problems because the identity page never actually replaces a previously
mapped page and suspend/resume on SMP systems has been uncommon.
Tested by: sos
MFC after: 3 days
the ipfw KLD.
For IPFIREWALL_FORWARD this does not have any side effects. If the module
has it but not the kernel it just doesn't do anything.
For IPDIVERT the KLD will be unloadable if the kernel doesn't have IPDIVERT
compiled in too. However this is the least disturbing behaviour. The user
can just recompile either module or the kernel to match the other one. The
access to the machine is not denied if ipfw refuses to load.
have been unified with that of msleep(9), further refine the sleepq
interface and consolidate some duplicated code:
- Move the pre-sleep checks for theaded processes into a
thread_sleep_check() function in kern_thread.c.
- Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c.
Specifically, if a thread is awakened by something other than a signal
while checking for signals before going to sleep, clear TDF_SINTR in
sleepq_catch_signals(). This removes a sched_lock lock/unlock combo in
that edge case during an interruptible sleep. Also, fix
sleepq_check_signals() to properly handle the condition if TDF_SINTR is
clear rather than requiring the callers of the sleepq API to notice
this edge case and call a non-_sig variant of sleepq_wait().
- Clarify the flags arguments to sleepq_add(), sleepq_signal() and
sleepq_broadcast() by creating an explicit submask for sleepq types.
Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of
0. Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and
move the setting of TDF_SINTR to sleepq_add() if this flag is set rather
than sleepq_catch_signals(). Note that it is the caller's responsibility
to ensure that sleepq_catch_signals() is called if and only if this flag
is passed to the preceeding sleepq_add(). Note that this also removes a
sched_lock lock/unlock pair from sleepq_catch_signals(). It also ensures
that for an interruptible sleep, TDF_SINTR is always set when
TD_ON_SLEEPQ() is true.
has only been partly initialized via newfs(8) so that it applies to both
UFS1 and UFS2.
Submitted by: "Xin LI" delphij at frontfree dot net
MFC: maybe?
lock is not held.
Rather than annotating that the lock is released after calls to
unp_detach() with a comment, annotate with an assertion.
Assert that the UNIX domain socket subsystem lock is not held when
unp_externalize() and unp_internalize() are called.
This provides greater context for the locking and allows us to avoid
locking the pcbinfo structure if not binding operations will take
place (i.e., already bound, connected, and no expliti sendto()
address).
drive is known to the configuration check also if it already has a geom.
Without this check several needless geoms are created and valid
configuration data was overwritten.
This change obsoletes the need for a separate geom to taste an
offered provider and the consumer doesn't need to be opened with the
exclusive bit set.
the driver to issue a bus reset more quickly than intended. We want to
*wait* if we find another SCB that could be the cause of this timeout,
not proceed to a bus reset.
Noticed by: kan
callers. These ioctls attempted to enable and disable the ACPI
interpreter at runtime. In practice, it is not possible to boot with
ACPI and then disable it on many systems and trying to do so can cause
crashes, interrupt storms, etc. Binary compatibility with userland is
retained.
MFC after: 2 days
ACPI_DEBUG case. Without this, use of allocated memory is unaligned and
causes a trap on ia64. Intel may fix this differently in a subsequent
release but this is adequate for now.
Submitted by: marcel
MFC after: 2 days
amd64 agp option here in order to let the pc98 kernel build
complete. This doesn't seem right, since there probably aren't
plans to build a pc98 amd64 box; however, it's not clear to me
how to get config to generate an opt_agp.h without an option
defined.
and preserves the ipfw ABI. The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.
However there are many changes how ipfw is and its add-on's are handled:
In general ipfw is now called through the PFIL_HOOKS and most associated
magic, that was in ip_input() or ip_output() previously, is now done in
ipfw_check_[in|out]() in the ipfw PFIL handler.
IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to
be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
reassembly. If not, or all fragments arrived and the packet is complete,
divert_packet is called directly. For 'tee' no reassembly attempt is made
and a copy of the packet is sent to the divert socket unmodified. The
original packet continues its way through ip_input/output().
ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet
with the new destination sockaddr_in. A check if the new destination is a
local IP address is made and the m_flags are set appropriately. ip_input()
and ip_output() have some more work to do here. For ip_input() the m_flags
are checked and a packet for us is directly sent to the 'ours' section for
further processing. Destination changes on the input path are only tagged
and the 'srcrt' flag to ip_forward() is set to disable destination checks
and ICMP replies at this stage. The tag is going to be handled on output.
ip_output() again checks for m_flags and the 'ours' tag. If found, the
packet will be dropped back to the IP netisr where it is going to be picked
up by ip_input() again and the directly sent to the 'ours' section. When
only the destination changes, the route's 'dst' is overwritten with the
new destination from the forward m_tag. Then it jumps back at the route
lookup again and skips the firewall check because it has been marked with
M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with
'option IPFIREWALL_FORWARD' to enable it.
DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for
a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will
then inject it back into ip_input/ip_output() after it has served its time.
Dummynet packets are tagged and will continue from the next rule when they
hit the ipfw PFIL handlers again after re-injection.
BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS.
More detailed changes to the code:
conf/files
Add netinet/ip_fw_pfil.c.
conf/options
Add IPFIREWALL_FORWARD option.
modules/ipfw/Makefile
Add ip_fw_pfil.c.
net/bridge.c
Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw
is still directly invoked to handle layer2 headers and packets would
get a double ipfw when run through PFIL_HOOKS as well.
netinet/ip_divert.c
Removed divert_clone() function. It is no longer used.
netinet/ip_dummynet.[ch]
Neither the route 'ro' nor the destination 'dst' need to be stored
while in dummynet transit. Structure members and associated macros
are removed.
netinet/ip_fastfwd.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.
netinet/ip_fw.h
Removed 'ro' and 'dst' from struct ip_fw_args.
netinet/ip_fw2.c
(Re)moved some global variables and the module handling.
netinet/ip_fw_pfil.c
New file containing the ipfw PFIL handlers and module initialization.
netinet/ip_input.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code. ip_forward() does not longer require
the 'next_hop' struct sockaddr_in argument. Disable early checks
if 'srcrt' is set.
netinet/ip_output.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.
netinet/ip_var.h
Add ip_reass() as general function. (Used from ipfw PFIL handlers
for IPDIVERT.)
netinet/raw_ip.c
Directly check if ipfw and dummynet control pointers are active.
netinet/tcp_input.c
Rework the 'ipfw forward' to local code to work with the new way of
forward tags.
netinet/tcp_sack.c
Remove include 'opt_ipfw.h' which is not needed here.
sys/mbuf.h
Remove m_claim_next() macro which was exclusively for ipfw 'forward'
and is no longer needed.
Approved by: re (scottl)
also generates a notify. Since we held the lock over this call, the
notify never got to run and the battery status read never returned.
Document this also.
Tested by: Maxim Maximov <mcsi_at_mcsi.pp.ru>
Approved by: re (scottl)
data packet is received from the mouse. In the case of many KVM's,
this avoids a bug in their mouse emulation that sends back incorrect
sync when you explicitly request a data packet from the mouse. Without
this change, you must force the driver into stock PS/2 mode or be flooded
with a never ending stream of "out of sync" messages on these KVMs.
Approved by: re
The ISA probe uses an identify routine to probe all slot locations from
1 to 14 that do not conflict with other allocated resources. This required
making aic7770.c part of the driver core when compiled as a module.
aic7xxx.c:
aic79xx.c:
aic_osm_lib.c:
Use aic_scb_timer_start() consistently to start the watchdog timer.
This removes a few places that verbatum copied the code in
aic_scb_timer_start().
During recovery processing, allow commands to still be queued to
the controller. The only requirement we have is that our recovery
command be queued first - something the code already guaranteed.
The only other change required to make this work is to prevent
timers from being started for these newly queued commands.
Approved by: re
- Add some commented out NICs from i386 GENERIC. Most of them look like they
would work but I'm not sure if they are endian-clean and can't test. There
was a report that sk(4) works on sparc64 but it doesn't look like it would
because it doesn't use busdma.
- Improve some of the descriptions of sparc64 specific devices.
There's no functional change, i.e. no added or deleted uncommented devices or
options, in this commit.
- Chase the split of pcm(4). This unbreaks LINT compiles.
- sc(4) basically works and a lot of its options should be supported.
- Add the creator and ofw_console drivers.
- vinum(4) should work, at least its module was turned on for sparc64 a while
ago.
- Don't build sio(4). Its EBus front-end was removed a while ago and the ISA
one hardly works. Use uart(4) instead, it's not perfect yet but works much
better.
only required to support probing of the Adaptec 284X VLB SCSI controller
which becomes visible in EISA space if you perform these writes. 284X
probing is moving to an ISA attachment.
parent rather than track resources locally. The original code
was incomplete in that it would only honor requests for resources
that already exist in its resource list. This prevented many ISA
identify routines from allocating temporary resources. Passing
the requests up to legacy's parent losing no functionality and
allows these requests to succeed.
Reviewed by: imp, jhb
Approved by: RE
have been rush hour...
While here, move COMPAT_IA32 from opt_global.h to opt_compat.h like on
amd64. Consequently, it's unsafe to use the option in pcb.h. We now
unconditionally have the ia32 specific registers in the PCB.
This commit is untested.
This was tested with a Netgear WG311v2 802.11b/g PCI card. Things
that were fixed:
- This chip has two memory mapped regions, one at PCIR_BAR(0) and the
other at PCIR_BAR(1). This is a little different from the other
chips I've seen with two PCI shared memory regions, since they tend
to have the second BAR ad PCIR_BAR(2). if_ndis_pci.c tests explicitly
for PCIR_BAR(2). This has been changed to simply fill in ndis_res_mem
first and ndis_res_altmem second, if a second shared memory range
exists. Given that NDIS drivers seem to scan for BARs in ascending
order, I think this should be ok.
- Fixed the code that tries to process firmware images that have been
loaded as .ko files. To save a step, I was setting up the address
mapping in ndis_open_file(), but ndis_map_file() flags pre-existing
mappings as an error (to avoid duplicate mappings). Changed this so
that the mapping is now donw in ndis_map_file() as expected.
- Made the typedef for 'driver_entry' explicitly include __stdcall
to silence gcc warning in ndis_load_driver().
NOTE: the Texas Instruments ACX111 driver needs firmware. With my
card, there were 3 .bin files shipped with the driver. You must
either put these files in /compat/ndis or convert them with
ndiscvt -f and kldload them so the driver can use them. Without
the firmware image, the NIC won't work.
- Trailing tab/space cleanup
- Remove spurious spaces between or before tabs
This change avoids touching files that Andre likely has in his working
set for PFIL hooks changes for IPFW/DUMMYNET.
Approved by: re (scottl)
Submitted by: Xin LI <delphij@frontfree.net>
prodstr may be NULL when fetched. For the default device description,
guard against this and return the numeric IDs instead when this
happens. For the matching routines, and consider NULL to not match
those entries that aren't NULL w/o calling strcmp.
Early patches by: Anders Hanssen
and can lead to two threads being granted exclusive access. Check that no one
has the same lock in exclusive mode before proceeding to acquire it.
The LK_WANT_EXCL and LK_WANT_UPGRADE bits act as mini-locks and can block
other threads. Normally this is not a problem since the mini locks are
upgraded to full locks and the release of the locks will unblock the other
threads. However if a thread reset the bits without obtaining a full lock
other threads are not awoken. Add missing wakeups for these cases.
PR: kern/69964
Submitted by: Stephan Uphoff <ups at tree dot com>
Very good catch by: Stephan Uphoff <ups at tree dot com>
address of the dirhash, rather than the first sizeof(struct dirhash
*) bytes of the structure (which, thankfully, seem to be constant).
Submitted by: Ted Unangst <tedu@zeitbombe.org>
MFC after: 2 weeks
- include <machine/../linux32/linux.h> instead of <machine/../linux/linux.h>
if building with the COMPAT_LINUX32 option.
- make minimal changes to the i386 linprocfs_docpuinfo() function to support
amd64. We return a fake CPU family of 6 for now.
with the COMPAT_LINUX32 option. This is largely based on the i386 MD Linux
emulations bits, but also builds on the 32-bit FreeBSD and generic IA-32
binary emulation work.
Some of this is still a little rough around the edges, and will need to be
revisited before 32-bit and 64-bit Linux emulation support can coexist in
the same kernel.
on AMD64, and the general case where the emulated platform has different
size pointers than we use natively:
- declare certain structure members as l_uintptr_t and use the new PTRIN
and PTROUT macros to convert to and from native pointers.
- declare some structures __packed on amd64 when the layout would differ
from that used on i386.
- include <machine/../linux32/linux.h> instead of <machine/../linux/linux.h>
if compiling with COMPAT_LINUX32. This will need to be revisited before
32-bit and 64-bit Linux emulation support can coexist in the same kernel.
- other small scattered changes.
This should be a no-op on i386 and Alpha.
"debug.mpsafevm" results in (almost) Giant-free execution of zero-fill
page faults. (Giant is held only briefly, just long enough to determine
if there is a vnode backing the faulting address.)
Also, condition the acquisition and release of Giant around calls to
pmap_remove() on "debug.mpsafevm".
The effect on performance is significant. On my dual Opteron, I see a
3.6% reduction in "buildworld" time.
- Use atomic operations to update several counters in vm_fault().
before dereferencing sotounpcb() and checking its value, as so_pcb
is protected by protocol locking, not subsystem locking. This
prevents races during close() by one thread and use of ths socket
in another.
unp_bind() now assert the UNP lock, and uipc_bind() now acquires
the lock around calls to unp_bind().
wait for system wires to disappear, do so (much more trivially) by
instead only checking for system wires of user maps and not kernel maps.
Alternative by: tor
Reviewed by: alc
- pipespace is now able to resize non-empty pipes; this allows
for many more resizing opportunities
- Backing is no longer pre-allocated for the reverse direction
of pipes. This direction is rarely (if ever) used, so this cuts the
amount of map space allocated to a pipe in half.
- Pipe growth is now much more dynamic; a pipe will now grow when
the total amount of data it contains and the size of the write are
larger than the size of pipe. Previously, only individual writes greater
than the size of the pipe would cause growth.
- In low memory situations, pipes will now shrink during both read
and write operations, where possible. Once the memory shortage
ends, the growth code will cause these pipes to grow back to an appropriate
size.
- If the full PIPE_SIZE allocation fails when a new pipe is created, the
allocation will be retried with SMALL_PIPE_SIZE. This helps to deal
with the situation of a fragmented map after a low memory period has
ended.
- Minor documentation + code changes to support the above.
In total, these changes increase the total number of pipes that
can be allocated simultaneously, drastically reducing the chances that
pipe allocation will fail.
Performance appears unchanged due to dynamic resizing.
for EBus, ISA and PCI, by compiling ofw_isa.c and ofw_pci_if.m unconditio-
nally. The correct way is to rewrite OF_decode_addr() in ofw_machdep.c in
a bus-neutral way. That's certainly possible but we unfortunately didn't
make it for FreeBSD 5.3.
Approved by: tmm
contained "sanity" checks that could be violated if another CPU modified
the pmap between the emulation trap and locking the pmap in
pmap_emulate_reference(). As a result, the pte could be inconsistent
with the access that caused the emulation trap. In such cases,
pmap_emulate_reference() now flushes the current CPU's TLB entry and
returns.
- Make pmap_changebit() an inline function, reducing object code size.
Don't count busy buffers before the initial call to sync() and
don't skip the initial sync() if no busy buffers were called.
Always call sync() at least once if syncing is requested. This
defers the "Syncing disks, buffers remaining..." message until
after the initial sync() call and the first count of busy
buffers. This backs out changes in kern_shutdown 1.162.
Print a different message when there are no busy buffers after the
initial sync(), which is now the expected situation.
Print an additional message when syncing has completed successfully
in the unusual situation where the work of syncing was done by
boot().
Uppercase one message to make it consistent with all of the other
kernel shutdown messages.
Discussed with: bde (in a much earlier form, prior to 1.162)
Reviewed by: njl (in an earlier form)
logical CPUs on a system to be used as a dedicated watchdog to cause a
drop to the debugger and/or generate an NMI to the boot processor if
the kernel ceases to respond. A sysctl enables the watchdog running
out of the processor's idle thread; a callout is launched to reset a
timer in the watchdog. If the callout fails to reset the timer for ten
seconds, the watchdog will fire. The sysctl allows you to select which
CPU will run the watchdog.
A sample "debug.leak_schedlock" is included, which causes a sysctl to
spin holding sched_lock in order to trigger the watchdog. On my Xeons,
the watchdog is able to detect this failure mode and break into the
debugger, which cannot otherwise be done without an NMI button.
This option does not currently work with sched_ule due to ule's push
notion of scheduling, similar to machdep.hlt_logical_cpus failing to
work with that scheduler.
On face value, this might seem somewhat inefficient, but there are a
lot of dual-processor Xeons with HTT around, so using one as a watchdog
for testing is not as inefficient as one might fear.
of PS_STRINGS. This is a no-op at present, but it will be needed when
running 32-bit Linux binaries on amd64 to ensure PS_STRINGS is in
addressable memory.
Without this, the device cannot detect the end of ethernet packets
whose size is a multiple of the USB packat size.
PR: kern/70474
Submitted by: Andrew Thompson <andy@fud.org.nz>
MFC after: 1 week
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.
Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).
Reviewed by: green, rwatson (both earlier versions)
attempt to IPI other cpus when entering the debugger in order to stop
them while in the debugger. The default remains to issue the stop;
however, that can result in a hang if another cpu has interrupts disabled
and is spinning, since the IPI won't be received and the KDB will wait
indefinitely. We probably need to add a timeout, but this is a useful
stopgap in the mean time.
Reviewed by: marcel
and that can be used as an identify function for all kinds of busses on a
certain platform. Expect for sparc64 these are only stubs right now. [1]
- For sparc64, add code to its uart_cpu_identify() for registering the on-
board ISA UARTs and their resources based on information obtained from
Open Firmware.
It would be better if this would be done in the OFW ISA code. However, due
to the common FreeBSD ISA code and PNP-IDs not always being present in the
properties of the ISA nodes there seems to be no good way to implement that.
Therefore special casing UARTs as the sole really relevant ISA devices on
sparc64 seemed reasonable. [2]
Approved by: marcel
Discussed with: marcel [1], tmm [2]
Tested by: make universe
without Open Firmware directly instead of using OF_getetheraddr(). This is
a bit painful though, as the MAC address is contained in the NA field of
the VPD of the EBus bridge, which is is another function of the same chip.
To make it worse, the VPD of the EBus bridge can't be accessed via the PCI
capability pointer but has to be digged out from the Boot PROM and has a
non-standard format.
The PCI VPD struct and macros used here should be part of the FreeBSD PCI
code nevertheless.
Approved by: tmm
Based on: NetBSD
Tested with: Sun X1032A (hme(4)-isp(4)-combo card) on alpha and i386
o reprobe children when a new driver is added to uhub
o fix the usbd_probe_and_attach to set the ivars to a malloc'd area, as well
as freeing the ivars on child destruction.
o Don't delete children that don't attach. Evidentally, the need to do this
is a common misconception.
o minor formatting foo that may violate style(9) at the moment, but keeps the
diffs against my p4 tree smaller.
This does not solve the ugen gobbling things up problem, but the fixes
I have for that expose bugs in other parts of the tree...
variable. If set to "true" OF_getetheraddr() will now return the unique
MAC address stored in the "local-mac-address" property of the device's
OFW node if present and the host address/system default MAC address if
the node doesn't doesn't have such a property. If set to "false" the
host address will be returned for all devices like before this change.
This brings the behaviour of device drivers for NICs with OFW support/
FCode, i.e. dc(4) for on-board DM9102A on Sun machines, gem(4) and hme(4),
regarding "local-mac-address?" in line with NetBSD and Solaris.
The man pages of the respective drivers will be updated separately to
reflect this change.
- Remove OF_getetheraddr2() which was used as a stopgap in dc(4). Its
functionality is now part of OF_getetheraddr().
threads consuming the result of pfind() will not need to check for a NULL
credential pointer or other signs of an incompletely created process.
However, this also means that pfind() cannot be used to test for the
existence or find such a process. Annotate pfind() to indicate that this
is the case. A review of curent consumers seems to indicate that this is
not a problem for any of them. This closes a number of race conditions
that could result in NULL pointer dereferences and related failure modes.
Other related races continue to exist, especially during iteration of the
allproc list without due caution.
Discussed with: tjr, green
have already done this, so I have styled the patch on their work:
1) introduce a ip_newid() static inline function that checks
the sysctl and then decides if it should return a sequential
or random IP ID.
2) named the sysctl net.inet.ip.random_id
3) IPv6 flow IDs and fragment IDs are now always random.
Flow IDs and frag IDs are significantly less common in the
IPv6 world (ie. rarely generated per-packet), so there should
be smaller performance concerns.
The sysctl defaults to 0 (sequential IP IDs).
Reviewed by: andre, silby, mlaier, ume
Based on: NetBSD
MFC after: 2 months
connect to, re-check that the local UNIX domain socket hasn't been
closed while we slept, and if so, return EINVAL. This affects the
system running both with and without Giant over the network stack,
and recent ULE changes appear to cause it to trigger more
frequently than previously under load. While here, improve catching
of possibly closed UNIX domain sockets in one or two additional
circumstances. I have a much larger set of related changes in
Perforce, but they require more testing before they can be merged.
One debugging printf is left in place to indicate when such a race
takes place: this is typically triggered by a buggy application
that simultaenously connect()'s and close()'s a UNIX domain socket
file descriptor. I'll remove this at some point in the future, but
am interested in seeing how frequently this is reported. In the
case of Martin's reported problem, it appears to be a result of a
non-thread safe syslog() implementation in the C library, which
does not synchronize access to its logging file descriptor.
Reported by: mbr
the interface as IFF_NEEDSGIANT so if_start is run holding Giant.
Note: mutexes are initialized in the softc for this driver, but the
locking appears inadequate to allow Giant-free operation.
the interface as IFF_NEEDSGIANT so if_start is run holding Giant.
Note: this driver does declare and occasionally reference mutexes,
but I believe not nearly enough to provide safety.
code was adjusting twice for the instruction pointer indicating
the *next* instruction to execute. The aic79xx driver had a similar
bug, but was fixed some time ago.
check whether p_ucred is NULL or not in pfs_getattr() before
dereferencing the credential, and return ENOENT if there wasn't one.
This is a symptom of a larger problem, wherein pfind() can return
references to incompletely initialized processes, and we instead ought
to not return them, or check the process state before acting on the
process.
Reported by: kris
Discussed with: tjr, others
algorithm built into the map entry splay tree. This replaces the
first_free hint in struct vm_map with two fields in vm_map_entry:
adj_free, the amount of free space following a map entry, and
max_free, the maximum amount of free space in the entry's subtree.
These fields make it possible to find a first-fit free region of a
given size in one pass down the tree, so O(log n) amortized using
splay trees.
This significantly reduces the overhead in vm_map_findspace() for
applications that mmap() many hundreds or thousands of regions, and
has a negligible slowdown (0.1%) on buildworld. See, for example, the
discussion of a micro-benchmark titled "Some mmap observations
compared to Linux 2.6/OpenBSD" on -hackers in late October 2003.
OpenBSD adopted this approach in March 2002, and NetBSD added it in
November 2003, both with Red-Black trees.
Submitted by: Mark W. Krentel
* Serialize access to the sysctl routines and the notify handler
* Assert that the sx lock is held in any functions they call.
* Note that recursively calling to re-enable the hotkeys is sub-optimal.
* Remove the interrupt wrapper that locked Giant and call the handler
directly. Mark the handler as MPSAFE.
* Don't attempt to detect if a handler is installed. Leave that to the
bus_alloc_resource() function.
* Serialize operations in acpi_video_bind_outputs(), acpi_video_detach(),
acpi_video_notify_handler(), acpi_video_power_profile(), and the sysctls.
The main goal is to protect the shared device list and prevent conflicting
settings.
* Add assertions that the sx lock is held in the leaf functions.
* Restructure the event handling path. acpi_tz_thread() now calls
acpi_tz_timeout() any time an event occurs. acpi_tz_timeout() checks
the flags and calls acpi_tz_power_profile(), acpi_tz_establish(), and
acpi_tz_monitor() as appropriate. Notifies only do a wakeup and let
acpi_tz_thread() do the actual work. This path is cleaner and allows
locking since the call path is now always a D.A.G.
* Add the acpi_tz_signal() function to set flags and wake the thread.
* Remove the tz_tmp_updating flag since calls are serialized by
acpi_tz_thread().
* Remove Giant locking.
* Serialize acpi_pwr_switch_consumer() and acpi_pwr_wake_enable().
* Make acpi_pwr_switch_consumer() have a single exit point.
* Add assertions to the leaf functions they call.
* Fix a memory leak in acpi_pwr_deregister_consumer(). However, it is
currently ifdefed out so this code was unused.
* Serialize access to acpi_pci_link_config(), acpi_pci_find_prt(),
acpi_pci_link_route(), and acpi_pci_link_resume().
* Add lock assertions to all functions called by them.
* Serialize notifying the user in acpi_lid_notify_status_changed(). This
way multiple lid events occur in order.
* Add an initialization pass to get the lid status at boot-time. This
pass does not notify any apps but gets the initial status.
* Use the common serialization macros instead of rolling our own.
* Increase the coverage of the lock in EcSpaceHandler() to cover the entire
loop to avoid dropping the lock when reading more than one byte.
* Serialize ops in acpi_cmbat_notify_handler(), acpi_cmbat_ioctl(),
acpi_cmbat_init_battery(), and acpi_cmbat_get_battinfo().
* Get the softc directly in acpi_cmbat_get_total_battinfo() rather than
build an array of them.
* Don't queue a _BIF query after receiving a notify. Since we clear the
timespec, a _BIF query will be done in the context of the next caller.
* Add asserts to leaf functions that operate on shared data.
* Remove the bst/bif updating flags now that we hold the lock over the
full query.
* Explain various comments in more detail.
* Serialize acpi_battery_get_battdesc(), acpi_battery_register(), and
acpi_battery_remove().
* Assert that the sx lock is held in acpi_batteries_init().
* Remove check for device_get_softc() returning NULL.
* Serialize notification of acline changes in acpi_acad_get_status().
* Remove the initializing flag. With the locking, we don't need to
push off requests for the acline before initialization is done.
* Don't check device_get_softc(), it can't return NULL.
* Serialize calls to acpi_alloc_resource(), acpi_release_resource(),
acpi_Enable(), acpi_Disable(), and acpi_debug_sysctl().
* Acquire the ACPI mutex in acpi_register_ioctl(), acpi_deregister_ioctl(),
and acpiioctl().
* Acquire the mutex while disabling subsequent requests to enter a
sleep state in acpi_SetSleepState().
* Be sure to re-enable sleep requests and don't run resume methods when
the current request fails.
* Don't check if sleep requests are disabled in the ACPIIO_SETSLPSTATE
ioctl. acpi_SetSleepState() does this for us.
* Remove the acquisition of Giant from the struct cdevsw.
* Remove the ACPI_USE_THREADS option.
* Add and comment our locking primitives. The mutex primitives use a
a static mutex and the serialization ones use a static sx lock. A global
acpi_mutex is used for access to global resources (i.e., writes to the
SMI_CMD register.)
* Remove 4.x compat defines.
Since the only thing truly unique about a prison is it's ID, I figured
this would be the most granular way of handling this.
This commit makes the following changes:
- Adds tokenizing and parsing for the ``jail'' command line option
to the ipfw(8) userspace utility.
- Append the ipfw opcode list with O_JAIL.
- While Iam here, add a comment informing others that if they
want to add additional opcodes, they should append them to the end
of the list to avoid ABI breakage.
- Add ``fw_prid'' to the ipfw ucred cache structure.
- When initializing ucred cache, if the process is jailed,
set fw_prid to the prison ID, otherwise set it to -1.
- Update man page to reflect these changes.
This change was a strong motivator behind the ucred caching
mechanism in ipfw.
A sample usage of this new functionality could be:
ipfw add count ip from any to any jail 2
It should be noted that because ucred based constraints
are only implemented for TCP and UDP packets, the same
applies for jail associations.
Conceptual head nod by: pjd
Reviewed by: rwatson
Approved by: bmilekic (mentor)
to avoid later changes before pmap_enter() and vm_fault_prefault()
has completed.
Simplify deadlock avoidance by not blocking on vm map relookup.
In collaboration with: alc
subset ("compatible", "device_type", "model" and "name") of the standard
properties in drivers for devices on Open Firmware supported busses. The
standard properties "reg", "interrupts" und "address" are not covered by
this interface because they are only of interest in the respective bridge
code. There's a remaining standard property "status" which is unclear how
to support properly but which also isn't used in FreeBSD at present.
This ofw_bus kobj-interface allows to replace the various (ebus_get_node(),
ofw_pci_get_node(), etc.) and partially inconsistent (central_get_type()
vs. sbus_get_device_type(), etc.) existing IVAR ones with a common one.
This in turn allows to simplify and remove code-duplication in drivers for
devices that can hang off of more than one OFW supported bus.
- Convert the sparc64 Central, EBus, FHC, PCI and SBus bus drivers and the
drivers for their children to use the ofw_bus kobj-interface. The IVAR-
interfaces of the Central, EBus and FHC are entirely replaced by this. The
PCI bus driver used its own kobj-interface and now also uses the ofw_bus
one. The IVARs special to the SBus, e.g. for retrieving the burst size,
remain.
Beware: this causes an ABI-breakage for modules of drivers which used the
IVAR-interfaces, i.e. esp(4), hme(4), isp(4) and uart(4), which need to be
recompiled.
The style-inconsistencies introduced in some of the bus drivers will be
fixed by tmm@ in a generic clean-up of the respective drivers later (he
requested to add the changes in the "new" style).
- Convert the powerpc MacIO bus driver and the drivers for its children to
use the ofw_bus kobj-interface. This invloves removing the IVARs related
to the "reg" property which were unused and a leftover from the NetBSD
origini of the code. There's no ABI-breakage caused by this because none
of these driver are currently built as modules.
There are other powerpc bus drivers which can be converted to the ofw_bus
kobj-interface, e.g. the PCI bus driver, which should be done together
with converting powerpc to use the OFW PCI code from sparc64.
- Make the SBus and FHC front-end of zs(4) and the sparc64 eeprom(4) take
advantage of the ofw_bus kobj-interface and simplify them a bit.
Reviewed by: grehan, tmm
Approved by: re (scottl)
Discussed with: tmm
Tested with: Sun AX1105, AXe, Ultra 2, Ultra 60; PPC cross-build on i386
pf_cksum_fixup() was called without last argument from
normalization, also fixup checksum when random-id modifies ip_id.
This would previously lead to incorrect checksums for packets
modified by scrub random-id.
(Originally) Submitted by: yongari
The first one was going to 'dropfrag', which unlocks the IPQ, before the lock
was aquired; The second one doing a unlock and then a 'goto dropfrag' which
led to a double-unlock.
Tripped over by: des
migration. Use this in sched_prio() and sched_switch() to stop us from
migrating threads that are in short term sleeps or are runnable. These
extra migrations were added in the patches to support KSE.
- Only set NEEDRESCHED if the thread we're adding in sched_add() is a
lower priority and is being placed on the current queue.
- Fix some minor whitespace problems.
there is no irq link. Since we now use the stored copy of PRT, not the
one that used to be passed into acpi_pcib_route_interrupt(), we need it in
the list. [1]
Fix a bug in acpi_pci_find_prt() where we weren't checking the bus, thus
choosing the wrong PRT entry to use for routing the link. Also, add a
printf for the case where the PRT entry is not found as this should not
happen.
Tested by: marcel [1]
- Remove kern.geom.mirror.sync_block_size sysctl. It is quite obvious that we
want to use the biggest size possible.
- Do not use UMA zone for sync data allocations. There could be only one
synchronization request per synchronized disk at a time, so allocate memory
for one request on whole synchronization process related to one disk.
Tested by synchronizing one component (out of three) and by synchronizing
two components (out of three) in parallel.
- Remove __RMAN_RESORUCE_VISIBLE again. It's no longer required either
because of the above change or because struct rman is no longer hidden.
Reviewed by: grehan
Tested by: cross-compile on i386
for structures with timers in them. It might be that a timer might fire
even when the associated structure has already been free'd. Having type-
stable storage in this case is beneficial for graceful failure handling and
debugging.
Discussed with: bosko, tegge, rwatson
called "rtentry".
This saves a considerable amount of kernel memory. R_Zmalloc previously
used 256 byte blocks (plus kmalloc overhead) whereas UMA only needs 132
bytes.
Idea from: OpenBSD
incomplete in that the PRT routing was not aware of link programming.
Fix this by doing all routing through the link devices. The new algorithm
for setting up links is:
1. Read _CRS to get current setting. If invalid (not in _PRS), then set
to 0.
2. Attempt to call _DIS on the link. If successful, mark the link as not
routed. Otherwise, assume it still is.
Then when a routing request occurs:
3. Update weights for all IRQs
4. Attempt to route the initial IRQ if valid
5. If that fails, walk through the sorted list, attempting to route IRQs.
6. Configure the trigger/polarity based on _PRS.
Other changes:
* Add acpi_pci_find_prt() to look up the PRT entry for a given device and
acpi_pci_link_route() to select/route the best IRQ for it.
* Remove duplicated code in acpi_pcib_route_interrupt() that picked the
first IRQ from _PRS.
* Remove unneeded arguments from acpi_pcib_resume() and friends.
* Ignore _STA on link devices but report if it seems strange.
* Add a prt_source handle to the PRT structure since the ACPI struct
ACPI_PCI_ROUTING_TABLE uses a fixed-size entry for it. We'll need to
dynamically size this object if we want to use it the same way ACPI-CA
does. Null-terminate the source.
Tested by: Luo Hong <luohong99_at_mails.tsinghua.edu.cn>,
Jeffrey Katcher <jmkatcher_at_yahoo.com>
Info from: jhb, Len Brown (Intel)
and bio_inbed fields to 0. Without this change we can end up with
I/O leakage in some rare situations.
I tested this change by putting failure probability mechanism simlar
to this used in NOP class into g_clone_bio(9) function, so it was
able to return NULL with the given probability.
Discussed with: phk
we update the registers. That way we don't have any dirty registers to
worry about and also know that bsp=bspstore, which makes updating the
RSE related registers predictable.
This is not the end of it. We need more validity checks, but for now
this allows us to complete the gdb testsuite without crashing the
kernel.
if_start routines cannot currently be entered without Giant. When
the kernel is running with debug.mpsafenet != 0, this will defer
if_start execution to a task queue thread holding Giant, which may
introduce additional latency, but avoid incorrect execution.
Suggested by: dfr
full, avoiding the cost of mutex operations if it is. We re-test
once the mutex is acquired to make sure it's still true before doing
the -modify-write part of the read-modify-write. Note that due to
the maximum fifo depth being pretty deep, this is unlikely to improve
harvesting performance yet.
Approved by: markm
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.
Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc
a standard configuration similar to [NO_]ADAPTIVE_MUTEXES. This
feature causes Giant to be included in the set of mutexes adaptively
spun on. It appears to have a positive effect on performance on SMP
across several workloads, including measurements of a 16% improvement
on buildworld, and 30%+ improvement for MySQL using the supersmack
benchmark with Giant over the network stack; a 6% improvement without
Giant on the network stack (as a result of less giant contention).
we may sleep when doing so; check that we didn't race with another thread
allocating storage for the vnode after allocation is made to a local
pointer, and only update the vnode pointer if it's still NULL. Otherwise,
accept that another thread got there first, and release the local storage.
Discussed with: jmg
Implement the protection check required by the pmap_extract_and_hold()
specification.
Remove the acquisition and release of Giant from pmap_extract_and_hold() and
pmap_protect().
Many thanks to Ken Smith for resolving a sparc64-specific initialization
problem in my original patch.
Tested by: kensmith@
bio_driver1 (as all the rest).
This introduced a small memory leak, but it wasn't really critical,
because maximum memory for g_stripe_zone is always set, so after few
requests gstripe was working in "economic" mode.
are resevered, they can be written with anything, but they always read
as zero, we should simulate it in set_regs() as we are reading/writting
real hardware %rflags register.
contributed to the transferable load count. This prevents any potential
problems with sched_pin() being used around calls to setrunqueue().
- Change the sched_add() load balancing algorithm to try to migrate on
wakeup. This attempts to place threads that communicate with each other
on the same CPU.
- Don't clear the idle counts in kseq_transfer(), let the cpus do that when
they call sched_add() from kseq_assign().
- Correct a few out of date comments.
- Make sure the ke_cpu field is correct when we preempt.
- Call kseq_assign() from sched_clock() to catch any assignments that were
done without IPI. Presently all assignments are done with an IPI, but I'm
trying a patch that limits that.
- Don't migrate a thread if it is still runnable in sched_add(). Previously,
this could only happen for KSE threads, but due to changes to
sched_switch() all threads went through this path.
- Remove some code that was added with preemption but is not necessary.
umich copyright is asserting.
Clarify that the copyright I'm asserting is the standard Berkeley
license.
Remove Giant assertions from AARP and DDP input routines.
is here so that we can gather stats on the nature of the recent rash of
hard lockups, and in this particular case panic the machine instead of
letting it deadlock forever.
becauses some syscalls using set_mcontext can sneakily change
parameters and later when those syscalls references parameters,
they will wrongly use register values in mcontext_t.
Approved by: peter
chipsets, based on Linux's via-agp.c. On boot, the system selects which AGP
version to use based on the inserted card. If v2 was chosen, the chipset
needs to be programmed with the v2 registers still. Also included in kern/69953
are changes to make the programming of the v3 registers match linux, but that
will be left out until the need to do so is confirmed (want specs or a tester).
PR: kern/69953
Submitted by: Oleg Sharoiko <os@rsu.ru>
Tested by: Oleg Sharoiko <os@rsu.ru>, Geoff Speicher <geoff@speicher.org>
(full version from PR)
The hardware always gives read access for privilege level 0, which
means that we cannot use the hardware access rights and privilege
level in the PTE to test whether there's a change in protection. So,
we save the original vm_prot_t in the PTE as well.
Add pmap_pte_prot() to set the proper access rights and privilege
level on the PTE given a pmap and the requested protection.
The above allows us to compare the protection in pmap_extract_and_hold()
which was missing. While in pmap_extract_and_hold(), add pmap locking.
While here, clean up most (i.e. all but one) PTE macros we inherited
from alpha. They were either unused, used inconsistently, badly named
or simply weren't beneficial. We save the wired and managed state of
the PTE in distinct (bit) fields.
While in pte.h, s/u_int64_t/uint64_t/g
pmap locking obtained from: alc@
feedback & review by: alc@
* Allow no-fault wiring/unwiring to succeed for consistency;
however, the wired count remains at zero, so it's a special case.
* Fix issues inside vm_map_wire() and vm_map_unwire() where the
exact state of user wiring (one or zero) and system wiring
(zero or more) could be confused; for example, system unwiring
could succeed in removing a user wire, instead of being an
error.
* Require all mappings to be unwired before they are deleted.
When VM space is still wired upon deletion, it will be waited
upon for the following unwire. This makes vslock(9) work
rather than allowing kernel-locked memory to be deleted
out from underneath of its consumer as it would before.
1. Move a comment to its proper place, updating it. (Except for white-
space, this comment had been unchanged since revision 1.1!)
2. Remove spl calls.
fix the obvious bugs, nastier ones reside below the surfac), and having
it commented out here just encourages people to try it.
# I'm not removing it from the base system, yet.
For incoming packets, the packet's source address is checked if it
belongs to a directly connected network. If the network is directly
connected, then the interface the packet came on in is compared to
the interface the network is connected to. When incoming interface
and directly connected interface are not the same, the packet does
not match.
Usage example:
ipfw add deny ip from any to any not antispoof in
Manpage education by: ru
It allows to fix problems when last provider's sector is shared between few
providers.
- Bump version number for CONCAT and STRIPE and add code for backward
compatibility.
- Do not bump version number of MIRROR, as it wasn't officially introduced yet.
Even if someone started to play with it, there is no big deal, because
wrong MD5 sum of metadata will deny those providers.
- Update manual pages.
- Add version history to g_(stripe|concat).h files.
thread, after the bound thread leaves critical region, the thread should
check debug flag may suspend itself by using the command.
2.Schedule upcall after thread is suspended by debugger
3.Wakeup upcall thread after process suspension.
Reviewed by: deischen
understood. This makes room for additional binary compatibility in the
future.
Put fields in the class for the geom's methods and initialize the methods
of a new geom from these fields. This saves some code in all classes.
o Change the motion calculation to result in
a more reasonable speed of motion
This should fix the 'aiming' problems people have reported. It also
mitigates (but doesn't completely solve) the 'stalling' problems at
very low speeds.
Tested by: many subscribers to -current
Approved by: njl
o Catch 'taps' as button presses
o One finger sends button1, two fingers send button3,
three fingers send button2 (double-click)
Tested by: many subscribers to -current
Approved by: njl
o Handle the 'up/down' buttons some touchpads have as
a z-axis (scrollwheel) as recommended by the specs
o Report the buttons as button4 and button5 instead
of button2 and button4, button2 can be emulated by
pressing button1 and button3 simultaneously. This
allows one to use the two extra buttons for other
purposes if one so desires.
Tested by: many subscribers to -current
Approved by: njl
o Clean up whitespace and comments in the
enable_synaptics() probing function
o Only use (and rely on) the extended capability
bits when we are told they actually exist
o Partly ignore the (possibly dated?) part of the
specification about the mode byte so that we
can support 'guest devices' too.
Tested by: many subscribers to -current
Approved by: njl
path. The basic problem is that we cannot set the single stepping flag
directly, because we don't leave the kernel via an interrupt return. So,
we need another way to set the single stepping flag.
The way we do this is by enabling the lower-privilege transfer trap, which
gets raised when we drop the privilege level. However, since we're still
running in kernel space (sec), we're not yet done. We clear the lower-
privilege transfer trap, enable the taken-branch trap and continue exiting
the kernel until we branch into user space.
Given the current code, there's a total of two traps this way before
we can raise SIGTRAP.
after a fork(2) in fork_trampoline(). By moving the epc_syscall_return
label immediately before the call to do_ast() in epc_syscall(), we not
only achieve that but also handle the detour through exception_return
when the frame corresponds to an asynchronous kernel entry. Hence, we
simplified fork_trampoline() as a side-effect.
related to breakpoints and single stepping into SIGTRAP so gdb(1) knows
why the remote target has stopped. In particular, gdb(1) needs to know
if the reason is something of its own doing.
catch leaking into VFS without Giant.
Inch Giant a little lower in several file descriptor operations on
vnodes to cover only VFS operations that need it, rather than file
flag reading, etc.
was being unconditionally dereferenced but was NULL for PIO requests.
Check the request flags for a DMA transaction before dereferencing.
Reported by: ceri
Tested by: Radek Kozlowski <radek -at- raadradd.com>
fcntl() operations, including:
F_DUPFD dup() alias
F_GETFD retrieve close-on-exec flag
F_SETFD set close-on-exec flag
F_GETFL retrieve file descriptor flags
For the remaining fcntl() operations, do acquire Giant, especially
where we call into fo_ioctl() as a result. We're not yet ready to
push Giant into fo_ioctl(). Once we do, this can all become quite a
bit prettier.
calls. Note that the information included is a bit different from the
existing KTR traces generated on powerpc, as I'm primarily interested
in kernel context (thread, syscall #, proc, etc), not the user
arguments to the system call. Some convergence would be useful here.
with it that need to be understood better before they can be resolved.
This takes time and time is already in short supply.
Reported & tested by: glebius@
will prepend the current kernel booting... This prevents a problem of
loading /boot/kernel's modules when a different kernel has no modules,
but you left your module_load="YES" in loader.conf...
Reviewed by: dcs (minus the help part)
something goes wrong while running in "fast" mode, we free all bios and
falling back to "economic" mode. Freeing bios, doesn't mean decrease
bio_children, so bio_inbed couldn't be equal to bio_children and request
was never finished.
Decrease bio_children manually when destroying bios.
Reported by: Sam Lawrance <boris@brooknet.com.au>, simon
message if they are incorrect. Also, remove the hack of allowing the
initial irq setting to not be in _PRS. As before, the old behavior can be
regained by defining ACPI_OLD_PCI_LINK.
structures, allowing in6_pcbnotify() to lock the pcbinfo and each
inpcb that it notifies of ICMPv6 events. This prevents inpcb
assertions from firing when IPv6 generates and delievers event
notifications for inpcbs.
Reported by: kuriyama
Tested by: kuriyama
a result of scheduling an ithread, cut a KTR_INTR trace record so
that it's clear in tracing interrupt activity where and when the
entropy harvesting code is invoked.
callout_reset rather than calling callout_stop. This results in a few
lines of code duplication, but it provides a significant performance
improvement because it avoids recursing on callout_lock.
Requested by: rwatson
or multicast packet, we don't need to acquire the inpcb mutex
unless we are actually using inpcb fields other than the bound port
and address. Since we hold the pcbinfo lock already, these can't
change. Defer acquiring the inpcb mutex until we have a high
chance of a match. This avoids about 120 mutex operations per UDP
broadcast packet received on one of my work systems.
Reviewed by: sam
want a splash screen.
There seems to be some confusion in the syscons code as to the meaning of
the SC_KERNEL_CONSOLE flag. Its absence is sometimes interpreted to mean
"I am not the system console", and sometimes to mean "I am not the only
VGA console" (see the font loading code for an example of the latter).
Someone with better syscons fu than myself should take a closer look.
(that does not compile with !gcc). Moreover we get the benefit for all archs
that have a hand optimized in_cksum_skip().
Submitted by: yongari
Tested by: me (i386, extensivly), pf4freebsd ML (various)
better check for 'adjacent'. The old code assumed that if two resources
were adjacent in the linked list that they were also adjacent range wise.
This is not true when a resource manager has to manage disparate regions.
For example, the current interrupt code on i386/amd64 will instruct
irq_rman to manage two disjoint regions: 0-1 and 3-15 for the non-APIC
case. If IRQs 1 and 3 were allocated and then released, the old code
would coalesce across the 1 to 3 boundary because the resources were
adjacent in the linked list thus adding 2 to the area of resources that
irq_rman managed as a side effect. The fix adds extra checks so that
adjacent unallocated resources are only merged with the resource being
freed if the start and end values of the resources also match up. The
patch also consolidates the checks for adjacent resources being allocated.
following behavior:
* Link devices return invalid status (_STA) values. The results are very
unreliable -- sometimes never present. Just ignore the status and pick
the best configuration from _PRS.
* Link devices return invalid current settings (_CRS). Even after setting
the link value, many systems still return a different setting for _CRS.
When setting an IRQ, don't bother to check _CRS to see if we succeeded.
Note that we still check _CRS before routing and this should be addressed
as well.
Since this is a sensitive area, leave the old behavior accessible via
uncommenting the define for ACPI_OLD_PCI_LINK at the top of the file. Once
this has been thoroughly tested, this option and the code it covers will
be removed.
Thanks to Len Brown at Intel for informing us of these issues as he worked
around them in Linux.
location (for the wake code). It should not be needed since we don't
map other pages at the same location and if there was an old mapping, it
would be restored by a fault. The old code had serious problems, namely
that it was restoring the new page it had just removed (not opage) and
it could only guess at the right protection (since there's no
pmap_extract_protect function). Thanks to Alan Cox for explaining much
of this to me.
Also, remove a commented-out initializecpu() call since it is not needed.
Restoring the cpu context is better than attempting to init from scratch.
Reviewed by: alc (earlier version)
have clear idea on boot2 BSS size and leaves portion of it not zeroed out.
btxcsu.s is in much better position for this job.
Obtained from: DragonflyBSD (with minor adjustments)
Since HME doesn't compensate the checksum for UDP datagram which
can yield to 0x0, UDP transmit checksum offload is disabled by
default. The UDP Transmit checksum offload can be reactivated
by setting special link option link0 with ifconfig(8).
Approved by: jake (mentor)
Reviewed by: tmm
Tested by: Herve Boulouis <amon@sockar.homeip.net>
before grabbing BPF locks to see if there are any entries in order to
avoid the cost of locking if there aren't any. Avoids a mutex lock/
unlock for each packet received if there are no BPF listeners.
consumer and 'bio_pflags' which can be used by provider.
- Remove BIO_FLAG1 and BIO_FLAG2 flags. From now on new fields should be
used for internal flags.
- Update g_bio(9) manual page.
- Update some comments.
- Update GEOM_MIRROR, which was the only one using BIO_FLAGs.
Idea from: phk
Reviewed by: phk
spin-wait code to use the same spin mutex (smp_tlb_mtx) as the TLB ipi
and spin-wait code snippets so that you can't get into the situation of
one CPU doing a TLB shootdown to another CPU that is doing a lazy pmap
shootdown each of which are waiting on each other. With this change, only
one of the CPUs would do an IPI and spin-wait at a time.
the immediate awakening of proc0 (scheduler kproc, controls swapping
processes in and out). The scheduler process periodically awakens already,
so this will not result in processes not being swapped in, there will just
be more latency in between a thread being made runnable and the scheduler
waking up to swap the affected process back in.
macros and pass the value to the associated _mtx_*() functions to avoid
more curthread dereferences in the function implementations. This provided
a very modest perf improvement in some benchmarks.
Suggested by: rwatson
Tested by: scottl
text/data are covered on APs. This enables the kernel to boot on
a 4 way Intel Itanium-2 platform. This has a secondary effect of
keeping the TRs identical on BP and the APs.
reviewed by: marcel@
a sleep() call waking up in namei(), a later assertion triggers that
Giant is not held. By asserting Giant at the start of namei(), we can
know that if that assertion triggers, Giant is lost during the call to
namei(), and not before.
lock assertions even if IPv6 is compiled into the kernel. Previously,
inclusion of IPv6 and locking assertions would result in a rapid
assertion failure as IPv6 was not properly locking inpcbs.
- In ntoskrnl_var.h, I had defined compat macros for
ntoskrnl_acquire_spinlock() and ntoskrnl_release_spinlock() but
never used them. This is fortunate since they were stale. Fix them
to work properly. (In Windows/x86 KeAcquireSpinLock() is a macro that
calls KefAcquireSpinLock(), which lives in HAL.dll. To imitate this,
ntoskrnl_acquire_spinlock() is just a macro that calls hal_lock(),
which lives in subr_hal.o.)
- Add macros for ntoskrnl_raise_irql() and ntoskrnl_lower_irql() that
call hal_raise_irql() and hal_lower_irql().
- Use these macros in kern_ndis.c, subr_ndis.c and subr_ntoskrnl.c.
- Along the way, I realised subr_ndis.c:ndis_lock() was not calling
hal_lock() correctly (it was using the FASTCALL2() wrapper when
in reality this routine is FASTCALL1()). Using the
ntoskrnl_acquire_spinlock() fixes this. Not sure if this actually
caused any bugs since hal_lock() would have just ignored what
was in %edx, but it was still bogus.
This hides many of the uses of the FASTCALLx() macros which makes the
code a little cleaner. Should not have any effect on generated object
code, other than the one fix in ndis_lock().
vm_page_sleep_if_busy() and the page table page's busy flag as a
synchronization mechanism on page table pages.
Also, relocate the inline pmap_unwire_pte_hold() so that it can be used
to shorten _pmap_unwire_pte_hold() on alpha and amd64. This places
pmap_unwire_pte_hold() next to a comment that more accurately describes
it than _pmap_unwire_pte_hold().
by a transaction performing a driver handled message sequence (an
scb with the MK_MESSAGE flag set).
SCBs that perform host managed messaging must always be
at the head of their per-target selection queue so that
the firmware knows to manually assert ATN if the current
negotiation agreement is packetized. In the past we
guaranteed this by queuing these SCBs separarately in
the execution queue. This exposes the system to potential
command reordering in two cases:
1) Another SCB for the same ITL nexus is queued that does
not have the MK_MESSAGE flag set. This SCB will be
queued to the per-target list which can be serviced
before the MK_MESSAGE scb that preceeded it.
2) If the target cannot accept all of the commands in the
per-target selection queue in one selection, the remainder
is queued to the tail of the selection queues so as to
effect round-robin scheduling. This could allow the
MK_MESSAGE scb to be sent to the target before the
requeued commands.
This commit changes the firmware policy to defer queuing
MK_MESSAGE SCBs into the selection queues until this can
be done without affecting order. This means that the
target's selection queue is either empty, or the last
SCB on the execution queue is also a MK_MESSAGE SCB.
During any wait, the firmware halts the download of new
SCBs so only a single "holding location" is required.
Luckily, MK_MESSAGE SCBs are rare and typically occur only
during CAM's bus probe where only one command is outstanding
at a time. However, during some recovery scenarios, the
reordering *could* occur.
aic79xx.c:
Update ahd_search_qinfifo() and helper routines to
search for pending MK_MESSAGE scbs and properly
restitch the execution queue if either the MK_MESSAGE
SCB is being aborted, or the MK_MESSAGE SCB can be
queued due to the execution queue draining due to
aborts.
Enable LQOBUSFREE status to assert an interrupt.
This should be redundant since a BUSFREE interrupt
should always occur along with an LQOBUSFREE event,
but on the Rev A, this doesn't seem to be guaranteed.
When a PPR request is rejected when a previously
existing packetized agreement is in place, assume
that the target has been reset without our knowledge
and revert to async/narrow transfers. This corrects
two issues: the stale ENATNO setting that was used
to send the PPR is cleared so the firmware is not
confused by a future packetized selection with
ATN asserted but no MK_MESSAGE flag in the SCB and
it speeds up recovery by aborting any pending
packetized transactions that by definition are now
dead.
When re-queueing SCBs after a failed negotiation
attempt, ensure command ordering by freezing the
device queue first.
Traverse the list of pending SCBs rather than the
whole SCB array on the controller when pushing
MK_MESSAGE flag changes out to the controller.
The original code was optimized for the aic7xxx
controllers where there are fewer controller slots
then pending SCBs and the firmware picks SCB
slots. For the U320 controller, the hope is
that we have fewer pending SCBs then the 512
slots on the controller.
Enhance some diagnostics.
Factor out some common code.
aic79xx.h:
Add prototype for new ahd_done_with_status() that is
used to factor out some commone code.
aic79xx.reg:
Add definisions for the pending MK_MESSAGE SCB.
aic79xx.seq:
Defer MK_MESSAGE SCB queing to the execution queue
so as to preserve command ordering. Re-arrange some
of the selection processing code so the above change
had no performance impact on the common code path.
Close a few critical section holes.
When entering a non-packetized phase, manually enable
busfree interrupts, since the controller hardware
does not do this automatically.
aic79xx_inline.h:
Enhance logging for queued SCBs.
aic79xx_osm.c:
Add new a new DDB ahd command, ahd_dump, which
invokes the ahd_dump_card_state() routine on the
unit specified with the ahd_sunit DDB command.
aic79xx_pci.c:
Turn on the BUSFREEREV bug for the Rev B. controller.
This is required to close the busfree during non-packetized
phase hole.
functions. Basically, the ip_next() function was used to get the PPTP and
Skinny headers when tcp_next() should have been used instead. Symptoms of
this included a segfault in natd when trying to process a PPTP or Skinny
packet.
Approved by: des
should be set to VM_PAGE_BITS_ALL before returning, to ensure that
neither vm_pager_get_pages nor vm_fault calls vm_page_zero_invalid
after dev_pager_getpages has returned.
Submitted by: tegge
into single-user mode (as seen on sparc64 and PPC). Problems were due
to a minor oversight in the changes committed in revision 1.25.
Submitted by: grehan
Tested by: gad & yongari
being defined, define and use a new MD macro, cpu_spinwait(). It only
expands to something on i386 and amd64, so the compiled code should be
identical.
Name of the macro found by: jhb
Reviewed by: jhb
make it fully self-contained.
o ip_reass() now returns a new mbuf with the reassembled packet and ip->ip_len
including the IP header.
o Computation of the delayed checksum is moved into divert_packet().
Reviewed by: silby
- according to RFC2661 an offset size of 0 is allowed.
- when skipping offset padding do not forget to also skip
the 2 octets of the offset size field.
Reviewed by: archie
Approved by: pjd (mentor)
link[n].latency calculated from user supplied value.
This prevents repeated NGM_PPP_SET_CONFIG/NGM_PPP_GET_CONFIG
from failing because of link[n].conf.latency being out of range.
Reviewed by: archie
Approved by: pjd (mentor)
and setting MSR. This was most evident with the idle proc running
with interrupts disabled and causing a lockup. Switch over to the
i386 style which does things in the right order.
debug assisted by: gallatin, and the invaluable KTR option.
pipelock(), not via a mixture of mutexes and pipelock(). Additionally,
add a few KASSERTS, and change some statements that should have been
KASSERTS into KASSERTS.
As a result of these cleanups, some segments of code have become
significantly shorter and/or easier to read.
unconditionally, stop after the first one (system board) if no EISA hardware
is detected. This fixes a boot hang (i.e. Thinkpad) when ACPI is disabled.
Also, split the probe code into a separate function and do some style cleanup.
Note that the Adaptec 2842 VLB controller probe is broken by this change
and will fail to probe. It should be fixed separately.
in case of a CHECK CONDITION.
- Make this driver return SCSI status information.
- While here, factor out the clearing of the CAM status from every
element of the switch statement to only once before the switch.
This fixes burning CDs with recent cdrecord 2.01 alpha versions and
burners attached to asr(4) controllers but there could have been
other applications and da(4) etc. also affected.
Reviewed by: gibbs, scottl
MFC after: 2 weeks
UFS2 was here. It so happened that UFS2 did not need a seperate
partition type. Keep the definition as a comment for documentation
purposes. If there is a benefit for UFS2 file systems to have a
seperate partition type under GPT, then this definition should be
restored as that was the intention of the definition.
Currently one cannot load the mem.ko module without panicing if mem is
compiled into the kernel and one cannot build a kernel w/o "device mem"
right now either. Thus it is too dangerous to install mem.ko right now
because if one puts 'mem_load="YES"' in /etc/loader.conf they cannot
boot an "old" kernel (at the time that a kernel doesn't have to be built
with "device mem).
pic_eoi_source() into one call. This halves the number of spinlock operations
and indirect function calls in the normal case of handling a normal (ithread)
interrupt. Optimize the atpic and ioapic drivers to use inlines where
appropriate in supporting the intr_execute_handlers() change.
This knocks 900ns, or roughly 1350 cycles, off of the time spent servicing an
interrupt in the common case on my 1.5GHz P4 uniprocessor system. SMP systems
likely won't see as much of a gain due to the ioapic being more efficient than
the atpic. I'll investigate porting this to amd64 soon.
Reviewed by: jhb
skip blocks that are too big by a factor of two or greater. This
avoids some cases of extremely inefficient memory use that can occur
when large (e.g. 64k) blocks on the free list get used when allocating
a 4k chunk of 64-byte fragments. Because fragments have their own
free list, the 60k difference got lost forever every time.
system BIOS to disable legacy device emulation as per the "EHCI
Extended Capability: Pre-OS to OS Handoff Synchronisation" section
of the EHCI spec. BIOSes that implement legacy emulation using SMIs
are supposed to disable the emulation when this procedure is performed.
set gp->softc to NULL and return ENXIO when it is NULL, so GEOM
will not panic or hang, but unload one device on every 'unload'.
This make 'unload' command usable, but it have to be executed
<number of devices> + 1 times.
- Made use of 'pp' variable.
so that they know whether the allocation is supposed to be able to sleep
or not.
* Allow uma_zone constructors and initialation functions to return either
success or error. Almost all of the ones in the tree currently return
success unconditionally, but mbuf is a notable exception: the packet
zone constructor wants to be able to fail if it cannot suballocate an
mbuf cluster, and the mbuf allocators want to be able to fail in general
in a MAC kernel if the MAC mbuf initializer fails. This fixes the
panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
the default.
Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.
now, but it's possible for ndis_reset_nic() to sleep (sometimes the
MiniportReset() method returns NDIS_STATUS_PENDING and we have
to wait for completion). To get around this, execute the ndis_reset_nic()
routine in the NDIS_TASKQUEUE thread.
- Give ndiscvt(8) the ability to process a .SYS file directly into
a .o file so that we don't have to emit big messy char arrays into
the ndis_driver_data.h file. This behavior is currently optional, but
may become the default some day.
- Give ndiscvt(8) the ability to turn arbitrary files into .ko files
so that they can be pre-loaded or kldloaded. (Both this and the
previous change involve using objcopy(1)).
- Give NdisOpenFile() the ability to 'read' files out of kernel memory
that have been kldloaded or pre-loaded, and disallow the use of
the normal vn_open() file opening method during bootstrap (when no
filesystems have been mounted yet). Some people have reported that
kldloading if_ndis.ko works fine when the system is running multiuser
but causes a panic when the modile is pre-loaded by /boot/loader. This
happens with drivers that need to use NdisOpenFile() to access
external files (i.e. firmware images). NdisOpenFile() won't work
during kernel bootstrapping because no filesystems have been mounted.
To get around this, you can now do the following:
o Say you have a firmware file called firmware.img
o Do: ndiscvt -f firmware.img -- this creates firmware.img.ko
o Put the firmware.img.ko in /boot/kernel
o add firmware.img_load="YES" in /boot/loader.conf
o add if_ndis_load="YES" and ndis_load="YES" as well
Now the loader will suck the additional file into memory as a .ko. The
phony .ko has two symbols in it: filename_start and filename_end, which
are generated by objcopy(1). ndis_open_file() will traverse each module
in the module list looking for these symbols and, if it finds them, it'll
use them to generate the file mapping address and length values that
the caller of NdisOpenFile() wants.
As a bonus, this will even work if the file has been statically linked
into the kernel itself, since the "kernel" module is searched too.
(ndiscvt(8) will generate both filename.o and filename.ko for you).
- Modify the mechanism used to provide make-pretend FASTCALL support.
Rather than using inline assembly to yank the first two arguments
out of %ecx and %edx, we now use the __regparm__(3) attribute (and
the __stdcall__ attribute) and use some macro magic to re-order
the arguments and provide dummy arguments as needed so that the
arguments passed in registers end up in the right place. Change
taken from DragonflyBSD version of the NDISulator.
to be particularly correct or optimal, but it seems to be enough
to allow the attachment of USB2 hubs and USB2 devices connected via
USB2 hubs. None of the split transaction support is implemented in
our USB stack, so USB1 peripherals will definitely not work when
connected via USB2 hubs.
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.
The different between the new function and g_mirror_orphan() (which was
used previously) is that syncid is bumped immediately, instead of on
first write, because when consumer was spoiled, it means, that its
provider was opened for writing, so we can't trust that its data
will be valid when it will be connected again.
features. The gmirror(8) utility should be used for control of this class.
There is no manual page yet, but I'm working on it with keramida@.
Many useful tests provided by: simon (thank you!)
Some ideas from: scottl, simon, phk
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.
s/vfs_mount/vfs_omount/
s/vfs_nmount/vfs_mount/
Name our filesystems mount function consistently.
Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.
Reorganize the root filesystem selection code.
those architectures without pmap locking.
- Eliminate the acquisition and release of Giant from vm_map_protect().
(Translation: mprotect(2) runs to completion without touching Giant on
alpha, amd64, i386 and ia64.)
brings ia64 to parity with alpha, amd64, and i386 in this area.)
- Prevent a race in pmap_find_pte(): If pmap_find_pte() sleeps in
uma_zalloc(), another thread could allocate a pte at the same address.
Instead, sleep at a higher level and retry the lookup before retrying
the allocation.
Reviewed and tested by: marcel@
This is really ugly way to do this, but there is no other way for now.
It allows to mount root file system from providers which belong to
those classes.
Approved by: phk
maps. We always acquire the sx lock exclusively here, but we can't
use a mutex because we want to be able to sleep while holding the
lock. This is completely equivalent to what we were doing with the
lockmgr(9) locks before.
Approved by: alc
provider.
- Bump version number.
This allows for a quite interesting trick. One can setup a stripe with
stripe size of 512 bytes and create transparent provider on top of it
with sector size equal to <ndisks> * 512. The result will be something
like RAID3 without parity disk (every access will touch all disks).
submitted version with style cleanups and changes to comments. I also
modified the ioctl interface. This version only has one ioctl (to get
the Synaptics-specific config parameters) since this is the only
information a user might want.
Submitted by: Arne Schwabe <arne -at- rfc2549.org>
- Enable recursion on the page queues lock. This allows calls to
vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page
queues lock held. Such calls are made to allocate page table pages
and pv entries.
- The previous change enables a partial reversion of vm/vm_page.c
revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault()
now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT.
- Add partial locking to pmap_copy(). (As a side-effect, pmap_copy()
should now be faster on i386 SMP because it no longer generates IPIs
for TLB shootdown on the other processors.)
- Complete the locking of pmap_enter() and pmap_enter_quick(). (As of now,
all changes to a user-level pmap on alpha, amd64, and i386 are performed
with appropriate locking.)
- zone_large_init() stays pretty much the same.
- zone_small_init() will try to stash the slab header in the slab page
being allocated if the amount of calculated wasted space is less
than UMA_MAX_WASTE (for both the UMA_ZONE_REFCNT case and regular
case). If the amount of wasted space is >= UMA_MAX_WASTE, then
UMA_ZONE_OFFPAGE will be set and the slab header will be allocated
separately for better use of space.
- uma_startup() calculates the maximum ipers required in offpage slabs
(so that the offpage slab header zone(s) can be sized accordingly).
The algorithm used to calculate this replaces the old calculation
(which only happened to work coincidentally). We now iterate over
possible object sizes, starting from the smallest one, until we
determine that wastedspace calculated in zone_small_init() might
end up being greater than UMA_MAX_WASTE, at which point we use the
found object size to compute the maximum possible ipers. The
reason this works is because:
- wastedspace versus objectsize is a see-saw function with
local minima all equal to zero and local maxima growing
directly proportioned to objectsize. This implies that
for objects up to or equal a certain objectsize, the see-saw
remains entirely below UMA_MAX_WASTE, so for those objectsizes
it is impossible to ever go OFFPAGE for slab headers.
- ipers (items-per-slab) versus objectsize is an inversely
proportional function which falls off very quickly (very large
for small objectsizes).
- To determine the maximum ipers we'll ever need from OFFPAGE
slab headers we first find the largest objectsize for which
we are guaranteed to not go offpage for and use it to compute
ipers (as though we were offpage). Since the only objectsizes
allowed to go offpage are bigger than the found objectsize,
and since ipers vs objectsize is inversely proportional (and
monotonically decreasing), then we are guaranteed that the
ipers computed is always >= what we will ever need in offpage
slab headers.
- Define UMA_FRITM_SZ and UMA_FRITMREF_SZ to be the actual (possibly
padded) size of each freelist index so that offset calculations are
fixed.
This might fix weird data corruption problems and certainly allows
ARM to now boot to at least single-user (via simulator).
Tested on i386 UP by me.
Tested on sparc64 SMP by fenner.
Tested on ARM simulator to single-user by cognet.
* Some systems have _FDE and child floppy devices, but no _FDI. This seems
to be compatible with the standard. Don't error out if there is no _FDI.
Instead, continue on to the next device. The normal fd probe will take
care of this device.
* Some systems have _FDE but no child devices in AML. For these, add a
second pass that compares the results of _FDE to the presence of devices.
If not present, add the missing device.
* Some BIOS authors didn't read the spec. They use tape drive values for
all fdc(4) devices. Since this isn't grossly incompatible with the
required boolean value, use them. They also define the _FDE items as a
package instead of buffer. Regenerate the buffer from the package if it
is present.
Tested by: tjr, marcel
Add local rootvp variables as needed.
Remove checks for miniroot's in the swappartition. We never did that
and most of the filesystems could never be used for that, but it had
still been copy&pasted all over the place.
vm/vm_object.c revision 1.88) and vm_object_sync() (originating in
vm/vm_map.c revision 1.36): When descending a chain of backing objects,
both use the wrong object's backing offset. Consequently, both may
operate on the wrong pages.
Quoting Matt, "This could be responsible for all of the sporatic madvise
oddness that has been reported over the years."
Reviewed by: Matt Dillon
Alice is too lazy to write a server application in PF-independent
manner. Therefore she knocks up the server using PF_INET6 only
and allows the IPv6 socket to accept mapped IPv4 as well. An evil
hacker known on IRC as cheshire_cat has an account in the same
system. He starts a process listening on the same port as used
by Alice's server, but in PF_INET. As a consequence, cheshire_cat
will distract all IPv4 traffic supposed to go to Alice's server.
Such sort of port theft was initially enabled by copying the code that
implemented the RFC 2553 semantics on IPv4/6 sockets (see inet6(4)) for
the implied case of the same owner for both connections. After this
change, the above scenario will be impossible. In the same setting,
the user who attempts to start his server last will get EADDRINUSE.
Of course, using IPv4 mapped to IPv6 leads to security complications
in the first place, but there is no reason to make it even more unsafe.
This change doesn't apply to KAME since it affects a FreeBSD-specific
part of the code. It doesn't modify the out-of-box behaviour of the
TCP/IP stack either as long as mapping IPv4 to IPv6 is off by default.
MFC after: 1 month
this file from userland. Since we export struct ifnet to userland, and
that struct ifnet now contains a struct task, userland needs to know
what struct task looks like.
We need to consider having a pointer to a struct task here instead and
forward declare struct task in the !_KERNEL case.
static symbols. This wasn't a problem with previous GCC releases, but
unit-at-a-time mode of GCC 3.4.2 prevents linker set components from
being emitted at all.
"__FreeBSD_version should only ever increment. It is a historial record
of events in the system. Decrementing it is akin to trying to go back
in time and change history."
Reminded by: kuriyama, scottl
with the FIN bit set for all segments, if a FIN has already been sent before.
The fix will allow the FIN bit to be set for only the last segment, in case
it has to be retransmitted.
Fix another bug that would have caused snd_nxt to be pulled by len if
there was an error from ip_output. snd_nxt should not be touched
during sack retransmissions.
synchronizing IPv6 protocol control blocks and lists. These changes
are modeled on the inpcb locking for IPv4, submitted by Jennifer Yang,
and committed by Jeffrey Hsu. With these locking changes, IPv6 use of
inpcbs is now substantially more MPSAFE, and permits IPv4 inpcb locking
assertions to be run in the presence of IPv6 compiled into the kernel.
device drivers to declare that the ifp->if_start() method implemented
by the driver requires Giant in order to operate correctly.
Add a 'struct task' to 'struct ifnet' that can be used to execute a
deferred ifp->if_start() in the event that if_start needs to be called
in a Giant-free environment. To do this, introduce if_start(), a
wrapper function for ifp->if_start(). If the interface can run MPSAFE,
it directly dispatches into the interface start routine. If it can't
run MPSAFE, we're running with debug.mpsafenet != 0, and Giant isn't
currently held, the task is queued to execute in a swi holding Giant
via if_start_deferred().
Modify if_handoff() to use if_start() instead of direct dispatch.
Modify 802.11 to use if_start() instead of direct dispatch.
This is intended to provide increased compatibility for non-MPSAFE
network device drivers in the presence of Giant-free operation via
asynchronous dispatch. However, this commit does not mark any network
interfaces as IFF_NEEDSGIANT.
using linker_load_module(). This works OK if NGM_MKPEER message came
from userland and we have process associated with thread. But when
NGM_MKPEER was queued because target node was busy, linker_load_module()
is called from netisr thread leading to panic.
To workaround that we do not load modules by framework, instead ng_socket
loads module (if this is required) before sending NGM_MKPEER.
However, the race condition between return from NgSendMsg() and actual
creation of node still exist and needs to be solved.
PR: kern/62789
Approved by: julian
clients simultaneously. When node is client its mode is configured
with a control message.
sysctl net.graph.nonstandard_pppoe is deprecated but kept for
backward compatibility for some time.
Approved by: julian
dereference curthread. It is called only from critical_{enter,exit}(),
which already dereferences curthread. This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.
Head nodding: jhb, bmilekic
(i.e. with the foreign address being not wildcard) when checking
for possible port theft since such connections cannot be stolen.
The port theft check is FreeBSD-specific and isn't in the KAME tree.
PR: bin/65928 (in the audit trail)
Reviewed by: -net, -hackers (silence)
Tested by: Nick Leuta <skynick at mail.sc.ru>
MFC after: 1 month
an adaptive fashion when adaptive mutexes are enabled. The theory
behind non-adaptive Giant is that Giant will be held for long periods
of time, and therefore spinning waiting on it is wasteful. However,
in MySQL benchmarks which are relatively Giant-free, running Giant
adaptive makes an observable difference on SMP (5% transaction rate
improvement). As such, make adaptive behavior on Giant an option so
it can be more widely benchmarked.
- Push down Giant into shmexit(). (Giant is acquired only if the vmspace
contains shm segments.)
- Eliminate the acquisition of Giant from proc_rwmem().
- Reduce the scope of Giant in exit1(), uncovering the destruction of the
address space.
switch in fork_exit() to before anything else is done (but keep
schedlock for the deadthread check). This means one less
nasty bug if ever in the future whatever might have been called
before the update played with schedlock or critical sections.
Discussed with: tjr
when inpcb is NULL, this is no longer invalid since jlemon added the
tcp_twstart function... this prevents close "failing" w/ EINVAL when it
really was successful...
Reviewed by: jeremy (NetBSD)
Now it is user-controlled through ifconfig(8).
The former ``automagic'' way of operation created more
trouble than good. First, VLAN_MTU consumers other than
vlan(4) had appeared, e.g., ng_vlan(4). Second, there was
no way to disable VLAN_MTU manually if it were causing
trouble, e.g., data corruption.
Dropping the ``automagic'' should be completely invisible
to the user since
a) all the drivers supporting VLAN_MTU
have it enabled by default, and in the first place
b) there is only one driver that can really toggle VLAN_MTU
in the hardware under its control (it's fxp(4), to which
I added VLAN_MTU controls to illustrate the principle.)
the system" resource limit code: When checking if the caller has superuser
privileges, we should be checking the *real* user, not the *effective*
user. (In general, resource limiting is done based on the real user, in
order to avoid resource-exhaustion-by-setuid-program attacks.)
Now that a SUSER_RUID flag to suser_cred exists, use it here to return
this code to its correct behaviour.
Pointed out by: rwatson
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.
The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)
Discussed with: rwatson, scottl
Requested by: jhb
ACPI_DEBUG. This upset the ordering that acpi_probe_order() was meant to
provide, causing devices to attach before the sysresource object. This
debugging feature has been unnecessary for a while so just remove it.
Testing by: marcel
the license from /usr/src/COPYRIGHT. Since cvs annotate shows that
this was written by jasone, julian, jhb, peter, bmilekic and obrien.
cvs log shows that many others may have contributed to this file. As
such, go ahead and use the author of 'FreeBSD Project' for this file.
If this is a problem, please notify me.
# this eliminates the last file in the kernel with an indirect reference
# to /usr/src/COPYRIGHT in the kernel. A few more in userland remain.
vinumdrive geom with an exclusive bit. This should fix the problem
when underlying partitions overlap (i.e. the 'a' partition is at
the same offset as the 'c' partition).
Ideas borrowed from pjd@, quite a bit of testing by
Matthias Schuendehuette <msch@snafu.de>.
Properly wait for not busy and introduce a timeout for devices not
setting busy (as they should).
Leave a printf in there that states how long the wait was, as I'd like
to get an idea of the variations here. The time needed seems also to be
affected by whether a medium is present or not.
FOREACH_SAFE. Remove bad cast of retp and instead use an additional
arg to pass back the number of valid outputs. Use the package convenience
functions for parsing packages.
o Separate out local (ports) scripts that use rc.d, and the old style
startup/shutdown scripts and execute them separately. On startup the
rc.d style scripts are executed first and then the old-style scripts.
On shutdown, exactly the reverse happens.
o The rc.d ports scripts should now behave more like base system scripts.
Scripts ending in .sh will be sourced into the current shell, while the
rest will be executed in a subshell. Previously, all ports scripts,
regardless of the .sh suffix, were executed in a subshell.
o The parent script, /etc/rc.d/localpkg, passes its command line arguments
straight to the rc.d ports scripts. This means they should now honor
faststop and faststart commands as well. Old style scripts, should not see
any differences. They will still get either a start or stop command.
o The initial phrase shown during shutdown has been changed to use
"local packages" instead of "daemon processes" to be more inline with the
phrase used during local package startup. The phrases are also used only for
old-style ports script startup/shutdown, whereas previously they were being
used for both rc.d and old-style scripts. This should make startup/shutdown
output a bit less ugly.
Discussed with: portmgr
Has Reservations: eik
(in particular, bge(4) hasn't supported rxcsum since if_bge.c#1.5)
Clean up some aspects of capabilities usage, i.e. stop using
if_hwassist to see whether we are doing offload now because if_hwassist
is for TCP/IP layer and it is subordinate to if_capenable.
Thanks to: Aled Morris for donating a nice bge(4) NIC to me
Reviewed by: -net, -hackers (silence)
vmspace to the new vmspace in vmspace_exec() is mostly wasted effort. With
one exception, vm_swrss, the copied fields are immediately overwritten.
Instead, initialize these fields to zero in vmspace_alloc(), eliminating a
bcopy() from vmspace_exec() and a bzero() from vmspace_fork().
can't yet be referenced by other threads.
In microbenchmarks, this appears to reduce the cost of
pipe();close();close() on UP by 10%, and SMP by 7%. The vast majority
of the cost of allocating a pipe remains VM magic.
Suggested by: silby
calls further down the stack. If we find the cksum to be okay we pretend
that the hardware did all the work and hence keep the upper layers from
checking again.
Submitted by: Pyun YongHyeon
addend of 0. This isn't correct, and was quite easy to break by
referring to the address of an element within a structure.
However, fixing this exposed the fact that symbol lookups for
local variables were returning the base of the section they
were contained in. This case is detected by comparing the return
value from elf_lookup() to the relocbase+addend value: if it is
lesser, but greater than relocbase, then relocbase+addend is
taken to be the authoritative value.
bug reported by: gallatin
happens because the sio device was never opened and com->tp is
therefore NULL. ttygone can't swallow a NULL, so guard against that
possibility. Other places in this function make similar checks, so I
believe this is correct.
Improve child_detached a little and make it conform better to
style(9). Also, improve comment about what we'll be doing in the
future about driver_added. Soon it will be possible to kldload usb
drivers and have them attach w/o a need to disconnect/reconnect them.
Giant conditional on debug.mpsafenet in the socket soo_stat() routine,
unconditionally in vn_statfile() for VFS, and otherwise don't acquire
Giant. Accept an unlocked read in kqueue_stat(), and cryptof_stat() is
a no-op. Don't acquire Giant in fstat() system call.
Note: in fdescfs, fo_stat() is called while holding Giant due to the VFS
stack sitting on top, and therefore there will still be Giant recursion
in this case.
the data sheets leads me to believe these will just work. Those parts
with the various media readers on them may not have the required
FreeBSD drivers that will attach to the subdevices that will be seen
on some of these parts.
PCI 1515, 1530, 1620, 4520, 6411, 6420, 7410, 7510, 7610
Prompted by: Havard Eidnes
These are from the datasheets downloaded from TI's web site.
They describe the PCI[67]x[12]1 and PCI[67]x20 parts, with and without
the smartcard enabled.
kmem_alloc_pageable(). The difference between these is that an errant
memory access to the zone will be detected sooner with
kmem_alloc_nofault().
The following changes serve to eliminate the following lock-order
reversal reported by witness:
1st 0xc1a3c084 vm object (vm object) @ vm/swap_pager.c:1311
2nd 0xc07acb00 swap_pager swhash (swap_pager swhash) @ vm/swap_pager.c:1797
3rd 0xc1804bdc vm object (vm object) @ vm/uma_core.c:931
There is no potential deadlock in this case. However, witness is unable
to recognize this because vm objects used by UMA have the same type as
ordinary vm objects. To remedy this, we make the following changes:
- Add a mutex type argument to VM_OBJECT_LOCK_INIT().
- Use the mutex type argument to assign distinct types to special
vm objects such as the kernel object, kmem object, and UMA objects.
- Define a static swap zone object for use by UMA. (Only static
objects are assigned a special mutex type.)
individual file object implementations can optionally acquire Giant if
they require it:
- soo_close(): depends on debug.mpsafenet
- pipe_close(): Giant not acquired
- kqueue_close(): Giant required
- vn_close(): Giant required
- cryptof_close(): Giant required (conservative)
Notes:
Giant is still acquired in close() even when closing MPSAFE objects
due to kqueue requiring Giant in the calling closef() code.
Microbenchmarks indicate that this removal of Giant cuts 3%-3% off
of pipe create/destroy pairs from user space with SMP compiled into
the kernel.
The cryptodev and opencrypto code appears MPSAFE, but I'm unable to
test it extensively and so have left Giant over fo_close(). It can
probably be removed given some testing and review.
thread-local pointer, in practice that thread needs to be curthread. If
we're running with INVARIANTS, generate a warning if not. If we have
KDB compiled in, generate a stack trace. This doesn't fire at all in my
local test environment, but could be irritating if it fires frequently
for someone, so there will be motivation to fix things quickly when it
does.
the caller passes in a td that is curthread, and consistently pass 'td'
into vget(). Remove some bogus logic that passed in td or curthread
conditional on td being non-NULL, which seems redundant in the face of
the earlier assignment of td to curthread if td is NULL.
In devfs_symlink(), cache the passed thread in 'td' so we don't have
to keep retrieving it from the 'ap' structure, and assert that td is
curthread (since we dereference it to get thread-local td_ucred). Use
'td' in preference to curthread for later lockmgr calls, since they are
equal.
and saved link register as per the ABI call sequence. Update code
that uses this (fork_trampoline etc) to use the correct genassym'd
offsets.
This fixes the 'invalid LR' message when backtracing kernel
threads in DDB.
being incomplete, it currently has to know how to drop and pick back
up the vm_object's mutex if it has to sleep and drop the page queue
mutex. The problem with this is that if the page is busy, while we
are sleeping, the page can be freed and object disappear. When trying
to lock m->object, we'd get a stale or NULL pointer and crash.
The object is now cached, but this makes the assumption that
the object is referenced in some manner and will not itself
disappear while it is unlocked. Since this only happens if
the object is locked, I had to remove an assumption earlier in
contigmalloc() that reversed the order of locking the object and
doing vm_page_sleep_if_busy(), not the normal order.
(WITNESS) for code paths that always call uma_zalloc_arg() shortly
after where the check was, because uma_zalloc_arg() already does
a similar check.
No objections from Alfred. Thanks Alfred.
RTF_BLACKHOLE as well.
To quote the submitter:
The uRPF loose-check implementation by the industry vendors, at least on Cisco
and possibly Juniper, will fail the check if the route of the source address
is pointed to Null0 (on Juniper, discard or reject route). What this means is,
even if uRPF Loose-check finds the route, if the route is pointed to blackhole,
uRPF loose-check must fail. This allows people to utilize uRPF loose-check mode
as a pseudo-packet-firewall without using any manual filtering configuration --
one can simply inject a IGP or BGP prefix with next-hop set to a static route
that directs to null/discard facility. This results in uRPF Loose-check failing
on all packets with source addresses that are within the range of the nullroute.
Submitted by: James Jun <james@towardex.com>
work very infrequently, and often results in a compound panic which
confuses debugging; locking/SMP have made the layering violation (and
risks) of this more obvious over time.
Discussed with: green, bde, et al.
the thread ID and call db_trace_thread().
Since arm has all the logic in db_stack_trace_cmd(), rename the
new DB_COMMAND function to db_stack_trace to avoid conflicts on
arm.
While here, have db_stack_trace parse its own arguments so that
we can use a more natural radix for IDs. If the ID is not a thread
ID, or more precisely when no thread exists with the ID, try if
there's a process with that ID and return the first thread in it.
This makes it easier to print stack traces from the ps output.
requested by: rwatson@
tested on: amd64, i386, ia64
init and fini handlers. Our vm system removes all userland mappings at
exit prior to calling pmap_release. It just so happens that we might
as well reuse the pmap for the next process since the userland slate
has already been wiped clean.
However. There is a functional benefit to this as well. For platforms
that share userland and kernel context in the same pmap, it means that
the kernel portion of a pmap remains valid after the vmspace has been
freed (process exit) and while it is in uma's cache. This is significant
for i386 SMP systems with kernel context borrowing because it avoids
a LOT of IPIs from the pmap_lazyfix() cleanup in the usual case.
Tested on: amd64, i386, sparc64, alpha
Glanced at by: alc
Also introduce a macro to be called by persistent nodes to signal their
persistence during shutdown to hide this mechanism from the node author.
Make node flags have a consistent style in naming.
Document the change.
- Return meaningful return errorcodes.
- Free previously allocated connection in error cases.
In ng_device_rcvdata():
- Return meaningful return errorcodes.
- Detach mbuf from netgraph item, and free the item before
doing any other actions that may return from method.
- Do not call strange malloc() for buffer. [1]
- In case of any error jump to end, where mbuf is freed.
In ng_device_disconnect():
- Return meaningful return errorcodes.
- Free disconnected connection.
style(9) in mentioned above functions:
- Remove '/* NGD_DEBUG */', when only one line is ifdef'ed.
- Remove extra braces to easier reading.
- Add space after comma in function calls.
PR: kern/41881 (part)
Reviewed by: marks
Approved by: julian (mentor)
2. Sort includes, while here.
3. s/NULL/0/ in NG_SEND_MSG_HOOK(), since ng_ID_t is integer.
PR: kern/41881 (part)
Reviewed by: marks
Approved by: julian (mentor)
we construct the EFI image. It doesn't seem to actually end up
in the EFI image, AFAICT.
o Replace .quad, .long and .short with data8, data4 and data2 resp.
The former are gnuisms.
o Redefine _start_plabel as a data16 with @iplt(_start) as its
value. This is the preferred way to create user PLT entries.
binutils 2.15. The linker now creates a .rela.dyn section for
dynamic relocations, while our script created a .rela section.
Likewise, we copied the .rela section to the EFI image, but not
the .rela.dyn section. The fix is to rename .rela to .rela.dyn
in the linker script so that all relocations end up in the same
section again. This we copy into the EFI image.
gcc is using. This fixes devstat consumers (like vmstat, iostat,
systat) so they don't print crazy zillion digit numbers for
disk transfers and bandwidth.
According to gcc, long doubles are 64-bits, rather than 128 bits
like the SVR4 ABI spec wants them to be.. Note that MacOSX also treats
long doubles as 64-bits, and not 128 bits, so we are in good company.
Reviewed by: das
Approved by: grehan
1) data to be sent to the right of snd_recover.
2) send more data then whats in the send buffer.
The fix is to postpone sack retransmit to a subsequent recovery episode
if the current retransmit pointer is beyond snd_recover.
Thanks to Mohan Srinivasan for helping fix the bug.
Submitted by:Daniel Lang
usbdi.c rev. 1.104, author: mycroft
ugen_isoc_rintr() may recycle the xfer immediately. Therefore, we
avoid touching the xfer after calling the callback in
usb_transfer_complete(). From PR 25960.
allocated as "no object" pages. Similar changes were made to the amd64
and i386 pmap last year. The primary reason being that maintaining
a pte object leads to lock order violations. A secondary reason being
that the pte object is redundant, i.e., the page table itself can be
used to lookup page table pages. (Historical note: The pte object
predates our ability to allocate "no object" pages. Thus, the pte
object was a necessary evil.)
- Unconditionally check the vm object lock's status in vm_page_remove().
Previously, this assertion could not be made on Alpha due to its use
of a pte object.
to not get a page fault if he has not defined a dump device.
Panic can often not do a dump as it can hang forever in some cases.
The original PR was for amd64 only. This is a generalised version of
that change.
PR: amd64/67712
Submitted by: wjw@withagen.nl <Willen Jan Withagen>
a fast interrupt handler doing an swi_sched(). This fixed the lockups I
saw on my laptop when using xmms in KDE and on rwatson's MySQL benchmarks
on SMP. This will eventually be removed and/or modified when I figure out
what the root cause is and fix that.
improved chance of working despite pressure from running programs.
Instead of trying to throw a bunch of pages out to swap and hope for
the best, only a range that can potentially fulfill contigmalloc(9)'s
request will have its contents paged out (potentially, not forcibly)
at a time.
The new contigmalloc operation still operates in three passes, but it
could potentially be tuned to more or less. The first pass only looks
at pages in the cache and free pages, so they would be thrown out
without having to block. If this is not enough, the subsequent passes
page out any unwired memory. To combat memory pressure refragmenting
the section of memory being laundered, each page is removed from the
systems' free memory queue once it has been freed so that blocking
later doesn't cause the memory laundered so far to get reallocated.
The page-out operations are now blocking, as it would make little sense
to try to push out a page, then get its status immediately afterward
to remove it from the available free pages queue, if it's unlikely to
have been freed. Another change is that if KVA allocation fails, the
allocated memory segment will be freed and not leaked.
There is a sysctl/tunable, defaulting to on, which causes the old
contigmalloc() algorithm to be used. Nonetheless, I have been using
vm.old_contigmalloc=0 for over a month. It is safe to switch at
run-time to see the difference it makes.
A new interface has been used which does not require mapping the
allocated pages into KVA: vm_page.h functions vm_page_alloc_contig()
and vm_page_release_contig(). These are what vm.old_contigmalloc=0
uses internally, so the sysctl/tunable does not affect their operation.
When using the contigmalloc(9) and contigfree(9) interfaces, memory
is now tracked with malloc(9) stats. Several functions have been
exported from kern_malloc.c to allow other subsystems to use these
statistics, as well. This invalidates the BUGS section of the
contigmalloc(9) manpage.
specify "us" as the thread not the process/ksegrp/kse.
You can always find the others from the thread but the converse is not true.
Theorotically this would lead to runtime being allocated to the wrong
entity in some cases though it is not clear how often this actually happenned.
(would only affect threaded processes and would probably be pretty benign,
but it WAS a bug..)
Reviewed by: peter
is obviously not run a lot. (but is in some test cases).
This code is not usually run because it covers a case that doesn't
happen a lot (removing a node that has data traversing it).
time now to break with the past: do not write the PID in the first note.
Rationale:
1. [impact of the breakage] Process IDs in core files serve no immediate
purpose to the debugger itself. They are only useful to relate a core
file to a process. This can provide context to the person looking at
the core file, provided one keeps track of this. Overall, not having
the PID in the core file is only in very rare occasions unfortunate.
2. [reason of the breakage] Having one PRSTATUS note contain the PID,
while all others contain the LWPID of the corresponding kernel thread
creates an irregularity for the debugger that cannot easily be worked
around. This is caused by libthread_db correlating user thread IDs to
kernel thread (aka LWP) IDs and thus aware of the actual LWPIDs.
Update comments accordingly.
does not display the symptom). Evidently the ifpi2 controller needs to be
massaged more than it was.
Note that this does not close the PR since it was filed against 4.9.
MFC: 5 days
PR: kern/68756
Submitted by: Ari Suutari <ari.suutari@syncrontech.com>
that get certain types of control messages (ping6 and rtsol are
examples). This gets the new code closer to working:
1) Collect control mbufs for processing in the controlp ==
NULL case, so that they can be freed by externalize.
2) Loop over the list of control mbufs, as the externalize
function may not know how to deal with chains.
3) In the case where there is no externalize function,
remember to add the control mbuf to the controlp list so
that it will be returned.
4) After adding stuff to the controlp list, walk to the
end of the list of stuff that was added, incase we added
a chain.
This code can be further improved, but this is enough to get most
things working again.
Reviewed by: rwatson
NO_ADAPTIVE_MUTEXES. This option has been enabled by default on amd64 for
quite some time, and has been extensively tested on i386 and sparc64. It
shows measurable performance gains in many circumstances, and few negative
effects. It would be nice in t he future if adaptive mutexes actually went
to sleep after a certain amount of spinning, but that will require quite a
bit more testing.
This fixes checksum for some drivers with partial H/W ckcsum offloads.
Reported by: Simon 'corecode' Schubert, Devon H. O'Dell, hmp
Reviewed by: Pyun YongHyeon
valid.
Implement the protection check required by the pmap_extract_and_hold()
specification. (This enables the elimination of Giant from that function.)
earlier in unp_connect() so that vp->v_socket can't change between
our copying its value to a local variable and later use of that
variable. This may have been responsible for a panic during
shutdown that I experienced where simultaneous closing of a listen
socket by rpcbind and a new connection being made to rpcbind by
mountd.
pmap_remove_page(). The reason being that pmap_pte_quick() requires
the page queues lock, which is already held, rather than Giant.
- Assert that the page queues lock is held in pmap_remove_page() and
pmap_remove_pte().
values from either user land or from the kernel. Use them for
[gs]etsockopt and to clean up some calls to [gs]etsockopt in the
Linux emulation code that uses the stackgap.
for the SYN|ACK packet and then letting in6_pcbconnect set the
flowlabel later. Arange for the syncache/syncookie code to set and
recall the flow label so that the flowlabel used for the SYN|ACK
is consistent. This is done by using some of the cookie (when tcp
cookies are enabeled) and by stashing the flowlabel in syncache.
Tested and Discovered by: Orla McGann <orly@cnri.dit.ie>
Approved by: ume, silby
MFC after: 1 month
MFC:
Fix by dhartmei@
change pf_route() loop detection: introduce a counter (number of times
a packet is routed already) in the mbuf tag, allow at most four times.
Fixes some legitimate cases broken by the previous change.
Reviewed by: dhartmei
icmp_error() packets. While here retire PACKET_TAG_PF_GENERATED (which
served the same purpose) and use M_SKIP_FIREWALL in pf as well. This should
speed up things a bit as we get rid of the tag allocations.
Discussed with: juli
using M_PROTO6 and possibly shooting someone's foot, as well as allowing the
firewall to be used in multiple passes, or with a packet classifier frontend,
that may need to explicitly allow a certain packet. Presently this is handled
in the ipfw_chk code as before, though I have run with it moved to upper
layers, and possibly it should apply to ipfilter and pf as well, though this
has not been investigated.
Discussed with: luigi, rwatson
since they are only accessed by curthread and thus do not need any
locking.
- Move pr_addr and pr_ticks out of struct uprof (which is per-process)
and directly into struct thread as td_profil_addr and td_profil_ticks
as these variables are really per-thread. (They are used to defer an
addupc_intr() that was too "hard" until ast()).
define NULL. This means we cannot use NULL in the definition of CMSG_NXTHDR.
So replace NULL with 0.
PR: kern/60309
Submitted by: Jeff King <peff-freebsd@peff.net>
left around after the PCI probe, acpi_video stopped attaching because while
it was an acpi child device, it really is a PCI device. Fix this by making
it a PCI child.
* Remove non-handle ivars accesses since child busses only implement
acpi_get_handle().
* Access the acpi softc directly through the devclass instead of through
the implied parent.
* Clean up a potential panic on unload by freeing the sysctl context before
storing NULL in the OID.
Found by: marks
- `sound'
The generic sound driver, always required.
- `snd_*'
Device-dependent drivers, named after the sound module names.
Configure accordingly to your hardware.
In addition, rename the `snd_pcm' module to `sound' in order to sync
with the driver names.
Suggested by: cg
accurately represents the intention of the 'single' label element in
Biba and MLS labels. It also approximates the use of 'effective' in
traditional UNIX credentials, and avoids confusion with 'singlelabel'
in the context of file systems.
Inspired by: trhodes
future:
rename ttyopen() -> tty_open() and ttyclose() -> tty_close().
We need the ttyopen() and ttyclose() for the new generic cdevsw
functions for tty devices in order to have consistent naming.
rev. 1.67, author: mycroft
Fix a byte order error.
rev. 1.68, author: mycroft
Adjust some silliness that was causing us to do extra work for
"frame list rollover" interrupts, which we pretty much ignore.
Obtained from: NetBSD
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.
hints-based probe to fdc_hints_probe().
Also:
* Fix some resource leaks when attach fails.
* Remove the FDC_ATTACHED flag. It was supposed to prevent multiple
unloads but this is not necessary.
allows a bus to re-enumerate its child handles and optionally replace
them with new children, arranged to the bus's liking. (The current device
space is flat with all devices immediately under acpi0). Add comments
for each interface.
for unknown events.
A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
systems defined interfaces in thread_db.h use prgregset_t but not
prgregset_t * to be a output parameter, this is the only way to maintain
source code compatible with them.
line as the announcement. Someone should probably update the "buffers
remaining" message since we now no longer should have any buffers remaining
at that point.
Until now the function has just returned zero for any event, but
that is downright wrong for MOD_UNLOAD and not very useful for any
future events we add where it may be crucial to be able to tell
if the event was unhandled or successful.
Change the function to return as follows:
MOD_LOAD -> 0
MOD_UNLOAD -> EBUSY
anything else -> EOPNOTSUPP
when the battery is fully charged. That breaks some of the arithmetic in
calculating the remaining capacity (ends up with more than 100%).
This commit makes sure the max is 100.
Approved by: njl
check to ensure that the caller is not prison root.
The intention is to fix file descriptor creation so that
prison root can not use the last remaining file descriptors.
This privilege should be reserved for non-jailed root users.
Approved by: bmilekic (mentor)
reduces the size of the pv_entry structure a small but significant amount.
This is implemented a little differently because it isn't so cheap to get
the physical address of the page tabke page on amd64.. instead of it
being directly accessible from the top level page directory, it is now
two additional tree levels down. However.. In almost all cases, we
recently had the physical address if the page table page a short while
before we needed it, but it slipped through our fingers. This patch
saves it for when we do need it. Also, for the one case where we do not
have the ptp paddr, we are always running in curproc context and so we
can do a vtopte-like trick. I've implemented vtopde() for this purpose.
There is still a CYA entry in pmap_unuse_pt() that needs to be removed. I
think it can be removed now but I forgot to test with it gone.
sched_add() rather than just doing it in sched_wakeup(). The old
ithread preemption code used to set NEEDRESCHED unconditionally if it
didn't preempt which masked this bug in SCHED_4BSD.
Noticed by: jake
Reported by: kensmith, marcel
Add a MOD_QUIESCE event for modules. This should return error (EBUSY)
of the module is in use.
MOD_UNLOAD should now only fail if it is impossible (as opposed to
inconvenient) to unload the module. Valid reasons are memory references
into the module which cannot be tracked down and eliminated.
When kldunloading, we abandon if MOD_UNLOAD fails, and if -force is
not given, MOD_QUIESCE failing will also prevent the unload.
For backwards compatibility, we treat EOPNOTSUPP from MOD_QUIESCE as
success.
Document that modules should return EOPNOTSUPP for unknown events.
needed so that sysresource objects are created first to reserve all regions,
then other devices can allocate from them. Otherwise, acpi_timer (the only
ACPI device with an identify routine), would allocate its resources from
the nexus, causing the later sysresource reserve to fail.
Debugging by: Taku YAMAMOTO, Andrea Campi
1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
thread current user thread is running on. Add tm_dflags into
kse_thr_mailbox, the flags is written by debugger, it tells
UTS and kernel what should be done when the process is being
debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.
TMDF_SSTEP is used to tell kernel to turn on single stepping,
or turn off if it is not set.
TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
whenever possible, to UTS, it means do not run the user thread
until debugger clears it, this behaviour is necessary because
gdb wants to resume only one thread when the thread's pc is
at a breakpoint, and thread needs to go forward, in order to
avoid other threads sneak pass the breakpoints, it needs to remove
breakpoint, only wants one thread to go. Also, add km_lwp to
kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
switch time when process is not being debugged, so when process
is attached, debugger can map kernel thread to user thread.
2. Add p_xthread to proc strcuture and td_xsig to thread structure.
p_xthread is used by a thread when it wants to report event
to debugger, every thread can set the pointer, especially, when
it is used in ptracestop, it is the last thread reporting event
will win the race. Every thread has a td_xsig to exchange signal
with debugger, thread uses TDF_XSIG flag to indicate it is reporting
signal to debugger, if the flag is not cleared, thread will keep
retrying until it is cleared by debugger, p_xthread may be
used by debugger to indicate CURRENT thread. The p_xstat is still
in proc structure to keep wait() to work, in future, we may
just use td_xsig.
3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
a thread. When process stops, debugger can set the flag for
thread, thread will check the flag in thread_suspend_check,
enters a loop, unless it is cleared by debugger, process is
detached or process is existing. The flag is also checked in
ptracestop, so debugger can temporarily suspend a thread even
if the thread wants to exchange signal.
4. Current, in ptrace, we always resume all threads, but if a thread
has already a TDF_DBSUSPEND flag set by debugger, it won't run.
Encouraged by: marcel, julian, deischen
1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel
thread current user thread is running on. Add tm_dflags into
kse_thr_mailbox, the flags is written by debugger, it tells
UTS and kernel what should be done when the process is being
debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER.
TMDF_SSTEP is used to tell kernel to turn on single stepping,
or turn off if it is not set.
TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall
whenever possible, to UTS, it means do not run the user thread
until debugger clears it, this behaviour is necessary because
gdb wants to resume only one thread when the thread's pc is
at a breakpoint, and thread needs to go forward, in order to
avoid other threads sneak pass the breakpoints, it needs to remove
breakpoint, only wants one thread to go. Also, add km_lwp to
kse_mailbox, the lwp id is copied to kse_thr_mailbox at context
switch time when process is not being debugged, so when process
is attached, debugger can map kernel thread to user thread.
2. Add p_xthread to proc strcuture and td_xsig to thread structure.
p_xthread is used by a thread when it wants to report event
to debugger, every thread can set the pointer, especially, when
it is used in ptracestop, it is the last thread reporting event
will win the race. Every thread has a td_xsig to exchange signal
with debugger, thread uses TDF_XSIG flag to indicate it is reporting
signal to debugger, if the flag is not cleared, thread will keep
retrying until it is cleared by debugger, p_xthread may be
used by debugger to indicate CURRENT thread. The p_xstat is still
in proc structure to keep wait() to work, in future, we may
just use td_xsig.
3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend
a thread. When process stops, debugger can set the flag for
thread, thread will check the flag in thread_suspend_check,
enters a loop, unless it is cleared by debugger, process is
detached or process is existing. The flag is also checked in
ptracestop, so debugger can temporarily suspend a thread even
if the thread wants to exchange signal.
4. Current, in ptrace, we always resume all threads, but if a thread
has already a TDF_DBSUSPEND flag set by debugger, it won't run.
Encouraged by: marcel, julian, deischen
pmap_remove_pages(). (The implementation of pmap_remove_pages() is
optional. If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)
Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().
- ddb -> db for low-level trapcode
- implement makectx. I think it only matters that the stack is setup
correctly.
- bring over ddb_trap_glue and rename to db_trap_glue
a better name. I have a kern_[sg]etsockopt which I plan to commit
shortly, but the arguments to these function will be quite different
from so_setsockopt.
Approved by: alfred
* Add an fdtype ivar. This will be the equivalent of fd->type.
* Move enabling the FIFO to the end of attach.
* Unify reset code into fdc_initial_reset().
* Add fdc_write_ivar().
* Update isa and pccard attachments accordingly.
* Set the flags unconditionally in probe since they may be overridden by
other probe routines. Both before and now, we're depending on probe
being called a final time on the winning driver so the flags we get are
the ones we intended.
* Use the bus accessor macros instead of defining our own.
* Remove duplicate assigns of fd->type.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.
will be used heavily in debugging KSE threads. This breaks libpthread
on IA64, but because libpthread was not in 5.2.1 release, I would like
to change it so we needn't to introduce another syscall.
tracing process to obtain information about the LWP that caused the
traced process to stop. Debuggers can use this information to select
the thread currently running on the LWP as the current thread.
The request has been made compatible with NetBSD for as much as
possible. This implementation differs from NetBSD in the following
ways:
1. The data argument is allowed to be smaller than the size of the
ptrace_lwpinfo structure known to the kernel, but not 0. This
is opposite to what NetBSD allows. The reason for this is that
we can extend the structure without affecting older binaries.
2. On NetBSD the tracing process is to set the pl_lwpid field to
the Id of the LWP it wants information of. We don't do that.
Our ptrace interface allows passing the LWP Id instead of the
PID. The tracing process is to set the PID to the LWP Id it
wants information of.
3. When the PID is actually the PID of the tracing process, this
request returns the information about the LWP that caused the
process to stop. This was the whole purpose of the request in
the first place.
When the traced process has exited, this request will return the
LWP Id 0, indicating that the process state is not the result of
an event specific to a LWP.
more generic, but that didn't actually happen. Since the feature to
switch backends (and historically this means from DDB to GDB) is
important, make sure people can do just that until such the generic
mechanism actually sees the light of day.
Suggested by: rwatson@
changing the backend from outside the KDB frontend. For example from
within a backend. Rewrite kdb_sysctl_current to make use of this
function as well.
in soreceive() after removing an MT_SONAME mbuf from the head of the
socket buffer.
When processing MT_CONTROL mbufs in soreceive(), first remove all of
the MT_CONTROL mbufs from the head of the socket buffer to a local
mbuf chain, then feed them into dom_externalize() as a set, which
both avoids thrashing the socket buffer lock when handling multiple
control mbufs, and also avoids races with other threads acting on
the socket buffer when the socket buffer mutex is released to enter
the externalize code. Existing races that might occur if the protocol
externalize method blocked during processing have also been closed.
Now that we synchronize socket buffer and stack state following
modifications to the socket buffer, turn the manual synchronization
that previously followed control mbuf processing with a set of
assertions. This can eventually be removed.
The soreceive() code is now substantially more MPSAFE.
the head of the mbuf chains in a socket buffer, re-synchronizes the
cache pointers used to optimize socket buffer appends. This will be
used by soreceive() before dropping socket buffer mutexes to make sure
a consistent version of the socket buffer is visible to other threads.
While here, update copyright to account for substantial rewrite of much
socket code required for fine-grained locking.
locking on 'nextrecord' and concerns regarding potentially inconsistent
or stale use of socket buffer or stack fields if they aren't carefully
synchronized whenever the socket buffer mutex is released. Document
that the high-level sblock() prevents races against other readers on
the socket.
Also document the 'type' logic as to how soreceive() guarantees that
it will only return one of normal data or inline out-of-band data.
This also fixes the (runtime) breakage introduced in the previous
commit that was the result of a botched merge. This hasn't even
been compile-tested...
a problem that could also be fixed differently without reverting previous
attempts to fix DELAY while the debugger is active (rev 1.204). The bug
was that the i8254 implements a countdown timer, while for (k)db_active
a countup timer was implemented. This resulted in premature termination
and consequently the breakage of DELAY. The fix (relative to rev 1.211)
is to implement a countdown timer for the kdb_active case. As such the
ability to step clock initialization is preserved and DELAY does what is
expected of it.
Blushed: bde :-)
Submitted by: bde
name in the debug.kdb.current sysctl. All other dereferences are
properly guarded, but this one was overlooked.
Reported by: Morten Rodal (morten at rodal dot no)
o Sources that are shared between kernel and userland and that may
contain references to DDB or any of its functions may need to
know this.
o Userland tools may include <machine/gdb_machdep.h> from now on.
Think kernel debugger...
o The kernel core file now contains the TID of the kernel thread
that made the dump.
db_elf.c, db_kld.c: The new KDB backend supports both at the same time.
db_sysctl.c: The functionality has been moved to sys/kern/subr_kdb.c.
db_trap.c: The DDB entry point has been moved to sys/ddb/db_main.c.
o Rename WITNESS_DDB to WITNESS_KDB. In the new world order KDB is the
acronym to use for debugging related code. The DDB option is used
to enable the DDB debugger backend only.
o Likewise, rename DDB_TRACE to KDB_TRACE, rename DDB_UNATTENDED to
KDB_UNATTENDED and rename SC_HISTORY_DDBKEY to SC_HISTORY_KDBKEY.
o Remove DDB_NOKLDSYM. The new DDB backend supports pre-linker symbol
lookups as well as KLD symbol lookups at the same time.
o Remove GDB_REMOTE_CHAT. The GDB protocol hacks to allow this are
FreeBSD specific. At the same time, the GDB protocol has packets
for console output.
actually work.
Make the PCI and PCCARD attachments provide a bus_get_resource_list()
method so that resource listing for PCCARD works. PCCARD does not
have a bus_get_resource_list() method (yet), so I faked up the
resource list management in if_ndis_pccard.c, and added
bus_get_resource_list() methods to both if_ndis_pccard.c and if_ndis_pci.c.
The one in the PCI attechment just hands off to the PCI bus code.
The difference is transparent to the NDIS resource handler code.
Fixed ndis_open_file() so that opening files which live on NFS
filesystems work: pass an actual ucred structure to VOP_GETATTR()
(NFS explodes if the ucred structure is NOCRED).
Make NdisMMapIoSpace() handle mapping of PCMCIA attribute memory
resources correctly.
Turn subr_ndis.c:my_strcasecmp() into ndis_strcasecmp() and export
it so that if_ndis_pccard.c can use it, and junk the other copy
of my_strcasecmp() from if_ndis_pccard.c.
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.
The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.
With this change, ia64 has support for breakpoints.
o Make debugging code conditional upon KDB instead of DDB.
o Call kdb_enter() instead of Debugger().
o Remove implementation of Debugger().
o Check kdb_active instead of db_active.
o Call kdb_trap() according to the new world order.
o ksym_start and ksym_end changed type to vm_offset_t.
o Make debugging support conditional upon KDB instead of DDB.
o Call kdb_enter() instead of breakpoint().
o Remove implementation of Debugger().
o Call kdb_trap() according to the new world order.
unwinder:
o s/db_active/kdb_active/g
o Various s/ddb/kdb/g
o Add support for unwinding from the PCB as well as the trapframe.
Abuse a spare field in the special register set to flag whether
the PCB was actually constructed from a trapframe so that we can
make the necessary adjustments.
md_var.h:
o Add RSE convenience macros.
o Add ia64_bsp_adjust() to add or subtract from BSP while taking
NaT collections into account.
o Make debugging code conditional upon KDB instead of DDB.
o Declare ksym_start and ksym_end as extern and initialize them.
This was previously and bogusly handled by DDB itself.
o Call kdb_enter() instead of Debugger().
o Remove implementation of Debugger().
o Make debugging support conditional upon KDB instead of DDB.
o Remove implementation of Debugger().
o Don't make setjump() and longjump() conditional upon DDB.
o s/ddb_on_nmi/kdb_on_nmi/g
o Call kdb_reenter() when kdb_active is non-zero. Call kdb_trap()
otherwise.
o Make debugging support conditional upon KDB instead of DDB.
o Call kdb_trap() according to the new world order.
o Kill the NO_SIO option completely.
o Respect the boot_gdb environment variable.
o Don't make debug specific kernel options conditional.
o Remove implementation of Debugger().
it's in the way even more. Basicly: remove all alpha specific console
support from gfb(4), sio(4) and syscons(4). Rewrite the alpha console
initialization to be identical to all other platforms. In a nutshell:
call cninit().
The platform specific code now only sets or clears RB_SERIAL and thus
automaticly causes the right console to be selected.
sio.c:
o Replace the remote GDB hacks and use the GDB debug port interface
instead.
o Make debugging code conditional upon KDB instead of DDB.
o Call kdb_alt_break() instead of db_alt_break().
o Call kdb_enter() instead of breakpoint().
o Remove the ugly compatibility of using the console as the debug
port.
debugger is not active. The fixes breakages of DELAY() when
running in the debugger, because not calling getit() when the
debugger is active yields a DELAY that doesn't.
o s/ddb_on_nmi/kdb_on_nmi/g
o Rename sysctl machdep.ddb_on_nmi to machdep.kdb_on_nmi
o Make debugging support conditional upon KDB instead of DDB.
o Call kdb_reenter() when kdb_active is non-zero.
o Call kdb_trap() to enter the debugger when not already active.
o Update comments accordingly.
o Remove misplaced prototype of kdb_trap().
associated with a PR_ADDR protocol, make sure to update the m_nextpkt
pointer of the new head mbuf on the chain to point to the next record.
Otherwise, when we release the socket buffer mutex, the socket buffer
mbuf chain may be in an inconsistent state.
o Make debugging code conditional upon KDB instead of DDB.
o s/WITNESS_DDB/WITNESS_KDB/g
o s/witness_ddb/witness_kdb/g
o Rename the debug.witness_ddb sysctl to debug.witness_kdb.
o Call kdb_backtrace() instead of backtrace().
o Call kdb_enter() instead Debugger().
o Assert kdb_active instead of db_active.
o Make debugging code conditional upon KDB instead of DDB.
o Call kdb_enter() instead of Debugger().
o Call kdb_backtrace() instead of db_print_backtrace() or backtrace().
kern_mutex.c:
o Replace checks for db_active with checks for kdb_active and make
them unconditional.
kern_shutdown.c:
o s/DDB_UNATTENDED/KDB_UNATTENDED/g
o s/DDB_TRACE/KDB_TRACE/g
o Save the TID of the thread doing the kernel dump so the debugger
knows which thread to select as the current when debugging the
kernel core file.
o Clear kdb_active instead of db_active and do so unconditionally.
o Remove backtrace() implementation.
kern_synch.c:
o Call kdb_reenter() instead of db_error().
o Make debugging code conditional upon KDB instead of DDB.
o Call kdb_enter() instead of Debugger().
o Remove local (static) variable in_debugger. Use kdb_active instead.
o Call kdb_enter() instead of breakpoint().
o Call kdb_alt_break() instead of db_alt_break().
o Make debugging code conditional upon KDB instead of DDB.
o Make debugging code conditional upon KDB instead of DDB.
o Call kdb_alt_break() instead of db_alt_break().
o Call kdb_enter() instead of breakpoint().
o Call kdb_enter() instead of Debugger().
o Don't make such calls conditional upon KDB instead of DDB because
they're already conditional upon EN_DEBUG.
o Use kdb_alt_break() to handle the alternate break sequence instead
of handcoding it here.
o Remove GDB kluges to make this driver work with the pre-KDB remote
GDB code.
o Call kdb_enter() instead of Debugger().
Note that with this commit the dcons(4) driver cannot be used for
remote debugging anymore. This driver has to use the new GDB debug
port interface instead. Such has not been done yet.
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.
in particular not without removing the options they replace or in the
proper location in this file. The purpose of this commit is to make it
possible to commit changes in parts without causing massive build
breakages. At least, that's the intend. I have no idea if it actually
works out as I hope...
in which multiple (presumably different) debugger backends can be
configured and which provides basic services to those backends.
Besides providing services to backends, it also serves as the single
point of contact for any and all code that wants to make use of the
debugger functions, such as entering the debugger or handling of the
alternate break sequence. For this purpose, the frontend has been
made non-optional.
All debugger requests are forwarded or handed over to the current
backend, if applicable. Selection of the current backend is done by
the debug.kdb.current sysctl. A list of configured backends can be
obtained with the debug.kdb.available sysctl. One can enter the
debugger by writing to the debug.kdb.enter sysctl.
backend improves over the old GDB support in the following ways:
o Unified implementation with minimal MD code.
o A simple interface for devices to register themselves as debug
ports, ala consoles.
o Compression by using run-length encoding.
o Implements GDB threading support.
Add copyiniov() which copies a struct iovec array in from userland into
a malloc'ed struct iovec. Caller frees.
Change uiofromiov() to malloc the uio (caller frees) and name it
copyinuio() which is more appropriate.
Add cloneuio() which returns a malloc'ed copy. Caller frees.
Use them throughout.
assigning a pointer to the list and then dereferencing the pointer as a
second step. When the first spin lock is acquired, curthread is not in
a critical section so it may be preempted and would end up using another
CPUs lock list instead of its own.
When this code was in witness_lock() this sequence was safe as curthread
was in a critical section already since witness_lock() is called after the
lock is acquired.
Tested by: Daniel Lang dl at leo.org
In this mode you can setup even very small stripe size and you can be
sure that only one I/O request will be send to every disks in stripe.
It consumes some more memory, but if allocation fails, it will fall
back to "ECONOMIC" mode.
It is about 10 times faster for small stripe size than "ECONOMIC" mode
and other RAID0 implementations. It is even recommended to use this
mode and small stripe size, so our requests are always splitted.
One can still use "ECONOMIC" mode by setting kern.geom.stripe.fast to 0.
It is also possible to setup maximum memory which "FAST" mode can consume,
by setting kern.geom.stripe.maxmem from /boot/loader.conf.
one go before returning. This avoids calling uiomove() while holding
allproc_lock.
Don't adjust uio->uio_offset manually, uiomove() does that for us.
Don't drop allproc_lock before calling panic().
Suggested by: alfred
so setfault would return correctly when a page fault was invalid
(e.g. a syscall with a bad parameter).
This caused an endless DSI loop, seen when running sendmail which
does a setlogin() call with a NULL pointer.
- introduce KTR_SYSC tracing. expose the syscallnames[] array to
make the tracing more readable.
- Avoid an additional lock acquire/release when leaving xl_intr(), by
changing xl_start*() to xl_start*_locked(), and calling the appropriate
routine by chip revision (as the DMA descriptors are different).
- Simplify the appropriate routines now that they are called with the
lock held.
This should save a significant amount of CPU cycles spent on servicing
each interrupt for both UP and SMP whilst remaining MPSAFE.
Tested by: rwatson
- Add *_locked() entry points as needed to avoid unnecessary lock thrashing.
- Use these entry points wisely.
- Only acquire the lock once when servicing an interrupt.
- Check 'suspended' on interrupt to avoid racing detach.
- Correct a mis-spelled comment.
- Don't take the lock in vr_reset() to avoid lock thrashing in attach.
- Comment this.
Reviewed by: -net (silence)
- Avoid unnecessary re-acquisition elsewhere by adding *_locked()
entry points as needed.
- Correct locking for the DEVICE_POLLING case.
- Hold the driver lock for the entire duration of interrupt servicing,
to avoid unneeded, expensive re-acquisition; use *_locked() entry
points as needed.
Reviewed by: -net (silence)
bootp -> BOOTP
bootp.nfsroot -> BOOTP_NFSROOT
bootp.nfsv3 -> BOOTP_NFSV3
bootp.compat -> BOOTP_COMPAT
bootp.wired_to -> BOOTP_WIRED_TO
- i.e. back out the previous commit. It's already possible to
pxeboot(8) with a GENERIC kernel.
Pointed out by: dwmalone
takes an argument to specify if it should preempt or not. Don't preempt
when sched_add_internal() is called from kseq_idled() or kseq_assign()
as in those cases we are about to call mi_switch() anyways. Also, doing
so during the first context switch on an AP leads to a NULL pointer deref
because curthread is NULL.
- Reenable preemption for ULE.
Submitted by: Taku YAMAMOTO taku at tackymt.homeip.net
has outlined which break numbers are software interrupts, debugger
breakpoints and ABI specific breaks. We mostly treated all break
numbers we didn't care about as debugger breakpoints.
When we orphan/wither a provider, an attached geom+consumer could
end up being withered as a result and it may be in front of us in
the normal object scanning order so we need to do multi-pass. On
the other hand, there may be withering stuff we can't get rid off
(yet), so we need to keep track of both the existence of withering
stuff and if there is more we can do at this time.
BOOTP -> bootp
BOOTP_NFSROOT -> bootp.nfsroot
BOOTP_NFSV3 -> bootp.nfsv3
BOOTP_COMPAT -> bootp.compat
BOOTP_WIRED_TO -> bootp.wired_to
This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:
bootp="YES"
bootp.nfsroot="YES"
bootp.nfsv3="YES"
bootp.wired_to="bge1"
or even setting the variables manually from the OK prompt.
work on a G5 (no BAT registers) or on PearPC (dBAT3 used for mapping
the framebuffer and BATs not re-inited on OpenFirmware calls).
It also hid a number of bugs.
jumping to the kernel. Another bug exposed by removing the
1:1 BAT mapping. Sparc64 doesn't do this either.
Compile tested on: panther (sparc64). Code built, but not used, on sparc64.
of the 256Mb 1:1 BAT mapping exposed this as copying into memory that
hadn't been claimed from OpenFirmware.
compiled-tested on: panther (sparc64). Code built, but not used, on sparc64
step in making this driver more attachment neutral. Others plan on
adding acpi front ends.
Still need to cleanup the MI part of the driver because it isn't as
bus independent as it could be.
This should allow us to more easily break out the acpi and 'legacy pc'
front ends as well (so only the bus front end would touch rtc, for
example).
This isn't a great separation, since isa dma routines are still called
from the MI code, but it is a start.
- In subr_ndis.c:ndis_allocate_sharemem(), create the busdma tags
used for shared memory allocations with a lowaddr of 0x3E7FFFFF.
This forces the buffers to be mapped to physical/bus addresses within
the first 1GB of physical memory. It seems that at least one card
(Linksys Instant Wireless PCI V2.7) depends on this behavior. I
don't know if this is a hardware restriction, or if the NDIS
driver for this card is truncating the addresses itself, but using
physical/bus addresses beyong the 1GB limit causes initialization
failures.
- Create am NDIS_INITIALIZED() macro in if_ndisvar.h and use it in
if_ndis.c to test whether the device has been initialized rather
than checking for the presence of the IFF_UP flag in if_flags.
While debugging the previous problem, I noticed that bringing
up the device would always produce failures from ndis_setmulti().
It turns out that the following steps now occur during device
initialization:
- IFF_UP flag is set in if_flags
- ifp->if_ioctl() called with SIOCSIFADDR (which we don't handle)
- ifp->if_ioctl() called with SIOCADDMULTI
- ifp->if_ioctl() called with SIOCADDMULTI (again)
- ifp->if_ioctl() called with SIOCADDMULTI (yet again)
- ifp->if_ioctl() called with SIOCSIFFLAGS
Setting the receive filter and multicast filters can only be done
when the underlying NDIS driver has been initialized, which is done
by ifp->if_init(). However, we don't call ifp->if_init() until
ifp->if_ioctl() is called with SIOCSIFFLAGS and IFF_UP has been
set. It appears that now, the network stack tries to add multicast
addresses to interface's filter before those steps occur. Normally,
ndis_setmulti() would trap this condition by checking for the IFF_UP
flag, but the network code has in fact set this flag already, so
ndis_setmulti() is fooled into thinking the interface has been
initialized when it really hasn't.
It turns out this is usually harmless because the ifp->if_init()
routine (in this case ndis_init()) will set up the multicast
filter when it initializes the hardware anyway, and the underlying
routines (ndis_get_info()/ndis_set_info()) know that the driver/NIC
haven't been initialized yet, but you end up spurious error messages
on the console all the time.
Something tells me this new behavior isn't really correct. I think
the intention was to fix it so that ifp->if_init() is only called
once when we ifconfig an interface up, but the end result seems a
little bogus: the change of the IFF_UP flag should be propagated
down to the driver before calling any other ioctl() that might actually
require the hardware to be up and running.
When avoiding the zeroing of "bogus_page" when it appears in a buf,
be sure to advance the pointers into the data for successive pages.
The bug caused file corruption when read(2)ing from a "hole" in a
file where a previous page of the read block had already been faulted
in: fsx tripped up on this pretty quickly. The particular access
pattern is probably pretty unusual, so other applications probably
wouldn't have had problems, but you'd never know.
Reviewed By: alc@
{ip,udp,tcp} header and return a void * pointing to the payload (i.e. the
first byte past the end of the header and any required padding). Use them
consistently throughout libalias to a) reduce code duplication, b) improve
code legibility, c) get rid of a bunch of alignment warnings.
a short pointer. The previous implementation seems to be in a gray zone
of the C standard, and GCC generates incorrect code for it at -O2 or
higher on some platforms.
Rebind the client socket when we experience a timeout. This fixes
the case where our IP changes for some reason.
Signal a VFS event when NFS transitions from up to down and vice
versa.
Add a placeholder vfs_sysctl where we will put status reporting
shortly.
Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.
(but keep it conditional on __ISO_C_VISIBLE >= 1999.
Why? Our out /usr/src/contrib assumes it, and more than a few ports have
an autoconf that looks for __va_copy because it is available on glibc.
It is critical that we use it on PowerPC. It generally isn't a problem
for i386 and its ilk because those platforms can get away with cheating
the C standard, using a plain assignment.
hangs due to recent preemption changes. This change appears to remove
the panic that I was running into, but at the cost of increasing
ithread scheduling latency, and as such is a temporary band-aid until
jhb has a chance to resolve the ule<->preemption interaction that is
the source of the problem. If it doesn't fix the problem for others--
sorry!
so that last_work_seen has a reasonable value at the transition
to the SYNCER_SHUTTING_DOWN state, even if net_worklist_len happened
to be zero at the time.
Initialize last_work_seen to zero as a safety measure in case the
syncer never ran in the SYNCER_RUNNING state.
Tested by: phk
device is open. This allows certain old and rather special dual
floppy controllers to work on both channels, as long as you only
have one open at a time.
When two drivers share an ISA DMA channel, they both call isa_dmainit()
and the second call fails if DIAGNOSTIC is on.
If isa_dmainit() was already called successfully, just return silently.
This only works if both drivers agree on the bounce buffer size,
but since sharing DMA is usually only possible on very special
hardware and then typically only for devices of the same type (which
would have multiple instances of the same device driver), this is
not a problem in practice.
belong in the respective drivers. I've not removed ALL of them, as a
few still haven't moved. I've just removed the ones that aren't used.
# these can be removed from amd64, but I'm having issues getting to
# sledge at the moment for a build.
named link, foo_link or link_foo to lnk, foo_lnk or lnk_foo, fixing
signed / unsigned comparisons, and shoving unused function arguments
under the carpet.
I was hoping WARNS?=6 might reveal more serious problems, and perhaps
the source of the -O2 breakage, but found no smoking gun.
- Eliminate the use of a recursive mutex.
- Mark the driver INTR_MPSAFE.
This work is incomplete and will be refined in a future commit.
- Most notably, _locked() variants of entry points need to be introduced.
- The mii upcall/downcall may still be racy.
- Add a stubbed-out guard against racing rl_detach() for the time being.
Tested on: UP, debug.mpsafenet && !debug.mpsafenet
Reviewed by: silence on -net
Use C99 types. Use ANSI function definitions. Sort prototypes.
Split long lines correctly. Punctuate/wordsmith comments.
Use device_printf()/if_printf() where possible.
Reviewed by: -net (silence)
- Eliminate the use of a recursive mutex.
- Mark the driver as INTR_MPSAFE.
- Split the default media choice code out into xl_choose_media() to
avoid making poor assumptions about the state of the lock during attach.
- The miibus upcall/downcall paths may still be racy.
Change to commented-out locking assertions there for now.
- Tested with nfsclient, routed, ssh, ntp, dhclient and quagga bgpd.
- This needs SMP test coverage. I do not have such resources.
Tested on: UP, !debug.mpsafenet && debug.mpsafenet
Hardware: 3C905B-TX (0x905510b7)
Speed up the syncer when shutting down by sleeping for a shorter
period of time instead of cranking up rushjob and using the
normal one second sleep.
Skip empty worklist slots when shutting down to avoid lengthy
intervals of inactivity.
Give I/O more time to complete between steps by not speeding the
syncer quite as much.
Terminate the syncer after one full pass through the worklist
plus one second with the worklist containing nothing but syncer
vnodes.
Print an indication of shutdown progress to the console.
Add a sysctl, vfs.worklist_len, to allow the size of the syncer worklist
to be monitored.
- Use device_printf() during device probe/attach.
- Move if_xname initialization to before xl_reset() is called.
- Use if_printf() at all other times after struct ifnet has been
initialized.
with sleepable locks held from further up in the network stack, and
attempts to allocate memory to hold multicast group membership information
with M_WAITOK.
This panic was triggered specifically when an exiting routing daemon
process closes its raw sockets after joining multicast groups on them.
While we're here, comment some possible locking badness.
PR: kern/48560
this more accurately reflects what the underlying hardware of most
acpi machines that don't have children pci busses.
We still need a better way to get this information from acpi/hardware.
dereferenced directly. Toss an ifdef around it for the moment and
allow this to compile. This likely means that priority packets aren't
queued to the special high priority queue. The maintainer of this
should look into the problem.
This is likely fallout from the netgraph migration to using a more
generic meta tag from the mbug recently.
Fixes: pc98 tinerbox
and WITNESS is not built, then force all M_WAITOK allocations to
M_NOWAIT behavior (transparently). This is to be used temporarily
if wierd deadlocks are reported because we still have code paths
that perform M_WAITOK allocations with lock(s) held, which can
lead to deadlock. If WITNESS is compiled, then the sysctl is ignored
and we ask witness to tell us wether we have locks held, converting
to M_NOWAIT behavior only if it tells us that we do.
Note this removes the previous mbuf.h inclusion as well (only needed
by last revision), and cleans up unneeded [artificial] comparisons
to just the mbuf zones. The problem described above has nothing to
do with previous mbuf wait behavior; it is a general problem.
zones, and do it by direct comparison of uma_zone_t instead of strcmp.
The mbuf subsystem used to provide M_TRYWAIT/M_DONTWAIT semantics, but
this is mostly no longer the case. M_WAITOK has taken over the spot
M_TRYWAIT used to have, and for mbuf things, still may return NULL if
the code path is incorrectly holding a mutex going into mbuf allocation
functions.
The M_WAITOK/M_NOWAIT semantics are absolute; though it may deadlock
the system to try to malloc or uma_zalloc something with a mutex held
and M_WAITOK specified, it is absolutely required to not return NULL
and will result in instability and/or security breaches otherwise.
There is still room to add the WITNESS_WARN() to all cases so that
we are notified of the possibility of deadlocks, but it cannot change
the value of the "badness" variable and allow allocation to actually
fail except for the specialized cases which used to be M_TRYWAIT.
functionality by setting to a non-zero value. This is an integer, but
is treated as a boolean by the code, so clamp it to a boolean value
when set so as to avoid unnecessary bridge reinitialization if it's
changed to another value.
PR: kern/61174
Requested by: Bruce Cran
around in the vnodes surroundings when we allocate a block.
Assign a blocksize when we create a vnode, and yell a warning (and ignore it)
if we got the wrong size.
Please email all such warnings to me.
generic filesystem events to userspace. Currently only mount and unmount
of filesystems are signalled. Soon to be added, up/down status of NFS.
Introduce a sysctl node used to route requests to/from filesystems
based on filesystem ids.
Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/
entrypoint by the sysctl code to change individual filesystems.
ffs_mount -> bdevvp -> getnewvnode(..., mp = NULL, ...) ->
insmntqueue(vp, mp = NULL) -> KASSERT -> panic
Make getnewvnode() only call insmntqueue() if the mountpoint parameter
is not NULL.
our cached 'next vnode' being removed from this mountpoint. If we
find that it was recycled, we restart our traversal from the start
of the list.
Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:
MNT_ILOCK(mp);
loop:
for (vp = TAILQ_FIRST(&mp...);
(vp = nvp) != NULL;
nvp = TAILQ_NEXT(vp,...)) {
if (vp->v_mount != mp)
goto loop;
MNT_IUNLOCK(mp);
...
MNT_ILOCK(mp);
}
MNT_IUNLOCK(mp);
The code which takes vnodes off a mountpoint looks like this:
MNT_ILOCK(vp->v_mount);
...
TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
...
MNT_IUNLOCK(vp->v_mount);
...
vp->v_mount = something;
(Take a moment and try to spot the locking error before you read on.)
On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.
Fix:
Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go. This saves approx 65 lines of
duplicated code.
Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.
Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.
to dup_sockaddr() was renamed to sodupsockaddr(), the argument was
changed from '1' to 'M_WAITOK', which changed the semantics. This
resulted in a WITNESS warning about a potential sleep while holding the
NFS server mutex. Now this will no longer happen, restoring a possible
bug present in the original code (setting RC_NAM even though the malloc
to copy the addres may fail). bde observes that the flag names here
should probably not be the same as the malloc flags for name space
reasons.
Bumped into by: kuriyama
honor the alignment and boundary constraints in the dma tag when loading
buffers. Previously, these constraints were only honored when allocating
memory via bus_dmamem_alloc(). Now, bus_dmamap_load() will automatically
use bounce buffers when needed.
Also add a set of sysctls to monitor the global busdma stats. These are:
hw.busdma.free_bpages
hw.busdma.reserved_bpages
hw.busdma.active_bpages
hw.busdma.total_bpages
hw.busdma.total_bounced
hw.busdma.total_deferred
to failing -- that is, allocations via malloc(M_WAITOK) that are required
to never fail -- if WITNESS is not defined. While everyone should be
running WITNESS, in any case, zone "Mbuf" allocations are really the only
ones that should be screwed with by this hack.
This hack is crashing people, and would continue to do so with or without
WITNESS. Things shouldn't be allocating with M_WAITOK with locks held,
but it's not okay just to always remove M_WAITOK when !WITNESS.
Reported by: Bernd Walter <ticso@cicely5.cicely.de>
FAT32 filesystems to be mounted, subject to some fairly serious limitations.
This works by extending the internal pseudo-inode-numbers generated from
the file's starting cluster number to 64-bits, then creating a table
mapping these into arbitrary 32-bit inode numbers, which can fit in
struct dirent's d_fileno and struct vattr's va_fileid fields. The mappings
do not persist across unmounts or reboots, so it's not possible to export
these filesystems through NFS. The mapping table may grow to be rather
large, and may grow large enough to exhaust kernel memory on filesystems
with millions of files.
Don't enable this option unless you understand the consequences.
- Remove recursive locking situations. Remove the MTX_RECURSE bit.
- Take the lock for any routine which is not called from within if_vr.c
itself; this includes entry points called by newbus, ifnet, callout,
ifmedia, and polling subsystems.
- Remove spl references from the code added to miibus callbacks in rev 1.60.
- Add the INTR_MPSAFE bit.
- Tidy up some assignments; locks are not needed for taking the address
of something at a known offset, for example.
- Tested on the machine this was committed from.
Tested on: UP only, !debug.mpsafenet && debug.mpsafenet
Reviewed by: rwatson
introduced a KSE_CAN_MIGRATE() invocation with one argument
missing (class). Either this is a genuine forget or it crept
in from JHB's repo where he may have modified it. If it's
the latter then it may require more attention. For now fix
the make depend.
Put some braces around the busy-wait loop in vr_rxeoc() to make the
no-op semicolon more obvious.
No functional changes.
Running on the machine I am committing from without problems.
Reviewed by: jmallett
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.
This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.
Approved by: scottl (with his re@ hat)
This class is used for detecting volume labels on file systems:
UFS, MSDOSFS (FAT12, FAT16, FAT32) and ISO9660.
It also provide native labelization (there is no need for file system).
g_label_ufs.c is based on geom_vol_ffs from Gordon Tetlow.
g_label_msdos.c and g_label_iso9660.c are probably hacks, I just found
where volume labels are stored and I use those offsets here,
but with this class it should be easy to do it as it should be done by
someone who know how.
Implementing volume labels detection for other file systems also should
be trivial.
New providers are created in those directories:
/dev/ufs/ (UFS1, UFS2)
/dev/msdosfs/ (FAT12, FAT16, FAT32)
/dev/iso9660/ (ISO9660)
/dev/label/ (native labels, configured with glabel(8))
Manual page cleanups and some comments inside were submitted by
Simon L. Nielsen, who was, as always, very helpful. Thanks!
header pointer and then casting it to the ecdt pointer. This fixes the
-O2 build. I'm unsure what changed recently to reveal this error since
this code has been unchanged for months.
switch to. If a non-NULL thread pointer is passed in, then the CPU will
switch to that thread directly rather than calling choosethread() to pick
a thread to choose to.
- Make sched_switch() aware of idle threads and know to do
TD_SET_CAN_RUN() instead of sticking them on the run queue rather than
requiring all callers of mi_switch() to know to do this if they can be
called from an idlethread.
- Move constants for arguments to mi_switch() and thread_single() out of
the middle of the function prototypes and up above into their own
section.
following drivers: bfe(4), em(4), fxp(4), lnc(4), tun(4), de(4) rl(4),
sis(4) and xl(4)
More patches are pending on: http://peoples.freebsd.org/~mlaier/ Please take
a look and tell me if "your" driver is missing, so I can fix this.
Tested-by: many
No-objection: -current, -net
lock and sched_lock so they can be read with either lock held. Document
the locking as well. The one remaining bogosity is that pr_addr and
pr_ticks should be per-thread but profiling of multithreaded apps is
currently undefined.
stats structure and a vmspace as this should always be true rather
than checking the always true condition in an if statement.
- Remove never-false check: if ((ru = &pstats->p_ru) != NULL)
- Remove pstats variable that is only used once and inline its one use
instead.
Just use p2->p_uarea directly instead.
- Remove an old and mostly bogus assertion regarding p2->p_sigacts.
- Use RANGEOF macro ala fork1() to clean up bzero/bcopy of p_stats.
pointer to the corresponding struct thread to the thread ID (lwpid_t)
assigned to that thread. The primary reason for this change is that
libthr now internally uses the same ID as the debugger and the kernel
when referencing to a kernel thread. This allows us to implement the
support for debugging without additional translations and/or mappings.
To preserve the ABI, the 1:1 threading syscalls, including the umtx
locking API have not been changed to work on a lwpid_t. Instead the
1:1 threading syscalls operate on long and the umtx locking API has
not been changed except for the contested bit. Previously this was
the least significant bit. Now it's the most significant bit. Since
the contested bit should not be tested by userland, this change is
not expected to be visible. Just to be sure, UMTX_CONTESTED has been
removed from <sys/umtx.h>.
Reviewed by: mtm@
ABI preservation tested on: i386, ia64
faster and iterate to over its work list a few times in an attempt
to empty the work list before the syncer terminates. This leaves
fewer dirty blocks to be written at the "syncing disks" stage and
keeps the the "giving up on N buffers" problem from being triggered
by the presence of a large soft updates work list at system shutdown
time. The downside is that the syncer takes noticeably longer to
terminate.
Tested by: "Arjan van Leeuwen" <avleeuwen AT piwebs DOT com>
Approved by: mckusick
Fix this problem by separating out the SACK and the newreno cases. Also, check
if we are in FASTRECOVERY for the sack case and if so, turn off dupacks.
Fix an issue where the congestion window was not being incremented by ssthresh.
Thanks to Mohan Srinivasan for finding this problem.
rev. 1.68, author: mycroft
Ignore a port error that happens to come in at the same time as a
connect status change. Some root hubs seem to report both.
Obtained from: NetBSD
bus interfaces. These interfaces use the FTDI chipset with different
Vendor and Product IDs.
Add two additional baud rate enumerations. The vehicle bus interfaces
use a baud rate of 2000000. Also add 3000000 as it is the other FTDI
baud divisor special case.
I've commited a slightly different patch from that provided in the PR as
I changed the matching code a bit yesterday.
Submitted by: Mike Durian <durian at shadetreesoftware.com>
PR: kern/67357
resource_head types. Also add a way to set start and end so fewer
things need to reach into struct resource.
Pointy hat to: imp for breaking the build on so many platforms.
operations when the refcount doesn't protect the opens and closes. Fix
this, and don't actually let a time out happen: now ugen(4) devices do
not get freed out from under the programs with them open.
instead of a mutex so we do not unblock it in msleep(). If we do this,
another event could occur, resetting the status register since reads
reset it. While I'm here, remove the backoff approach. Instead, sleep
in 10 ms chunks for up to the configured timeout using either DELAY (if
we aren't booted yet) or tsleep.
Help from: dillon
Tested by: Andrew Thompson andy AT fud.org.nz
Unify the code to disable GPEs with the enable code. Shutdown is handled
the same way. ACPI now does all wake/sleep prep for child devices so
now they no longer need to call external functions in the suspend/resume
path. Add the flags to non-ACPI busses (i.e., pci).
This brings us into line with the standard, which requires power resources
be enabled when wake is enabled for a given device. Move the dereferencing
code into its own function, +acpi_pwr_dereference_resource().
code that was never really used. Print a message when disabling ACPI via
a quirk. Allow the user to override the blacklist decision by setting
hint.acpi.0.disabled="0". Add missing AcpiTerminate() calls; they are
needed to clean up if bailing out after AcpiInitializeSubsystem().
devremoved events. This reduces the races around these events. We
now include the pnp info in both. This lets one do more interesting
thigns with devd on device insertion.
Submitted by: Bernd Walter
PCI native addressing. That means that if the HW says that using "real"
addresses instead of the hardwired legacy compat ones is allowed, we will
use them.
namespace. This is to allow decoupling of attachments from ACPI where they
need some functionality when ACPI is present but do not want to require ACPI
to always be loaded.
pv entries per 1GB of user virtual memory. (eg: if we had 1GB file was
mmaped into 30 processes, that would theoretically reduce the KVA demand by
30MB for pv entries. In reality though, we limit pv entries so we don't
have that many at once.)
We used to store the vm_page_t for the page table page. But we recently
had the pa of the ptp, or can calculate it fairly quickly. If we wanted
to avoid the shift/mask operation in pmap_pde(), we could recover the
pa but that means we have to store it for a while.
This does not measurably change performance.
Suggested by: alc
Tested by: alc
creation of the sysctl tree for the turnstile profiling stats until a
SI_SUB_LOCK sysinit. Doing it in init_turnstiles() is too early as it is
called before mi_startup().
hash tables used in the sleep queue and turnstile code. Each option adds
a sysctl tree under debug containing the maximum depth of any bucket in
the hash table as well as a separate node for each bucket (or chain)
containing the current depth and maximum depth for that bucket.
vm objects shadowing source in vm_object_shadow(). This closes a race where
vm_object_collapse() could be called with a partially uninitialized object
argument causing symptoms that looked like hardware problems, e.g. signal 6,
10, 11 or a /bin/sh busy-waiting for a nonexistant child process.
the queue has been removed from the global taskqueue_queues list. This
removes the need for the draining queue hack.
- Allow taskqueue_run() to be called with the taskqueue mutex held. It
can still be called without the lock for API compatiblity. In that case
it will acquire the lock internally.
- Don't lock the individual queue mutex in taskqueue_find() until after the
strcmp as the global queues mutex is sufficient for the strcmp.
- Simplify taskqueue_thread_loop() now that it can hold the lock across
taskqueue_run().
Submitted by: bde (mostly)
and bump all of the taskqueue swi's to 6. This gives callouts higher
priority than taskqueue tasks and gives all taskqueue tasks the same
priority.
Discussed with: bde
than a a stack-limited list. This removes the artifical limit on s/g list
size.
cvs: ----------------------------------------------------------------------
Add Intel Pro100Lan56 card.
Also integrate changes from Carlos Velasco. Only attch if we're a
network device (to filter out the serial devices). Also, increment
vpmatch if we match to conform to the pccard match function api.
Use bus space rather than direct inb/outb. Minor style changes while
I'm here. Extremely preliminary support for siliconix ethernet cards
(but more work is required).
The hack for setting the bus has been moved down into the cbb driver.
I've been running without this hack in my tree for so long I had
forgotten that I'd removed it :-). Please let me know if this causes
difficulty for your laptop.
starting value. This is more pedantically correct (since the handle
isn't always identical to the start of the resource) and also doesn't
access the innards of struct resource direct (which I forbid in my
tree). We need to do this for all resource types, not just ioport.
Reviewed by: njl
Now, when trying to mount file system in read-only mode it tries to
opened a device for writting to be able to update to read-write mode
latter. Ehh.
Discussed with: phk
so_gencnt, numopensockets, and the per-socket field so_gencnt. Annotate
this this might be better done with atomic operations.
Annotate what accept_mtx protects.
to warn about attempts to sleep in the I/O path. This change pushes the
definition and use of 'mymutex' behind #ifdef WITNESS to avoid the cost
in non-debugging cases. This results in a clear .22% performance win for
512 byte and 1k I/O tests on my SMP test box. Not much, but every bit
counts.
associated with performing a wakeup on the socket buffer:
- When performing an sbappend*() followed by a so[rw]wakeup(), explicitly
acquire the socket buffer lock and use the _locked() variants of both
calls. Note that the _locked() sowakeup() versions unlock the mutex on
return. This is done in uipc_send(), divert_packet(), mroute
socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append().
- When the socket buffer lock is dropped before a sowakeup(), remove the
explicit unlock and use the _locked() sowakeup() variant. This is done
in soisdisconnecting(), soisdisconnected() when setting the can't send/
receive flags and dropping data, and in uipc_rcvd() which adjusting
back-pressure on the sockets.
For UNIX domain sockets running mpsafe with a contention-intensive SMP
mysql benchmark, this results in a 1.6% query rate improvement due to
reduce mutex costs.
The overhead of unconditionally allocating TIDs (and likewise,
unconditionally deallocating them), is amortized across multiple
thread creations by the way UMA makes it possible to have type-stable
storage.
Previously the cost was kept down by having threads created as part
of a fork operation use the process' PID as the TID. While this had
some nice properties, it also introduced complexity in the way TIDs
were allocated. Most importantly, by using the type-stable storage
that UMA gives us this was also unnecessary.
This change affects how core dumps are created and in particular how
the PRSTATUS notes are dumped. Since we don't have a thread with a
TID equalling the PID, we now need a different way to preserve the
old and previous behavior. We do this by having the given thread (i.e.
the thread passed to the core dump code in td) dump it's state first
and fill in pr_pid with the actual PID. All other threads will have
pr_pid contain their TIDs. The upshot of all this is that the debugger
will now likely select the right LWP (=TID) as the initial thread.
Credits to: julian@ for spotting how we can utilize UMA.
Thanks to: all who provided julian@ with test results.
uhid.c (1.61), author: jdolecek
add support for USB_GET_DEVICEINFO and USB_GET_STRING_DESC ioctls,
with same meaning as for ugen(4)
usbdi_util.h (1.29), usb_quirks.c (1.50), uhid.c (1.62),
ugen.c (1.68), usb_subr.c (1.114) author: mycroft
Yes, some devices return incorrect lengths in their string
descriptors. Rather than losing, do what Windows does: just
request the maximum size, and allow a shorter response. Obsoletes
the need for UQ_NO_STRINGS, and therefore these "quirks" are removed.
usb_subr.c (1.116), author: mycroft
In the "seemed like a good idea until I found the fatal flaw"
department... Attempting to read a maximum-size string descriptor
causes my kue device to go completely apeshit. So, go back to the
original method, but allow the device to return a shorter string than
it claimed.
Obtained from: NetBSD
copies.
No current line disciplines have a dynamically changing hotchar, and
expecting to receive anything sensible during a change in ldisc is
insane so no locking of the hotchar field is necessary.
ohci.c (1.147), author: mycroft
Failure to properly mask off UE_DIR_IN from the endpoint address
was causing OHCI_ED_FORMAT_ISO and EHCI_QH_HRECL to get set
spuriously, causing rather interesting lossage.
Suddenly I get MUCH better performance with ehci...
ohci.c (1.148), author: mycroft
Adjust a couple of comments to make it clear WTF is going on.
Obtained from: NetBSD
we have to revert to TTYDISC which we know will successfully open
rather than try the previous ldisc which might also fail to open.
Do not let ldisc implementations muck about with ->t_line, and remove
code which checks for reopens, it should never happen.
Move ldisc->l_hotchar to tty->t_hotchar and have ldisc implementation
initialize it in their open routines. Reset to zero when we enter
TTYDISC. ("no" should really be -1 since zero could be a valid
hotchar for certain old european mainframe protocols.)
Now that the devs files are marked before-depend, we can remvoe them
from a few places they were explicitly mentioned (along with
BEFORE_DEPEND).
Noticed by: bde
Reduce the need for hard coded *devs in various makefiles by declaring
them before-depend.
Other bugs in the handling of *devs remain, but this is the start of
the cleanup. These will be address in future commits.
Cleanup Motivator: bde
ehci.c (1.55), ehcireg.h (1.16); author: mycroft
Set the data toggle correctly, and use EHCI_QTD_DTC. This fixes
problems with my ALi-based drive enclosure (it works now, rather
than failing to attach). Also seems to work with a GL811-based
enclosure and an ASUS enclosure with a CD-RW, on both Intel and
NEC controllers.
Note: The ALi enclosure is currently very SLOW, due to some issue
with taking too long to notice that the QTD is complete. This
requires more investigation.
ehci.c (1.56); author: mycroft
Failure to properly mask off UE_DIR_IN from the endpoint address
was causing OHCI_ED_FORMAT_ISO and EHCI_QH_HRECL to get set
spuriously, causing rather interesting lossage.
Suddenly I get MUCH better performance with ehci...
ehci.c (1.58); author: mycroft
Fix a stupid bug in ehci_check_intr() that caused use to try to
complete a transaction that was still running. Now ehci can
handle multiple devices being active at once.
ehci.c (1.59); author: enami
As the ehci_idone() now uses the variable `epipe'
unconditionally, always declare it (in other words, make this
file compile w/o EHCI_DEBUG).
ehci.c (1.60); author: mycroft
Remove comment about the data toggle being borked.
ehci.c (1.61); author: mycroft
Update comment.
ehci.c (1.62); author: mycroft
Adjust a couple of comments to make it clear WTF is going on.
ehci.c (1.63); author: mycroft
Fix an error in a debug printf().
ehci.c (1.64), ehcireg.h (1.17); author: mycroft
Further cleanup of toggle handling. Now that we use EHCI_QH_DTC,
we don't need to fiddle with the TOGGLE bit in the overlay
descriptor, so minimize how much we fuss with it.
Obtained from: NetBSD
Thanks to Sam for importing tags in a way that allowed this to be done.
Submitted by: Gleb Smirnoff <glebius@cell.sick.ru>
Also allow the sr and ar drivers to create netgraph versions of their modules.
Document the change to the ksocket node.
when not propogated on fork (due to minherit(2)). Consistency checks
otherwise fail when the vm_map is freed and it appears to have not been
emptied completely, causing an INVARIANTS panic in vm_map_zdtor().
PR: kern/68017
Submitted by: Mark W. Krentel <krentel@dreamscape.com>
Reviewed by: alc
because UFS uses fixed-size directory blocks. When using this code with
other file systems, such as HFS+, the value of auio.uio_resid will need
to be taken into account.
local to a function. Remove a couple of blank lines in variable
declarations.
In one case, explicitly test against NULL rather than using a pointer
as a boolean directly.
older API to list attributes on a file (zero-length attribute name)
to function. extattr_list_*() are now the only available APIs to
use when listing attributes.
only allow this to be further processed when bridging is active on
that interface, but also if the current packet has a VLAN tag and
VLANs are active on our interface. This gives the VLAN layers a
chance to also consider the packet (and perhaps drop it instead of the
main dispatcher).
This fixes a situation where bridging was only active on VLAN
interfaces but ether_demux() called on behalf of the main interface
had already thrown the packet away.
MFC after: 4 weeks
are currently all bad BIOS revisions that will never be able to support
ACPI. They were derived by examining which BIOS's are blacklisted by other
operating systems. Other types of quirks will be possible here as well.
network interfaces. This global mutex will protect all ifnet labels.
Acquire the mutex across various MAC activities on interfaces, such
as security checks, propagating interface labels to mbufs generated
from the interface, retrieving and setting the interface label.
Introduce mpo_copy_ifnet_label MAC policy entry point to copy the
value of an interface label from one label to another. Use this
to avoid performing a label externalize while holding mac_ifnet_mtx;
copy the label to a temporary ifnet label and then externalize that.
Implement mpo_copy_ifnet_label for various MAC policies that
implement interface labeling using generic label copying routines.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
locking in tcp_input() for TCP packets with urgent data pointers to
hold the socket buffer lock across testing and updating oobmark
from just protecting sb_state.
Update socket locking annotations
Ultra2 users may want to set OFWCONS_POLL_HZ to a value of '20'.
I have left default value at '4' as higher values can consume a more
than is acceptable amount of CPU, and we don't have a consensus yet
what is an optimal value.
Submitted by: Pyun YongHyeon <yongari@kt-is.co.kr>
not active GEOM providers, it will result in a kernel panic.
If the GEOM provider or disk goes away before the volume
configuration data gets written to the disk, it will result
in another kernel panic.
o Make sure that the drives specified for volume creation
are active GEOM providers.
o When writing out volume configuration data to associated drives,
make sure that the GEOM provider is active, otherwise continue
to the next drive in the volume.
Approved by: le, bmilekic (mentor)
Giant if debug.mpsafenet=0, as any points that require synchronization
in the SMPng world also required it in the Giant-world:
- inpcb locks (including IPv6)
- inpcbinfo locks (including IPv6)
- dummynet subsystem lock
- ipfw2 subsystem lock
- Assert the mutex in NG_IDHASH_FIND() since the mutex is required to
safely walk the node lists in the ng_ID_hash table.
- Acquire the ng_nodelist_mtx when walking ng_allnodes or ng_allhooks
to generate state dump output from the netgraph sysctls.
the socket buffer having its limits adjusted. sbreserve() now acquires
the lock before calling sbreserve_locked(). In soreserve(), acquire
socket buffer locks across read-modify-writes of socket buffer fields,
and calls into sbreserve/sbrelease; make sure to acquire in keeping
with the socket buffer lock order. In tcp_mss(), acquire the socket
buffer lock in the calling context so that we have atomic read-modify
-write on buffer sizes.
smp_rendezvous() to ensure we run on the BSP. This reverts rev 1.128.
Add a comment indicating that MI code should be the one that runs all
shutdown functions on the BSP with the APs halted. This should work
around problems in power off while waiting for the MI code to be improved.
waiting for the socket to connect and use msleep() on the socket
mute rather than tsleep(). Acquire socket buffer mutexes around
read-modify-write of socket buffer flags.
The general UMA lock is a recursion-allowed lock because
there is a code path where, while we're still configured
to use startup_alloc() for backend page allocations, we
may end up in uma_reclaim() which calls zone_foreach(zone_drain),
which grabs uma_mtx, only to later call into startup_alloc()
because while freeing we needed to allocate a bucket. Since
startup_alloc() also takes uma_mtx, we need to be able to
recurse on it.
This exact explanation also added as comment above mtx_init().
Trace showing recursion reported by: Peter Holm <peter-at-holm.cc>
originated on RELENG_4 and was ported to -CURRENT.
The scoreboarding code was obtained from OpenBSD, and many
of the remaining changes were inspired by OpenBSD, but not
taken directly from there.
You can enable/disable sack using net.inet.tcp.do_sack. You can
also limit the number of sack holes that all senders can have in
the scoreboard with net.inet.tcp.sackhole_limit.
Reviewed by: gnn
Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan)
This should fix problems with older SMP systems that only have ISA/EISA
IRQs when routing virgin PCI interrupts as well as on other boxes whose
MADT does not have any interrupt override entries for ISA IRQs that are
used to route PCI interrupts even in APIC mode.
actually used. For most ACPI devices this means deferring the call
until bus_alloc_resource().
- Add a function acpi_config_intr() to call BUS_CONFIG_INTR() for an
ACPI IRQ resource using the trigger mode and polarity information
stored in the ACPI resource object.
- Add a function acpi_lookup_irq_resource() to lookup the ACPI IRQ
resource that corresponds to a specified rid and new-bus resource.
- Have the ACPI PCI bridge driver call BUS_CONFIG_INTR() on interrupts
that it routes through link devices.
- Remove needactivate variable from acpi_alloc_resource() by changing the
function not modify the flags variable but just mask off RF_ACTIVE when
calling rman_reserve_resource().
Reviewed by: njl (1, an earlier version)
- Allow ioapic_set_{nmi,smi,extint}() to be called multiple times on the
same pin so long as the pin's mode is the same as the mode being
requested.
- Add a notion of bus type for the interrupt associated with interrupt pin.
This is needed so that we can force all EISA interrupts to be active high
in the forthcoming ioapic_config_intr().
- Fix a bug for EISA systems that didn't remap IRQs. This would have broken
EISA systems that tried to disable mixed mode for IRQ 0.
case of NFS mounted swap, so do not try to dereference it.
While we're here, brucify the printf() call which happens when we
time out on acquisition of vm_page_queue_mtx.
PR: kern/67898
Submitted by: bde (style)
device associated with any PCI devices that are enumerated in the ACPI
tree when adding children to an ACPI PCI bus and remove the duplicate
ACPI-only device_t and replace the device_t associated with the handle with
the ACPI and PCI aware device_t.
Several changes:
* Implement read for ulpt.
* If the device is not opened for reading, occasionally drain any
data the printer might have (but don't hammer the printer with reads).
* Lower the buffer size to one page.
The driver seems to work with more printers now.
Obtained from: NetBSD
depending on namespace pollution in <sys/vnode.h> for the definition
of mutex interfaces used in SOCKBUF_*LOCK().
Sorted includes.
Removed unused includes.
the SS_NBIO flag from the parent socket to the child socket during an
accept() operation.
The file descriptor O_NONBLOCK flag would have been propagated already
by the fflag assignment, and therefore would have been inconsistent
with the underlying socket's so_state member.
This makes accept() more closely adhere to the API contract we effectively
outline in the manual page. Note also that Linux continues to differ here;
O_NONBLOCK is not propagated. The other BSDs do propagate the flag, as
does Solaris. The Single UNIX Specification does not offer specific
advice on this issue.
PR: kern/45733
Requested by: Jayanth Vijayaraghavan
Reviewed by: rwatson
%di will already point to the character after the nul char when the
'repnz scasb' terminates.
Submitted by: Tom Cosgrove tom dot cosgrove at arches-consulting dot com
- Split the code out into if_clone.[ch].
- Locked struct if_clone. [1]
- Add a per-cloner match function rather then simply matching names of
the form <name><unit> and <name>.
- Use the match function to allow creation of <interface>.<tag>
vlan interfaces. The old way is preserved unchanged!
- Also the match function to allow creation of stf(4) interfaces named
stf0, stf, or 6to4. This is the only major user visible change in
that "ifconfig stf" creates the interface stf rather then stf0 and
does not print "stf0" to stdout.
- Allow destroy functions to fail so they can refuse to delete
interfaces. Currently, we forbid the deletion of interfaces which
were created in the init function, particularly lo0, pflog0, and
pfsync0. In the case of lo0 this was a panic implementation so it
does not count as a user visiable change. :-)
- Since most interfaces do not need the new functionality, an family of
wrapper functions, ifc_simple_*(), were created to wrap old style
cloner functions.
- The IF_CLONE_INITIALIZER macro is replaced with a new incompatible
IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE
instead.
Submitted by: Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1]
Reviewed by: andre, mlaier
Discussed on: net
Only the first link0..link$NLINKS hooks would be utilized, whereas
the link hooks may be connected sparsely.
Add a counter variable so that the link hook array is only traversed
while there is still work to do, but that it continues up to the end
if it has to.
* block packets that fail to create state table entries
* only allow non-fragmented packets to influence whether or not a logged
packet is the same as the one logged before.
* correct the ICMP packet checksum fixing up when processing ICMP errors for NAT
* implement a maximum for the number of entries in the NAT table (NAT_TABLE_MAX
and ipf_nattable_max)
* frsynclist() wasn't paying attention to all the places where interface
names are, like it should.
* fix comparing ICMP packets with established TCP state where only 8 bytes
of header are returned in the ICMP error.
MFC after: 1 week
* Obtain/release schedlock around calls to calcru.
* Sort switch cases which do not cascade per style(9).
* Sort local variables per style(9).
* Remove "superfluous" whitespace.
* Cleanup handling of NULL uap->tp in clock_getres(). It would probably
be better to return EFAULT like clock_gettime() does by passing the
pointer to copyout(), but I presume it was written to not fail on
purpose in the original code. I'll defer to -standards on this one.
Reported by: bde
This is not really used by the process but it's confusing to some
status readers to see zombie processes the "runnin" threads.
Pointed out by: Don Lewis <truckman@FreeBSD.org>
the GEOM topology.
There are still issues with not detaching from cam correctly such that
upon a device detach there's an invalid pointer dereference from the
later call to cam_rescan().
that are on a CISS bus to be exported up to CAM and made available as normal
devices. This will typically add one or two buses to CAM, which will be
numbered starting at 32 to allow room for CISS proxy buses. Also, the CISS
firmware usually hides disk devices, but these can also be exposed as 'pass'
devices if you set the hw.ciss.expose_hidden_physical tunable.
Sponsored by: Tape Laboratories, Inc.
MFC After: 3 days
where it is known to detect a problem but the problem is not very easy
to fix. The warning became very common recently after a call to calcru()
was added to fill_kinfo_thread().
Another (much older) cause of "negative times" (actually non-monotonic
times) was fixed in rev.1.237 of kern_exit.c.
Print separate messages for non-monotonic and negative times.
from exit1(). sched_exit() must be called unconditionally from exit1().
It was called almost unconditionally because the only exits on system
shutdown if at all.
(2) Removed the comment that presumed to know what sched_exit() does.
sched_exit() does different things for the ULE case. The call became
essential when it started doing load average stuff, but its caller
should not know that.
(3) Didn't fix bugs caused by bitrot in the condition. The condition was
last correct in rev.1.208 when it was in wait1(). There p was spelled
curthread->td_proc and was for the waiting parent; now p is for the
exiting child. The condition was to avoid lowering init's priority.
It should be in sched_exit() itself. Lowering of priorities is broken
in other ways in at least the 4BSD scheduler, and doing it for init
causes less noticeable problems than doing it for for shells.
Noticed by: julian (1)
least the pci device unloadable
- Use ttymalloc() rather than a plain malloc to allocate the
rp->rp_tty ttys. This is now required due to the recent locking
changes to ttys and prevents a panic due to locking an unitialized
t_mtx.
- Allow the pci driver to be unloaded. This involved moving
the call rp_releaseresource() to the end of rp_pcireleaseresource(),
since rp_pcireleaseresource() uses ctlp->dev, which is freed
by rp_releaseresource().
- Allow the generic part of the driver to be unattached by providing
a hook to cancel timeouts.
Glanced at by: obrien
- sowakeup() now asserts the socket buffer lock on entry. Move
the call to KNOTE higher in sowakeup() so that it is made with
the socket buffer lock held for consistency with other calls.
Release the socket buffer lock prior to calling into pgsigio(),
so_upcall(), or aio_swake(). Locking for this event management
will need revisiting in the future, but this model avoids lock
order reversals when upcalls into other subsystems result in
socket/socket buffer operations. Assert that the socket buffer
lock is not held at the end of the function.
- Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now
have _locked versions which assert the socket buffer lock on
entry. If a wakeup is required by sb_notify(), invoke
sowakeup(); otherwise, unconditionally release the socket buffer
lock. This results in the socket buffer lock being released
whether a wakeup is required or not.
- Break out socantsendmore() into socantsendmore_locked() that
asserts the socket buffer lock. socantsendmore()
unconditionally locks the socket buffer before calling
socantsendmore_locked(). Note that both functions return with
the socket buffer unlocked as socantsendmore_locked() calls
sowwakeup_locked() which has the same properties. Assert that
the socket buffer is unlocked on return.
- Break out socantrcvmore() into socantrcvmore_locked() that
asserts the socket buffer lock. socantrcvmore() unconditionally
locks the socket buffer before calling socantrcvmore_locked().
Note that both functions return with the socket buffer unlocked
as socantrcvmore_locked() calls sorwakeup_locked() which has
similar properties. Assert that the socket buffer is unlocked
on return.
- Break out sbrelease() into a sbrelease_locked() that asserts the
socket buffer lock. sbrelease() unconditionally locks the
socket buffer before calling sbrelease_locked().
sbrelease_locked() now invokes sbflush_locked() instead of
sbflush().
- Assert the socket buffer lock in socket buffer sanity check
functions sblastrecordchk(), sblastmbufchk().
- Assert the socket buffer lock in SBLINKRECORD().
- Break out various sbappend() functions into sbappend_locked()
(and variations on that name) that assert the socket buffer
lock. The !_locked() variations unconditionally lock the socket
buffer before calling their _locked counterparts. Internally,
make sure to call _locked() support routines, etc, if already
holding the socket buffer lock.
- Break out sbinsertoob() into sbinsertoob_locked() that asserts
the socket buffer lock. sbinsertoob() unconditionally locks the
socket buffer before calling sbinsertoob_locked().
- Break out sbflush() into sbflush_locked() that asserts the
socket buffer lock. sbflush() unconditionally locks the socket
buffer before calling sbflush_locked(). Update panic strings
for new function names.
- Break out sbdrop() into sbdrop_locked() that asserts the socket
buffer lock. sbdrop() unconditionally locks the socket buffer
before calling sbdrop_locked().
- Break out sbdroprecord() into sbdroprecord_locked() that asserts
the socket buffer lock. sbdroprecord() unconditionally locks
the socket buffer before calling sbdroprecord_locked().
- sofree() now calls socantsendmore_locked() and re-acquires the
socket buffer lock on return. It also now calls
sbrelease_locked().
- sorflush() now calls socantrcvmore_locked() and re-acquires the
socket buffer lock on return. Clean up/mess up other behavior
in sorflush() relating to the temporary stack copy of the socket
buffer used with dom_dispose by more properly initializing the
temporary copy, and selectively bzeroing/copying more carefully
to prevent WITNESS from getting confused by improperly
initialized mutexes. Annotate why that's necessary, or at
least, needed.
- soisconnected() now calls sbdrop_locked() before unlocking the
socket buffer to avoid locking overhead.
Some parts of this change were:
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
ld: locore.o: non-pic code with imm relocation against dynamic
symbol `__gp'
With binutils 2.15, ld(1) defines the implicit/automatic symbol __gp
as a dynamic symbol and thus will now complain when used in a non-PIC
fashion (the immediate relocation used to set the GP register). Resolve
this by defining __gp in the linker script. Make sure __gp is aligned
on a 16-byte boundary.
Note: the 0x200000 magic offset is due to having a 22-bit GP-relative
relocation. The GOT will be accessed with negative offsets from GP.
that it is a series of alphabetically-ordered #fidef's, from Bruce Evans.
Define two new thread-related values in kproc_info, from Cyrille Lefevre.
Also remove a few values from kproc_info that were not needed, and change
around a few comments, from me. Changes are combined into a single commit
simply because it is a hassle to make sure that alignments and sizes are
not changed on any platform when modifying kproc_info.
socket from its accept queue when aborting it during a new inbound
connection. Update spx_input() to acquire the accept lock, assert
the condition of the socket on its parent queue, and approriately
disconnect it from the queue before calling soabort() on it.
Tweak things so that ng_fec has a chance of working with things
other than ethernet. Use ifp->if_output of the underlying interfaces
and use IF_HANDOFF() rather than depending on ether_output() and
ether_output_frame() explicitly. Also, don't insist that underlying
devices be IFM_ETHER when checking their link states in the link
monitor code.
With these changes, I was able to create a two channel bundle
consisting of one ethernet interface and one 802.11 wireless
device (via ndis). Note that this only works because both devices
use the same if_output vector: ng_fec will not let you bundle
devices with different output vectors together (it really doesn't
make sense to do that).
underlying interfaces rather than using ac_netgraph in struct arpcom.
The latter is meant only for use by ng_ether, and using it breaks
interoperability with the rest of netgraph.
socket lock over pulling so_options and so_linger out of the socket
structure in order to retrieve a consistent snapshot. This may be
overkill if user space doesn't require a consistent snapshot.
resolved by socket locking: in particular, that we test the connection
state at the socket layer without locking, request that the protocol
begin listening, and then set the listen state on the socket
non-atomically, resulting in a non-atomic cross-layer test-and-set.
lots of errors. Blind substitution of "dev_t foo" by "struct cdev *foo"
in comments usually just created an English syntax error (e.g.,
"struct cdev *changes"), but here it did less than that since the dev_t
is a user dev_t.
in npxsetregs() too. npxsetregs() must overwrite the previous state, and
it is never paired with an npxgetregs() that would defuse the previous
state (since npxgetregs() would have fninit'ed the state, leaving nothing
to do).
PR: 68058 (this should complete the fix)
Tested by: Simon Barner <barner@in.tum.de>
to vm_map_find() that is less likely to be outside of addressable memory
for 32-bit processes: just past the end of the largest possible heap.
This is the same hint that mmap() uses.
ki_childutime, and ki_emul. Also uses the timevaladd() routine to
correct the calculation of ki_childtime. That will correct the value
returned when ki_childtime.tv_usec > 1,000,000.
This also implements a new KERN_PROC_GID option for kvm_getprocs().
(there will be a similar update to lib/libkvm/kvm_proc.c)
Submitted by: Cyrille Lefevre
which just mark areas which are empty due to issues with the alignment
of already-existing fields. This defines several unrelated variables
in one shot, because most of the work for updating kinfo_proc is making
sure the sizeof(struct kinfo_proc) remains the same across all hardware
platforms, and that no space is wasted on any platform due to alignment
issues with the new variables.
Submitted by: some by Cyrille Lefevre, some by me
unnecessary because cpu_setregs() and/or npxinit() always sets CR0_TS
during system initialization, and CR0_TS is set in the next statement
(fpstate_drop()) if necessary after system initialization. Setting
it unnecessarily was less than a pessimization since it broke the
invariant that the npx can be used without an npxdna() trap if
fpucurthread is non-null. The broken invariant became harmful when I
added an fnclex to npxdrop().
Removed setting of CR0_MP in exec_setregs(). This was similarly
unnecessary but was harmless.
Updated comments (mainly by removing them). Things are simpler now
that we have cpu_setregs() and don't support a math emulator or pretend
to support not having either a math emulator or an npx.
Removed the ifdef for avoiding setting CR0_NE in the !SMP case in
cpu_setregs(). npx_probe() should reverse the setting if it wants to
force IRQ13 exception handling for testing.
lock state. Convert tsleep() into msleep() with socket buffer mutex
as argument. Hold socket buffer lock over sbunlock() to protect sleep
lock state.
Assert socket buffer lock in sbwait() to protect the socket buffer
wait state. Convert tsleep() into msleep() with socket buffer mutex
as argument.
Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK()
in order to call into these functions with the lock, as well as to
start protecting other socket buffer use in their implementation. Drop
the socket buffer mutexes around calls into the protocol layer, around
potentially blocking operations, for copying to/from user space, and
VM operations relating to zero-copy. Assert the socket buffer mutex
strategically after code sections or at the beginning of loops. In
some cases, modify return code to ensure locks are properly dropped.
Convert the potentially blocking allocation of storage for the remote
address in soreceive() into a non-blocking allocation; we may wish to
move the allocation earlier so that it can block prior to acquisition
of the socket buffer lock.
Drop some spl use.
NOTE: Some races exist in the current structuring of sosend() and
soreceive(). This commit only merges basic socket locking in this
code; follow-up commits will close additional races. As merged,
these changes are not sufficient to run without Giant safely.
Reviewed by: juli, tjr
ip_ctloutput(), as it may need to perform blocking memory allocations.
This also improves consistency with locking relative to other points
that call into ip_ctloutput().
Bumped into by: Grover Lines <grover@ceribus.net>
output to permanently (not ephemerally) go to the console. It is also
sent to any other console specified by TIOCCONS as normal.
While I'm here, document the kern.log_console_output sysctl.
NULL ifc.ifc_buf pointer, to determine the expected buffer size.
The submitted fix only takes account of interfaces with an AF_INET
address configured. This could no doubt be improved.
PR: kern/45753
Submitted by: Jacques Garrigue (with cleanups)
was received on a broadcast address on the input path. Under certain
circumstances this could result in a panic, notably for locally-generated
packets which do not have m_pkthdr.rcvif set.
This is a similar situation to that which is solved by
src/sys/netinet/ip_icmp.c rev 1.66.
PR: kern/52935
be suspended in thread_suspend_check, after they are resumed, all
threads will call thread_single, but only one can be success,
others should retry and will exit in thread_suspend_check.
wasn't actually clean, it was saving the xmm registers as left over by the
bios. fninit() doesn't clear those.
In fpudna(), instead of doing a fninit() and forgetting to load the initial
mxcsr, do a full fxrstor(&fpu_cleanstate). Otherwise we hand over whatever
random values are left in the xmm registers by the last user.
I'm not certain of whether this is excessive paranoia or not, but there was
an outright bug in neglecting to set the mxcsr value that caused awk to
SIGFPE in some case. Especially for Tim Robbins. :-)
i386 probably should do something about the mxcsr setings too.
Found by: tjr
encapsulated within an IPv6 datagram, do not abuse the 'ipov' pointer
when registering trace records. 'ipov' is specific to IPv4, and
will therefore be uninitialized.
[This fandango is only necessary in the first place because of our
host-byte-order IP field pessimization.]
PR: kern/60856
Submitted by: Galois Zheng
rwatson_netperf:
Introduce conditional locking of the socket buffer in fifofs kqueue
filters; KNOTE() will be called holding the socket buffer locks in
fifofs, but sometimes the kqueue() system call will poll using the
same entry point without holding the socket buffer lock.
Introduce conditional locking of the socket buffer in the socket
kqueue filters; KNOTE() will be called holding the socket buffer
locks in the socket code, but sometimes the kqueue() system call
will poll using the same entry points without holding the socket
buffer lock.
Simplify the logic in sodisconnect() since we no longer need spls.
NOTE: To remove conditional locking in the kqueue filters, it would
make sense to use a separate kqueue API entry into the socket/fifo
code when calling from the kqueue() system call.
unless the segment really contains the last of the data for the stream.
PR: kern/34619
Obtained from: OpenBSD (tcp_output.c rev 1.47)
Noticed by: Joseph Ishac
Reviewed by: George Neville-Neil
frstor can trap despite it being a control instruction, since it bogusly
checks for pending exceptions in the state that it is overwriting.
This used to be a non-problem because frstor was always paired with a
previous fnsave, and fnsave does an implicit fninit so any pending
exceptions only remain live in the saved state. Now frstor is sometimes
paired with npxdrop() and we must do a little more than just forget
that the npx was used in npxdrop() to avoid a trap later. This is a
non-problem in the FXSR case because fxrstor doesn't do the bogus check.
FXSR is part of SSE, and npxdrop() is only in FreeBSD-5.x, so this bug
only affected old machines running FreeBSD-5.x.
PR: 68058
- Lock down low hanging fruit use of sb_flags with socket buffer
lock.
- Lock down low hanging fruit use of so_state with socket lock.
- Lock down low hanging fruit use of so_options.
- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with
socket buffer lock.
- Annotate situations in which we unlock the socket lock and then
grab the receive socket buffer lock, which are currently actually
the same lock. Depending on how we want to play our cards, we
may want to coallesce these lock uses to reduce overhead.
- Convert a if()->panic() into a KASSERT relating to so_state in
soaccept().
- Remove a number of splnet()/splx() references.
More complex merging of socket and socket buffer locking to
follow.
devclass will be present even if the driver was disabled by a hint. Using
device_get_softc() provides the right info even if it's overkill.
Explained by: jhb
The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
- prevent an endless loop with route-to lo0, fixes PR 3736 (dhartmei@)
- The rule_number parameter for pf_get_pool() needs to be 32 bits, not 8 -
this fixes corruption of the address pools with large rulesets.
(mcbride@, pb@)
Reviewed-by: dhartmei
timeout values in the CAM CCBs. Divide by 1000 to get values in seconds
which are what ata(4) timeouts internally use.
This does lose granularity, though, and small values can now round down
to zero. It's probably worth making all ata(4) timeouts in terms of
hz/ticks/milliseconds/something.
Version 3.5 brings:
- Atomic commits of ruleset changes (reduce the chance of ending up in an
inconsistent state).
- A 30% reduction in the size of state table entries.
- Source-tracking (limit number of clients and states per client).
- Sticky-address (the flexibility of round-robin with the benefits of
source-hash).
- Significant improvements to interface handling.
- and many more ...
- Remove pflog and pfsync modules. Things will change in such a fashion
that there will be one module with pf+pflog that can be loaded into
GENERIC without problems (which is what most people want). pfsync is no
longer possible as a module.
- Add multicast address for in-kernel multicast pfsync protocol. Protocol
glue will follow once the import is done.
- Add one more mbuf tag
placates gcc which seems to like to complain about -1 being assigned to
an unsigned value. It is well defined and intended, but since signess bugs
are being hunted just change to 0xffffffff.
o Mask the lower 8 bits, not the lower 4 bits for the ai_capabilities word.
All 8 bits are defined and the 0xf was almost certainly a typo.
o Define APM_UNKNOWN to 0xff for emulation layer.
do not pick up the first local ip address for the source
ip address, return ENETUNREACH instead.
Submitted by: Gleb Smirnoff
Reviewed by: -current (silence)
mode tunnel, take the per-route MTU into account, *if* and *only if* it
is non-zero (as found in struct rt_metrics/rt_metrics_lite).
PR: kern/42727
Obtained from: NetBSD (ip_input.c rev 1.151)
fixes the problem of UDP sockets getting wedged in a connected state (and
bound to their destination) under heavy load.
Temporary bind/connect should probably be deleted in future
as an optimization, as described in "A Faster UDP" [Partridge/Pink 1993].
Notes:
- INP_LOCK() is already held in udp_output(). The connection is in effect
happening at a layer lower than the socket layer, therefore in theory
socket locking should not be needed.
- Inlining the in_pcbdisconnect() operation buys us nothing (in the case
of the current state of the code), as laddr is not part of the
inpcb hash or the udbinfo hash. Therefore there should be no need
to rehash after restoring laddr in the error case (this was a
concern of the original author of the patch).
PR: kern/41765
Requested by: gnn
Submitted by: Jinmei Tatuya (with cleanups)
Tested by: spray(8)
so_timeo Used as a sleep/wakeup address, no locking.
sb_* Almost all socket buffer fields locked with
sockbuf lock for the oskcet buffer.
so_cred Static after socket creation.
using rawcb_mtx. Hold this mutex while modifying or iterating over
the control list; this means that the mutex is held over calls into
socket delivery code, which no longer causes a lock order reversal as
the routing socket code uses a netisr to avoid recursing socket ->
routing -> socket.
Note: Locking of IPsec consumers of rawcb_list is not included in this
commit.
ALTQ enabled versions of IFQ_* macros by default, as requested by serveral
others. This is a follow-up to the quick fix I committed yesterday which
turned off the ALTQ checks for non-ALTQ kernels.
other modules to explode. eg: snd_ich->snd_pcm and umass->usb.
The problem was that I was using the unified base address of the module
instead of finding the start address of the section in question.
1. Remove a race whereby contigmalloc() would deadlock against the
running processes in the system if they kept reinstantiating
the memory on the active and inactive page queues that it was
trying to flush out. The process doing the contigmalloc() would
sit in "swwrt" forever and the swap pager would be going at full
force, but never get anywhere. Instead of doing it until the
queues are empty, launder for as many iterations as there are
pages in the queue.
2. Do all laundering to swap synchronously; previously, the vnode
laundering was synchronous and the swap laundering not.
3. Increase the number of launder-or-allocate passes to three, from
two, while failing without bothering to do all the laundering on
the third pass if allocation was not possible. This effectively
gives exactly two chances to launder enough contiguous memory,
helpful with high memory churn where a lot of memory from one pass
to the next (and during a single laundering loop) becomes dirtied
again.
I can now reliably hot-plug hardware requiring a 256KB contigmalloc()
without having the kldload/cbb ithread sit around failing to make
progress, while running a busy X session. Previously, it took killing
X to get contigmalloc() to get further (that is, quiescing the system),
and even then contigmalloc() returned failure.
live in src/lib/libc/gmon/gmon.c. glibc puts these prototypes in the same
header, so put them here for the sake of consistency.
PR: bin/4459
Reviewed by: bde
success and a proper errno value on failure. This makes it
consistent with cv_timedwait(), and paves the way for the
introduction of functions such as sema_timedwait_sig() which can
fail in multiple ways.
Bump __FreeBSD_version and add a note to UPDATING.
Approved by: scottl (ips driver), arch
flags relating to several aspects of socket functionality. This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties. The
following fields are moved to sb_state:
SS_CANTRCVMORE (so_state)
SS_CANTSENDMORE (so_state)
SS_RCVATMARK (so_state)
Rename respectively to:
SBS_CANTRCVMORE (so_rcv.sb_state)
SBS_CANTSENDMORE (so_snd.sb_state)
SBS_RCVATMARK (so_rcv.sb_state)
This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts). In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.
allocated ressouces should be ultimately freed in gv_destroy_geom()
(when unloading the module and not earlier), but I need to look at this
more closely.
I added bounds checking to the patch and cg improved
the formular.
Submitted by: Andriy Gapon <avg@icyb.net.ua>
PR: kern/65485
Approved by: cg
Reviewed by: imp, rwatson, le
I've had this sitting in my tree for a long time and I can't seem to
find who sent it to me in the first place, apologies to whoever is
missing out on a Contributed by: line here.
I belive it works as it should.
In the end drivers should be building with ALTQ checks by default, but for
now build them with the old macros for non-ALTQ kernels.
Note: Check new features w/ LINT *and* w/ LINT minus the new feature.
Found-by: rwatson
allocation was passed up to nexus. Now, we probe sysresource objects and
manage the resources they describe in a local rman pool. This helps
devices which attach/detach varying resources (like the _CST object) and
module loads/unloads. The allocation/release routines now check to see if
the resource is described in a child sysresource object and if so,
allocate from the local rman. Sysresource objects add their resources to
the pool and reserve them upon boot. This means sysresources need to be
probed before other ACPI devices.
Changes include:
* Add ordering to the child device probe. The current order is: system
resource objects, embedded controllers, then everything else.
* Make acpi_MatchHid take a handle instead of a device_t arg.
* Replace acpi_{get,set}_resource with the generic equivalents.
- Define type rlim_t and struct timeval. This makes autoconf
happier. (PR: 62388)
- Add RLIMIT_AS, which is an alias for our RLIMIT_VMEM.
- structs orlimit and loadavg, as well as macros CP*, should
only appear if __BSD_VISIBLE.
- Use underscored versions of int32_t and fixpt_t in case
<sys/types.h> is not included.
- Document areas of non-conformance.
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.
__FreeBSD_version bump will follow.
Tested-by: (i386)LINT
conform to the rfc2734 and rfc3146 standard for IP over firewire and
should eventually supercede the fwe driver. Right now the broadcast
channel number is hardwired and we don't support MCAP for multicast
channel allocation - more infrastructure is required in the firewire
code itself to fix these problems.
in a TAILQ. Re-arrange some of the ecb elements so that they can stay
stable through alloc/free cycles while the rest get bzero'd.
- Use the tag_id from the ecb rather than fro the ccb. The latter is only
for target mode.
- Honor the ccb flags for tag_action when deciding whether to do a tagged
or untagged transaction.
- Re-arrange autosense completion so that it works correctly in failure
cases.
- Turn on the PI_TAG_ABLE flag so that CAM will send us tagged transactions.
This enables tagged queueing in the driver.
SOCK_LOCK(so):
- Hold socket lock over calls to MAC entry points reading or
manipulating socket labels.
- Assert socket lock in MAC entry point implementations.
- When externalizing the socket label, first make a thread-local
copy while holding the socket lock, then release the socket lock
to externalize to userspace.
order definition for witness. Send lock before receive lock, and
socket locks after accept but before select:
filedesc -> accept -> so_snd -> so_rcv -> sellck
All routing locks after send lock:
so_rcv -> radix node head
All protocol locks before socket locks:
unp -> so_snd
udp -> udpinp -> so_snd
tcp -> tcpinp -> so_snd
before calling sotryfree().
-- Body of earlier bulk commit this belonged with --
Log:
Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:
- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
soref(), sorele().
- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
so_count: sofree(), sotryfree().
- Acquire SOCK_LOCK(so) before calling these functions or macros in
various contexts in the stack, both at the socket and protocol
layers.
- In some cases, perform soisdisconnected() before sotryfree(), as
this could result in frobbing of a non-present socket if
sotryfree() actually frees the socket.
- Note that sofree()/sotryfree() will release the socket lock even if
they don't free the socket.
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
reference count:
- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
soref(), sorele().
- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
so_count: sofree(), sotryfree().
- Acquire SOCK_LOCK(so) before calling these functions or macros in
various contexts in the stack, both at the socket and protocol
layers.
- In some cases, perform soisdisconnected() before sotryfree(), as
this could result in frobbing of a non-present socket if
sotryfree() actually frees the socket.
- Note that sofree()/sotryfree() will release the socket lock even if
they don't free the socket.
Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS
Otherwise, the setting of the PG_M bit by one processor could be lost if
another processor is simultaneously changing the PG_W bit.
Reviewed by: tegge@
rig a PREPEND macro for ALTQ as the POLL/DEQUEUE semantic is very bad in
terms of locking. We make this a full functional queue to allow "bulk
dequeue" which will further reduce the locking overhead (for non-altq
enabled devices). Drivers will access this via the following macros, which
will show up in <net/if_var.h> once we expose ALTQ to the build:
IFQ_DRV_DEQUEUE(ifq, m) - takes a mbuf off the queue (driver queue first)
IFQ_DRV_PREPEND(ifq, m) - pushes a mbuf back to the driver queue
IFQ_DRV_PURGE(ifq) - drops all packets in both queues
IFQ_DRV_IS_EMPTY(ifq) - checks for pending mbufs in either queue
One has to make sure that the first three are protected by a driver mutex.
At the moment most network drivers still require Giant, so this is not an
issue. Even those that have thier own mutex usually hold it in if_start and
the like, so this requirement is almost always satisfied.
This evolved from a discussion with Andrew Gallatin.
protect fields in the socket buffer. Add accessor macros to use the
mutex (SOCKBUF_*()). Initialize the mutex in soalloc(), and destroy
it in sodealloc(). Add addition, add SOCK_*() access macros which
will protect most remaining fields in the socket; for the time being,
use the receive socket buffer mutex to implement socket level locking
to reduce memory overhead.
Submitted by: sam
Sponosored by: FreeBSD Foundation
Obtained from: BSD/OS
that the command succeeded. Sheesh! This makes CDROMs no longer cause an
instant panic at boot. Thanks to Jake Burkholder for providing a remote
test setup.
Also make device resets work, thanks to another typo.
pass any traffic. Unfortunately this means no full-duplex link with auto-
negotiation on hme(4) using DP83840A PHYs again.
I really thought I had tested this also on a Netra t1 100...
- add locking
- disable ALTQ3_COMPAT by default (do not remove the code to keep the diff
towards KAME small)
- put some more code under ALTQ3 conditional compilation as it should be
- account for if_xname
- some more minor compile fixes
As people started wondering:
The strange path layout "altq/altq" is there to avoid "-Isys/contrib" and
make it "-Isys/contrib/altq" instead, as we will need at least <altq/altq.h>
and <altq/if_altq.h> for kernel compilation.
The "freebsd4_..." in the privious commit is just the best tag name in the
KAME tree I could find to classify this in order to track its history. It
does *not* mean that this will go to 4-STABLE or anything of that kind.
HEAD at this point). This will not exactly live in a vendor branch, but have
the vendor backing to make it easier to exchange diffs.
This will be followed by a diff which takes most of the .c files off the
vendor branch in order to:
- add locking
- disable ALTQ3_COMPAT code (which is outdated and "un-lockable")
There is work in progress to refine the configuration API. Import this "as
is" now to have more exposure time before 5-STABLE.
This is only the import, it will be some more days until you will actually
be able to compile ALTQ support into your kernel so don't hold your breath.
HEADUPs will be posted on current@ and net@ before this is actually enabled.
No-objection: re(scottl), core(rwatson)
ruleset, the pcb is looked up once per ipfw_chk() activation.
This is done by extracting the required information out of the PCB
and caching it to the ipfw_chk() stack. This should greatly reduce
PCB looking contention and speed up the processing of UID/GID based
firewall rules (especially with large UID/GID rulesets).
Some very basic benchmarks were taken which compares the number
of in_pcblookup_hash(9) activations to the number of firewall
rules containing UID/GID based contraints before and after this patch.
The results can be viewed here:
o http://people.freebsd.org/~csjp/ip_fw_pcb.png
Reviewed by: andre, luigi, rwatson
Approved by: bmilekic (mentor)
driver tries to submit the same request repeatedly, on finding the
controller cmd queue to be full.
Submitted by:ps, vkashyap
Reviewed by:re
Approved by:re
ifp->if_output() basedd on debug.mpsafenet. That way once bpfwrite()
can be called without Giant, it will acquire Giant (if desired) before
entering the network stack.
the need to synchronize access to the structure. I believe this
should fit into the stack under the necessary circumstances, but
if not we can either add synchronization or use a thread-local
malloc for the duration.
versions of various routers seen:
- Introduce igmp_mtx.
- Protect global variable 'router_info_head' and list fields
in struct router_info with this mutex, as well as
igmp_timers_are_running.
- find_rti() asserts that the caller acquires igmp_mtx.
- Annotate a failure to check the return value of
MALLOC(..., M_NOWAIT).
is the actual name here) on EBus and which are PCF8584 (on systems having
a boot-bus controller the i2c are said to not be a PCF8584). Similar to the
SUNW,envctrl devices, onboard slaves for monitoring fans, temperatures and
such hang off of these i2c devices. But there's also stuff like EEPROMs
housing the hostid of the system and the boards usally have a connector to
add custom slave devices (on CP1500 there's actually a second PCF8584 with
its own I2C bus for these).
This driver already works fine but I'm not yet sure if access to the slave
devices on CP1400/CP1500 marked as "reserved for factory use" in the docs
should be blocked (most likely these are the voltage controllers wich aren't
meant to be controller by software and even not by the firmware). Once the
issues with polled mode are fixed in the common pcf(4) part in pcf.c, this
front-end should probably honour the poll-mode property of the i2c devices.
Tested on Ultra AXe and CP1500 (Netra t1 100).
OK'ed by: joerg, nsouch
- Use "envctrl" as the name when registering this module rather than "pcf";
we can't have "pcf" as the name for all pcf(4) front-ends or we would get
conflicts.
OK'ed by: joerg
- s,pcf_,pcf_isa, to better reflect the purpose of this front-end and to
avoid conflicts.
- Don't use this front-end for attaching to EBus, declaring it as an EBus
driver was a cut&paste accident according to joerg.
OK'ed by: joerg, nsouch
global and allocated variables. This strategy is derived from work
originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam
Leffler:
- Add unp_mtx, a global mutex which will protect all UNIX domain socket
related variables, structures, etc.
- Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros.
- Acquire unp_mtx on entering most UNIX domain socket code,
drop/re-acquire around calls into VFS, and release it on return.
- Avoid performing sodupsockaddr() while holding the mutex, so in general
move to allocating storage before acquiring the mutex to copy the data.
- Make a stack copy of the xucred rather than copying out while holding
unp_mtx. Copy the peer credential out after releasing the mutex.
- Add additional assertions of vnode locks following VOP_CREATE().
A few notes:
- Use of an sx lock for the file list mutex may cause problems with regard
to unp_mtx when garbage collection passed file descriptors.
- The locking in unp_pcblist() for sysctl monitoring is correct subject to
the unpcb zone not returning memory for reuse by other subsystems
(consistent with similar existing concerns).
- Sam's version of this change, as with the BSD/OS version, made use of
both a global lock and per-unpcb locks. However, in practice, the
global lock covered all accesses, so I have simplified out the unpcb
locks in the interest of getting this merged faster (reducing the
overhead but not sacrificing granularity in most cases). We will want
to explore possibilities for improving lock granularity in this code in
the future.
Submitted by: sam
Sponsored by: FreeBSD Foundatiuon
Obtained from: BSD/OS 5 snapshot provided by BSDi
present and thus that the PnPBIOS probe should be skipped instead of
having ACPI zero out the PnPBIOStable pointer.
- Make the PnPBIOStable pointer static to i386/i386/bios.c now that that is
the only place it is used.
its primary use is for the FEPS/FAS366 SCSI found in Sun Ultra 1e and 2
machines. Once the pci front-end is ported, this driver can replace the
amd(4) driver.
The code as-is is fairly stable. I've disabled tagged-queueing until I can
figure out a corruption bug related to it. I'm importing it now so that
people with these machines can (finally) stop netbooting and report bugs
before 5.3.
as otherwise the junk it contains may cause uhub_explore to give
up without ever trying to restart the port. This fixes the following
errors I was seeing with a VIA UHCI controller:
uhub0: port error, restarting port 1
uhub0: port error, giving up port 1
(time grows downward)
thread 1 thread 2
------------|------------
dec ref_cnt |
| dec ref_cnt <-- ref_cnt now zero
cmpset |
free all |
return |
|
alloc again,|
reuse prev |
ref_cnt |
| cmpset, read
| already freed
| ref_cnt
------------|------------
This should fix that by performing only a single
atomic test-and-set that will serve to decrement
the ref_cnt, only if it hasn't changed since the
earlier read, otherwise it'll loop and re-read.
This forces ordering of decrements so that truly
the thread which did the LAST decrement is the
one that frees.
This is how atomic-instruction-based refcnting
should probably be handled.
Submitted by: Julian Elischer
internal reference counters, UMA_ZONE_NOFREE. This way, those slabs
(with their ref counts) will be effectively type-stable, then using
a trick like this on the refcount is no longer dangerous:
MEXT_REM_REF(m);
if (atomic_cmpset_int(m->m_ext.ref_cnt, 0, 1)) {
if (m->m_ext.ext_type == EXT_PACKET) {
uma_zfree(zone_pack, m);
return;
} else if (m->m_ext.ext_type == EXT_CLUSTER) {
uma_zfree(zone_clust, m->m_ext.ext_buf);
m->m_ext.ext_buf = NULL;
} else {
(*(m->m_ext.ext_free))(m->m_ext.ext_buf,
m->m_ext.ext_args);
if (m->m_ext.ext_type != EXT_EXTREF)
free(m->m_ext.ref_cnt, M_MBUF);
}
}
uma_zfree(zone_mbuf, m);
Previously, a second thread hitting the above cmpset might
actually read the refcnt AFTER it has already been freed. A very
rare occurance. Now we'll know that it won't be freed, though.
Spotted by: julian, pjd
all of the interface between the driver and the bus. This will enable
us to stop special casing eisa bus attachments in modules and treat them
like we treat all other busses.
In the longer run, we need to eliminate much (all?) of these interfaces
and switch to using the standard bus_alloc_resource(), but that's not
done right now.
# I've not updated the modules to include eisa, etc, just yet
Tested on: Compaq Proliant 3000/333 purchased for eisa work
mode. The 5704 apparently has some s00p3r s33kr1t registers for setting
the advertisement of pause frame ability (i.e flow control) when in
autoneg mode. If we don't set these registers correctly, we may not
be able to negotiate a proper link with some switches. (Symptom is that
the NIC reports the link as up (PCS synched) but no traffic can be
exchanged.)
PR: kern/67598
Add two new functions: ttyref() and ttyrel(). ttymalloc() creates a struct
tty with a reference count of one. when ttyrel sees the count go to zero,
struct tty is freed.
Hold references for open ttys and for ttys which are controlling terminal
for sessions.
Until drivers start using ttyrel(), this commit will make no difference.
different cards that matched vendor/id, but weren't wi cards. This is
because the vendor foolishly didn't have unique product ids. Symbol
has a serial card that would otherwise match the wi driver, for
example...
Taken from a patch for xe posted by: Carlos Velasco
recursive entering of the socket code from the routing code:
- Modify rt_dispatch() to bundle up the sockaddr family, if any,
associated with a pending mbuf to dispatch to routing sockets, in
an m_tag on the mbuf.
- Allocate NETISR_ROUTE for use by routing sockets.
- Introduce rtsintrq, an ifqueue to be used by the netisr, and
introduce rts_input(), a function to unbundle the tagged sockaddr
and inject the mbuf and address into raw_input(), which previously
occurred in rt_dispatch().
- Introduce rts_init() to initialize rtsintrq, its mutex, and
register the netisr. Perform this at the same point in system
initialization as setup of the domains.
This change introduces asynchrony between the generation of a
pending routing socket message and delivery to sockets for use
by userspace. It avoids socket->routing->rtsock->socket use and
helps to avoid lock order reversals between the routing code and
socket code (in particular, raw socket control blocks), as route
locks are held over calls to rt_dispatch().
Reviewed by: "George V.Neville-Neil" <gnn@neville-neil.com>
Conceptual head nod by: sam
pmap_extract() already does it.
In pmap_enter(), opa has already been masked so don't do it again.
Wrap a long line (recent transgression).
Use trunc_page() in pmap_mapdev() instead of anding with PG_FRAME, since
that is what we really meant.
Submitted by: alc (first item)
- export the rest of the cpu features (and amd's features).
- turn on EFER_NXE, depending on the NX amd feature bit
- reorg the identcpu stuff a bit in order to stop treating the
amd features as second class features (since it is now a primary feature
bit set) and make it easier to export.
lives in the top 12 'available' bits. atop() in the PHYS_TO_VM_PAGE()
macro only masks off the lower bits (by accident) and the upper bits
in the 64 bit ptes turn into "interesting" index values.
pmap_remove() would be called with a huge range and we'd stride across
it in only 2MB chunks. This would manifest as massive cpu time and a
largely unresponsive system during hard swap. Instead, check the higher
page directories which means we can run pmap_remove() in just a few
hundred loop iterations instead of millions since we can process
address space in chunks of 512GB and 1GB as well as 2MB.
Eternal thanks to: tmm
when I reordered events in accept1() to allocate a file descriptor
earlier, I didn't properly update use of goto on exit to unwind for
cases where the file descriptor is now held, but wasn't previously.
The result was that, in the event of accept() on a non-blocking socket,
or in the event of a socket error, a file descriptor would be leaked.
This ended up being non-fatal in many cases, as the file descriptor
would be properly GC'd on process exit, so only showed up for processes
that do a lot of non-blocking accept() calls, and also live for a long
time (such as qmail).
This change updates the use of goto targets to do additional unwinding.
Eyes provided by: Brian Feldman <green@freebsd.org>
Feet, hands provided by: Stefan Ehmann <shoesoft@gmx.net>,
Dimitry Andric <dimitry@andric.com>
Arjan van Leeuwen <avleeuwen@piwebs.com>
which doesn't support ACPI power states. Return AE_NOT_FOUND for these
cases and don't print the warning message. Also, print the name of the
handle instead of device when unable to switch states. The device is often
not attached at this point and so its name is NULL, which doesn't help
debugging.
is generic to any threading system. This commit does not link this
file to the build yet, nor does it remove these functions from their
current location in kern_thread.c. (that commit coming up after further review)
Dividing by 0 in order to check for irq13/exception16 delivery apparently
always causes an irq13 even if we have configured for exception16 (by
setting CR0_NE). This was expected, but the timing of the irq13 was
unexpected. Without CR0_NE, the irq13 is delivered synchronously at
least on my test machine, but with CR0_NE it is delivered a little
later (about 250 nsec) in PIC mode and much later (5000-10000 nsec)
in APIC mode. So especially in APIC mode, the irq13 may arrive after
it is supposed to be shut down. It should then be masked, but the
shutdown is incomplete, so the irq goes to a null handler that just
reports it as stray. The fix is to wait a bit after dividing by 0 to
give a good chance of the irq13 being handled by its proper handler.
Removed the hack that was supposed to recover from the incomplete shutdown
of irq13. The shutdown is now even more incomplete, or perhaps just
incomplete in a different way, but the hack now has no effect because
irq13 is edge triggered and handling of edge triggered interrupts is
now optimized by skipping their masking. The hack only worked due
to it accidentally not losing races.
The incomplete shutdown of irq13 still allows unprivileged users to
generate a stray irq13 (except on systems where irq13 is actually used)
by unmasking an npx exception and causing one. The exception gets
handled properly by the exception 16 handler. A spurious irq13 is
delivered asynchronously but is harmless (as in the probe) because it
is almost perfectly not handled by the null interrupt handler.
Perfectly not handling it involves mainly not resetting the npx busy
latch. This prevents further irq13's despite them not being masked in
the [A]PIC.
with an ASUS A7N8X-E motherboard in APIC mode, since storming interrupts
don't repeat immediately. Use DELAY(1) to wait a bit for them to repeat.
This affects all systems. Only delay for the first
(10 * intr_storm_threshold) interrupts (per interrupt handler) so that
this is only a pessimization while warming up. Throttle after calling
the sub-handlers instead of before so that the long delay given by
throttling can be used instead of the DELAY(1) to detect storms after
warming up.
Reduced the throttling period from 1/10 second to 1/hz seconds so that
throttling doesn't destroy performance so much. Interrupts that are
detected as storming are effectively handled by polling at a frequency
of hz Hz. On A7N8X-E's there is another hardware or configuration bug
that makes the throttled frequency closer to 2*hz Hz.
tree, output an empty string instead of "?". This is already what
happened with DEVICE_SYSCTL_LOCATION and DEVICE_SYSCTL_PNPINFO. This
makes the output of "sysctl dev" much nicer (it won't display those
empty sysctls).
Reviewed by: des
"stray irq 9" messages on my Thinkpad. It may also help with general
reboot consistency although the recent hang on reboot was solved by
acpi_cpu.c rev 1.39.
after. Unify the paths for all Cx states. Remove cpu_idle_busy and
instead do the little profiling we need before re-enabling interrupts.
Use 1 quantum as estimate for C1 sleep duration since the timer interrupt
is the main reason we wake.
While here, change the cx_history sysctl to cx_usage and report statistics
for which idle states were used in terms of percent. This seems more
intuitive than counters. Remove the cx_stats structure since it's no
longer used. Update the man page.
Change various types which do not need explicit size.
size_t and size_t *, respectively. Update callers for the new interface.
This is a better fix for overflows that occurred when dumping segments
larger than 2GB to core files.
called ttyldoptim().
Use this function from all the relevant drivers.
I belive no drivers finger linesw[] directly anymore, paving the way for
locking and refcounting.
exactly as done in the cmi driver. I am quite confident this is
safe since I'm runing this for more than two weeks now, on an SMP
box. A few people tested this patch for me successfully as well.
<sys/linedisc.h> (repocopied).
Temporarily use a nested include from <sys/tty.h> to get <sys/linedisc.h>
into relevant source files.
Introduce a set of inline functions named ttyld_...() to invoke
linedisc methods instead of groping around in the linesw array.
class variables in addition to per-device variables. In plain English,
this means that dev.foo0.bar is now called dev.foo.0.bar, and it is
possible to to have dev.foo.bar as well.
- In subr_ndis.c, my_strcasecmp() actually behaved like my_strncasecmp():
we really need it to behave like the former, not the latter. (It was
falsely matching "RadioEnable", which defaults to 1 with "RadioEnableHW"
which the driver creates itself and to 0, because we were using
strlen("RadioEnable") as the length to test. This caused the radio to
always be turned off. :( )
- In if_ndis.c, only set IEEE80211_CHAN_A for channels if we actually
set any IEEE80211_MODE_11A rates. (ieee80211_attach() will "helpfully"
add IEEE80211_MODE_11A to ic_modecaps for you if you initialize any
802.11a channels. This caused "ndis0: 11a rates:" to erroneously be
displayed during driver load.)
- Also in if_ndis.c, when using TESTSETRATE() to add in any missing 802.11b
rates, remember to OR the rates with IEEE80211_RATE_BASIC, otherwise
comparing against existing basic rates won't match. (1, 2, 5.5 and
11Mbps are basic rates, according to the 802.11b spec.) This erroneously
cause 11Mbps to be added to the 11b rate list twice.
double NULL entries signal Witness to stop processing the array of
order entries meaning none of the spin locks are added resulting in
panics on boot.
- Add a missing NULL, NULL terminator to the Slip locks list to keep them
separate from the spin locks.
that m_prepend() is not called with possibility to wait while the
pcb lock is held. What still needs revisiting is whether the
ripcbinfo lock is really required here.
Discussed with: rwatson
relationships:
Sockets: filedesc->accept->sellck
Routing: radix node head->rtentry->ifaddr
UDP: udp->udpinp
TCP: tcp->tcpinp
SLIP: slip_mtx->slip sc_mtx
Drop in a place holder section for UNIX domain sockets. Various
sections to be expanded over the next few days.
sysctls were global (hw.fxp_rnr and hw.fxp_noflow), all of them are
now per-device. Sample output of "sysctl dev.fxp0" with this patch,
with the standard %foo nodes removed :
dev.fxp0.int_delay: 1000
dev.fxp0.bundle_max: 6
dev.fxp0.rnr: 0
dev.fxp0.noflow: 0
little/big endian fashion, so that network drivers can just reference
the standard implementation and don't have to bring their own.
As discussed on arch@.
Obtained from: NetBSD
protect the registers so it was trivially possible for a sync command and
i/o command to fight each other and confuse the controller. Make the
sync fib alloc/release functions inline and remove the somewhat worthless
AAC_SYNC_LOCK_FORCE flag. Thanks to Adil Katchi for helping me to track
this down in RELENG_4.
sched_clock() rather than using callouts. This means we no longer have to
take the load of the callout thread into consideration while balancing and
should make the balancing decisions simpler and more accurate.
Tested on: x86/UP, amd64/SMP
global mutex, accept_mtx, which serializes access to the following
fields across all sockets:
so_qlen so_incqlen so_qstate
so_comp so_incomp so_list
so_head
While providing only coarse granularity, this approach avoids lock
order issues between sockets by avoiding ownership of the fields
by a specific socket and its per-socket mutexes.
While here, rewrite soclose(), sofree(), soaccept(), and
sonewconn() to add assertions, close additional races and address
lock order concerns. In particular:
- Reorganize the optimistic concurrency behavior in accept1() to
always allocate a file descriptor with falloc() so that if we do
find a socket, we don't have to encounter the "Oh, there wasn't
a socket" race that can occur if falloc() sleeps in the current
code, which broke inbound accept() ordering, not to mention
requiring backing out socket state changes in a way that raced
with the protocol level. We may want to add a lockless read of
the queue state if polling of empty queues proves to be important
to optimize.
- In accept1(), soref() the socket while holding the accept lock
so that the socket cannot be free'd in a race with the protocol
layer. Likewise in netgraph equivilents of the accept1() code.
- In sonewconn(), loop waiting for the queue to be small enough to
insert our new socket once we've committed to inserting it, or
races can occur that cause the incomplete socket queue to
overfill. In the previously implementation, it was sufficient
to simply tested once since calling soabort() didn't release
synchronization permitting another thread to insert a socket as
we discard a previous one.
- In soclose()/sofree()/et al, it is the responsibility of the
caller to remove a socket from the incomplete connection queue
before calling soabort(), which prevents soabort() from having
to walk into the accept socket to release the socket from its
queue, and avoids races when releasing the accept mutex to enter
soabort(), permitting soabort() to avoid lock ordering issues
with the caller.
- Generally cluster accept queue related operations together
throughout these functions in order to facilitate locking.
Annotate new locking in socketvar.h.
It was based on the pty(4) driver which as a tty side an a non-tty side.
Nmdm(4) seems to have inherited two symmetric sides from pty but
unfortunately they are not quite ttys. Running a getty one one
side and tip on the other failed to produce NL->CRNL mapping for
instance.
Rip out the basically bogus cdevsw->{read,write} functions and rely
on ttyread() and ttywrite() which does the same thing.
Use taskqueue_swi_giant to run a task for either side to do what
needs to be done. (Direct calling is not an option as it leads to
recursion.) Trigger the task from the t_oproc and t_stop methods.
Default the ports to not ECHO. Since we neither rate limiting nor
emulation, two ports echoing each other is a really bad idea, which
can only be properly mitigated by rate limiting, rate emulation or
intelligent detection. Rate emulation would be a neat feature.
Ditch the modem-line emulation, if needed for some app, it needs
to be thought much more about how it interacts with the open/close
logic.
APIC interrupt pin based on the settings in the corresponding interrupt
source structure.
- Use ioapic_program_intpin() in place of manual frobbing of the intpin
configuration in ioapic_program_destination() and ioapic_register().
- Use ioapic_program_intpin() to implement suspend/resume support for I/O
APICs.
descriptors out of fdrop_locked() and into vn_closefile(). This
removes all knowledge of vnodes from fdrop_locked(), since the lock
behavior was specific to vnodes. This also removes the specific
requirement for Giant in fdrop_locked(), it's now only required by
code that it calls into.
Add GIANT_REQUIRED to vn_closefile() since VFS requires Giant.
nextpkt within the m_hdr was not being initialized to NULL for
!M_PKTHDR cases. *Maybe* this will fix weird socket buffer
inconsistency panics, but we'll see.
every iteration of aac_startio(). This ensures that a command that is
deferred for lack of resources doesn't immediately get retried in the
aac_startio() loop. This avoids an almost certain livelock.
the socket is on an accept queue of a listen socket. This change
renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new
state field on the socket, so_qstate, as the locking for these flags
is substantially different for the locking on the remainder of the
flags in so_state.
It doesn't take 'align' and 'flags' but 'master' instead, which is
a reference to the Master Zone, containing the backing Keg.
Pointed out by: Tim Robbins (tjr)
them to behave the same as if the SS_NBIO socket flag had been set
for this call. The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).
Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation. The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.
mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.
Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.
From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.
Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.
Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)
Add two additional pairs of assertions, one at the end of the NFS
server event loop, and one one exit from the NFS daemon, that
assert that if debug.mpsafenet is enabled, Giant is not held, and
that if it is not enabled, Giant will be held. This is intended
to support debugging scenarios where Giant is "leaked" during NFS
processing.
my Elektor card. Note that the hints are necessary to specify the
IO base of the pcf chip. This enables to check the IO base when the
probe routine is called during ISA enumeration.
The interrupt driven code is mixed with polled mode, which is wrong
and produces supposed spurious interrupts at each access. I still have
to work on it.
install nfssvc(). It also updates the argument count, but did so
without setting SYF_MPSAFE, effectively removing the MPSAFE flag even
when syscalls.master indicates it doesn't require Giant. This change
forces the modevent to set MPSAFE as a flag to its internal notion of
an argument coutn.
Note: this duplication of information is a bad thing, but is a more
general problem I'm not currently willing to address.
kernel). No other sys/*.h file requires machine/foo.h to be included
before it. In addition, all the files that include rman.h would need
to include those two anyway. From these two perspectives, it is
traditional to include things like this.
This lets us stop treating sys/rman.h specially in every bus frontend
file.
a vnode. Not bumped into with asserts in the main tree because we
run the NFS server with Giant by default. Discovered by inspection.
Complete annotations of Giant acquisition/release to note that it's
only because of VFS that we acquire Giant in most places in the NFS
server.
long time, i.e., since the cleanup of the VM Page-queues code done two
years ago.
Reviewed by: Alan Cox <alc at freebsd.org>,
Matthew Dillon <dillon at backplane.com>
This is part 2/2 of fixing autonegotiation on hme(4) using DP83840A PHYs.
It appears to also fix the occasional problems to establish a link on
hme(4) using LU6612 PHYs and shouldn't hurt on those using QS6612 PHYs.
Obtained from: NetBSD
properly. This causes the autonegotiation to e.g. never establish a
100baseTX full-duplex link. The solution to this problem is to manually
write the capabilities from the BMSR to the ANAR every time a media
change occurs, even when already in autonegotiation mode.
The NetBSD way of doing this is to set their MIIF_FORCEANEG flag in the
NIC driver. This causes mii_phy_setmedia() to call mii_phy_auto() (which
will set the ANAR according to the BMSR) even when the PHY alread is in
autonegotiation mode. However, while doing the same on FreeBSD (which
involves porting the MIIF_FORCEANEG flag and converting nsphy.c to use
mii_phy_setmedia()) fixes autonegotiation, using mii_phy_setmedia()
causes this driver to no longer work properly in the other modes.
Another drawback of that approach is that this will also force writing
the ANAR on other PHYs whose drivers use mii_phy_setmedia() and which
are used with a NIC whose driver sets MIIF_FORCEANEG (e.g. hme(4) is
known to be used together with 3 different PHYs while only the DP83840A
require this workaround).
So instead of moving to MIIF_FORCEANEG, just call mii_phy_auto() in
nsphy_service() unconditionally when hanging off of a hme(4) and serving
a media change
This is part 1/2 of fixing autonegotiation on hme(4) using DP83840A PHYs.
pipes, since open pipes are linked off a usbd_interface structure
that is free()'d when the configuration index is changed. Attempting
to close or use such pipes later would access freed memory and
usually crash the system.
The only driver that is known to trigger this problem is if_axe,
which is itself at fault, but it is worth detecting the situation
to avoid the obscure crashes that result from this type of easily
made driver mistakes.
behaviour lost in the change from 4.x style netgraph tee nodes.
Alter the tee node to use the new method. Document the behaviour.
Step the ABI version number... old netgraph klds will refuse to load.
Better than just crashing.
Submitted by: Gleb Smirnoff <glebius@cell.sick.ru>
make the key name matching case-insensitive. There are some drivers
and .inf files that have mismatched cases, e.g. the driver will look
for "AdhocBand" whereas the .inf file specifies a registry key to be
created called "AdHocBand." The mismatch is probably a typo that went
undetected (so much for QA), but since Windows seems to be case-insensitive,
we should be too.
In if_ndis.c, initialize rates and channels correctly so that specify
frequences correctly when trying to set channels in the 5Ghz band, and
so that 802.11b rates show up for some a/b/g cards (which otherwise
appear to have no 802.11b modes).
Also, when setting OID_802_11_CONFIGURATION in ndis_80211_setstate(),
provide default values for the beacon interval, ATIM window and dwelltime.
The Atheros "Aries" driver will crash if you try to select ad-hoc mode
and leave the beacon interval set to 0: it blindly uses this value and
does a division by 0 in the interrupt handler, causing an integer
divide trap.
of this micro-optimization occurs when we call pmap_enter() to wire an
already mapped page. Because of the micro-optimization, we fail to
mark the PTE as wired. Later, on teardown of the address space,
pmap_remove_pages() destroys the PTE before vm_fault_unwire() has
unwired the page. (pmap_remove_pages() is not supposed to destroy
wired PTEs. They are destroyed by a later call to pmap_remove().)
Thus, the page becomes lost.
Note: The page is not lost if the application called munlock(2), only
if it relies on teardown of the address space to unwire its pages.
For the historically inclined, this bug was introduced by a
megacommit, revision 1.182, roughly six years ago.
Leak observed by: green@ and dillon independently
Patch submitted by: dillon at backplane dot com
Reviewed by: tegge@
MFC after: 1 week
correct. Instead, check it against the possible settings (_PRS) when
the link is probed. This is important when using APIC mode but link
devices still have PIC mode settings. This is also what Linux does.
Additional prodding by: Len Brown len dot brown at intel dot com
based on the destination sleep state. Add a method to restore the old
state on resume. This is needed for the case of suspending to a very low
state disabling a GPE (i.e. S4), resuming, and then suspending to a higher
state (i.e. S3). This case should now keep the proper GPEs enabled.
device can wake the system. For example:
dev.root0.nexus0.acpi0.acpi_lid0.wake: 1
dev.root0.nexus0.acpi0.acpi_button0.wake: 1
dev.root0.nexus0.acpi0.pcib0.wake: 0
dev.root0.nexus0.acpi0.sio0.wake: 0
been developed for use with FreeBSD, version 4.8 and later.
Submitted by: Hema Joyce
Reviewed by: Prafulla Deuskar
Approved by: Prafulla Deuskar
MFC after: 1 week
subsystem lock to avoid tripping over an assertion regarding whether
the lock is held or not. This is likely to be the cause of a panic
tripped over by Andrea Campi.
acpi_wake_init:
Evaluate _PRW and set the GPE type
acpi_wake_set_enable:
Enable or disable a device's GPE.
acpi_wake_sleep_prep:
Perform any last-minute changes to the device to prepare it for
entering the given sleep state.
Also, walk the entire namespace when transitioning to a sleep state,
disabling any GPEs which aren't appropriate for the given state. Transition
acpi_lid and acpi_button to the new API.
This clears the way for non-ACPI-aware devices to wake the system (i.e.
modems) and fixes a problem where systems power up after shutdown when a
GPE is triggered.
In particular, disabling it was likely to break configurations
involving ng_vlan(4) since the latter couldn't control
the parent's VLAN_MTU in the way vlan(4) did.
Pointed out by: ru
does not reliably prevent the triggering of interrupts for all supported
configurations. Thus, the FIFO size probe could cause an interrupt,
which could lead to an interrupt storm in the shared interrupt case.
To prevent this, change ns8250_bus_probe() to use the overflow bit in
the line status register instead of the RX ready bit in the interrupt
identification register to detect whether the FIFO has filled up.
This allows us to clear all bits in the interrupt enable register during
the probe, which should prevent interrupts reliably.
Additionally, the detected FIFO size may be a bit more accurate, because
the overflow bit is only set when the FIFO did actually fill up, while
interrupts would trigger a bit early.
Reviewed and tested on a lot of hardware by: marcel
struct vmspace is freed from cpu_sched_exit() to pmap_release().
This has the advantage of being able to rely on MI code to decide
when a free should occur, instead of having to inspect the reference
count ourselves.
At the same time, turn the per-CPU vmspace pointer into a pmap pointer,
so that pmap_release() can deal with pmaps exclusively.
Reviewed (and embrassing bug spotted) by: jake
gmon and struct gmonhdr was originally just to represent the kernel
(profiling) clock frequency and it remains poorly suited to representing
the frequencies of fast counters like the TSC. It broke a year or two
ago. This quick fix keeps it working for another year or month or two
until TSC frequencies can exceed 2^32, by dividing the frequency by 2.
Dividing the frequency by 4 would work for a little longer but would
lose a little too much precision.
Fixed profiling of trap, syscall and interrupt handlers and some
ordinary functions, essentially by backing out half of rev.1.106 of
i386/exception.s. The handlers must be between certain labels for
the purposes of profiling, and this was broken by scattering them in
separately compiled .s files, especially for ordinary functions that
ended up between the labels. Merge the files by #including them as
before, except with different pathnames and better comments and
organization. Changes to the scattered files are minimal -- just
move the labels to the file that does the #includes.
This also partly fixes profiling of IPIs -- all IPI handlers are now
correctly classified as interrupt handlers, but many are still missing
mcount calls.
vm86bios.s is included as before, but it is now between the labels for
interrupt handlers again, which seems to be wrong since half of it is
for a non-interrupt handler.
converts miidevs to a .h file, so rename to reflect that.
The usb and pccard versions have also been renamed and will be hooked
into the build system shortly (I've made the conversion in my p4
tree).
process is a non-prison root. The security.jail.allow_raw_sockets
sysctl variable is disabled by default, however if the user enables
raw sockets in prisons, prison-root should not be able to interact
with firewall rule sets.
Approved by: rwatson, bmilekic (mentor)
WRT manipulating capabilities of the parent interface:
- use ioctl(SIOCSIFCAP) to toggle VLAN_MTU (the way that was done
before was just wrong);
- use the right order of conditional clauses to set the MTU fudge
(that is logically independent from toggling VLAN_MTU.)
This splits the driver into a bus-independant backend, plus bus-specific
frontends. The old pcf(4) (i386/ISA) frontend is now in pcf_isa.c, the
frontend in envctrl.c is for sparc64/Ebus2 (Sun device name: SUNW,envctrl
from Sun E450 machines). More frontends are expected to appear in future.
This is not yet ready for public consumption, but it basically works.
Nicolas will bring over his ISA-specific fixes soon.
Reviewed by: nsouch
1. Contrary to the Single Unix Specification our implementation of
munlock(2) when performed on an unwired virtual address range has
returned an error. Correct this. Note, however, that the behavior
of "system" unwiring is unchanged, only "user" unwiring is changed.
If "system" unwiring is performed on an unwired virtual address
range, an error is still returned.
2. Performing an errant "system" unwiring on a virtual address range
that was "user" (i.e., mlock(2)) but not "system" wired would
incorrectly undo the "user" wiring instead of returning an error.
Correct this.
Discussed with: green@
Reviewed by: tegge@
* Add calls to AcpiSetGpeType. We use wake/run as the type for lid and
button switches since wake-only causes Thinkpads to immediately wake on
the second suspend. Note that with wake/run, some systems return both
wake and device-specific notifies so we don't register for system notifies
for lid and button switches.
* Remove the hw.acpi.osi_method tunable since it is not needed.
* Always print unknown notifies for all types.
* Add more cleanup for the EC if it fails to attach.
* Use the GPE handle now that we parse it. This allows GPEs to be defined
in AML GPE blocks.
* Always use ACPI_NOT_ISR since it's ok to acquire a mutex in our thread
which processes queued requests.
in the various pci specifications as readonly. vendor, subvendor,
device and subdevice are required to be loaded in hardware by some
means that isn't the system BIOS or other system software (although
some devices do have ways of accomplishing this). class and subclass
are defined to be read-only in section 6.2.1 (v2.2). Apart from the
status register, which we weren't touching, these are the only
read-only registers I could find in the 2.2 spec.
progif is also defined as being read-only in section 6.2.1. However,
the PCI IDE programming document specifically states that some of the
bits are read/write. Since we may have to restore registers before we
have a driver attached, go ahead and restore this one byte when
transitioning between D3 and D0.
The PCI spec also says that writes to reserved and unimplemented
registers must be completed normally. It makes no statements about
writes to read-only registers, so be as conservative as possible,
while covering the exception to the rule that is documented in a
subpart of the standard.
Requested by: socttl
polarity rather than assuming that level triggered IRQs use active low and
edge triggered IRQs use active high. Both the MultiProcessor 1.4
and ACPI 2.0 Specifications state in their examples that level triggered
EISA IRQs are active low, but in practice they seem to be active high.
Reported by: Nik Azim Azam nskyline_r35 at yahoo dot com
ordinary functions, essentially by backing out half of rev.1.115 of
amd64/exception.S. The handlers must be between certain labels for
the purposes of profiling, and this was broken by scattering them in
separately compiled .S files, especially for ordinary functions that
ended up between the labels. Merge the files by #including them as
before, except with different pathnames and better comments and
organization. Changes to the scattered files are minimal -- just
move the labels to the file that does the #includes.
This also partly fixes profiling of IPIs -- all IPI handlers are now
correctly classified as interrupt handlers, but many are still missing
mcount calls.
mechanism so that early processing on mbufs can be performed before
a context switch to the NFS server threads. Because of this, if
the socket code is running without Giant, the NFS server also needs
to be able to run the upcall code without relying on the presence on
Giant. This change modifies the NFS server to run using a "giant
code lock" covering operation of the whole subsystem. Work is in
progress to move to data-based locking as part of the NFSv4 server
changes.
Introduce an NFS server subsystem lock, 'nfsd_mtx', and a set of
macros to operate on the lock:
NFSD_LOCK_ASSERT() Assert nfsd_mtx owned by current thread
NFSD_UNLOCK_ASSERT() Assert nfsd_mtx not owned by current thread
NFSD_LOCK_DONTCARE() Advisory: this function doesn't care
NFSD_LOCK() Lock nfsd_mtx
NFSD_UNLOCK() Unlock nfsd_mtx
Constify a number of global variables/structures in the NFS server
code, as they are not modified and contain constants only:
nfsrvv2_procid nfsrv_nfsv3_procid nonidempotent
nfsv2_repstat nfsv2_type nfsrv_nfsv3_procid
nfsrvv2_procid nfsrv_v2errmap nfsv3err_null
nfsv3err_getattr nfsv3err_setattr nfsv3err_lookup
nfsv3err_access nfsv3err_readlink nfsv3err_read
nfsv3err_write nfsv3err_create nfsv3err_mkdir
nfsv3err_symlink nfsv3err_mknod nfsv3err_remove
nfsv3err_rmdir nfsv3err_rename nfsv3err_link
nfsv3err_readdir nfsv3err_readdirplus nfsv3err_fsstat
nfsv3err_fsinfo nfsv3err_pathconf nfsv3err_commit
nfsrv_v3errmap
There are additional structures that should be constified but due
to their being passed into general purpose functions without const
arguments, I have not yet converted.
In general, acquire nfsd_mtx when accessing any of the global NFS
structures, including struct nfssvc_sock, struct nfsd, struct
nfsrv_descript.
Release nfsd_mtx whenever calling into VFS, and acquire Giant for
calls into VFS. Giant is not required for any part of the
operation of the NFS server with the exception of calls into VFS.
Giant will never by acquired in the upcall code path. However, it
may operate entirely covered by Giant, or not. If debug.mpsafenet
is set to 0, the system calls will acquire Giant across all
operations, and the upcall will assert Giant. As such, by default,
this enables locking and allows us to test assertions, but should not
cause any substantial new amount of code to be run without Giant.
Bugs should manifest in the form of lock assertion failures for now.
This approach is similar (but not identical) to modifications to the
BSD/OS NFS server code snapshot provided by BSDi as part of their
SMPng snapshot. The strategy is almost the same (single lock over
the NFS server), but differs in the following ways:
- Our NFS client and server code bases don't overlap, which means
both fewer bugs and easier locking (thanks Peter!). Also means
NFSD_*() as opposed to NFS_*().
- We make broad use of assertions, whereas the BSD/OS code does not.
- Made slightly different choices about how to handle macros building
packets but operating with side effects.
- We acquire Giant only when entering VFS from the NFS server daemon
threads.
- Serious bugs in BSD/OS implementation corrected -- the snapshot we
received was clearly a work in progress.
Based on ideas from: BSDi SMPng Snapshot
Reviewed by: rick@snowhite.cis.uoguelph.ca
Extensive testing by: kris
This change is possible since all the relevant drivers have been
fixed to set if_capenable properly. The field if_capabilities tracks
supported capabilities, which may be disabled administratively.
Inheriting checksum offload support from the parent interface isn't
that easy because the checksumming capabilities of the parent may be
toggled on the fly. Disable the code for now.
Kernel profiling for amd64's (normal and high resolution) should now
compile and work as (un)well as on i386's. It works better than user
profiling because:
- it uses _cyg_profile_func_*() instead of .mcount(), so it doesn't suffer
from gcc misspelling .mcount as mcount.
- it doesn't neglect saving %rax in .mcount().
The SMP case hasn't been tested. The high resolution subcase of this uses
the i8254, and as on i386's, the locking for this is deficient and the
i8254 is too inefficient. The acpi timer is also too inefficient.
- perfmon headers must be avoided until perfmon is supported.
- all call-used registers including return registers must be preserved
by .mcount(), etc., not quite as in profile.h. __cyg_profile_func_*()
don't require this, but they are (mis)implemented as aliases for
.mcount(), etc. so they preserve the registers.
- i386 ifdefs related to perfmon have not been adjusted yet.
amd64 as necessary. This is routine, except:
- the FAKE_MCOUNT($bintr) in doreti was missing the '$'. This gave a
a garbage address made up of padding bytes (with the nop byte 0x90 as
the MSB) instead of the intended address of bintr. This accidentally
worked on i386's because (0x90 << 24) is close enough to bintr, but
it doesn't work on amd64's because (0x90 << 56) is much further away
from bintr.
- the FAKE_MCOUNT($btrap) in calltrap was similarly broken. It hasn't
been needed since FreeBSD-1, so just delete it.
and high resolution profiling of interrupt handlers. The adjustments
are routine once the magic stack offset 13*4 is decoded to be TF_RIP
(there were originally more types of stack frames so using TF_EIP for
one of them wouldn't have been much simpler).
Removed garbage comments attached to some of the FAKE_MCOUNT()s.
that the usual macro for "ret" hides the detail of calling .mexitcount
before returning.
Fixed missing call to .mexitcount in lgdt(). This was missing on
i386's, mainly because lgdt() uses lret[q] insted of ret. This is
very unimportant since lgdt() is not (normally?) called until after
profiling is initialized.
in all USB ethernet drivers. The qdat structure contains a pointer
to the interface's struct ifnet and is used to process incoming
packets, so simultaneous use of two similar devices caused crashes
and confusion.
The if_udav driver appeared in the tree since Daan's PR, so I made
similar changes to that driver too.
PR: kern/59290
Submitted by: Daan Vreeken <Danovitsch@Vitsch.net>
MFNetBSD 1.177; author: toshii
Use the correct wValue to get hub desriptors.
Also, make wValue checks of root hub codes less strict.
MFNetBSD 1.178: author: martin
Interrupt descriptors might become invalid while being processed in
uhci_check_intr - so remember their next pointer before calling it.
Patch provided by Matthew Orgass in PR kern/24542.
Obtained from: NetBSD
KERN_PROC_SESSION option which had been previously defined but
never implemented.
PR: bin/65803 (a very tiny piece of the PR)`
Submitted by: Cyrille Lefevre
return value for BUS_READ_IVAR and thus don't generate the proper NULL
in cases where a device (i.e. on PCI) does not have a handle.
Found by: peadar, tjr
identifiers, to openfirmio.h as OFIOCMAXNAME, so programs can use it
for buffer sizes etc.
Note: Although this is only a rough upper limit to make the code more
robust and to prevent the allocation of ridiculous amounts of memory,
the current limit of one page (8191 + '\0' in openfirm_getstr()) still
appears a bit high. The maximum length of OFW property names is 31.
I didn't find a maximum length for the device identifiers in the OFW
documentation but it certainly is much smaller than 8191, too.
- Enable the OFIOCSET ioctl, i.e. move it out from under #if 0.
- Don't use openfirm_getstr() for the property value in OFIOCSET, there
are also properties whose values aren't strings and it makes sense to
use a different maximum length for property values than OFW_NAME_MAX/
OFIOCMAXNAME. The maximum accepted property value is defined in
openfirmio.h as OFIOCMAXVALUE (currently the maximum size of the value
of the nvramrc property).
- Make OFIOCSET not return EINVAL when OF_setprop() returns a different
length for the written value than it was told to write, this is normal
for the text string values of the properties in the OFW /options node.
Instead, only return EINVAL if OF_setprop() returned -1 (value could
not be written or property could not be created). Add a comment about
the specialty of the OFW /options node.
- Make OFIOCSET return the length of the written value returned by
OF_setprop(), just like OF_getprop() does. Quite useful, at least for
debugging.
Reviewed by: tmm
is being turned off, or else TCP/IP will keep assigning the job to us.
Drivers themselves should consult if_capenable, not if_hwassist--the
latter is for the TCP/IP stack.
being that PHYS_TO_VM_PAGE() returns the wrong vm_page for fictitious
pages but unwiring uses PHYS_TO_VM_PAGE(). The resulting panic
reported an unexpected wired count. Rather than attempting to fix
PHYS_TO_VM_PAGE(), this fix takes advantage of the properties of
fictitious pages. Specifically, fictitious pages will never be
completely unwired. Therefore, we can keep a fictitious page's wired
count forever set to one and thereby avoid the use of
PHYS_TO_VM_PAGE() when we know that we're working with a fictitious
page, just not which one.
In collaboration with: green@, tegge@
PR: kern/29915
and improved some comments). Also, made the documented {f,s}uword()
functions the standard entry points and the undocumented {f,s}uword64()
functions alternative entry points, like {f,s}uword32() for i386's. The
bitrot in the comments was a little larger here -- there are new undocumented
32-bit sub-word functions, not just renaming of 16-bit functions from
documented ones to undocumented ones.
of not clearing the flags for execv() syscall will result that a new
program runs in KSE thread mode without enabling it.
Submitted by: tjr
Modified by: davidxu
fixes was applicable to HEAD, originally it was thought this
should only be done in RELENG_4. Implement IO_INVAL in the vnode
op for writing by marking the buffer as "no cache". This fix
has already been applied to RELENG_4 as Rev. 1.65.2.15 of
ufs/ufs/ufs_readwrite.c.
Reviewed by: alc, tegge
Split the baby. For idepci devices, now both legacy mode bits need
not be set. We can run an idepci in a split mode. However, it only
works better than before, not works. It works better in that when one
device is legacy and the other isn't and disabled, we now operate
correctly.
sos submitted a version of this patch.
subclass, progif and revid. While these are typically read
only fields, they aren't always read-only. progif is writable
for ata devices, for example. It does no harm when they are
read only, and helps when they aren't.
to try to allocate things on my parent can be taken out. It duplicates code.
Also, add comment about why the power state stuff is here (type 2
devices don't participate in the power state save/restore due to
larger Bx issues).
chattiness was left in for debugging, but now that nearly all of the
problems relating to the changes have been fixed, it is only annoying. It
is still available via bootverbose.
Prodded by: jhb
threatened in rev.1.10 of usr.sbin/kgmon/kgmon.c more than 2 years ago.
kgmon has been recovering from the missing initialization for too
long, but the fixup there is ifdefed for i386's and shouldn't be
needed for other arches.
high resolution kernel profiling (options GUPROF. "U" in GUPROF stands
for microseconds resolution, but the resolution is now smaller than 1
nanosecond on multi-GHz machines and the accuracy is heading towards
1 nanosecond too). Arches that support GUPROF must now provide certain
macros for the calibration. GUPROF is now only supported for i386's,
so the absence of the new macros for other arches doesn't break anything
that wasn't already broken. amd64's have uncommitted support for
GUPROF, and sparc64's have support that seems to be complete except
here (there was an #error for non-i386 cases; now there are undefined
macros).
Changed the asms a little:
- declare them as __volatile. They must not be moved, and exporting a
label across asms is technically incorrect, so try harder to stop gcc
moving them.
- don't put the non-clobbered register "bx" in the clobber list. The
clobber lists are still more conservative than necessary.
- drop the non-support for gcc-1. It just gave a better error message,
and this is not useful since compiling with gcc-1 would cause thousands
of worse error messages.
- drop the support for aout.
because VLAN hardware features are enabled in em(4) by default.
Note: Currently vlan(4) has a bug that it consults
if_capabilities, not if_capenable. This will be fixed
after all the network drivers set VLAN bits in
if_capenable properly.
- Connect geom_stripe and geom_nop modules to the build.
- Connect STRIPE and NOP classes to the LINT build.
- Disconnect gconcat(8) from the build.
Supported by: Wheel - Open Technologies - http://www.wheel.pl
is intend to be fast. Just like CONCAT class it provides manual and
auto configuration methods.
Supported by: Wheel - Open Technologies - http://www.wheel.pl
it is very useful for tests. One is able to destroy its provider
forcibly if wants to test how other class handle such events.
One is also able to specify failure probability to check how other
classes handle I/O errors.
Supported by: Wheel - Open Technologies - http://www.wheel.pl
unless it's in the closed or listening state (remote address
== INADDR_ANY).
If a TCP inpcb is in any other state, it's impossible to steal
its local port or use it for port theft. And if there are
both closed/listening and connected TCP inpcbs on the same
localIP:port couple, the call to in_pcblookup_local() will
find the former due to the design of that function.
No objections raised in: -net, -arch
MFC after: 1 month
to <sys/gmon.h>. Cleaned them up a little by not attempting to ifdef
for incomplete and out of date support for GUPROF in userland, as in
the sparc64 version.
of kmupetext(). The declaration is misplaced in <machine/profile.h>
since it is not MD and not related to the lowest level of profiling.
It will be moved, but getting it via <sys/gmon.h> already works.
algorithm, supplied by wpaul himself. The lame one has an origin
that's been called into question, so rather than argue about that (one
could make an excellent fair use argument), replace it with better
code since that's what FreeBSD is about.
Submitted by: wpaul[1], Klaus Klein
[1] Bill called this a silly bikeshed. Maybe his is not incorrect.
and cannot handle it going away, add an explicit reference to the kobj
class inside each linker class. Without this, a class with no modules
loaded will sit with an idle refcount of 0. Loading and unloading
a module with it causes a 0->1->0 transition which frees the ops table
and causes subsequent loads using that class to explode. Normally, the
"kernel" module will remain forever loaded and prevent this happening, but
if you have more than one linker class active, only one owns the "kernel".
This finishes making modules work for kldload(8) on amd64.
(nobits) tables to simplify some code. Try and shorten some of the very
wide lines. Somewhere along the way, I think I fixed the memory
corruption that caused panics after going multiuser.
to avoid lock order problems when manipulating the sockets associated
with the fifo.
Minor optimization of a couple of calls to fifo_cleanup() from
fifo_open().
reimplementations of enodev() (for the smbread() and smbwrite()
functions), as well as fixing various errno values to conform to
errno(3).
Bruce also points out that a number of the pointer == NULL tests
are probably nonsense because the respective checks are already
done at upper layers.
(Mostly) submitted by: bde
remove the empty line between the fdc and sio devices. The empty
line suggests that the comment applies to fdc only while it applies
to all following devices and options.
Typo spotted by: ru@
controllers and allows the controller to prefetch 1-2k on certain
PCI memory reads to the host. The spec says this should only be
used for IA32 based systems.
Informed of feature by: John Cagle <first.last@hp.com>
gets the relocation base passed in relocbase, we cannot declare a
local variable with the same name. Assume the argument holds the
same value as the local variable did...
repocopied. Soon there will be additional bus attachments and
specialization for isa, acpi and pccard (and maybe pc98's cbus).
This was approved by nate, joerg and myself. bde dissented on the new
location, but appeared to be OK after some discussion.
different context support for 32 vs 64 bit processes. This simply omits
the save/restore of the segment selector registers for non 32 bit
processes. This avoids the rdmsr/rwmsr juggling when restoring %gs
clobbers the kernel msr that holds the gsbase.
However, I suspect it might be better to conditionally do this at
user<->kernel transition where we wouldn't need to do the juggling in the
first place. Or have per-thread extended context save/restore hooks.
hal_raise_irql(void) doesn't take an argument, but it is called with one.
eg: irql = FASTCALL1(hal_raise_irql, DISPATCH_LEVEL);
This is hidden by the macros on i386, but becomes a compile error on amd64
since the arguments are actually checked.
to help the AMD cpus (which have a hardware tlb flush filter). I held
off to see what the 64 bit Intel cpus did, but it doesn't seem to help
much there either. Oh well, store it in the Attic.
properly using copyin/copyout for more than 5 years? This one did. :-)
Properly encapsulate all user<->kernel data transfers using copy{in,out}.
MFC after: 1 month
NULL name in device_add_child(), explicitly name all of our known
child drivers in order to give them a chance to attach to us.
Otherwise, only the first one present would be probed and attached.
Reviewed by: nsouch
MFC after: 1 month
yet, but building kld's is OK now and they can be loaded by kldload(2).
(but the machine will likely crash soon afterwards, a "minor" problem :-)
Brought to you by: my injured knee (from moving)
elf_reloc() backends for two reasons. First, to support the possibility
of there being two elf linkers in the kernel (eg: amd64), and second, to
pass the relocbase explicitly (for relocating .o format kld files).
fragment to zero the valid parts of a VM_IO buffer.
RE would like this to be part of 4.10-RC3 so this will be MFC-ed immediately.
Reviewed by: alc, tegge
is "void *" (it isn't) or that the default promotion of pid_t is int.
Instead, assume that casting "struct foo *" to "void *" and printing the
result with %p is useful, and that all pid_t's are representable as longs.
Fixed some minor style bugs (mainly spelling errors in comments).
It only supports sa1110 (on simics) right now, but xscale support should come
soon.
Some of the initial work has been provided by :
Stephane Potvin <sepotvin at videotron.ca>
Most of this comes from NetBSD.
Wind River. In the IPv4 output path, one of the tests in ip_output()
checks how many slots are actually available in the interface output
queue before attempting to send a packet. If, for example, we need
to transmit a packet of 32K bytes over an interface with an MTU of
1500, we know it's going to take about 21 fragments to do it. If
there's less than 21 slots left in the output queue, there's no point
in transmitting anything at all: IP does not do retransmission, so
sending only some of the fragments would just be a waste of bandwidth.
(In an extreme case, if you're sending a heavy stream of fragmented
packets, you might find yourself sending nothing by the first fragment
of all your packets.) So if ip_output() notices there's not enough
room in the output queue to send the frame, it just dumps the packet
and returns ENOBUFS to the app.
It turns out ip6_output() lacks this code. Consequently, this caused
the netperf UDPIPV6_STREAM test to produce very poor results with large
write sizes. This commit adds code to check the remaining space in the
output queue and junk fragmented packets if they're too big to be
sent, just like with IPv4. (I can't imagine anyone's running an NFS
server using UDP over IPv6, but if they are, this will likely make them
a lot happier. :)
fills its field (6 characters). In that case the OEMID is not
null-terminated, and the sprintf that was used would copy up to the
next null byte, which could be pretty far away.
registers, so add a register offset array to the softc. We key off the
device ID to determine which set of register offsets. Currently the 8385
host bridge used on amd64 is the only bridge to use the AGP3_VIA_*
register offsets and all other bridges use the AGP_VIA_* offsets. It is
currently unclear if the AGP3_VIA_* offsets are for VIA bridges that
implement AGP 3.0 bridges or just for amd64 bridges.
Submitted by: Kenneth Culver culverk at sweetdreamsracing dot biz
removes a specific thread from a sleep queue. sleepq_resume_thread()
resumes scheduling of a thread that has been previously removed from a
sleep queue.
- sleepq_catch_signals() just removes a thread from the queue it was just
added to when a pending signal is found.
- sleepq_signal() and sleepq_broadcast() remove threads from a queue,
drop the queue lock, and then resume all the previously removed threads.
This doesn't completely fix the sched_lock <-> sleepq chain LOR, but it
makes it a little better as we no longer call setrunnble() with a sleep
queue lock held meaning if setrunnable() tries to wakeup the swapper we
don't try to lock two sleep queue chains at the same time.
proper locking to be checked at runtime.
Remove sb_lock() and sb_unlock() calls from sb_reset_dsp() because the
latter is called from sb_setup() with the lock already held. Add a
call to sb_lockassert().
Surround the call to sb_reset_dsp() in sb16_attach() with sb_lock()
and sb_unlock() calls.
Tested by: Bartek Marcinkiewicz <junior AT p233.if.pwr.wroc.pl>
an option. Note that this option doesn't follow the normal USB_ or
Uxxx_ convention. That's because it is this way in the upstream
provider and I didn't want to change that.
Some of the conditions that caused vm_page_select_cache() to deactivate a
page were wrong. For example, deactivating an unmanaged or wired page is a
nop. Thus, if vm_page_select_cache() had ever encountered an unmanaged or
wired page, it would have looped forever. Now, we assert that the page is
neither unmanaged nor wired.
Allow 500us between pauses in ahd_pause_and_flushwork().
The maximum we will wait is now 500ms.
In the same routine, remove any attempt to clear ENSELO.
Let the firmware do it once the current selection has
completed. This avoids some race conditions having to
do with non-packetized completions and the auto-clearing
of ENSELO on packetized completions.
Also avoid attempts to clear critical sections when
interrups are pending. We are going to loop again
anyway, so clearing critical sections is a waste of
time. It also may not be possible to clear a critical
section if the source of the interrupt was a SEQINT.
aic79xx_pci.c:
Use the Generic 9005 mask when looking for generic 7901B
parts. This allows the driver to attach to 7901B parts
on motherboards using a non-Adaptec subvendor ID.
aic79xx_inline.h:
Test for the SCBRAM_RD_BUG against the bugs
field, not the flags field in the softc.
aic79xx.c:
Cancel pending transactions on devices that
respond with a selection timeout. This decreases
the duration of timeout recovery when a device
disappears.
aic79xx.c:
Don't bother forcing renegotiation on a selection
timeout now that we use the device reset handler
to abort any pending commands on the target.
The device reset handler already takes us down
to async narrow and forces a renegotiation.
In the device reset handlers, only send a
BDR sent async event if the status is not
CAM_SEL_TIMEOUT. This avoids sending this
event in the selection timeout case
aic79xx.c:
Modify the Core timeout handler to verify that another
command has the potential to timeout before passing off
a command timeout as due to some other command. This
safety measure is added in response to a timeout recovery
failure on H2B where it appears that incoming reselection
status was lost during a drive pull test. In that case,
the recovery handler continued to wait for the command
that was active on the bus indefinetly. While the root
cause of the above issue is still being determined seems
a prudent safeguard.
aic79xx_pci.c:
Add a specific probe entry for the Dell OEM 39320(B).
aic79xx.c:
aic79xx.h:
aic79xx.reg:
aic79xx.seq:
Modify the aic79xx firmware to never cross a cacheline or
ADB boundary when DMA'ing completion entries to the host.
In PCI mode, at least in 32/33 configurations, the SCB
DMA engine may lose its place in the data-stream should
the target force a retry on something other than an
8byte aligned boundary. In PCI-X mode, we do this to
avoid split transactions since many chipsets seem to be
unable to format proper split completions to continue
the data transfer.
The above change allows us to drop our completion entries
from 8 bytes to 4. We were using 8 byte entries to ensure
that PCI retries could only occur on an 8byte aligned
boundary. Now that the sequencer guarantees this by splitting
up completions, we can safely drop the size to 4 bytes (2
byte tag, one byte SG_RESID, one byte pad).
Both the split-completion and PCI retry problems only show
up under high tag load when interrupt coalescing is being
especially effective. The switch from a 2byte completion
entry to an 8 byte entry to solve the PCI problem increased
the chance of incurring a split in PCI-X mode when multiple
transactions were completed at once. Dropping the completion
size to 4 bytes also means that we can complete more commands
in a single DMA (128byte FIFO -> 32 commands instead of 16).
aic79xx.c:
Modify the SCSIINT handler to defer clearing
sequencer critical sections to the individual
interrupt handlers. This allows us to
immediately disable any outgoing selections in
the case of an unexpected busfree so we don't
inadvertantly clear ENSELO *after* a new selection
has started. Doing so may cause the sequencer
to miss a successful selection.
In ahd_update_pending_scbs(), only clear ENSELO if
the bus is currently busy and a selection is not
already in progress or the sequencer has yet to
handle a pending selection. While we want to ensure
that the selection for the SCB at the head of the
selection queue is restarted so that any change in
negotiation request can take effect, we can't clobber
pending selection state without confusing the sequencer
into missing a selection.
sequencer interrupt codes. These codes are only
relevant to the code that was last being executed
and that context is cleared when we reset the
program counter. This addresses a race condition
between a sequencer interrupt and any SCSI event
that causes us to restart the sequencer.
o When running the untagged-Q, we must start the
timer for any transaction we queue.
o Give the firmware half a millisecond between
pauses to flush work out. This should give us
around half a second of total delay before flagging
an issue with pausing and flushing controller work.
Only attempt to clear critical sections if there
are no pending interrupts in the pause and flush
loop. If the sequencer has issued an INTSTAT, we
may not be able to step out of the critical section.
o Cancel pending transactions on devices that
respond with a selection timeout. This decreases
the duration of timeout recovery when a device
disappears.
Don't bother forcing renegotiation on a selection
timeout now that we use the device reset handler
to abort any pending commands on the target.
The device reset handler already takes us down
to async narrow and forces a renegotiation.
o In the device reset handlers, only send a
BDR sent async event if the status is not
CAM_SEL_TIMEOUT. This avoids sending this
event in the selection timeout case.
o Modify the Core timeout handler to verify that another
command has the potential to timeout before passing off
a command timeout as due to some other command.
are used.
- Reduce duplication of a couple of macros removing the duplicates from
ich.h.
- Remove unused macros from icu.h as well as locore protection as this
header is no longer included in assembly sources.
- Require the APIC enumerators to explicitly enable mixed mode by calling
ioapic_enable_mixed_mode(). Calling this function tells the apic driver
that the PC-AT 8259A PICs are present and routable through the first I/O
APIC via an ExtINT pin. The mptable enumerator always calls this
function for now. The MADT enumerator only enables mixed mode if the
PC-AT compatability flag is set in the MADT header.
- Allow mixed mode to be enabled or disabled via a 'hw.apic.mixed_mode'
tunable. By default this tunable is set to 1 (true). The kernel option
NO_MIXED_MODE changes the default to 0 to preserve existing behavior, but
adding 'hw.apic.mixed_mode=0' to loader.conf achieves the same effect.
- Only use mixed mode to route IRQ 0 if it is both enabled by the APIC
enumerator and activated by the loader tunable. Note that both
conditions must be true, so if the APIC enumerator does not enable mixed
mode, then you can't set the tunable to try to override the enumerator.
to map. If the checksum fails, the table is unmapped and a NULL pointer
returned.
- For ACPI version >= 2.0, check the extended checksum of the RSDP.
AcpiOsGetRootPointer() already checks the version 1.0 checksum.
- Remap the full MADT table at the end of madt_probe() so that we verify
its checksum before saying it is really there.
Requested by: njl
the swizzle method for routing PCI interrupts across the bridge. This
fixes problems with motherboards (typically laptops) whose BIOS doesn't
provide a PRT for the AGP bridge even though there is a device entry for
the bridge in the ACPI namespace.
Tested by: Kenneth Culver culverk at sweetdreamsracing dot biz
it back to userspace, so it does not break bind(2) on raw sockets in jails.
Currently some processes, like traceroute(8) construct a routing request
to determine its source address based on the destination. This sockaddr
data is fed directly to bind(2). When bind calls ifa_ifwithaddr(9) to
make sure the address exists on the interface, the comparison will
fail causing bind(2) to return EADDRNOTAVAIL if the data wasnt zero'ed
before initialization.
Approved by: bmilekic (mentor)
"options OFW_NEWPCI").
This is a bit overdue, the new sparc64 OFW PCI code which is
meant to replace the old one is in place for 10 months and
enabled by default in GENERIC for 8 months. FreeBSD 5.2 and
5.2.1 also shipped with the new code enabled by default.
- Some minor clean-up, e.g. remove functions that encapsulated
the #ifdefs for OFW_NEWPCI, remove unused resp. no longer
required includes, etc.
Approved by: tmm, no objections on freebsd-sparc64
It's not quite correct from a posix Point Of view, but it is a lot better
than what was there before. This will be revisited later
when we decide what form our priority extensions will take. Posix doesn't
specify how a system scope thread can change its priority so you need to
add non-standard extensions to be able to do it..
For now make this slightly non standard to allow it to be done.
Submitted by: Dan Eischen originally, changed by myself.
there's not dependencies on pccard symboles, such a dependency is not
necessary. This means that drivers that have multiple attachments can
not drag bogus devices into the kernel at load time.
We can't (yet) do this with pci and isa. Drivers written for them
actually do seem to have symbols that depend on these busses'
implementation code.
ndis not touched until other things can be tested.
the kernel. We can guarantee this by resetting the FP status register.
This masks all FP traps. The reason we did get FP traps was that we
didn't reset the FP status register in all cases.
Make sure to reset the FP status register in syscall(). This is one of
the places where it was forgotten.
While on the subject, reset the FP status register only when we trapped
from user space.
Previously, mlockall(2) usage would leak MAP_FUTUREWIRE of the process's
vmspace::vm_map and subsequent processes would wire all of their memory.
Coupled with a wired-page leak in vm_fault_unwire(), this would run the
system out of free pages and cause programs to randomly SIGBUS when
faulting in new pages.
(Note that this is not the fix for the latter part; pages are still
leaked when a wired area is unmapped in some cases.)
Reviewed by: alc
PR kern/62930
of IP options.
net.inet.ip.process_options=0 Ignore IP options and pass packets unmodified.
net.inet.ip.process_options=1 Process all IP options (default).
net.inet.ip.process_options=2 Reject all packets with IP options with ICMP
filter prohibited message.
This sysctl affects packets destined for the local host as well as those
only transiting through the host (routing).
IP options do not have any legitimate purpose anymore and are only used
to circumvent firewalls or to exploit certain behaviours or bugs in TCP/IP
stacks.
Reviewed by: sam (mentor)
devices it cannot attach to. This gets rid of extraneous but harmless
device_probe_and_attach() errors. While I'm here, make the device
description more useful. The !acpi case for cpu is handled by legacy0.
and Rx frames up to 8191 octets, so it is perfectly capable of supporting
vlan(4)-style VLAN natively.
Thus, make it support VLAN `oversize' frames.
Reviewed by: tmm
that the OHCI driver uses. Broken OHCI devices (like the controller
in my laptop, apparently) like to set this bit at times. Research
through google shows that this problem has shown up on other systems
as well.
As the scheduling overrun handler doesn't actually do anything, and
the only effect is console spamming, disabling the interrupt seems
to be the right thing to do. (And it is also what linux 2.6 does.)
allocation and deallocation. This flag's principal use is shortly after
allocation. For such cases, clearing the flag is pointless. The only
unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never
requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed
page.
Reviewed by: tegge@
individual asm versions. The global lock is shared between the BIOS and
OS and thus cannot use our mutexes. It is defined in section 5.2.9.1 of
the ACPI specification.
Reviewed by: marcel, bde, jhb
o The ieee80211_media_status() function updates the ifi_link_state field
and calls rt_ifmsg() to notify listeners on the routing socket.
Approved by: sam
2. Note that ct device uses ctau name as driver name (due to name conflict
with ct driver) and also mark it as a driver inside the CVS tree.
MFC after: 10 days
segment, remove the groping around in the Option ROM segments, remove the
bogus tests for bcopy vs. copyout. There really is no reason for a
management app to know these things other than to create l33t info tables
for the user.
a significant functional change, but it further cleans up the code and
brings it closer to being portable. Thanks to Don Bowman for helping to
test this.
and type for printing info about the device that didn't probe from child, not
parent.
This fixes a panic on systems where not yet supported devices hang off of the
nexus, e.g. on E450.
Reported by: joerg
host-PCI bridge device and find a valid $PIR.
- Make pci_pir_parse() private to pci_pir.c and have pir0's attach routine
call it instead of having legacy_pcib_attach() call it.
- Implement suspend/resume support for the $PIR by giving pir0 a resume
method that calls the BIOS to reroute each link that was already routed
before the machine was suspended.
- Dump the state of the routed flag in the links display code.
- If a link's IRQ is set by a tunable, then force that link to be re-routed
the first time it is used.
- Move the 'Found $PIR' message under bootverbose as the pir0 description
line lists the number of entries already. The pir0 line also only shows
up if we are actually using the $PIR which is a bonus.
- Use BUS_CONFIG_INTR() to ensure that any IRQs used by a PCI link are
set to level/low trigger/polarity.
active low polarity when using the PIC interrupt model. This should fix
broken SCI interrupts on machines when not using the APIC where the BIOS
doesn't program the ELCR to level trigger for the ACPI SCI.
Requested by: njl
polarity for a specified IRQ. The intr_config_intr() function wraps
this pic method hiding the IRQ to interrupt source lookup.
- Add a config_intr() method to the atpic(4) driver that reconfigures
the interrupt using the ELCR if possible and returns an error otherwise.
- Add a config_intr() method to the apic(4) driver that just logs any
requests that would change the existing programming under bootverbose.
Currently, the only changes the apic(4) driver receives are due to bugs
in the acpi(4) driver and its handling of link devices, hence the reason
for such requests currently being ignored.
- Have the nexus(4) driver on i386 implement the bus_config_intr() function
by calling intr_config_intr().
and intr_polarity enums for passing around interrupt trigger modes and
polarity rather than using the magic numbers 0 for level/low and 1 for
edge/high.
- Convert the mptable parsing code to use the new ELCR wrapper code rather
than reading the ELCR directly. Also, use the ELCR settings to control
both the trigger and polarity of EISA IRQs instead of just the trigger
mode.
- Rework the MADT's handling of the ACPI SCI again:
- If no override entry for the SCI exists at all, use level/low trigger
instead of the default edge/high used for ISA IRQs.
- For the ACPI SCI, use level/low values for conforming trigger and
polarity rather than the edge/high values we use for all other ISA
IRQs.
- Rework the tunables available to override the MADT. The
hw.acpi.force_sci_lo tunable is no longer supported. Instead, there
are now two tunables that can independently override the trigger mode
and/or polarity of the SCI. The hw.acpi.sci.trigger tunable can be
set to either "edge" or "level", and the hw.acpi.sci.polarity tunable
can be set to either "high" or "low". To simulate hw.acpi.force_sci_lo,
set hw.acpi.sci.trigger to "level" and hw.acpi.sci.polarity to "low".
If you are having problems with ACPI either causing an interrupt storm
or not working at all (e.g., the power button doesn't turn invoke a
shutdown -p now), you can try tweaking these two tunables to find the
combination that works.
IRQ is edge triggered or level triggered. For ISA interrupts, we assume
that edge triggered interrupts are always active high and that level
triggered interrupts are always active low.
- Don't disable an edge triggered interrupt in the PIC. This avoids
outb instructions to the actual PIC for traditional ISA IRQs such as
IRQ 1, 6, 14, and 15. (Fast interrupts such as IRQs 0 and 8 don't mask
their source, so this doesn't change anything for them.)
- For MCA systems we assume that all interrupts are level triggered and
thus need masking. Otherwise, we probe the ELCR. If it exists we trust
what it tells us regarding which interrupts are level triggered. If it
does not exist, we assume that IRQs 0, 1, 2, and 8 are edge triggered
and that all other IRQs are level triggered and need masking.
- Instruct the ELCR mini-driver to restore its saved state during resume.
register controlled the trigger mode and polarity of EISA interrupts.
However, it appears that most (all?) PCI systems use the ELCR to manage
the trigger mode and polarity of ISA interrupts as well since ISA IRQs used
to route PCI interrupts need to be level triggered with active low
polarity. We check to see if the ELCR exists by sanity checking the value
we get back ensuring that IRQS 0 (8254), 1 (atkbd), 2 (the link from the
slave PIC), and 8 (RTC) are all clear indicating edge trigger and active
high polarity.
This mini-driver will be used by the atpic driver to manage the trigger and
polarity of ISA IRQs. Also, the mptable parsing code will use this mini
driver rather than examining the ELCR directly.
not as a pending interrupt status, but as a matter of status quo.
Consequently, when there's no data to be transmitted the condition
is not cleared and uart_intr() is stuck in an infinite loop trying
to clear the UART_IPEND_TXIDLE status.
The z8530_bus_ipend() function is changed to return idle only once
after having sent any data.
The root cause for this problem is that we cannot use the interrupt
status bits of the SCC itself. The register that holds the interrupt
status can only be accessed by channel A and holds the status for
both channels. Using the interrupt status register would complicate
the driver because we need to synchronize access to the SCC between
the channels.
Elementary testing: marius
labeling new mbufs created from sockets/inpcbs in IPv4. This helps avoid
the need for socket layer locking in the lower level network paths
where inpcb locks are already frequently held where needed. In
particular:
- Use the inpcb for label instead of socket in raw_append().
- Use the inpcb for label instead of socket in tcp_output().
- Use the inpcb for label instead of socket in tcp_respond().
- Use the inpcb for label instead of socket in tcp_twrespond().
- Use the inpcb for label instead of socket in syncache_respond().
While here, modify tcp_respond() to avoid assigning NULL to a stack
variable and centralize assertions about the inpcb when inp is
assigned.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
lookup for the label tag fails, return NULL rather than something close
to NULL. This scenario occurs if mbuf header labeling is optional and
a policy requiring labeling is loaded, resulting in some mbufs having
labels and others not. Previously, 0x14 would be returned because the
NULL from m_tag_find() was not treated specially.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
returns okay when HW probe fails. This happens when comconsole flag is
set but VGA console is used instead.
Back out requested by: bde (He will be looking at other solutions from scratch)
test the label pointer for NULL before testing the label slot for
permitted values. When loading mac_test dynamically with conditional
mbuf labels, the label pointer may be NULL if the mbuf was
instantiated while labels were not required on mbufs by any policy.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
synchronization protecting against dynamic load and unload of MAC
policies, and instead simply blocks load and unload. In a static
configuration, this allows you to avoid the synchronization costs
associated with introducing dynamicism.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
interrupt source.
- Only do an outb() to the PIC to clear a bit in imen if the bit is set.
- Add a NUM_ISA_IRQS macro to replace uglier
'sizeof(array) / sizeof(member)' expressions along with a CTASSERT() to
ensure that the macro is correct.
than using legacy_pcib_attach(). The MP Table drivers don't use the $PIR,
and the legacy_pcib_attach() function probes and parses the $PIR in
addition to adding the pci bus child device.
o New function ip_findroute() to reduce code duplication for the
route lookup cases. (luigi)
o Store ip_len in host byte order on the stack instead of using
it via indirection from the mbuf. This allows to defer the host
byte conversion to a later point and makes a quicker fallback to
normal ip_input() processing. (luigi)
o Check if route is dampned with RTF_REJECT flag and drop packet
already here when ARP is unable to resolve destination address.
An ICMP unreachable is sent to inform the sender.
o Check if interface output queue is full and drop packet already
here. No ICMP notification is sent because signalling source quench
is depreciated.
o Check if media_state is down (used for ethernet type interfaces)
and drop the packet already here. An ICMP unreachable is sent to
inform the sender.
o Do not account sent packets to the interface address counters. They
are only for packets with that 'ia' as source address.
o Update and clarify some comments.
Submitted by: luigi (most of it)
o Extend the if_data structure with an ifi_link_state field and
provide the corresponding defines for the valid states.
o The mii_linkchg() callback updates the ifi_link_state field
and calls rt_ifmsg() to notify listeners on the routing socket
in addition to the kqueue KNOTE.
o If vlans are configured on a physical interface notify and update
all vlan pseudo devices as well with the vlan_link_state() callback.
No objections by: sam, wpaul, ru, bms
Brucification by: bde
the falling edge of a media state change.
This is in preparation for media state change notification to the
routing socket.
No objections by: sam, wpaul, ru, bms
Brucification by: bde
o Fix and improve comments and references,
o Add PFIL_HOOKS, UFS_ACL and UFS_DIRHASH,
o Switch from SCHED_4BSD to SCHED_ULE,
o Remove SCSI_DELAY (there's no SCSI support),
- change pf_get_pool() argument rule_number type from u_int32_t
to u_int8_t, fixes corruption of address pools with large
rulesets (mcbride@)
- prevent endless loops with route-to (dhartmei@)
- limit option length to 2 octets max (frantzen@)
Obtained from: OpenBSD
Approved by: mlaier(mentor), bms(mentor)
uncommitted):
Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg
(the type of tag to claim) and push it out of ip_var.h into mbuf.h
alongside all of the other macros that work ok mbuf's and tag's.
parameter).
Keep using it only in the i386 NOTES for now. It is fairly MI, but it
doesn't use bus-space and has a couple of i386 i/o instructions in pci
intitialization.
intent was to make sure that message structs allocated off of the stack were
4-byte aligned. However, the macros as defined did absolutely nothing.
And since I2O forces you to manually copy messages down to the hardware, there
really is no point of enforced alignment anyways.
rename the control device from rasr%d to asr%d. This starts us down the
path of divorcing ourselves from a very bogus design in the management
apps. Since the apps are open source now, they will likely be updated
and fixed before 5.3.
Adjust staticness and a variable name for the split of cy.c into cy.c and
cy_isa.c. Use the new header required for the split to avoid repeating
declarations in cy_pci.c.
Put @includes in a better spot. Fix many cases of 2 space indents and spaces
between a function name and the parens. Use KASSERT instead of a home-rolled
ASSERT. Remove some undeeded caddr casts.
- Define option FORCECONSPEED to force the serial console to
be CONSPEED. I've run into a lot of boards in which
the detect for prior speed doesn't work and ends up with
broken console since it is at the wrong speed.
- If a serial port is marked as a console, but console=vidconsole
and if the serial ports doesn't exist it will be probed and
attached at a 8250 chip. Then writes to that will freeze the
system.
- Add an option flags 0x400000 to mark this as a potential
comconsole in-case the one flaged with 0x10 does not exist
in the system.
This makes it easier to deploy on systems with one or two serial ports.
Obtained from: IronPort
- Use the dh_inserted member of the dispatch header in the Windows
timer structure to indicate that the timer has been "inserted into
the timer queue" (i.e. armed via timeout()). Use this as the value
to return to the caller in KeCancelTimer(). Previously, I was using
callout_pending(), but you can't use that with timeout()/untimeout()
without creating a potential race condition.
- Make ntoskrnl_init_timer() just a wrapper around ntoskrnl_init_timer_ex()
(reduces some code duplication).
- Drop Giant when entering if_ndis.c:ndis_tick() and
subr_ntorkrnl.c:ntoskrnl_timercall(). At the moment, I'm forced to
use system callwheel via timeout()/untimeout() to handle timers rather
than the callout API (struct callout is too big to fit inside the
Windows struct KTIMER, so I'm kind of hosed). Unfortunately, all
the callouts in the callwhere are not marked as MPSAFE, so when
one of them fires, it implicitly acquires Giant before invoking the
callback routine (and releases it when it returns). I don't need to
hold Giant, but there's no way to stop the callout code from acquiring
it as long as I'm using timeout()/untimeout(), so for now we cheat
by just dropping Giant right away (and re-acquiring it right before
the routine returns so keep the callout code happy). At some point,
I will need to solve this better, but for now this should be a suitable
workaround.
- Remove second license, the first was not that different and should be
fine.
- Add nexus_attach(), and do not perform its task in nexus_probe() any
more.
- Remove nexus_write_ivar(), since it was quite pointless.
- Remove superfluous devinfo members.
- Clean up some comments, minor style issues and prototypes.
as dependent on binutils features/quirks as the current one. This one
loads plain .o files without having to mess with shared object mode.
This happens to be essential on amd64, because binutils hasn't implemented
all the quirks/features that we need for producing the hack non-PIC shared
objects. As it turned out, .o format isn't all that inconvenient after
all. It looks like the ability to use the same .o files for linking
directly into a static kernel or loading as a module might be worth it.
It is still very much a work-in-progress, but it is almost usable. Other
changes are still needed in order to use it though, these have not been
committed yet. There is still a memory corruption/overrun bug somewhere.
For example, test modules load and work, but the machine explodes a few
minutes later in vm_forkproc() or the like. Notable missing things
include kldxref support, and loader(8) support. I wanted to figure out
a working baseline set of code first.
things which compare /etc/fstab entries to results from
getfsstat(). The real way to fix this is to make 'ufs2'
a recognized filesystem (for real, no beating around the
bush).
This should fix things like 'umount -a -t ufs' now.
Appologies for the previous breakage.
boot0sio.s was repo-copied to boot0.S.
- Rename boot0ext.s to boot0ext.S, to stay consistent
with other preprocessed asm files around here, and
for better portability.
Repocopied by: joe
condition where kse_wakeup() doesn't yet see them in (interruptible)
sleep queues. Also add an upcall check to sleepqueue_catch_signals()
suggested by jhb.
This commit should fix recent mysql hangs.
Reviewed by: jhb, davidxu
Mysql'd by: Robin P. Blanchard <robin.blanchard at gactr uga edu>
switch to using C99-style comments everywhere in preprocessed
assembler. The reason is that lines starting with the regexp
'^[[:space:]]#' are treated as preprocessing directives, and
while it seems to work now with GCC, it's not necessarily has
to work. Use C99 comments `//' for the trailing comments to
save whitespace.
bridges, the EBus bridge has resource ranges it claims exclusively to
map its children into in its BARs. Hence, we need to allocate these
completely and manage them for the children, instead of just passing
allocations through to the PCI layer as we did before.
While being there, split ebus_probe(), which did also contain code
normally belonging into the attach method, into ebus_probe() and
ebus_attach(), and perform some minor cleanups.
correct interrupt source.
- Cache a pointer to the i8254_intsrc's pending method to avoid several
pointer indirections in i8254_get_timecount().
Reported by: bde
Merge boot0.s and boot0sio.s into boot0_512.s controlled by "#ifdef SIO".
Add Makefile magic to generate boot0.s and boot0sio.s from boot0_512.s.
The compile boot0 and boot0sio have unchanged MD5 checksums.
channels. This also work when PCI native mode has been selected
(patch for /sys/dev/pci/pci.c needed for that) since pci_get_progif
uses the saved value for progif, not the one stored after we may have
changed from legacy mode to native PCI mode.
jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.
Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.
The patch being committed is not identical to the patch
in the PR. The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely. This change has also been
presented and addressed on the freebsd-hackers mailing
list.
Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800
libufs, which only works for Charlie root.
This change reverts the introduction of libufs and moves the
check into the kernel. Since the f_fstypename is the same
for both ufs and ufs2, we check fs_magic for presence of
ufs2 and copy "ufs2" explicitly instead.
Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
possible while maintaining compatibility with the widest range of TCP stacks.
The algorithm is as follows:
---
For connections in the ESTABLISHED state, only resets with
sequence numbers exactly matching last_ack_sent will cause a reset,
all other segments will be silently dropped.
For connections in all other states, a reset anywhere in the window
will cause the connection to be reset. All other segments will be
silently dropped.
---
The necessity of accepting all in-window resets was discovered
by jayanth and jlemon, both of whom have seen TCP stacks that
will respond to FIN-ACK packets with resets not meeting the
strict last_ack_sent check.
Idea by: Darren Reed
Reviewed by: truckman, jlemon, others(?)
1) In pci.c, we need to check the child device's state, not the parent
device's state.
2) In acpi_pci.c, we have to run the power state change after the acpi
method when the old_state is > new state, not the other way around.
Submitted by: Dmitry Remesov
PR: 65694
1. rt_check() cleanup:
rt_check() is only necessary for some address families to gain access
to the corresponding arp entry, so call it only in/near the *resolve()
routines where it is actually used -- at the moment this is
arpresolve(), nd6_storelladdr() (the call is embedded here),
and atmresolve() (the call is just before atmresolve to reduce
the number of changes).
This change will make it a lot easier to decouple the arp table
from the routing table.
There is an extra call to rt_check() in if_iso88025subr.c to
determine the routing info length. I have left it alone for
the time being.
The interface of arpresolve() and nd6_storelladdr() now changes slightly:
+ the 'rtentry' parameter (really a hint from the upper level layer)
is now passed unchanged from *_output(), so it becomes the route
to the final destination and not to the gateway.
+ the routines will return 0 if resolution is possible, non-zero
otherwise.
+ arpresolve() returns EWOULDBLOCK in case the mbuf is being held
waiting for an arp reply -- in this case the error code is masked
in the caller so the upper layer protocol will not see a failure.
2. arpcom untangling
Where possible, use 'struct ifnet' instead of 'struct arpcom' variables,
and use the IFP2AC macro to access arpcom fields.
This mostly affects the netatalk code.
=== Detailed changes: ===
net/if_arcsubr.c
rt_check() cleanup, remove a useless variable
net/if_atmsubr.c
rt_check() cleanup
net/if_ethersubr.c
rt_check() cleanup, arpcom untangling
net/if_fddisubr.c
rt_check() cleanup, arpcom untangling
net/if_iso88025subr.c
rt_check() cleanup
netatalk/aarp.c
arpcom untangling, remove a block of duplicated code
netatalk/at_extern.h
arpcom untangling
netinet/if_ether.c
rt_check() cleanup (change arpresolve)
netinet6/nd6.c
rt_check() cleanup (change nd6_storelladdr)
- Fix some comments; remove numerous superfluous or outdated ones.
- Correctly pass on the requesting device when handing requests up
to the parent bus.
- Use the complete device name, including unit number, to build the
IOMMU instance name.
- Inline a function that was only used once, and was trivial.
consistently with the rest of the code, use IFP2AC(ifp) to access
the arpcom structure given the ifp.
In this case also fix a difference in assumptions WRT the rest of
the net/ sources: it is not the 'struct *softc' that starts with a
'struct arpcom', but a 'struct arpcom' that starts with a
'struct ifnet'
- use ifp instead if &ac->ac_if in a couple of nd6* calls;
this removes a useless dependency.
- use IFP2AC(ifp) instead of an extra variable to point to the struct arpcom;
this does not remove the nesting dependency between arpcom and ifnet but
makes it more evident.
caller to vm_page_grab(). Although this gives VM_ALLOC_ZERO a
different meaning for vm_page_grab() than for vm_page_alloc(), I feel
such change is necessary to accomplish other goals. Specifically, I
want to make the PG_ZERO flag immutable between the time it is
allocated by vm_page_alloc() and freed by vm_page_free() or
vm_page_free_zero() to avoid locking overheads. Once we gave up on
the ability to automatically recognize a zeroed page upon entry to
vm_page_free(), the ability to mutate the PG_ZERO flag became useless.
Instead, I would like to say that "Once a page becomes valid, its
PG_ZERO flag must be ignored."
added an arbitrary delay to our readings, causing us to use the ACPI-safe
read method when not necessary. Submitted by: bde
Old:
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks BAD min = 3, max = 19, width = 16
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks BAD min = 3, max = 19, width = 16
ACPI timer looks GOOD min = 3, max = 5, width = 2
ACPI timer looks GOOD min = 3, max = 4, width = 1
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
New:
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
ACPI timer looks GOOD min = 3, max = 4, width = 1
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
Also, reduce unnecesary overhead in ACPI-fast by remove the barrier for
reads. The timer in the ACPI-fast case is known to increase monotonically
so there is no need to serialize access to it.
misplaced forward declarations of structs). This also reduces namespace
pollution (the misplaced declarations were declared in the !_KERNEL case
when they are not used).
would actually map the file with read access enabled. According to
http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html this is
an error. Similarly, an madvise(..., MADV_WILLNEED) would enable read
access on a virtual address range that was PROT_NONE.
The solution implemented herein is (1) to pass a vm_prot_t to
vm_map_pmap_enter() describing the allowed access and (2) to make
vm_map_pmap_enter() responsible for understanding the limitations of
pmap_enter_quick().
Submitted by: "Mark W. Krentel" <krentel@dreamscape.com>
PR: kern/64573
from tcp_hostcache would have overridden a (now) lower MTU of
an interface or route that changed since first PMTU discovery.
The bug would have caused TCP to redo the PMTU discovery when
not strictly necessary.
Make a comment about already pre-initialized default values
more clear.
Reviewed by: sam
state. Apparently it happens when both devices try to disconnect RFCOMM
multiplexor channel at the same time.
The scenario is as follows:
- local device initiates RFCOMM connection to the remote device. This
creates both RFCOMM multiplexor channel and data channel;
- remote device terminates RFCOMM data channel (inactivity timeout);
- local device acknowledges RFCOMM data channel termination. Because
there is no more active data channels and local device has initiated
connection it terminates RFCOMM multiplexor channel;
- remote device does not acknowledges RFCOMM multiplexor channel
termination. Instead it sends its own request to terminate RFCOMM
multiplexor channel. Even though local device acknowledges RFCOMM
multiplexor channel termination the remote device still keeps
L2CAP connection open.
Because of hanging RFCOMM multiplexor channel subsequent RFCOMM
connections between local and remote devices will fail.
Reported by: Johann Hugo <jhugo@icomtek.csir.co.za>
modules is a very nice way to produce hard-to-find panics. Who would look for
a bug in a Makefile anyway?
Has anyone seen the pointy hat? :-o
Approved by: njl (mentor)
ip_id again. ip_id is already set to the ip_id of the encapsulated packet.
Make a comment about mbuf allocation failures more realistic.
Reviewed by: sobomax
resource pre-allocation. The problem is that the BARs of the EBus bridges
contain the ranges for the resources for the EBus devices beyond the bridge.
So when the EBus code tries to allocate the resource for an EBus device
it's already allocated by the PCI code.
To be removed again as soon as we have a proper solution in the EBus Code.
Reviewed by: tmm
Approved by: marcel (mentor)
source address of a packet exists in the routing table. The
default route is ignored because it would match everything and
render the check pointless.
This option is very useful for routers with a complete view of
the Internet (BGP) in the routing table to reject packets with
spoofed or unrouteable source addresses.
Example:
ipfw add 1000 deny ip from any to any not versrcreach
also known in Cisco-speak as:
ip verify unicast source reachable-via any
Reviewed by: luigi
secondary bus is 0, we program the primary bus, the secondary bus and
the suborindate bus. This isn't ideal, since we start at parent_bus +
1 and store this in a static.
Ideally, we'd walk the tree and assign bus numbers. However, that's
harder to accomplish without some help from the bus layer which we're
not planning on doing that until 6.
This fixes my CardBus problems on my Sony PCG-Z1WA, and might fix the
Dells that have had problems.
sched_ule, in January 2004. Looking at this, "pagezero" is (one of) the
culprit(s). We had no provision for processes with P_NOLOAD set. With
pagezero not running at PRI_ITHD, kseq_load_{add,rem} count pagezero as
another-normal-process, thus the "expected-plus-one" load reported in
the above thread.
Submitted by: Nikos Ntarmos <ntarmos@ceid.upatras.gr>
gadgets (hotkeys, lcd, ...) on Asus laptops. I aim to closely track the
acpi4asus project which implements these features in the Linux kernel.
If this breaks your laptop, please let me know how it does it :-)
Approved by: njl (mentor)
there are a lot of other dependencies that preclude the kernel from
working). Instead, have a more generic note that isa should not be
removed. This should be less confusing for users.
(I hope.)
My original instinct to make ndis_return_packet() asynchronous was correct.
Making ndis_rxeof() submit packets to the stack asynchronously fixes
one recursive spinlock acquisition, but it's also possible for it to
happen via the ndis_txeof() path too. So:
- In if_ndis.c, revert ndis_rxeof() to its old behavior (and don't bother
putting ndis_rxeof_serial() back since we don't need it anymore).
- In kern_ndis.c, make ndis_return_packet() submit the call to the
MiniportReturnPacket() function to the "ndis swi" thread so that
it always happens in another context no matter who calls it.
workaround was for hardware where the clock was not latched, not for
hardware that was too slow. Also, make variable names more specific for ddb
printing.
While I would have prefered to have a solution that didn't move
knowledge of this into the pci layer. However, this is literally the
only exception that's listed in the PCI standard to the usual way of
decoding BARs. atapci devices in legacy mode now ignore the first 4
bars and hard code the values to the legacy ide values (well, for each
of the controllers that are in legacy mode). The 5th bar is handled
normally.
Remove the zero bar handling. zero bars should be ignored at all
other times, and since we handle that specially, we don't need the
older workaround.
what the ACPI-safe workaround is intended to fix. Requested by phk.
Set the bushandle and tag when attaching the timer, don't do it each time
in read_counter(). Pointed out by bde.
Move test_counter() to the end. Staticize acpi_timer_reg.
Clearly comment the assumptions on the structure of keys (addresses)
and masks, and introduce a macro, LEN(p), to extract the size of these
objects instead of using *(u_char *)p which might be confusing.
Comment the confusion in the types used to pass around pointers
to keys and masks, as a reminder to fix that at some point.
Add a few comments on what some functions do.
Comment a probably inefficient (but still correct) section of code
in rn_walktree_from()
The object code generated after this commit is the same as before.
At some point we should also change same variable identifiers such
as "t, tt, ttt" to fancier names such as "root, left, right" (just
in case someone wants to understand the code!), replace misspelling
of NULL as 0, remove 'register' declarations that make little sense
these days.
like "the foo(4) manual page" to "foo(4)". Uniformized the remaining
instances of "manual page" and "manpage" to "man page". Uniformized
some nearby sentence breaks. Reformatted the whole paragraph containing
these changes only for DUMMYNET.
of you with other cards, please do review and test the drivers for
MP-safety and disable Giant in the interrupt routines when you are
sure of proper functionality.
wireless ever since I added the new spinlock code. Previously, I added
a special ndis_rxeof_serial() function to insure that when we receive
a packet, we never end up calling the MiniportReturnPacket() routine
until after the receive handler has finished. I set things up so that
ndis_rxeof_serial() would only be used for serialized miniports since
they depend on this property. Well, it turns out deserialized miniports
depend on a similar property: you can't let MiniportReturnPacket() be
called from the same context as the receive handler at all. The 2100B
driver happens to use a single spinlock for all of its synchronization,
and it tries to acquire it both while in MiniportHandleInterrupt() and
in MiniportReturnPacket(), so if we call MiniportReturnPacket() from
the MiniportHandleInterrupt() context, we will end up trying to acquire
the spinlock recursively, which you can't do.
To fix this, I made the ndis_rxeof_serial() handler the default. An
alternate solution would be to make ndis_return_packet() submit
the call to MiniportReturnPacket() to the NDIS task queue thread.
I may do that in the future, after I've tested things a bit more.
supported. Symptoms of this bug included unnecessary use of ACPI-safe
and a dmesg that has deltas of about 2^24:
ACPI timer looks BAD min = 2, max = 16777206, width = 16777204
ACPI timer looks BAD min = 2, max = 7, width = 5
ACPI timer looks GOOD min = 4, max = 5, width = 1
ACPI timer looks BAD min = 2, max = 16777206, width = 16777204
ACPI timer looks BAD min = 2, max = 7, width = 5
ACPI timer looks BAD min = 2, max = 16777210, width = 16777208
ACPI timer looks BAD min = 4, max = 16777189, width = 16777185
ACPI timer looks GOOD min = 4, max = 5, width = 1
ACPI timer looks BAD min = 2, max = 7, width = 5
ACPI timer looks BAD min = 4, max = 16777189, width = 16777185
To fix this:
* Use a 32 bit timecounter mask when the timer is 32 bits.
* In test_counter(), use the acpi_TimerDelta function which handles 24/32
bit timers and wraparound.
Miscellaneous fixes:
* Use C99 initializers for timecounter struct.
* Use u_int and uint32_t where appropriate instead of unsigned.
* Remove whitespace-only lines
* Remove the old PIIX4 PCI workaround. The timecounter testing code has
been in use for long enough to prove it's functional.
globally available. acpi_TimerDelta() subtracts two readings from the
ACPI PM timer and returns the difference. It properly distinguishes between
24-bit and 32-bit timers and handles wraparound.
2. Document that this means that kernel modules must be rebuilt.
3. While I'm here, fix my sorting error in callout.h
Requested by: many [1], scottl [2], bde [3]
it checked for rt == NULL after dereferencing the pointer).
We never check for those events elsewhere, so probably these checks
might go away here as well.
Slightly simplify (and document) the logic for memory allocation
in rt_setgate().
The rest is mostly style changes -- replace 0 with NULL where appropriate,
remove the macro SA() that was only used once, remove some useless
debugging code in rt_fixchange, explain some odd-looking casts.
implementation taken directly from OpenBSD.
I've resisted committing this for quite some time because of concern over
TIME_WAIT recycling breakage (sequential allocation ensures that there is a
long time before ports are recycled), but recent testing has shown me that
my fears were unwarranted.
TIME_WAIT recycling cases I was able to generate with http testing tools.
In short, as the old algorithm relied on ticks to create the time offset
component of an ISN, two connections with the exact same host, port pair
that were generated between timer ticks would have the exact same sequence
number. As a result, the second connection would fail to pass the TIME_WAIT
check on the server side, and the SYN would never be acknowledged.
I've "fixed" this by adding random positive increments to the time component
between clock ticks so that ISNs will *always* be increasing, no matter how
quickly the port is recycled.
Except in such contrived benchmarking situations, this problem should never
come up in normal usage... until networks get faster.
No MFC planned, 4.x is missing other optimizations that are needed to even
create the situation in which such quick port recycling will occur.
a NULL crsbuf pointer. This shouldn't happen if it returns AE_OK. We'll
figure out why this is happening later.
Submitted by: Bruno Ducrot <ducrot@poupinou.org>
routine since the error will be reported back to the user buffer.
This will quiet down the bootverbose case when using an ACU which
does brute force discovery of the physical and logical devices.
the same process as the current thread it makes absolutely
no sense to lock the parent process through the pointer in
said thread.
Submitted by: pho (with minor correction)
Pointy Hat To: mtm
this patch were submitted by Maurycy Pawlowski-Wieronski. In addition
to Maurycy's change, break out softc tear down from ppp_clone_destroy()
into ppp_destroy() rather than performing a convoluted series of
extraction casts and indirections during tear down at mod unload.
Submitted by: Maurycy Pawlowski-Wieronski <maurycy@fouk.org>
of the struct, so that a placeholder for it (or unportable C99
initializers) are not needed for entries that don't use it. Use a C99
initializer for the 1 entry that uses it. Removed 91 placeholders.
This also restores API compatibility with NetBSD and RELENG_4 for most
entries.
+ remove useless wrappers around bcmp(), bcopy(), bzero().
The code assumes that bcmp() returns 0 if the size is 0, but
this is true for both the libc and the libkern versions.
+ nuke Bcmp, Bzero, Bcopy from radix.h now that nobody uses them anymore.
Removed the requirement for a particular subvendor/subproduct in
rev.1.26 (VScom PCI-800L card). While the BARs, etc., may depend on
the sub-ids, this is not known to be so, and I think it is better to
guess that they don't. The decision to check sub-id checks in this
file is apparently random; for VScom cards they were checked in 3 of
8 cases.
Reviewed by: timeout by committer (joerg) after 6 months
there so there are no ABI changes);
+ replace 5 redefinitions of the IPF2AC macro with one in if_arp.h
Eventually (but before freezing the ABI) we need to get rid of
struct arpcom (initially with the help of some smart #defines
to avoid having to touch each and every driver, see below).
Apart from the struct ifnet, struct arpcom now only stores a copy
of the MAC address (ac_enaddr, but we already have another copy in
the struct ifnet -- if_addrhead), and a netgraph-specific field
which is _always_ accessed through the ifp, so it might well go
into the struct ifnet too (where, besides, there is already an entry
for AF_NETGRAPH data...)
Too bad ac_enaddr is widely referenced by all drivers. But
this can be fixed as follows:
#define ac_enaddr ac_if.the_original_ac_enaddr_in_struct_ifnet
(note that the right hand side would likely be a pointer rather than
the base address of an array.)
+ replace 0 with NULL where appropriate (not complete)
+ remove register declaration while there
+ add argument names to function prototypes to have a better idea of
what they are used for
+ add 'const' qualifiers in 3 places
Nehemiah chip, but the work is all done in hardware.
There are three opportunities to add other entropy; the Data
Buffer, the Cipher's IV and the Cipher's key. A future commit
will exploit these opportunities.
+ remove a partly incorrect comment that i introduced in the last commit;
+ deal with the correct part of the above comment by cleaning up the
updates of 'info' -- rti_addrs needd not to be updated,
rti_info[RTAX_IFP] can be set once outside the loop.
While at it, correct a few misspelling of NULL as 0, but there are
way too many in this file, and i did not want to clutter the
important part of this commit.
Logical volumes on these devices show up as LUNs behind another
controller (also known as proxy controller). In order to issue
firmware commands for a volume on a proxy controller, they must be
targeted at the address of the proxy controller it is attached to,
not the Host/PCI controller.
A proxy controller is defined as a device listed in the INQUIRY
PHYSICAL LUNS command who's L2 and L3 SCSI addresses are zero. The
corresponding address returned defines which "bus" the controller
lives on and we use this to create a virtual CAM bus.
A logical volume's addresses first byte defines the logical drive
number. The second byte defines the bus that it is attached to
which corresponds to the BUS of the proxy controller's found or the
Host/PCI controller.
Change event notification to be handled in its own kernel thread.
This is needed since some events may require the driver to sleep
on some operations and this cannot be done during interrupt context.
With this change, it is now possible to create and destroy logical
volumes from FreeBSD, but it requires a native application to
construct the proper firmware commands which is not publicly
available.
Special thanks to John Cagle @ HP for providing remote access to
all the hardware and beating on the storage engineers at HP to
answer my questions.
Specifically, we used to enable the source after locking sched_lock
and just before we had already decided to do a context switch.
This meant that an ithread could never process more than one interrupt
per context switch. Enabling earlier in the loop before sched_lock is
acquired allows an ithread to handle multiple interrupts per context
switch if interrupts fire very rapidly. For the case of heavy interrupt
load this can reduce the number of context switches (and thus overhead)
as well as reduce interrupt latency.
- Now that we can handle multiple interrupts per context switch, add simple
interrupt storm protection to threaded interrupts. If X number of
consecutive interrupts are triggered before the itherad voluntarily
yields to another thread, then the interrupt thread will sleep with the
associated interrupt source disabled (masked) for 1/10th of a second.
The default value of X is 500, but it can be tweaked via the tunable/
sysctl hw.intr_storm_threshold. If an interrupt storm is detected, then
a message is output to the kernel console on the first occurrence per
interrupt thread. Interrupt storm protection can be disabled completely
by setting this value to 0. There is no scientific reasoning for the
1/10th of a second or 500 interrupts values, so they may require tweaking
at some point in the future.
Tested by: rwatson (an earlier version w/o the storm protection)
Tested by: mux (reportedly made a machine with two PCI interrupts
storming usable rather than hard locked)
Reviewed by: imp
different BIOSs use the same exact settings to mean two very different and
incompatible things for the SCI. Thus, if the SCI is remapped to a PCI
interrupt, we now trust the trigger/polarity that the MADT provides by
default. However, the SCI can be forced to level/lo as 1.10 did by setting
the tunable "hw.acpi.force_sci_lo" to a non-zero value from the loader.
Thus, if rev 1.10 caused an interrupt storm, it should nwo fix your
machine. If rev 1.10 fixed an interrupt storm on your machine, you
probably need to set the aforementioned tunable in /boot/loader.conf to
prevent the interrupt storm.
The more general problem of getting the SCI's trigger/polarity programmed
"correctly" (for some value of correctly meaning several workarounds for
broken BIOSs and inconsistent "implementations" of the ACPI standard) is
going to require more work, but this band-aid should improve the current
situation somewhat.
Requested by: njl
uiomove(9) is not properly locked. So, return to NEEDGIANT
mode. Later, when uiomove is finely locked, I'll revisit.
While I'm here, provide some temporary debugging output to
help catch blocking startups.
unconditionally initialize the mbuf header even if cluster allocation
failed, which could result in a NULL pointer dereference in low-memory
conditions.
PR: kern/65548
Submitted by: Stephan Uphoff <ups@tree.com>
the TAILQ_FOREACH() form.
Comment the need to store the same info (mac address for ethernet-type
devices) in two different places.
No functional changes. Even the compiler output should be unmodified
by this change.
of an interface. No functional change.
On passing, comment an useless invocation of TAILQ_INIT(&ifp->if_addrhead)
which could probably be removed in the interest of clarity.
of an interface. No functional change.
On passing, comment a likely bug in net/rtsock.c:sysctl_ifmalist()
which, if confirmed, would deserve to be fixed and MFC'ed
ntoskrnl_unlocl_dpc().
- hal_raise_irql(), hal_lower_irql() and hal_irql() didn't work right
on SMP (priority inheritance makes things... interesting). For now,
use only two states: DISPATCH_LEVEL (PI_REALTIME) and PASSIVE_LEVEL
(everything else). Tested on a dual PIII box.
- Use ndis_thsuspend() in ndis_sleep() instead of tsleep(). (I added
ndis_thsuspend() and ndis_thresume() to replace kthread_suspend()
and kthread_resume(); the former will preserve a thread's priority
when it wakes up, the latter will not.)
- Change use of tsleep() in ndis_stop_thread() to prevent priority
change on wakeup.
if the link-level address has been initialized already.
The majority of modern drivers never does this and works fine, which
makes me think that the check is totally unnecessary and a residue
of cut&paste from other drivers.
This change is done to simplify locking because now almost none of the
drivers uses this field. The exceptions are "ct" "ctau" and "cx"
where i am not sure if i can remove that part.
This avoids presenting invalid data to the client's applications
when the file is modified, and then extended within the window of
the resolution of the modifcation timestamp.
Reviewed By: iedowse
PR: kern/64091
because they bogusly check for defined(INTR_MPSAFE) -- something which
never was a #define. Correct the definitions.
This make INTR_TYPE_AV finally get used instead of the lower-priority
INTR_TYPE_TTY, so it's quite possible some improvement will be had
on sound driver performance. It would also make all the drivers
marked INTR_MPSAFE actually run without Giant (which does seem to
work for me), but:
INTR_MPSAFE HAS BEEN REMOVED FROM EVERY SOUND DRIVER!
It needs to be re-added on a case-by-case basis since there is no one
who will vouch for which sound drivers, if any, willy actually operate
correctly without Giant, since there hasn't been testing because of
this bug disabling INTR_MPSAFE.
Found by: "Yuriy Tsibizov" <Yuriy.Tsibizov@gfk.ru>
attempting to duplicate Windows spinlocks. Windows spinlocks differ
from FreeBSD spinlocks in the way they block preemption. FreeBSD
spinlocks use critical_enter(), which masks off _all_ interrupts.
This prevents any other threads from being scheduled, but it also
prevents ISRs from running. In Windows, preemption is achieved by
raising the processor IRQL to DISPATCH_LEVEL, which prevents other
threads from preempting you, but does _not_ prevent device ISRs
from running. (This is essentially what Solaris calls dispatcher
locks.) The Windows spinlock itself (kspin_lock) is just an integer
value which is atomically set when you acquire the lock and atomically
cleared when you release it.
FreeBSD doesn't have IRQ levels, so we have to cheat a little by
using thread priorities: normal thread priority is PASSIVE_LEVEL,
lowest interrupt thread priority is DISPATCH_LEVEL, highest thread
priority is DEVICE_LEVEL (PI_REALTIME) and critical_enter() is
HIGH_LEVEL. In practice, only PASSIVE_LEVEL and DISPATCH_LEVEL
matter to us. The immediate benefit of all this is that I no
longer have to rely on a mutex pool.
Now, I'm sure many people will be seized by the urge to criticize
me for doing an end run around our own spinlock implementation, but
it makes more sense to do it this way. Well, it does to me anyway.
Overview of the changes:
- Properly implement hal_lock(), hal_unlock(), hal_irql(),
hal_raise_irql() and hal_lower_irql() so that they more closely
resemble their Windows counterparts. The IRQL is determined by
thread priority.
- Make ntoskrnl_lock_dpc() and ntoskrnl_unlock_dpc() do what they do
in Windows, which is to atomically set/clear the lock value. These
routines are designed to be called from DISPATCH_LEVEL, and are
actually half of the work involved in acquiring/releasing spinlocks.
- Add FASTCALL1(), FASTCALL2() and FASTCALL3() macros/wrappers
that allow us to call a _fastcall function in spite of the fact
that our version of gcc doesn't support __attribute__((__fastcall__))
yet. The macros take 1, 2 or 3 arguments, respectively. We need
to call hal_lock(), hal_unlock() etc... ourselves, but can't really
invoke the function directly. I could have just made the underlying
functions native routines and put _fastcall wrappers around them for
the benefit of Windows binaries, but that would create needless bloat.
- Remove ndis_mtxpool and all references to it. We don't need it
anymore.
- Re-implement the NdisSpinLock routines so that they use hal_lock()
and friends like they do in Windows.
- Use the new spinlock methods for handling lookaside lists and
linked list updates in place of the mutex locks that were there
before.
- Remove mutex locking from ndis_isr() and ndis_intrhand() since they're
already called with ndis_intrmtx held in if_ndis.c.
- Put ndis_destroy_lock() code under explicit #ifdef notdef/#endif.
It turns out there are some drivers which stupidly free the memory
in which their spinlocks reside before calling ndis_destroy_lock()
on them (touch-after-free bug). The ADMtek wireless driver
is guilty of this faux pas. (Why this doesn't clobber Windows I
have no idea.)
- Make NdisDprAcquireSpinLock() and NdisDprReleaseSpinLock() into
real functions instead of aliasing them to NdisAcaquireSpinLock()
and NdisReleaseSpinLock(). The Dpr routines use
KeAcquireSpinLockAtDpcLevel() level and KeReleaseSpinLockFromDpcLevel(),
which acquires the lock without twiddling the IRQL.
- In ndis_linksts_done(), do _not_ call ndis_80211_getstate(). Some
drivers may call the status/status done callbacks as the result of
setting an OID: ndis_80211_getstate() gets OIDs, which means we
might cause the driver to recursively access some of its internal
structures unexpectedly. The ndis_ticktask() routine will call
ndis_80211_getstate() for us eventually anyway.
- Fix the channel setting code a little in ndis_80211_setstate(),
and initialize the channel to IEEE80211_CHAN_ANYC. (The Microsoft
spec says you're not supposed to twiddle the channel in BSS mode;
I may need to enforce this later.) This fixes the problems I was
having with the ADMtek adm8211 driver: we were setting the channel
to a non-standard default, which would cause it to fail to associate
in BSS mode.
- Use hal_raise_irql() to raise our IRQL to DISPATCH_LEVEL when
calling certain miniport routines, per the Microsoft documentation.
I think that's everything. Hopefully, other than fixing the ADMtek
driver, there should be no apparent change in behavior.
* In the resume path, give up after waiting for a while
for WAK_STS to be set. Some BIOSs never set it.
* Allow access to the field if it is within the region size rounded
up to a multiple of the access byte width. This overcomes "off-by-one"
programming errors in the AML often found in Toshiba laptops.
in favour of rtalloc_ign(), which is what would end up being called
anyways.
There are 25 more instances of rtalloc() in net*/ and
about 10 instances of rtalloc_ign()
change the video output but use a separate device with a DSSX method
and a HID of "TOS6201" instead. We use a pseudo-driver to get the handle
for this object and pass it to the acpi_toshiba driver.
This is untested but seems to match the Linux Toshiba driver.
same problems as their Hurricane 575* bretheren in that one could set
the memory mapped port, but that has no effect. Add a quirk for this.
# I'll have to see if I can dig up documentation on these parts to see
# if there's someway software can know this other than a table...
the sense that any write to them reads back as a 0. This presents a
problem to our resource allocation scheme. If we encounter such vars,
the code now treats them as special, allowing any allocation against
them to succeed. I've not seen anything in the standard to clearify
what host software should do when it encounters these sorts of BARs.
Also cleaned up some output while I'm here and add commmented out
bootverbose lines until I'm ready to reduce the verbosity of boot
messages.
This gets a number of south bridges and ata controllers made mostly by
VIA, AMD and nVidia working again. Thanks to Soren Schmidt for his
help in coming up with this patch.
the space occupied by a struct sockaddr when passed through a
routing socket.
Use it to replace the macro ROUNDUP(int), that does the same but
is redefined by every file which uses it, courtesy of
the School of Cut'n'Paste Programming(TM).
(partial) userland changes to follow.
controllers (PDC203** PDC206**).
This also adds preliminary support for the Promise SX4/SX4000 but *only*
as a "normal" Promise ATA controller (ATA RAID's are supported though
but only RAID0, RAID1 and RAID0+1).
This cuts off yet another 5-8% of the command overhead on promise controllers,
making them the fastest we have ever had support for.
Work is now continuing to add support for this in ATA RAID, to accellerate
ATA RAID quite a bit on these controllers, and especially the SX4/SX4000
series as they have quite a few tricks in there..
This commit also adds a few fixes to the SATA code needed for proper support.
Alignment for pccards should also be treated in a similar way that
we tread it for cardbus cards.
Remove bogus debugs while I'm here.
# This is also necessary to make the CIS reading work.
Submitted by: Carlos Velasco
(1) Align to 64k for the CIS. Some cards don't like it when we aren't
aligned to a 64k boundary. I can't find anything in the standard
that requires this, but I have 1/2 dozen cards that won't work at
all unless I enable this.
(2) Sleep 1s before scanning the CIS. This may be a nop, but has little
harm.
(3) The CIS can be up to 4k in some weird, odd-ball edge cases. Since we
have limiters for when that's not the case, it does no harm to increase
it to 4k.
#1 was submitted, in a different form, by Carlos Velasco.
a LOR against sleepq. Fix the comment, and fix ptracestop() to pick up
sched_lock after stop() rather than before.
Reported by: Scott Sipe <cscotts@mindspring.com>
Reviewed by: rwatson, jhb
I'm not sure this is completely correct but at least this
is consistent with the accounting of incoming broadcasts.
PR: kern/65273
Submitted by: David J Duchscher <daved@tamu.edu>
FreeBSD, we can have a negative available space value, but the
corresponding fields in the NFS protocol are unsigned. So
trnucate the value to 0 if it's negative, so that the client
doesn't receive absurdly high values.
Tested by: cognet
Fix by dhartmei@ and mcbride@
1.433
Properly m_copyback() modified TCP sequence number after demodulation
1.432
Fix icmp checksum when sequence number modlation is being used.
Also fix a daddr vs saddr cut-n-paste error in ICMP error handling.
Fixes PR 3724
Obtained from: OpenBSD
Reviewed by: dhartmei
Approved by: rwatson
Fix by dhartmei@ and mcbride@
1.433
Properly m_copyback() modified TCP sequence number after demodulation
1.432
Fix icmp checksum when sequence number modlation is being used.
Also fix a daddr vs saddr cut-n-paste error in ICMP error handling.
Fixes PR 3724
This includes a modified form of some code from Thomas Moestl (tmm@)
to properly clean up the UMA zone and the "nfsnodehashtbl" hash
table.
Reviewed By: iedowse
PR: 16299
we get the resource allocation stuff hammered out.
Fix and off by one error that caused unnecessary filtering of valid
BARs for only 4 bytes than ICH3 and other PCI IDE controllers have.
Andrew Gallatin submitted this, although it doesn't solve the problems
ICH3 controllers have with the new code, it does restore the former
resource list on the probe line.
move its declaration to the machine-dependent header file on those
machines that use it. In principle, only i386 should have it.
Alpha and AMD64 should use their direct virtual-to-physical mapping.
- Remove pmap_kenter_temporary() from ia64. It is unused. Approved
by: marcel@
The VIA Nehemias is so obviously specific to i386 that it should not
be compiled on non-i386 platforms. The obviousness is in the fact that
all functions in nehemias.c are purely i386 inline assembly, guarded
by #ifdef __i386__
device in D0 to D0, that's a no-op, however the messages seem to be
confusing some people. Eventually, these messages will be parked
behind a if (bootverbose).
# I don't think this will fix any real bugs...
Xircom had an unfortunate habit of re-using PCMCIA IDs for quite different
cards - the xe driver knows about this and uses the first byte of 'extra'
PCMCIA ID info to identify cards with ambiguous IDs.
Reviewed by: imp (mentor)
can more easily be used INSTEAD OF the hard-working Yarrow.
The only hardware source used at this point is the one inside
the VIA C3 Nehemiah (Stepping 3 and above) CPU. More sources will
be added in due course. Contributions welcome!
o Save and restore bars for suspend/resume as well as for D3->D0
transitions.
o preallocate resources that the PCI devices use to avoid resource
conflicts
o lazy allocation of resources not allocated by the BIOS.
o set unattached drivers to state D3. Set power state to D0
before probe/attach. Right now there's two special cases
for this (display and memory devices) that need work in other
areas of the tree.
Please report any bugs to me.
Reference objects changed from ACPI_TYPE_ANY to ACPI_TYPE_LOCAL_REFERENCE
in Oct. 2002, this may help systems where switching the cooler on failed.
We support both types for now until this sorts out.
some machines to enable wake events for more devices although I haven't
seen a system yet that uses this form. Also, introduce acpi_GetReference()
which retrieves an object reference from various types.
address discovery and caching (similar to inet ARP). Use a single
global mutex, aarptab_mtx, to protect the table. Remove spl/spx.
Tested by: Bob Bishop <rb@gid.co.uk>
the reduction of the pager map's size by 8M bytes. In other words, eight
megabytes of largely wasted KVA are returned to the kernel map for use
elsewhere.
side effect of that change caused headers to not be sent if a 0 byte
file was passed to sendfile. This change fixes that behavior, allowing
sendfile to send out the headers even with a 0 byte file again.
Noticed by: Dirk Engling
equal to the process ID) is still present when we dump a core. It
already may have been destroyed. In that case we would end up
dereferencing a NULL pointer, so specifically test for that as well.
Reported & tested by: Dan Nelson <dnelson@allantgroup.com>
Remove spurious whitespace, add indent protection, fix punctuation,
remove initialization of static variables to zero, put wakeup_ctr
and wakeup_needed in the correct order. (reported by bde)
This doesn't fix all the style bugs I introduced, but the remaining
style bugs make it easier for me to understand what I did here.
are enumerated in the ACPI device tree. In addition to the normal PCI
powerstate functionality, the ACPI _PSx methods are executed and ACPI
PowerResources are switched on and off via the acpi_pwr_switch_consumer()
function.
Glanced at by: imp, njl
before calling BUS_GET_RESOURCE_LIST(). Previously, the list head would
only be initialized if BUS_GET_RESOURCE_LIST() succeeded; it needs to
be initialized unconditionally so that the list cleanup code won't
trip over potential stack garbage.
we convert ip_len into a network byte order; in_delayed_cksum() still
expects it in host byte order.
The symtom was the ``in_cksum_skip: out of data by %d'' complaints
from the kernel.
To add to the previous commit log. These fixes make tcpdump(1) happy
by not complaining about UDP/TCP checksum being bad for looped back
IP multicast when multicast router is deactivated.
Reported by: Vsevolod Lobko
a callout, and use the new callout_drain API to make sure that a callout
has finished before we deallocate memory it is using.
PR: kern/64121
Discussed with: gallatin
license, per letter dated July 22, 1999 and irc message from Robert
Watson saying that clause 3 can be removed from those files with an
NAI copyright that also have only a University of California
copyrights.
Approved by: core, rwatson
callout_stop(), except that if the callout being stopped is currently
in progress, it blocks attempts to reset the callout and waits until the
callout is completed before it returns.
This makes it possible to clean up callout-using code safely, e.g.,
without potentially freeing memory which is still being used by a callout.
Reviewed by: mux, gallatin, rwatson, jhb
count is protected by the mutex that protects the condition, so the count
does not require any extra locking or atomic operations. It serves as an
optimization to avoid calling into the sleepqueue code at all if there are
no waiters.
Note that the count can get temporarily out of sync when threads sleeping
on a condition variable time out or are aborted. However, it doesn't hurt
to call the sleepqueue code for either a signal or a broadcast when there
are no waiters, and the count is never out of sync in the opposite
direction unless we have more than INT_MAX sleeping threads.
to awaken all waiters when a contested mutex is released instead of just
the highest priority waiter. If the various threads are awakened in
sequence then each thread may acquire and release the lock in question
without contention resulting in fewer expensive unlock and lock
operations. This old behavior of waking just the highest priority is
still used if this option is specified. Making the algorithm conditional
on a kernel option will allows us to benchmark both cases later and
determine which one should be used by default.
Requested by: tanimura-san
more consistent with other APIs. sleepq and cv's use signal/broadcast, and
msleep uses wakeup_one/wakeup. Prior to this turnstiles were using a
signal/wakeup mixture.
to implement this mistake.
Fixed some nearby style bugs (initialization in declaration, misformatting
of this initialization, missing blank line after the declaration, and
comparision of the non-boolean result of the initialization with 0 using
"!". In KNF, "!" is not even used to compare booleans with 0).
- don't say what a small subset of the options includes are for.
- don't mark up functions which use all their args with /* ARGSUSED */.
The markup should have been removed when the unused retval parameter
was removed.
- don't comment on what routine suser() checks do. Removed nearby
excessive vertical whitespace.
While here, begin fixing dependencies of <sys/mount.h> on normal namespace
pollution (__BSD_VISIBLE) by not using u_int in the prototype for nmount(2),
although it is used in the man page.
While there, begin cleaning up another set of prototypes:
- use u_int in the prototype for the kernel part of nmount().
- consistently don't use parameter names in prototypes in the
"exported vnode operations" set of prototypes, although style(9) says to
use names in the kernel.
status registers for error conditions and updating statistics
when there are cycles left (inspired by the nge(4) driver).
- Removed the TX list counter and the producer/consumer gap; it's
enough to just ensure we don't reuse the last (free) descriptor,
as the chip may not have read its next pointer yet. If we reuse
it, the TX may stall under a heavy TX load with polling enabled.
- Dropped code to recharge the watchdog timer, it's pointless; the
watchdog routine will re-init the chip and both RX and TX lists.
For now, preserve the gif_called functionality to limit the nesting
level because uncontrolled nesting can easily cause the kernel stack
exhaustion. Rumors are it should be shot to allow people to easily
shoot themselves in the foot, but I have ran out of cartridges. ;)
case we should wait for the resetdone handler to be called before
returning.
- When providing resources via ndis_query_resources(), uses the
computed rsclen when using bcopy() to copy out the resource data
rather than the caller-supplied buffer length.
- Avoid using ndis_reset_nic() in if_ndis.c unless we really need
to reset the NIC because of a problem.
- Allow interrupts to be fielded during ndis_attach(), at least
as far as allowing ndis_isr() and ndis_intrhand() to run.
- Use ndis_80211_rates_ex when probing for supported rates. Technically,
this isn't supposed to work since, although Microsoft added the extended
rate structure with the NDIS 5.1 update, the spec still says that
the OID_802_11_SUPPORTED_RATES OID uses ndis_80211_rates. In spite of
this, it appears some drivers use it anyway.
- When adding in our guessed rates, check to see if they already exist
so that we avoid any duplicates.
- Add a printf() to ndis_open_file() that alerts the user when a
driver attempts to open a file under /compat/ndis.
With these changes, I can get the driver for the SMC 2802W 54g PCI
card to load and run. This board uses a Prism54G chip. Note that in
order for this driver to work, you must place the supplied smc2802w.arm
firmware image under /compat/ndis. (The firmware is not resident on
the device.)
Note that this should also allow the 3Com 3CRWE154G72 card to work
as well; as far as I can tell, these cards also use a Prism54G chip.
without the (defunct) isa compatibility shims. The new-bus-specific
parts are very similar to the ones for the pci probe and attach.
This was held up too long waiting for a repo copy to src/sys/dev/cy,
so I decided to fix the files in their old place. This gives easier
to read and merge diffs anyway.
The "count" line in src/sys/conf/files won't be changed until after
the repo copy, so old kernel configs that specify a count need not be
(and must not be) changed until then. The count is just ignored in
the driver. One unfinished detail is dynamic allocation of arrays
with <count> and (<count> * 32) entries, and iteration over the arrays.
This is now kludged with a fixed count of 10 (up to 10 cards with up
to 32 ports each).
Prodded by: imp
Submitted by: mostly by imp
Approved by: imp
common attach function so that the lock gets initialized in all cases.
This fixes breakage of the initialization of the lock in the pci case
in rev.1.135 (between the releases of 5.1 and 5.2). The lock is only
used in the SMP case, so this bug was not always fatal.
vm_mmap_vnode function, where we can safely check for a special /dev/zero
case. Rev. 1.180 has reordered checks and introduced a regression.
Submitted by: alc
Was broken by: kan
and used by uart(4) for the channel conflicted with the port offset for
the Z8530. The Z8530 has the channels reversed (i.e. channel B is at
offset 0 and channel A is at offset 4). Assign the port offsets in the
right order so that uart(4) will properly attach to the channels.
Submitted by: Marius Strobl <marius@alchemy.franken.de>
It was fixed by moving problemetic checks, as well as checks that
doesn't need locking before locks are acquired.
Submitted by: Ryan Sommers <ryans@gamersimpact.com>
In co-operation with: cperciva, maxim, mlaier, sam
Tested by: submitter (previous patch), me (current patch)
Reviewed by: cperciva, mlaier (previous patch), sam (current patch)
Approved by: sam
Dedicated to: enough!
SCHED_INTERACT_MAX was used where SCHED_SLP_RUN_MAX was needed. This was
causing the interactivity scaler to lose history at a more dramatic rate
than intended.
scenario into #ifdef DEBUG. This makes my cluster with Belkin
KVM switch completely usable, even if the KVM switch and mouse
get a bit confused sometimes.
Without this, when the mouse gets confused, all sorts of crud
gets spammed all over the screen. With this, the mouse may appear
dead for a second or three, but it recovers silently.
phandle_t. Since both are typedefed to unsigned int, this is more
or less cosmetic.
- Fix the code that determines whether a creator instance was used
for firmware output (and should not be blanked on initialization).
Since r1.2 of dev/fb/creator.c, this consisted comparing a handle of
an instance of a package with a handle of the package itself.
Use the test from r1.1, which utilizes OF_instance_to_package().
Submitted by: Marius Strobl <marius@alchemy.franken.de>
+ struct ifnet: remove unused fields, move ipv6-related field close
to each other, add a pointer to l3<->l2 translation tables (arp,nd6,
etc.) for future use.
+ struct route: remove an unused field, move close to each
other some fields that might likely go away in the future
"...If "keyboard" is the selected input-device and "screen" the
output-device (both via /options) but the keyboard is unplugged,
OF automatically switches to ttya for the console, it even prints
a line telling so on "screen". Solaris respects this behaviour and
uses ttya as the console in this case and people probably expect
FreeBSD to do the same (it's also very handy to temporarily switch
consoles)..."
"...I changed the comparison of the console device with "ttya" ||
"ttyb" to "tty" because on AXe boards all 4 onboard UARTs end in
SUB-D connectors (ttya and ttyb being 16550 and ttyc and ttyd a
SAB82532) and there's no Sun keyboard connector (but PS/2). If one
plugs a serial card in a box there also can be more than just ttya
and ttyb available for a console..."
Submitted by: Marius Strobl <marius@alchemy.franken.de>
Has no doubt that the change is correct: marcel
"... uart_cpu_sparc64.c currently only looks at /options if ttyX is
the selected console. However, there's one case where it should
additionally look at /chosen. If "keyboard" is the selected input-
device and "screen" the output-device (both via /options) but the
keyboard is unplugged, OF automatically switches to ttya for the
console. It even prints a line telling so on "screen". Solaris
respects this behaviour and uses ttya as the console in this case
and people probably expect FreeBSD to do the same (it's also very
handy to temporarily switch consoles)..."
Submitted by: Marius Strobl <marius@alchemy.franken.de>
Has no doubt the change is correct: marcel
WARNS=6. I don't change the WARNS level in the Makefile because I
didn't tested this on other archs.
The fs.h fix was suggested by: marcel
Reviewed by: md5(1)
of UARTs. We already did this in uart_cpu_getdev().
While here, also check the compat name for "su" or "su16550".
Both changes submitted by: Marius Strobl <marius@alchemy.franken.de>
Does not doubt the correctness of the second change: marcel
it belongs. Change the implementation to match those of rfs() and
rgs() for consistency and irrespective of whether the original was
more correct or not (technically speaking).
in the process. This is required for proper debugging of corefiles
created by 1:1 or M:N threaded processes. Add an XXX comment where
we should actually call a function that dumps MD specific notes.
An example of a MD specific note is the NT_PRXFPREG note for SSE
registers.
Since BFD creates non-annotated pseudo-sections for the first PRSTATUS
and FPREGSET notes (non-annotated in the sense that the name of the
section does not contain the pid/tid), make sure those sections describe
the initial thread of the process (i.e. the thread which tid equals the
pid). This is not strictly necessary, but makes sure that tools that use
the non-annotated section names will not change behaviour due to this
change.
The practical upshot of this all is that one can see the threads in
the debugger when looking at a corefile. For 1:1 threading this means
that *all* threads are visible.
NFSv3. It's likely that modifying the attributes will affect the
file's accessibility. This version of the patch is one suggested
by Ian Dowse after reviewing my original attempt in the PR
Reviewed By: iedowse
PR: kern/44336
MFC after: 3 days
is twofold:
1. When a 1:1 or M:N threaded process dumps core, we need to put the
register state of each of its kernel threads in the core file.
This can only be done by differentiating the pid field in the
respective note. For this we need the tid.
2. When thread support is present for remote debugging the kernel
with gdb(1), threads need to be identified by an integer due to
limitations in the remote protocol. This requires having a tid.
To minimize the impact of having thread IDs, threads that are created
as part of a fork (i.e. the initial thread in a process) will inherit
the process ID (i.e. tid=pid). Subsequent threads will have IDs larger
than PID_MAX to avoid interference with the pid allocation algorithm.
The assignment of tids is handled by thread_new_tid().
The thread ID allocation algorithm has been written with 3 assumptions
in mind:
1. IDs need to be created as fast a possible,
2. Reuse of IDs may happen instantaneously,
3. Someone else will write a better algorithm.
Add in missing case for i845G in the attach routine. I'll MFC this
with the rest of the change after the 4.10 codefreeze lifts.
Reviewed By: Doug Rabson
Under polling(4), we counted non-existent output packets and wasted
CPU cycles, corrected. (PR kern/64975.)
The fix in revision 1.71 to correct resetting of the watchdog timer
was wrong.
In rl(4), the TX list does not have a gap between the consumer and
producer, so the "empty TX list" test was wrong, corrected.
Also, resetting the timer to five each time we know there is still
some TX work to do was a bad idea -- under polling(4), if the chip
goes out to lunch, this results in the watchdog routine to _never_
be called. Instead, let the timer downgrade to zero and fire the
watchdog, then reset it to five when it is zero AND there is some
TX work left. (Most other network drivers need this fix too.)
MFC after: 3 days
Moved the RX ring resyncing code to ste_rxeoc(), and only run it
if we were asked to POLL_AND_CHECK_STATUS, under DEVICE_POLLING.
(This significantly reduces the CPU load.)
Improved the RX ring resyncing code by re-checking if the head
is still empty before doing resyncing. This mostly affects the
DEVICE_POLLING mode, where we run this code periodically. We
could start checking with an empty head (well, an empty ring
even), and after doing a few iterations, the chip might write
a few entries, including the head, and we would bogusly consider
this case as requiring resyncing. On a test box, this reduced
the number of resyncs done by a factor of 10.
In ste_txeof(sc), only reset the watchdog timer to zero when
the TX list is completely empty.
Converted ste_tx_prev_idx to a pointer -- faster.
Removed some bitrot.
refcnt on the node but left it in the node table. This allows the node table
to hold the results of scanned ap's but for ibss scans left nodes w/o any
driver-private state setup and/or a bad refcnt (when the nodes were timed
out they were prematurely discarded). Now we treat nodes identified for ap
scanning as before but force nodes discovered when scanning for ibss neighbors
to have complete/proper state and hold the refcnt on the node. Any other
nodes created because of these frames are discarded directly (need to optimize
this case to eliminate various work that's immediately discarded).
o remove IEEE80211_C_RCVMGT capability
o on transmit craft new nodes as needed using new ieee80211_find_txnode routine
o add ieee80211_find_txnode routine to lookup a node by mac address and
if not present create one when operating in ibss/ahdemo mode; new nodes
are dup'd from bss and the driver is told to treat the node as if a new
association has been created so driver-private state (e.g. rate control
handling) is setup
Obtained from: netbsd (basic idea)
blindy copying the node contents; this turns out to be a bad idea as we
add more state in the node for things like WPA
o track node allocation failures in ieee80211_dup_bss instead of the callers
Obtained from: madwifi
on architectures that need to call cninit() before the machine is
ready to support mutexes (required by make_dev()).
- Remove make_dev() call from scinit() when flags indicate
unit is the system console, rely on sc_attach_unit() to
handle it.
- When trying to access current screen's status (scr_stat
structure) use the static one provided for the initial
system console if no dev_t is available.
- When calling make_dev() in sc_attach_unit() catch special
case of system's initial console and set up dev_t structure
to include pointer to console's scr_stat struct.
Reviewed by: marcel
Tested by: marcel, grehan (ppc), others on current@
Approved by: rwatson (mentor)
- ptrace_single_step() is no longer called with the proc lock held, so
don't try to unlock it and then relock it.
- Push Giant down into proc_rwmem() instead of forcing all the consumers
(including Alpha breakpoint support) to explicitly wrap calls to
proc_rwmem() with Giant.
Tested by: kensmith
not quite well by me - if kern.ps_argsopen was set to 0, users weren't
permitted to see arguments of even own processes.
But kern.ps_argsopen is going away, so just remove this check and leave
security checks for p_cansee() function.
1. This check if wrong, because it is true by default
(kern.ps_argsopen is 1 by default) (p_cansee() is not even checked).
2. Sysctl kern.ps_argsopen is going away.
completely understand], md_takeroot() runs before md_preloaded(),
rendering both useless.
As a fix, move the body (effectively one line!) of md_takeroot()
into md_preloaded(), and get rid of the stuff that has become useless.
Bug and fix reported 10 days ago on -current, no reply.
the driver's RX ring head may fall behind the chip, causing the
stuck traffic, disordered packets, etc. Work around this by
adopting the technique of resyncing RX head used in dc(4) and
xl(4) drivers, but do it in a slightly different place to reduce
the number of resyncs needed.
Also, set the NIC's RX polling period to a more meaningful value,
to stop overloading the PCI bus (this also reduces the number of
resyncs by a factor of 3 or more in a long run; the actual number
is very dependent on a nature of the traffic).
Maintain the statistics counter as the hw.ste_rxsyncs sysctl.
In cooperation with: Vsevolod Lobko
OK'ed by: ambrisko
MFC after: 5 days
they all do and handle that without alarming the user. Also pull in a bit
of defensive code from OpenBSD that triggers when a card is recognised but
not properly classified as either Genesis or Yukon. Not that I could ever
have needed this. :-)
Obtained from: OpenBSD/NetBSD (partially)
MFC after: 2 weeks
not ~1, but the call has been switched over to bus_alloc_resource_any()
which has the same effect.
Submitted by: Suleiman Souhlal <refugee@segfaulted.com>
declaration. Observe that initialization in declaration is
frequently incompatible with locking, not just a bad idea
due to style(9).
Submitted by: bde
and consume that interface in portalfs and fifofs instead. In the
new world order, unp_connect2() assumes that the unpcb mutex is
held, whereas uipc_connect2() validates that the passed sockets are
UNIX domain sockets, then grabs the mutex.
NB: the portalfs and fifofs code gets down and dirty with UNIX domain
sockets. Maybe this is a bad thing.
read-only. Need to enable "legacy support", by poking
into pci config space. (comment from the patch)
Submited by: Autrijus Tang <autrijus@autrijus.org>
Approved by: tanimura (mentor)
No functional change, the previous ste_encap() was correct WRT
long mbuf chains; this just reduces code duplication.
MFC after: 3 days
Prodded by: ambrisko
#ifdefs in order to loop it back to OpenBSD after the next import. There are
a some implicit asserts involved which might be better spelled out
explicitly (af == AF_INET ...)
Approved by: bms(mentor)
is necessary because some IBMs use recursive methods (pointed out by
Robert Moore from Intel). The latter was a typo on my part. It was disabled
by default when it should have been enabled.
stuff was here (NFS) was fixed by Alfred in November. The only remaining
consumer of the stub functions was umapfs, which is horribly horribly
broken. It has missed out on about the last 5 years worth of maintenence
that was done on nullfs (from which umapfs is derived). It needs major
work to bring it up to date with the vnode locking protocol. umapfs really
needs to find a caretaker to bring it into the 21st century.
Functions GC'ed:
vop_noislocked, vop_nolock, vop_nounlock, vop_sharedlock.
- Add tun_mtx to tun_softc. Annotate what is (and isn't) locked by it.
- Lock down tun_flags, tun_pid.
- In the output path, cache the value of tun_flags so it's consistent
when processing a particular packet rather than re-reading the field.
- In general, use unlocked reads for debugging.
- Annotate a couple of places where additional unlocked reads may be
possible.
- Annotate that tun_pid is used as a bug in tunopen().
if_tun is now largely MPSAFE, although questions remain about some of
the cdevsw fields and how they are synchronized.
other pseudo-interfaces, break out tear-down of a softc into a
separate tun_destroy() function, and invoke that from the module
unloader. Hold tunmtx across manipulations of the global softc list.
resource_var.h.
In kern_ndis.c:ndis_convert_res(), fill in the cprd_flags and
cprd_sharedisp fields as best we can.
In if_ndis.c:ndis_setmulti(), don't bother updating the multicast
filter if our multicast address list is empty.
Add some missing updates to ndis_var.h and ntoskrnl_var.h that I
forgot to check in when I added the KeDpc stuff.
with a larger kernel stack. Remove inclusion of opt_kstack_pages.h now
that it's unused.
Note: If anyone's toes got stepped on by me doing this let me know
privately please.
Approved by: rwatson (mentor)
was not present in what I originally tested when checking to see if
the kernel built/ran with the -O2 change. Recent instability in
sparc64 kernel was tracked to this. A reproducible kernel stack
traceback followed by hard hang during the call to msleep() at the
point the kernel waits 15 seconds for the SCSI bus to settle crept in
to recent kernel builds and it seems to go away with this patch.
Noticed by: kris
Approved by: rwatson (mentor)
Previously, Giant would be grabbed at entry to the IP local delivery code
when debug.mpsafenet was set to true, as that implied Giant wouldn't be
grabbed in the driver path. Now, we will use this primitive to
conditionally grab Giant in the event the entire network stack isn't
running MPSAFE (debug.mpsafenet == 0).
instead of treating it as an unimplemented syscall. This appears to make
StarOffice 7.0 Linux binaries work according to submitter; also tested
with nvidia driver by submitter.
Submitted by: Matthias Schuendehuette
- Fix binat for incoming connections when a netblock (not just a single
address) is used for source in the binat rule. closes PR 3535, reported by
Karl O.Pinc. ok henning@, cedric@
- Fix a problem related to empty anchor rulesets, which could cause a kernel
panic.
Approved by: bms(mentor)
- Fix binat for incoming connections when a netblock (not just a single
address) is used for source in the binat rule. closes PR 3535, reported by
Karl O.Pinc. ok henning@, cedric@
- Fix a problem related to empty anchor rulesets, which could cause a kernel
panic.
Approved by: bms(mentor)
are supposed to continue firing as long as there is work to do, not
stop after the first invocation.
This is damage control after a patch that has been committed prematurely.
Tested by: kris
instead of ephemeral mappings using pmap_qenter() by the writer. The
writer is still, however, responsible for wiring the pages, just not
mapping them. Consequently, the allocation of KVA for the direct case is
unnecessary. Remove it and the sysctls limiting it, i.e.,
kern.ipc.maxpipekvawired and kern.ipc.amountpipekvawired. The number
of temporarily wired pages is still, however, limited by
kern.ipc.maxpipekva.
Note: On platforms lacking a direct virtual-to-physical mapping,
uiomove_fromphys() uses sf_bufs to cache ephemeral mappings. Thus,
the number of available sf_bufs can influence the performance of pipes
on platforms such i386. Surprisingly, I saw the greatest gain from this
change on such a machine: lmbench's pipe bandwidth result increased from
~1050MB/s to ~1850MB/s on my 2.4GHz, 400MHz FSB P4 Xeon.
long as there are still explicit uses of int, whether in types or
in function names (such as atomic_set_int() in sched_ule.c), we can
not change cpumask_t to be anything other than u_int. See also the
commit log for sys/sys/types.h, revision 1.84.
* all
- s/__FUNCTION__/__func__/.
Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at>
- Compatibility for RELENG_4 and DragonFly.
* firewire
- Timestamp just before queuing.
- Retry bus probe if it fails.
- Use device_printf() for debug message.
- Invalidiate CROM while update.
- Don't process minimum/invalid CROM.
* sbp
- Add ORB_SHORTAGE flag.
- Add sbp.tags tunable.
- Revive doorbell support. It's not enabled by default.
objects rather than synchronization objects. When a sync object is
signaled, only the first thread waiting on it is woken up, and then
it's automatically reset to the not-signaled state. When a
notification object is signaled, all threads waiting on it will
be woken up, and it remains in the signaled state until someone
resets it manually. We want the latter behavior for NDIS events.
- In kern_ndis.c:ndis_convert_res(), we have to create a temporary
copy of the list returned by BUS_GET_RESOURCE_LIST(). When the PCI
bus code probes resources for a given device, it enters them into
a singly linked list, head first. The result is that traversing
this list gives you the resources in reverse order. This means when
we create the Windows resource list, it will be in reverse order too.
Unfortunately, this can hose drivers for devices with multiple I/O
ranges of the same type, like, say, two memory mapped I/O regions (one
for registers, one to map the NVRAM/bootrom/whatever). Some drivers
test the range size to figure out which region is which, but others
just assume that the resources will be listed in ascending order from
lowest numbered BAR to highest. Reversing the order means such drivers
will choose the wrong resource as their I/O register range.
Since we can't traverse the resource SLIST backwards, we have to
make a temporary copy of the list in the right order and then build
the Windows resource list from that. I suppose we could just fix
the PCI bus code to use a TAILQ instead, but then I'd have to track
down all the consumers of the BUS_GET_RESOURCE_LIST() and fix them
too.
Compute the payload checksum for a locally originated IP multicast where
God intended, in ip_mloopback(), rather than doing it in ip_output() and
only when multicast router is active. This is more correct as we do not
fool ip_input() that the packet has the correct payload checksum when in
fact it does not (when multicast router is inactive). This is also more
efficient if we don't join the multicast group we send to, thus allowing
the hardware to checksum the payload.
which pulls a job off a thread work queue (assuming it hasn't run yet).
This is needed for KeRemoveQueueDpc().
- In subr_ntoskrnl.c, implement KeInsertQueueDpc() and KeRemoveQueueDpc(),
to go with KeInitializeDpc() to round out the API. Also change the
KeTimer implementation to use this API instead of the private
timer callout scheduler. Functionality of the timer API remains
unchanged, but we get a couple new Windows kernel API routines and
more closely imitate the way thing works in Windows. (As of yet
I haven't encountered any drivers that use KeInsertQueueDpc() or
KeRemoveQueueDpc(), but it doesn't hurt to have them.)
Report the %ecx bits in cpuid function 1. This is a hack.
When reporting AMD Features, only mask off the common bits. Otherwise
the SEP bit masks off SYSCALL etc in the report.
variable length, so we should not be trying to copy it into a fixed
length buffer, especially one on the stack. malloc() a buffer of the
right size and return a pointer to that instead.
Fixes a crash I discovered when testing whe a Cisco AP in infrastructure
mode, which returns several information elements that make the
ndis_wlan_bssid_ex structure larger than expected.
Because xfer->send.payload is a pointer to the buffer, '&' shouldn't be there.
Submitted by: John Weisgerber <weisgerberj@gsilumonics.com>
PR: misc/64623
(1 << 24) - 2 instead of 1 << 24, which it was obviously intended to
be). This fixes SBus isp(4)s on sparc64 machines.
Report and testing: Marius Strobl <marius@alchemy.franken.de>
iommu_dvma_vallocseg(), which I botched in r1.32. This bug could
cause an endless loop when a map was loaded and DVMA was scarce,
or that map had a stringent alignment or boundary.
Report and additional testing: Marius Strobl <marius@alchemy.franken.de>
in cpu_fork(). This prevents the stack tracer from running past the
end of the stack (only the pc is checked in that case), which became
fatal when db_print_backtrace() was introduced and called outside
of ddb.
Additional testing: kris
caused hangs on SMP systems under load. My theory was that an interrupted
thread was migrating and returning to PAL on a different CPU and that that
caused the hangs. To prevent this, I used the recently added sched_pin()
API to pin the interrupted thread to the CPU that received the interrupt
across ithread_schedule() to prevent migration. This seems to have fixed
the hangs based on tests by several folks on the alpha@ list.
Tested by: wilko, tisco, several others on alpha@
(NIC would claim to establish a link with an ad-hoc net but it couldn't
send/receive packets). It turns out that every time the checkforhang
handler was called by ndis_ticktask(), the driver would generate a new
media connect event. The NDIS spec says the checkforhang handler is
called "approximately every 2 seconds" but using exactly 2 seconds seems
too fast. Using 3 seconds makes it happy again, so we'll go with that
for now.
extra entry for if_ndis_pci.c that depends on cardbus, just to cover
all the bases. (I don't think you can have cardbus without PCI, but
just in case...)
- Add gre_mtx to protect global softc list.
- Hold gre_mtx over various list operations (insert, delete).
- Centralize if_gre interface teardown in gre_destroy(), and call this
from modevent unload and gre_clone_destroy().
- Export gre_mtx to ip_gre.c, which walks the gre list to look up gre
interfaces during encapsulation. Add a wonking comment on how we need
some sort of drain/reference count mechanism to keep gre references
alive while in use and simultaneous destroy.
This commit does not lockdown softc data, which follows in a future
commit.
- Add gif_mtx, which protects globals.
- Hold gif_mtx around manipulation of gif_softc_list.
- Abstract gif destruction code into gif_destroy(), which tears down
a softc after it's been removed from the global list by either module
unload or clone destroy.
- Lock gif_called, even though we know gif_called is broken with reentrant
network processing.
- Document an event ordering problem in gif_set_tunnel() that will need
to be fixed.
gif_softc fields not locked down in this commit.
processing with gif interfaces, to a global variable named "gif_called".
Add an annotation that this approach will not work with a reentrant
network stack, and that we should instead use packet tags to detect
excessive recursive processing.
implementation could be characterized as a hybrid of the amd64 and i386
implementations. Specifically, the direct virtual-to-physical mapping is
used if possible and sf_buf_alloc() is used if the direct map cannot.
to other files in netatalk:
Log:
Since I have my hands all over netatalk adding locking and restructuring
it, cinch the file's style closer to style(9) with regard to parenthesis:
s/( /(/g
s/ )/)/g
s/return(/return (/g
s/return 0/return (0)/
s/return 1/return (1)/
when it associates with a net. Because FreeBSD's kstack size is only
2 pages by default, this blows the stack and causes a double fault.
To deal with this, we now create all our kthreads with 8 stack pages.
Also, we now run all timer callouts in the ndis swi thread (since
they would otherwise run in the clock ithread, whose stack is too
small). It happens that the alloca() in this case was occuring within
the interrupt handler, which was already running in the ndis swi
thread, but I want to deal with the callouts too just to be extra
safe.
NOTE: this will only work if you update vm_machdep.c with the change
I just committed. If you don't include this fix, setting the number
of stack pages with kthread_create() has essentially no effect.
with more than the normal amount of stack pages, however the stack
pointer always wound up being initialized using KSTACK_PAGES. It
should be using td->td_kstack_pages instead. This means that although
the vm subsystem would give you all the stack pages you asked for,
%esp would always be initialized as if you had just 2 pages, and
the rest would go to waste.
I wanted to use the 'give me more stack pages' feature of kthread_create()
because the Intel 2200BG NDIS driver does an alloca() of about 5000 bytes,
which wrecks the stack with the default 2 page size, and I was baffled
that no matter how much code I shoved into thread contexts with
allegedly larger stacks, the thing would still crash unless I changed
KSTACK_PAGES.
Note: this bug is present in _ALL_ arches at this point. Peter has
promised to merge this fix into all of them.
instead of bus_alloc_resource_any() to restore source compatibility
with 5.2-REL and 5.2.1-REL systems. bus_alloc_resource_any() doesn't
really do anything besides hide some of bus_alloc_resource()'s arguments
from us, and in my opinion this isn't worth breaking backwards
compatibility for people who want to use the NDISulator code on 5.2.x.
activation (i.e., applications are using libpthread). This is because
SCHED_ULE sometimes puts P_SA processes into ksq_next unnecessarily.
Which doesn't give fair amount of CPU time to processes which are
using scheduler-activation-based threads when other (semi-)CPU-intensive,
non-P_SA processes are running.
Further work will no doubt be done by jeffr at a later date.
Submitted by: Taku YAMAMOTO <taku@cent.saitama-u.ac.jp>
Reviewed by: rwatson, freebsd-current@
distinguish between debugger inserted breakpoints and fixed
breakpoints. While here, make sure the break instruction never
ends up in the last slot of a bundle by forcing it to be an
M-unit instruction. This makes it easier for use to skip over
it.
are actually layered on top of the KeTimer API in subr_ntoskrnl.c, just
as it is in Windows. This reduces code duplication and more closely
imitates the way things are done in Windows.
- Modify ndis_encode_parm() to deal with the case where we have
a registry key expressed as a hex value ("0x1") which is being
read via NdisReadConfiguration() as an int. Previously, we tried
to decode things like "0x1" with strtol() using a base of 10, which
would always yield 0. This is what was causing problems with the
Intel 2200BG Centrino 802.11g driver: the .inf file that comes
with it has a key called RadioEnable with a value of 0x1. We
incorrectly decoded this value to '0' when it was queried, hence
the driver thought we wanted the radio turned off.
- In if_ndis.c, most drivers don't accept NDIS_80211_AUTHMODE_AUTO,
but NDIS_80211_AUTHMODE_SHARED may not be right in some cases,
so for now always use NDIS_80211_AUTHMODE_OPEN.
NOTE: There is still one problem with the Intel 2200BG driver: it
happens that the kernel stack in Windows is larger than the kernel
stack in FreeBSD. The 2200BG driver sometimes eats up more than 2
pages of stack space, which can lead to a double fault panic.
For the moment, I got things to work by adding the following to
my kernel config file:
options KSTACK_PAGES=8
I'm pretty sure 8 is too big; I just picked this value out of a hat
as a test, and it happened to work, so I left it. 4 pages might be
enough. Unfortunately, I don't think you can dynamically give a
thread a larger stack, so I'm not sure how to handle this short of
putting a note in the man page about it and dealing with the flood
of mail from people who never read man pages.
only done minimal testing on one of these cards and the firmware folks
have been extremely uncooperative in answering my qeustions about them, so
hopefully they will work ok for everyone.
This completes the effort to handle dependent functions, which are used
in some machines for irq link resources. Also, clean up some nearby
comments while I'm at it.
level of abstraction for any and all CPU mask and CPU bitmap variables
so that platforms have the ability to break free from the hard limit
of 32 CPUs, simply because we don't have more bits in an u_int. Note
that the type is not supposed to solve massive parallelism, where
the number of CPUs can be larger than the width of the widest integral
type. As such, cpumask_t is not supposed to be a compound type. If
such would be necessary in the future, we can deal with the issues
then and there. For now, it can be assumed that the type is integral
and unsigned.
With this commit, all MD definitions start off as u_int. This allows
us to phase-in cpumask_t at our leasure without breaking anything.
Once cpumask_t is used consistently, platforms can switch to wider
(or smaller) types if such would be beneficial (or not; whatever :-)
Compile-tested on: i386
This change has not been tested.
This change was triggered by a gcc(1) warning on ia64 at -O2. The
variable v was not used after being computed, which resulted in enough
dead code elimination (DCE) to confuse the compiler and emit a bogus
warning about the use of the variable i without prior definition. The
variable i is the loop variable.
Submitted by: des
Responsibility: marcel
for uart(4) to figure out which device to use as console. Use this file
to define hw.uart.console instead so that we don't have to put it in
the default loader.conf, which makes it hard to override.
to select a serial console and debug port (resp). On ia64 these replace
the use of hints completely and take precedence over hints on alpha,
amd64 and i386. On sparc64 these variables are not yet recognised.
The reasons for introducing these variables are:
1. Hints have side-effects. They reserve the unit number for use by
isa or acpi devices and therefore cannot be used to select a pci
device. Also, the use of a unit number to select a device prior
to bus enumeration is nonsense. The new variables have no side-
effects and are not based on unit numbers.
2. Hints don't have the expression power to allow the sysadmin to
select UARTs that are not legacy PC devices and need the support
of compile-time constants to give the sysadmin some level of
flexibility.
The hw.uart.console and hw.uart.dbgport variables specify a list of
attributes. An attribute is a tag-value pair, seperated by a colon.
Attributes are seperated by a comma. Where possible, tags are the
same as those in /etc/remote (only br and pa in practice). Details
can be found in the manpage (not part of this commit).
Not tested on: amd64, pc98
from ddp_usrreq.c. Functions moved are:
at_pcballoc()
at_pcbconnect()
at_pcbdetach()
at_pcbdisconnect()
at_pcbsetaddr()
at_sockaddr()
Also moved are ddp_ports and ddpcb, global variables associated with DDP
pcbs. This makes PCB implementation more parallel to inet, inet6, and
ipx.
device, the device is probed multiple times (so each device is
detected N times after unloading/loading the module N-1 times).
The real fix is (quote Doug and Warner):
> : In an ideal world, there should be some kind of BUS_UNIDENTIFY method
> : which a driver could use to delete the devices it created in
> : BUS_IDENTIFY.
>
> Or the bus would have a driver deleted routine that got called and it
> would remove all instances of the devclass attached to it.
Reviewed by: Doug Rabson & Warner Losh
to mmap it PROT_EXEC. This also depends on the architecture, as some
architextures (e.g. i386) do not distinguish between read and exec pages
Inspired by: http://linux.bkbits.net:8080/linux-2.4/cset@1.1267.1.85
Reviewed by: alc
mappings required by mdstart_swap(). On i386, if the ephemeral mapping
is already in the sf_buf mapping cache, a swap-backed md performs
similarly to a malloc-backed md. Even if the ephemeral mapping is not
cached, this implementation is still faster. On 64-bit platforms, this
change has the effect of using the direct virtual-to-physical mapping,
avoiding ephemeral mapping overheads, such as TLB shootdowns on SMPs.
On a 2.4GHz, 400MHz FSB P4 Xeon configured with 64K sf_bufs and
"mdmfs -S -o async -s 128m md /mnt"
before:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.430923 secs (311465697 bytes/sec)
after with cold sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.367948 secs (364773576 bytes/sec)
after with warm sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.252826 secs (530870010 bytes/sec)
malloc-backed md:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.253126 secs (530240978 bytes/sec)
I've added -fno-strict-aliasing for now so we can ease into this.
I wanted to shoot for -O3, but the inlining caused problems due to GCC's
size heuristics; so also add -frename-registers, which is one of the things
-O3 would have given us.
entry size and the ELF version. Also, avoid a potential integer
overflow when determining whether the ELF header fits entirely
within the first page.
Reviewed by: jdp
A panic when attempting to execute an ELF binary with a bogus program
header table entry size was
Reported by: Christer Öberg <christer.oberg@texonet.com>
the tap driver, even with Giant over the cdev operation vector, due to
a non-atomic test-and-set of the si_drv1 field in the dev_t. This bug
exists with Giant under high memory pressure, as malloc() may sleep
in tapcreate(), but is less likely to occur. The resolution will
probably be to cover si_drv1 using the global tapmtx since no softc is
available, but I need to think about this problem more generally
across a range of drivers using si_drv1 in combination with SI_CHEAPCLONE
to defer expensive allocation to open().
Correct what appears to be a bug in the original if_tap implementation,
in which tapopen() will panic if a tap device instance is opened more
than once due to an incorrect assertion -- only triggered if INVARIANTS
is compiled in (i.e., when built into a kernel). Return EBUSY instead.
Expand mtx_lock() coverage using tp->tap_mtx to include tp->ether_addr.
use sf_buf_free() instead of sf_buf_mext() to consolidate all actions
that require the page queues lock in one critical section. While I'm
here remove unnecessary splvm() and splx() calls.
clip/destroy the dB value contained in the wi(4)'s receive frames,
it doesn't match with the flag set in the radiotap header
(unperturbed dB versus dBm).
Also set HOOK_HACK to true (remove the related #ifdef's) as we have the
hooks in the kernel this was missed during the merge from the port.
Noticed by: Amir S. (for the HOOK_HACK part)
Approved by: bms(mentor)
options, status pointer and rusage pointer as arguments. It is up to
the caller to copyout the status and rusage to userland if needed. This
lets us axe the 'compat' argument and hide all that functionality in
owait(), by the way. This also cleans up some locking in kern_wait()
since it no longer has to drop locks around copyout() since all the
copyout()'s are deferred.
- Convert owait(), wait4(), and the various ABI compat wait() syscalls to
use kern_wait() rather than wait1() or wait4(). This removes a bit
more stackgap usage.
Tested on: i386
Compiled on: i386, alpha, amd64
Without this fix it is possible to cheat policies like:
- sysctl security.bsd.see_other_[gu]ids=0,
- mac_seeotheruids(4),
- jail(2)
and get full processes list with their arguments.
This problem exists from revision 1.62 of kern_proc.c when it was
introduced.
Reviewed by: nectar, rwatson.
as the process that opens tun_softc can exit before the file
descriptor is closed.
Taiwan experience provided by: keichii
Crashing breakers provided by: Chia-liang Kao <clkao@clkao.org>
(tap_pid, tap_flags). if_tap should now be entirely MPSAFE.
Committed from: Bamboo house by ocean in Taiwan
Tropical paradise provided by: Chia-liang Kao <clkao@clkao.org>
group block locked. If filesystem has any active snapshots, bawrite
can come back trying to allocate new snapshot data block from the same
cylinder group and cause panic due to recursive lock attempt.
PR: 64206
Reviewed by: mckusick
Tested by: pjd
dependent function by the same name and a machine-independent function,
sf_buf_mext(). Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical. Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page. That is now the purpose of
sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.
might be enqueued on a sleep queue but not be asleep when the timeout fires
if it is blocked on a lock trying to check for pending signals before going
to sleep. In the case of fixing up the TDF_TIMEOUT race, however, the
thread must be marked asleep.
Reported by: kan (the bogus one)
mini-layer. I don't have time to bing it forward into the GEOM world, and
no one else has stepped forward to claim it. It'll be in the Attic for safe
keeping for now.
Use kern_open() to implement creat() rather than taking the long route
through open(). Mark creat as MPSAFE.
While I'm at it, mark nosys() (syscall 0) as MPSAFE, for all the
difference it will make.
set it to avoid the need for a bunch of code that tests whether or
not the lock member is set to REQ_WIRED in order to determine which
length member should be used.
Fix another bug in the oldlen return value code.
Fix a potential wired memory leak if a sysctl handler uses
sysctl_wire_old_buffer() and returns an EAGAIN error to trigger
a retry.
If vslock() returns ENOMEM, sysctl_wire_old_buffer() should set
wiredlen to zero and return zero (success) so that the handler will
operate according to sysctl(3):
The size of the buffer is given by the location specified by
oldlenp before the call, and that location gives the amount
of data copied after a successful call and after a call that
returns with the error code ENOMEM.
The handler will return an ENOMEM error because the zero length
buffer will overflow.
ptrace_set_pc(), and cpu_ptrace() so that those functions are free to
acquire Giant, sleep, etc. We already do a PHOLD/PRELE around them so
that it is safe to sleep inside of these routines if necessary. This
allows ptrace() to be marked MP safe again as it no longer triggers lock
order reversals on Alpha.
Tested by: wilko
snprintf() and vsnprintf() in FreeBSD kernel land).
This is needed by the Intel Centrino 2200BG driver. Unfortunately, this
driver still doesn't work right with Project Evil even with this tweak,
but I'm unable to diagnose the problem since I don't have access to a
sample card.
This adds support for cardbus ATA/SATA controllers. I get roughly the
same transfer speeds as on true PCI controllers. Nice to be able to add
a couble of "real" disks to a laptop :)
add tapmtx, which protects globale variables.
Notes:
- The EBUSY check in MOD_UNLOAD may be subject to a race. Moving the
event handler unregister inside the mutex grab may prevent that race.
- Locking of global variables safely is now possible because tapclones
is only modified when the module is loading or unloading, thanks to
phk's recent chang to clone_setup().
- softc locking to follow.
255; USB keychains exist that use 256 as the number of heads. This
check has also been removed in Darwin (along with most of the other
head/sector sanity checks).
Only cy, bs and wd in the tree still use it. I have a replacement for
cy that I need to test on ISA and PCI cards. bs and wd are pc98 only
drivers that appear to no longer be necessary. I'll be removing them
when I hear back from the pc98 people.
this driver is being retired. Remove it from the tree. If someone
wants to update it to the latest APIs and can test the hardware, it
can return to the tree.
COMPAT_PCI api. This API is going away, so this driver is going away
also.
If users are interested in updating this, please contact the author
since he has some preliminary work to move this to newer APIs.
clock precision on i386. This is a NOP change on i386. But this stops
the mount_nfs units from suddenly changing to units of 1/20 of a second
(vs the normal 1/10 of a second) if HZ is increased.
driver uses COMPAT_ISA shims, and those shims are going away.
It can be brought back if someone updates it to the latest APIs, and
moves it to the appropriate place in the tree.
in the two consumers that need it.. processes using AIO and netncp.
Update docs. Say that process_exec is called with Giant, but not to
depend on it. All our consumers can handle it without Giant.
Fix 'broken' ifdefs.
icc does not support profiling yet so remove unfinished code which was
supposed to help.
Submitted by: netchild (original version)
Reviewed by: ru
- no longer serialize on Giant for thread_single*() and family in fork,
exit and exec
- thread_wait() is mpsafe, assert no Giant
- reduce scope of Giant in exit to not cover thread_wait and just do
vm_waitproc().
- assert that thread_single() family are not called with Giant
- remove the DROP/PICKUP_GIANT macros from thread_single() family
- assert that thread_suspend_check() s not called with Giant
- remove manual drop_giant hack in thread_suspend_check since we know it
isn't held.
- remove the DROP/PICKUP_GIANT macros from thread_suspend_check() family
- mark kse_create() mpsafe
all the ancient Intel/VIA/SIS/etc chipsets on amd64 systems. Even the
newer intel stuff won't need this since we use acpi by default and we
don't have all their magic programming information. Just use a generic
"Host to PCI bridge" name if we ever hit this code.
attach/detach time.
Assigning the default behaviour to this particular device is
incorrect, corrupting the video BIOS aperture, and breaking
VESA support in the kernel and XFree86.
Reviewed By: dfr
MFC after: 1 week
PR: kern/62906
we always grab Giant, even if we're actually only polling objects that
don't require giant. Once socket locking is merged, there will be
strong motivation to fix this.
While there, remove (caddr_t) casting of ethernet addresses, which
among other things discards the qualifier. This makes it clear that
atmulticastaddr does not require synchronization.
the last chunk are misaligned relative to a MAXBSIZE byte boundary.
vn_rdwr_inchunks() is used mainly for elf core dumps, and elf sections
are usually perfectly misaligned relative to MAXBSIZE, and chunking
prevents the file system from doing much realigning.
This gives a surprisingly large speedup for core dumps -- from 50 to
13 seconds for a 512MB core dump here. The pessimization was mostly
from an interaction of the misalignment with IO_DIRECT. It increased
the number of i/o's for each chunk by a factor of 5 (3 writes and 2
read-before-writes instead of 1 write).
to build the kernel. It doesn't affect the operation if gcc.
Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.
Additional changes:
- in_cksum.[ch]:
* use a generic C version instead of the assembly version in the !gcc
case (ASM code breaks with the optimizations icc does)
-> no bad checksums with an icc compiled kernel
Help from: andre, grehan, das
Stolen from: alpha version via ppc version
The entire checksum code should IMHO be replaced with the DragonFly
version (because it isn't guaranteed future revisions of gcc will
include similar optimizations) as in:
---snip---
Revision Changes Path
1.12 +1 -0 src/sys/conf/files.i386
1.4 +142 -558 src/sys/i386/i386/in_cksum.c
1.5 +33 -69 src/sys/i386/include/in_cksum.h
1.5 +2 -0 src/sys/netinet/igmp.c
1.6 +0 -1 src/sys/netinet/in.h
1.6 +2 -0 src/sys/netinet/ip_icmp.c
1.4 +3 -4 src/contrib/ipfilter/ip_compat.h
1.3 +1 -2 src/sbin/natd/icmp.c
1.4 +0 -1 src/sbin/natd/natd.c
1.48 +1 -0 src/sys/conf/files
1.2 +0 -1 src/sys/conf/files.amd64
1.13 +0 -1 src/sys/conf/files.i386
1.5 +0 -1 src/sys/conf/files.pc98
1.7 +1 -1 src/sys/contrib/ipfilter/netinet/fil.c
1.10 +2 -3 src/sys/contrib/ipfilter/netinet/ip_compat.h
1.10 +1 -1 src/sys/contrib/ipfilter/netinet/ip_fil.c
1.7 +1 -1 src/sys/dev/netif/txp/if_txp.c
1.7 +1 -1 src/sys/net/ip_mroute/ip_mroute.c
1.7 +1 -2 src/sys/net/ipfw/ip_fw2.c
1.6 +1 -2 src/sys/netinet/igmp.c
1.4 +158 -116 src/sys/netinet/in_cksum.c
1.6 +1 -1 src/sys/netinet/ip_gre.c
1.7 +1 -2 src/sys/netinet/ip_icmp.c
1.10 +1 -1 src/sys/netinet/ip_input.c
1.10 +1 -2 src/sys/netinet/ip_output.c
1.13 +1 -2 src/sys/netinet/tcp_input.c
1.9 +1 -2 src/sys/netinet/tcp_output.c
1.10 +1 -1 src/sys/netinet/tcp_subr.c
1.10 +1 -1 src/sys/netinet/tcp_syncache.c
1.9 +1 -2 src/sys/netinet/udp_usrreq.c
1.5 +1 -2 src/sys/netinet6/ipsec.c
1.5 +1 -2 src/sys/netproto/ipsec/ipsec.c
1.5 +1 -1 src/sys/netproto/ipsec/ipsec_input.c
1.4 +1 -2 src/sys/netproto/ipsec/ipsec_output.c
and finally remove
sys/i386/i386 in_cksum.c
sys/i386/include in_cksum.h
---snip---
- endian.h:
* DTRT in C++ mode
- quad.h:
* we don't use gcc v1 anymore, remove support for it
Suggested by: bde (long ago)
- assym.h:
* avoid zero-length arrays (remove dependency on a gcc specific
feature)
This change changes the contents of the object file, but as it's
only used to generate some values for a header, and the generator
knows how to handle this, there's no impact in the gcc case.
Explained by: bde
Submitted by: Marius Strobl <marius@alchemy.franken.de>
- aicasm.c:
* minor change to teach it about the way icc spells "-nostdinc"
Not approved by: gibbs (no reply to my mail)
- bump __FreeBSD_version (lang/icc needs to know about the changes)
Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.
Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.
Reviewed by: -arch
Submitted by: netchild
Intel C/C++ compiler (lang/icc) to build the kernel.
The icc CPUTYPE CFLAGS use icc v7 syntax, icc v8 moans about them, but
doesn't abort. They also produce CPU specific code (new instructions
of the CPU, not only CPU specific scheduling), so if you get coredumps
with signal 4 (SIGILL, illegal instruction) you've used the wrong
CPUTYPE.
Incarnations of this patch survive gcc compiles and my make universe.
I use it on my desktop.
To use it update share/mk, add
/usr/local/intel/compiler70/ia32/bin (icc v7, works)
or
/usr/local/intel_cc_80/bin (icc v8, doesn't work)
to your PATH, make sure you have a new kernel compile directory
(e.g. MYKERNEL_icc) and run
CFLAGS="-O2 -ip" CC=icc make depend
CFLAGS="-O2 -ip" CC=icc make
in it.
Don't compile with -ipo, the build infrastructure uses ld directly to
link the kernel and the modules, but -ipo needs the link step to be
performed with Intel's linker.
Problems with icc v8:
- panic: npx0 cannot be emulated on an SMP system
- UP: first start of /bin/sh results in a FP exception
Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.
Reviewed by: silence on -arch
Submitted by: netchild
Conforming POSIX application should do by disallowing the argv
argument to be NULL.
PR: kern/33738
Submitted by: Marc Olzheim, Serge van den Boom
OK'ed by: nectar
path to an absolute path without a host name. Previously, there was a
nasty POLA violation where a system would PXE boot until you added the
BOOTP option and then it would panic instead.
Reviewed by: tegge, Dirk-Willem van Gulik <dirkx at webweaving.org>
(a previous version)
Submitted by: tegge (getip function)
silences an annoying warning in getblk() when VMIO'ing on a directory
vnode, which can happen when vfs.vmiodirenable is 1.
Bring the warning message in line with reality at the same time.
Submitted by: hmp
were a rather overwhelming task. I soon learned that if you don't know
where you're going to store something, at least try to pile it next to
something slightly related in the hope that a pattern emerges.
Apply the same principle to the ffs/snapshot/softupdates code which have
leaked into specfs: Add yet a buf-quasi-method and call it from the
only two places I can see it can make a difference and implement the
magic in ffs_softdep.c where it belongs.
It's not pretty, but at least it's one less layer violated.
in the non-_KERNEL case. This "fixes" applications that include
this "kernel-only" header and also include <strings.h> (or get
<strings.h> via the default _BSD_VISIBLE pollution in <string.h>.
In C++ there was a fatal error: the declaration specifies C linkage
but the implementation gives C++ linkage. In C there was only a
static/extern mismatch if the headers were included in a certain order
order, and a partially redundant declaration for all include orders;
gcc emits incomplete or wrong diagnostics for these, but only for
compiling with -Wsystem-headers and certain other warning options, so
the problem was usually not seen for C.
Ports breakage reported by: kris
for Windows are deserialized miniports. Such drivers maintain their own
queues and do their own locking. This particular driver is not deserialized
though, and we need special support to handle it correctly.
Typically, in the ndis_rxeof() handler, we pass all incoming packets
directly to (*ifp->if_input)(). This in turn may cause another thread
to run and preempt us, and the packet may actually be processed and
then released before we even exit the ndis_rxeof() routine. The
problem with this is that releasing a packet calls the ndis_return_packet()
function, which hands the packet and its buffers back to the driver.
Calling ndis_return_packet() before ndis_rxeof() returns will screw
up the driver's internal queues since, not being deserialized,
it does no locking.
To avoid this problem, if we detect a serialized driver (by checking
the attribute flags passed to NdisSetAttributesEx(), we use an alternate
ndis_rxeof() handler, ndis_rxeof_serial(), which puts the call to
(*ifp->if_input)() on the NDIS SWI work queue. This guarantees the
packet won't be processed until after ndis_rxeof_serial() returns.
Note that another approach is to always copy the packet data into
another mbuf and just let the driver retain ownership of the ndis_packet
structure (ndis_return_packet() never needs to be called in this
case). I'm not sure which method is faster.
based on the Madison core and targeting the low end of the spectrum.
Its clock frequency is 1Ghz, whereas Madison starts at 1.3Ghz. Since
the CPUID information is the same for Madison and Deerfield, we use
the clock frequency to identify the processor.
Supposedly the Deerfield only uses 62W, which seems to be less than
modern Xeon processors (about 70W) and about half what a Madison would
need.
On vnode backed md(4) devices over a certain, currently undetermined
size relative to the buffer cache our "lemming-syncer" can provoke
a buffer starvation which puts the md thread to sleep on wdrain.
This generally tends to grind the entire system to a stop because the
event that is supposed to wake up the thread will not happen until a fair
bit of the piled up I/O requests in the system finish, and since a lot
of those are on a md(4) vnode backed device which is currently waiting
on wdrain until a fair amount of the piled up ... you get the picture.
The cure is to issue all VOP_WRITES on the vnode backing the device
with IO_SYNC.
In addition to more closely emulating a real disk device with a
non-lying write-cache, this makes the writes exempt from rate-limited
(there to avoid starving the buffer cache) and consequently prevents
the deadlock.
Unfortunately performance takes a hit.
Add "async" option to give people who know what they are doing the
old behaviour.
only. This is a MAJOR incompatible change for the sparc64 platform,
but will not effect FreeBSD on other architectures.
Reviewed by: imp for UPDATING, freebsd-sparc for the change itself.
This may not be a generally valid configuration, but neither is relying
on the PCI clock to be stable.
The only currently known and supported hardware is the VPN14x1 from
Soekris, and since it has external clock, we fail safe(r) by using
it.
Unfortunately there is no way to probe this reliably.
Retire g_sanity() and corresponding debugflag (0x8)
Retire g_{stall,release}_events().
Under #ifdef DIAGNOSTIC:
Make g_valid_obj() an official function and have it return an an
non-zero integer which indicates the kind of object when found.
Implement G_VALID_{CLASS,GEOM,CONSUMER,PROVIDER}() macros based
on g_valid_obj().
Sprinkle calls to these macros liberally over the infrastructure.
Always check that we do not free a live object.