SOCK_NONBLOCK flags, that allow to save fcntl() calls.
Implement a variation of the socket() syscall which takes a flags
in addition to the type argument.
Approved by: kib (mentor)
MFC after: 1 month
Temporarily use 0 for pid member as the FreeBSD does not cache remote
UNIX domain socket peer pid.
PR: kern/102956
Reviewed by: rwatson
Approved by: kib (mentor)
MFC after: 1 month
required two changes: setting the program and version numbers before
connect and fixing the handling of the Null Rpc case in newnfs_request().
Approved by: kib (mentor)
experimental nfs subsystem from sys/fs/nfs/nfs.h and sys/fs/nfs/nfsproto.h
to sys/fs/nfs/nfsport.h and rename nfsstat to ext_nfsstat. This was done
so that src/usr.bin/nfsstat.c could use it alongside the regular nfs
include files and struct nfsstat.
Approved by: kib (mentor)
before the struct file is fully initialized in vn_open(), in particular,
fp->f_vnode is NULL. Other thread calling file operation before f_vnode
is set results in NULL pointer dereference in devvn_refthread().
Initialize f_vnode before calling d_fdopen() cdevsw method, that might
set file ops too.
Reported and tested by: Chris Timmons <cwt networks cwu edu>
(RELENG_7 version)
MFC after: 3 days
that expect that oldlen is filled with required buffer length even when
supplied buffer is too short and returned error is ENOMEM.
Redo the fix for kern.proc.filedesc, by reverting the req->oldidx when
remaining buffer space is too short for the current kinfo_file structure.
Also, only ignore ENOMEM. We have to convert ENOMEM to no error condition
to keep existing interface for the sysctl, though.
Reported by: ed, Florian Smeets <flo kasimir com>
Tested by: pho
Apart from the 16 virtual terminals, Syscons allocates two device nodes
that should not really be TTYs, even though they are. One of them is
consolectl. In RELENG_7 and before, these device nodes are used in
single user mode. After I simplified input path, we only use this device
node to call ioctl() on (moused, Xorg, vidcontrol).
When you call ioctl() on consolectl, it will behave the same as being
called on the first window.
drops and re-grabs the softc mutex in the middle, resulting in kernel
trap 12. This may happen when a lot of traffic is being hammered on
one bge(4) interface while the system is shutting down.
Reported by: Alexander Sack <pisymbol gmail com>
PR: kern/134548
MFC After: 2 weeks
sysctl requests to avoid wiring too much user memory. Only grab this
lock if the user's old buffer is larger than a page as a tradeoff to
allow more concurrency for common small requests.
- Just use a shared lock on the sysctl tree for user sysctl requests now.
MFC after: 1 week
- Remove vga0 and the disabled uart2/uart3 hints from both platforms.
- Remove hints for ISA adv0, bt0, aha0, aic0, ed0, cs0, sn0, ie0, fe0, and
le0 from i386. All these hints were marked 'disabled' and thus already
did not work "out of the box".
Discussed with: imp
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now). Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.
While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.
Additively make ipi_nmi_pending as static.
Reviewed by: jhb, kmacy
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
- make mftb() shared, rewrite in C, provide complementary mttb()
- adjust SMP startup per the above, additional comments, minor naming
changes
- eliminate redundant TB defines, other minor cosmetics
Reviewed by: marcel, nwhitehorn
Obtained from: Freescale, Semihalf
chipset-specific code to attach chipset-specific data.
- Use chipset-specific data in the acard and promise chipsets rather than
changing the ivars of ATA PCI devices. ivars are reserved for use by the
parent bus driver and are _not_ available for use by devices directly.
This fixes a panic during sysctl -a with certain Promise controllers with
ACPI enabled.
Reviewed by: mav
Tested by: Magnus Kling (kingfon @ gmail) (on 7)
MFC after: 3 days
error due to copyout failure or short buffer.
The later breaks the usermode iterators of the sysctl results that pack
arbitrary number of variable-sized structures. Iterator expects that
kernel filled exactly oldlen bytes, and tries to interpret half-filled
or garbage structure at the end of the buffer. In particular,
kinfo_getfile(3) segfaulted.
Reported and tested by: pho
MFC after: 3 weeks
fget_unlocked().
- Save old file descriptor tables created on expansion until
the entire descriptor table is freed so that pointers may be
followed without regard for expanders.
- Mark the file zone as NOFREE so we may attempt to reference
potentially freed files.
- Convert several fget_locked() users to fget_unlocked(). This
requires us to manage reference counts explicitly but reduces
locking overhead in the common case.
new platform module. These are probed in early boot, and have the
responsibility of determining the layout of physical memory, determining
the CPU timebase frequency, and handling the zoo of SMP mechanisms
found on PowerPC.
Reviewed by: marcel, raj
Book-E parts by: raj
- For CPUs that only support MCE (the machine check exception) but not MCA
(i.e. Pentium), all this does is print out the value of the machine check
registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
a bit more involved.
- First, there is limited support for decoding the CPU-independent MCA
error codes in the kernel, and the kernel uses this to output a short
description of any machine check events that occur.
- When a machine check exception occurs, all of the MCx banks on the
current CPU are scanned and any events are reported to the console
before panic'ing.
- To catch events for correctable errors, a periodic timer kicks off a
task which scans the MCx banks on all CPUs. The frequency of these
checks is controlled via the "hw.mca.interval" sysctl.
- Userland can request an immediate scan of the MCx banks by writing
a non-zero value to "hw.mca.force_scan".
- If any correctable events are encountered, the appropriate details
are stored in a 'struct mca_record' (defined in <machine/mca.h>).
The "hw.mca.count" is a count of such records and each record may
be queried via the "hw.mca.records" tree by specifying the record
index (0 .. count - 1) as the next name in the MIB similar to using
PIDs with the kern.proc.* sysctls. The idea is to export machine
check events to userland for more detailed processing.
- The periodic timer and hw.mca sysctls are only present if the CPU
supports MCA.
Discussed with: emaste (briefly)
MFC after: 1 month
direct dispatch policy for specific protocols (NETISR_USB). We leave
the additional 'flags' argument to netisr_register() for the time being,
even though it is no longer required.
NULL or change it. We initialize it before we set if_ioctl. It can
therefore never be NULL, and most other drivers don't bother with this
sanity check.
introduced in amd64 revision 1.540 and i386 revision 1.547. However, it
had no harmful effects until after a recent change, r189698, on amd64.
(In other words, the error is harmless in RELENG_7.)
The error is triggered by the failure to allocate a pv entry for the one
and only mapping in a page table page. I am addressing the error by
changing pmap_copy() to abort if either pv entry allocation or page
table page allocation fails. This is appropriate because the creation of
mappings by pmap_copy() is optional. They are a (possible) optimization,
and not a requirement.
Correct a nearby whitespace error in the i386 pmap_copy().
Crash reported by: jeff@
MFC after: 6 weeks
following changes:
Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect
what this function actually does. Suggested by: tegge
Introduce a new version of vfs_page_set_valid() that does no more than
what the function's name implies. Specifically, it does not update
the page's dirty mask, and thus it does not require the page queues
lock to be held.
Update two of the three callers to the old vfs_page_set_valid() to
call vfs_page_set_validclean() instead because they actually require
the page's dirty mask to be cleared.
Introduce vm_page_set_valid().
Reviewed by: tegge
registers contents.
- Use memory barriers to preserve the order of buffer space operations.
This might be needed if we'll ever use this driver on architectures
where ordering is not guaranteed.
major/minor numbers of the devices.
Pretending that the vnodes are character devices prevents file tree
walkers from descending into the directories opened by current process.
Also, not doing stat on the filedescriptors prevents the recursive entry
into the VFS.
Requested by: kientzle
Discussed with: Jilles Tjoelker <jilles stack nl>
representing too large integer, instead of overflowing and possibly
returning a random but valid vnode.
Noted by: Jilles Tjoelker <jilles stack nl>
MFC after: 3 days
to a non loopback/ppp link types) through the loopback interface. Prior
to the new L2/L3 rewrite, this host route is implicitly added by the L2
code during RTM_RESOLVE of that interface address. This host route is
deleted when that interface is removed.
Reviewed by: kmacy
commit and fix it correctly by removing the _KERNEL check entirely. Now
the kernel always sees the same value of NULL as userland meaning that it
sees __null, 0, or 0L when compiled as C++, and '(void *)0' when compiled
as C.
As an experiment, I changed snp(4) to use a mutex instead of an sx lock.
We can't enable this right now, because Syscons still picks up Giant.
It's nice to already have the framework there.
'(void *)0' for NULL for C++ compilers compiling kernel code. Together this
makes it easier to build kernel modules using C++.
Reviewed by: imp
MFC after: 3 days
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.
suffer from the race condition that motivated revision 1.94. Consequently,
the work-around that was implemented by revision 1.94 is no longer needed.
Moreover, reverting this work-around eliminates the need for
vfs_busy_pages() to acquire the page queues lock when preparing a buffer
for read.
Reviewed by: tegge
directly in cpu_reset() in order to idle the APs before exiting
the kernel and letting the BSP enter the firmware so that processes
like init(8) which still might be running on an AP at that point
don't cause a panic there when it crashes due to the fact it no
longer can be supported by the kernel.
MFC after: 3 days
adapted to MPSAFE cam(4) to a isp(4) specific callout structure.
Thanks to Florian Smeets for providing access to a machine exhibiting
this problem for debugging.
Approved by: mjacob
MFC after: 3 days
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.
Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.
For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.
Approved by: kib (mentor)
designation of the emulated kernel version.
linux_kernver() returns integer value formatted as 'VVVMMMIII' where
VVV - version, MMM - major revision, III - minor revision.
Approved by: kib (mentor)
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.
Pointed out by: bde
Approved by: kib (mentor)
- Just use #error when including <sys/ioctl.h> in the kernel. Code
hasn't used this header for years now and probably doesn't compile
anyway, because of -Werror.
- Get rid of struct ttysize, TIOCGSIZE and TIOCSSIZE. All code nowadays
use both TIOC[GS]SIZE and TIOC[GS]WINSZ. Because we have other popular
systems that don't implement the first, it's of little use to support
interfaces nowadays.
Credential might need to hang around longer than its parent and be used
outside of mnt_explock scope controlling netcred lifetime. Use separate
reference-counted ucred allocated separately instead.
While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up
some unused declarations in new NFS code.
Reported by: John Hickey
PR: kern/133439
Reviewed by: dfr, kib
At first I allowed ioctl_compat.h to be included, but it just returned
an empty file. I had to do this, to keep kdump happy. I really want to
raise a compiler error when including this header, so now it will just
throw an error if you don't set COMPAT_43TTY.
that vnode_pager_input_smlfs() zeroes the page, it should not mark the page
as valid until after the page is zeroed. Otherwise, the page could be
mapped for read access (e.g., by vm_map_pmap_enter()) before the page is
zeroed. Reviewed by: tegge
Eliminate gratuitous clearing of the page's dirty mask by
vnode_pager_input_smlfs(). Instead, assert that the page is clean.
Reviewed by: tegge
Eliminate some blank lines.
Eliminate pointless calls to pmap_clear_modify() and vm_page_undirty() from
vnode_pager_input_old(). The page is not mapped. Therefore, it cannot have
any page table entries that are modified.
Eliminate an incorrect comment from vnode_pager_generic_getpages().
'net.inet.ip.fw.default_to_accept'. The current value can also be queried
via a read-only sysctl of the same name.
Requested by: plosher
MFC after: 1 week
I really don't want any pieces of code to include ioctl_compat.h, so let
the ibcs2 and svr4 compat leave sgtty alone. If they want to support
sgtty, they should emulate it on top of termios, not sgtty.
The code has been marked with BURN_BRIDGES for a long time. ibcs2 and
svr4 are not really popular pieces of code anyway.
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg. Struct vprocg is likely to become replaced in the near future with
a new jail management API import.
As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.
Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.
Permit kldload / kldunload operations to be executed only from the default
vimage context.
This change should have no functional impact on nooptions VIMAGE kernel
builds.
Reviewed by: bz
Approved by: julian (mentor)
This is based on a fix that went in to opensolaris on March 9th. However, it uses a dedicated
thread instead of a Solaris' taskq to avoid doing a blocking memory allocation with the vnode
interlock held.
This fixes a long-time deadlock in ZFS. This is not, strictly speaking, an LOR. The spa_zio
thread releases a vnode, this calls in to vn_reclaim which in turn needs to acquire range locks
to sync dirty data out to disk. The range locks are already held by a user-level process waiting
on a condition variable that it the process is waiting on a spa_zio thread to signal it on. The
process could not be signalled because the spa_zio thread could not proceed.
The nature of this problem was not apparent due to ZFS locks opting out of witness which meant
that DDB did not know about the locks that were held by ZFS.
Reviewed by: pjd
MFC after: 7 days
OSD-based jail extensions. This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.
Reviewed by: dchagin, kib
Approved by: bz (mentor)
It turns out if we called cfmakeraw() on a TTY with only a rint handler
in place, it could inject data into the TTY, even though it should be
redirected. Always take a look at the hooks before looking at the
termios flags.
which is available for Glibc as sysconf(_SC_CLK_TCK). If AT_CLKTCK entry is
not exported, Glibc uses 100.
linux_times() shall use the value that is exported to user space.
Pointyhat to: dchagin
PR: kern/134251
Approved by: kib (mentor)
MFC after: 2 weeks
The entire world seems to use the non-standard TIOCSCTTY ioctl to make a
TTY a controlling terminal of a session. Even though tcsetsid(3) is also
non-standard, I think it's a lot better to use in our own source code,
mainly because it's similar to tcsetpgrp(), tcgetpgrp() and tcgetsid().
I stole the idea from QNX. They do it the other way around; their
TIOCSCTTY is just a wrapper around tcsetsid(). tcsetsid() then calls
into an IPC framework.
Use the protocol family constants for the domain argument validation.
Return EAFNOSUPPORT in case when the incorrect domain argument
is specified.
Return EPROTONOSUPPORT instead of passing values that are not 0
to the BSD layer.
Suggested by: rwatson
Approved by: kib (mentor)
MFC after: 1 month
sc_rixmap is an inverse map
NB: could eliminate the check for an invalid rate by filling in 0 for
invalid entries but the rate control modules use it to identify
bogus rates so leave it for now
This is necessary for two reasons:
1) In order to avoid collisions with the use of a BIOs flags set by a consumer
or a provider
2) Because GV_BIO_DONE was used to mark a BIO as done, not enough flags was
available, so the consumer flags of a BIO had to be misused in order to
support enough flags. The new queue makes it possible to recycle the
GV_BIO_DONE flag into GV_BIO_GROW.
As a consequence, gvinum will now work with any other GEOM class under it or
on top of it.
- Use bio_pflags for storing internal flags on downgoing BIOs, as the requests
appear to come from a consumer of a gvinum volume. Use bio_cflags only for
cloned BIOs.
- Move gv_post_bio to be used internally for maintenance requests.
- Remove some cases where flags where set without need.
PR: kern/133604
as they do not really relate and to prepare for an additional queue to be
covered by the BIO queue mutex.
- Implement wrappers for fetching the next element from the event queue as well
as for putting a new element into the BIO queue.
(i.e. seems to be) already set.
This should reduce console noise due to curvnet recursion reports.
This change has no impact on nooptions VIMAGE builds.
Approved by: julian (mentor)
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.
This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.
The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.
The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.
This change also introduces a DDB subcommand to show the list of all
vnet instances.
Approved by: julian (mentor)
to module builds. This avoids having to have the module builds walk up
the tree to find the kernel sources. It also allows a kernel + module
build to succeed when a new level of module subdirectories is added without
requiring that the /usr/share/mk/bsd.kmod.mk file on the machine be patched.
MFC after: 1 week
fix SMP topology detection. On i386, we extend it to cover Core, Core 2,
and Core i7 processors, not just Pentium 4 family, and move it to better
place. On amd64, all supported Intel CPUs should have this MSR.
support for NFSv4 as well as NFSv2 and 3.
It lives in 3 subdirs under sys/fs:
nfs - functions that are common to the client and server
nfsclient - a mutation of sys/nfsclient that call generic functions
to do RPCs and handle state. As such, it retains the
buffer cache handling characteristics and vnode semantics that
are found in sys/nfsclient, for the most part.
nfsserver - the server. It includes a DRC designed specifically for
NFSv4, that is used instead of the generic DRC in sys/rpc.
The build glue will be checked in later, so at this point, it
consists of 3 new subdirs that should not affect kernel building.
Approved by: kib (mentor)
and hide it inside of atrtc driver. Add new tunable hint.atrtc.0.clock
controlling it. Setting it to 0 disables using RTC clock as stat-/
profclock sources.
Teach i386 and amd64 SMP platforms to emulate stat-/profclocks using i8254
hardclock, when LAPIC and RTC clocks are disabled.
This allows to reduce global interrupt rate of idle system down to about
100 interrupts per core, permitting C3 and deeper C-states provide maximum
CPU power efficiency.
Broadcom BCM43xx chipsets. This driver uses the v3 firmware that
needs to be fetched separately. A port will be committed to create
the bwi firmware module.
The driver matches the following chips: Broadcom BCM4301, BCM4307,
BCM4306, BCM4309, BCM4311, BCM4312, BCM4318, BCM4319
The driver works for 802.11b and 802.11g.
Limitations:
This doesn't support the 802.11a or 802.11n portion of radios.
Some BCM4306 and BCM4309 cards don't work with Channel 1, 2 or 3.
Documenation for this firmware is reverse engineered from
http://bcm.sipsolutions.net/
V4 of the firmware is needed for 11a or 11n support
http://bcm-v4.sipsolutions.net/
Firmware needs to be fetched from a third party, port to be committed
# I've tested this with a BCM4319 mini-pci and a BCM4318 CardBus card, and
# not connected it to the build until the firmware port is committed.
Obtained from: DragonFlyBSD, //depot/projects/vap
Reviewed by: sam@, thompsa@
leading to a bug, when C-state does not decrease on sleep shorter then
declared transition latency. Fixing this deprecates workaround for broken
C-states on some hardware.
By the way, change state selecting logic a bit. Instead of last sleep
time use short-time average of it. Global interrupts rate in system is a
quite random value, to corellate subsequent sleeps so directly.
sleepable context for net80211 driver callbacks. This removes the need for USB
and firmware based drivers to roll their own code to defer the chip programming
for state changes, scan requests, channel changes and mcast/promisc updates.
When a driver callback completes the hardware state is now guaranteed to have
been updated and is in sync with net80211 layer.
This nukes around 1300 lines of code from the wireless device drivers making
them more readable and less race prone.
The net80211 layer has been updated as follows
- all state/channel changes are serialised on the taskqueue.
- ieee80211_new_state() always queues and can now be called from any context
- scanning runs from a single taskq function and executes to completion. driver
callbacks are synchronous so the channel, phy mode and rx filters are
guaranteed to be set in hardware before probe request frames are
transmitted.
Help and contributions from Sam Leffler.
Reviewed by: sam
Reimplement "kernel_pmap" in the standard way.
Eliminate unused variables. (These are mostly variables that were
discarded by the machine-independent layer after FreeBSD 4.x.)
Properly handle a vm_page_alloc() failure in pmap_init().
Eliminate dead or legacy (FreeBSD 4.x) code.
Eliminate unnecessary page queues locking.
Eliminate some excess white space.
Correct the synchronization of pmap_page_exists_quick().
Tested by: gonzo
MAC_BOOLEAN -> MAC_POLICY_BOOLEAN
MAC_BOOLEAN_NOSLEEP -> MAC_POLICY_BOOLEANN_NOSLEEP
MAC_CHECK -> MAC_POLICY_CHECK
MAC_CHECK_NOSLEEP -> MAC_POLICY_CHECK_NOSLEEP
MAC_EXTERNALIZE -> MAC_POLICY_EXTERNALIZE
MAC_GRANT -> MAC_POLICY_GRANT
MAC_GRANT_NOSLEEP -> MAC_POLICY_GRANT_NOSLEEP
MAC_INTERNALIZE -> MAC_POLICY_INTERNALIZE
MAC_PERFORM -> MAC_POLICY_PERFORM_CHECK
MAC_PERFORM_NOSLEEP -> MAC_POLICY_PERFORM_NOSLEEP
This frees up those macro names for use in wrapping calls into the MAC
Framework from the remainder of the kernel.
Obtained from: TrustedBSD Project
Restore previous behaviour for the case of unknown interrupt. Invocation
of IRQ -1 crashes my system on resume. Returning 0, as it was, is not
perfect also, but at least not so dangerous.
IRQ0 routing on LAPIC-enabled systems.
Add hint.apic.0.clock tunable. Setting it 0 disables using LAPIC timers
as hard-/stat-/profclock sources falling back to using i8254 and rtc timers.
On modern CPUs LAPIC is a part of CPU core which is shutting down when CPU
enters C3 or deeper power state. It makes no problems for interrupt
processing, as chipset wakes up CPU on interrupt triggering. But entering
C3 state kills LAPIC timer and freezes system time, making C3 and deeper
states practically unusable. Using i8254 timer allows to avoid this
problem.
By using i8254 timer my T7700 C2D CPU with UP kernel successfully enters
C3 state, saving more then a Watt of total idle power (>10%) in addition to
all other power-saving techniques.
This technique is not working for SMP yet, as only one CPU receives
timer interrupts. But I think that problem could be fixed by forwarding
interrupts to other CPUs with IPI.
Old implemention used Giant to protect the kernel data structures,
but at the same time called malloc(M_WAITOK), that could cause the
calling thread to sleep and lost Giant protection. User-visible
result was the missed wakeup.
New implementation uses one sx lock per futex. The sx protects
the futex structures and allows to sleep while copyin or copyout
are performed.
Unlike linux, we return EINVAL when FUTEX_CMP_REQUEUE operation
is requested and either caller specified futexes are equial or
second futex already exists. This is acceptable since the situation
can only occur from the application error, and glibc falls back to
old FUTEX_WAKE operation when FUTEX_CMP_REQUEUE returns an error.
Approved by: kib (mentor)
MFC after: 1 month
- Generate fake channel interrupts even if channel busy with previous
request to let it finish. Without this, dumping requests were just queued
and never processed.
- Drop pre-dump requests queue on dumping. ATA code, working in dumping
(interruptless) mode, unable to handle long request queue. Actually, to get
coherent dump we anyway should do as few unrelated actions as possible.
Yukon from common multicast handling code. Yukon uses hash-based
multicast filtering(big endian form) but GENESIS uses perfect
multicast filtering as well as hash-based one(little endian form).
Due to the differences of multicast filtering there is no much
sense to have a common code.
o Remove sk_setmulti() and introduce sk_rxfilter_yukon(),
sk_rxfilter_yukon() that handles multicast filtering setup.
o Have sk_rxfilter_{yukon, genesis} handle promiscuous mode and
nuke sk_setpromisc(). This simplifies ioctl handler as well as
giving a chance to check validity of Rx control register of
Yukon.
o Don't reinitialize controller when IFF_ALLMULTI flags is changed.
o Nuke sk_gmchash(), it's not needed anymore.
o Always reconfigure Rx control register whenever a new multicast
filtering condition is changed. This fixes multicast filtering
setup on Yukon.
PR: kern/134051
- Avoid possible divide-by-zero panic on SMP system when the CPUID is
disabled, unsupported, or buggy.
Submitted by: pluknet (pluknet at gmail dot com)[1]
- Probe supported sleep states from acpi_attach() just once and do not
call AcpiGetSleepTypeData() again. It is redundant because
AcpiEnterSleepStatePrep() does it any way.
- Treat UNKNOWN sleep state as NONE, i.e., "do nothing", and remove obscure
NONE state (ACPI_S_STATES_MAX + 1) to avoid confusions.
- Do not set unsupported sleep states as default button/switch events.
If the default sleep state is not supported, just set it as UNKNOWN/NONE.
- Do not allow sleep state change if the system is not fully up and running.
This should prevent entering S5 state multiple times, which causes strange
behaviours later.
- Make sleep states case-insensitive when they are used with sysctl(8).
For example,
sysctl hw.acpi.lid_switch_state=s1
sysctl hw.acpi.sleep_button_state=none
are now legal and equivalent to the uppercase ones.
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:
1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:
options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet
default build: vnet_net_0._ifnet
options VIMAGE_GLOBALS: ifnet
2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:
INIT_VNET_NET(ifp->if_vnet); becomes
struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];
3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.
4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.
5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.
6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.
Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.
Reviewed by: bz, rwatson
Approved by: julian (mentor)
Feature is controlled by hint.ata.X.pm_level tunable:
0 - PM disabled, old behaviour, default.
1 - device is allowed to initiate PM state change, host is passive.
2 - host initiates PARTIAL state transition every time port is idle.
3 - host initiates SLUMBER state transition every time port is idle.
PARTIAL state has up to 100us (50us for me) wakeup latency, but for my
ICH8M saves 0.5W of power per drive. SLUMBER state has up to 10ms (3.5ms
for me) wakeup latency, but saves 0.8W of power.
Modes 2 and 3 are implemented only for AHCI driver now.
Interface power management is incompatible with device presence detection
(host receives no signal from drive, so unable to monitor it), so later is
disabled when PM is used.
interface as nmount(2). Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
This replaces jail(2).
* jail_get, to read the parameters of existing jails. This replaces the
security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes. The current jail(2) system
call still exists, though it is now a stub to jail_set(2).
Approved by: bz (mentor)
import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.
NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
This has the effect that IPv6 multicast traffic won't trigger
an SPI allocation when IPSEC is in use, however, this obviously
needs to stomp on locks, and IN6_LOOKUP_MULTI() is about to go away.
This definitely needs to be revisited before 8.x is branched as
a release branch.
for use by MLDv2.
Add IPv6 SSM socket layer membership vector size constants and
tree bounds.
Remove unreferenced struct ipv6_mreq_source; SSM for IPv6 goes
straight to the RFC 3678 socket options.
incorrectly output, if the RB-tree enumeration happened to reuse the
same chain for a mode switch: that is, both ALLOW and BLOCK records
were appended for the same group, in the same mbuf packet chain.
This was introduced during an mbuf chain layout bug fix involving
m_getptr(), which obviously cannot count from offset 0 on the
second pass through the RB-tree when serializing the IGMPv3
group records into the pending mbuf chain.
Cut over to KTR_INET for IGMPv3 CTR usage.
respectively, as placeholder for future use of CTR() by
the networking code (MLDv2 will be going in shortly).
Mark KTR_SPARE* as being used in fact by cxgb (picked up
on a 'make universe' run).
topology of nehalem/corei7 based systems.
- Remove the cpu_cores/cpu_logical detection from identcpu.
- Describe the layout of the system in cpu_mp_announce().
Sponsored by: Nokia
and it only optimized out an ipi or mwait in very few cases.
- Skip the adaptive idle code when running on SMT or HTT cores. This
just wastes cpu time that could be used on a busy thread on the same
core.
- Rename CG_FLAG_THREAD to CG_FLAG_SMT to be more descriptive. Re-use
CG_FLAG_THREAD to mean SMT or HTT.
Sponsored by: Nokia
root cpuset of that jail.
Processes inside the jail will still be able to change child sets.
A superuser outside of a jail will still be able to change the jail cpuset
and thus limit the number of cpus available to the jail.
Problem reported by: 000.fbsd@quip.cz (Miroslav Lachman)
PR: kern/134050
Reviewed by: jeff
MFC after: 3 weeks
X-MFC: backout r191596
- Add some missing const.
- Move the size of the window spun by the registers to the softc
as neither using va_mem_size for this nor va_mem_base for the
start of the bus addresses is appropriate.
MFC after: 1 week
This change adds (possibly redundant) early check for invalid
state input parameter (including S0). Handling of S5 request
is reduced to simply calling shutdown_nice(). As a result
control flow of acpi_EnterSleepState is somewhat simplified
and resume/backout half of the function is not executed
for S5 (soft poweroff) request and invalid state requests.
Note: it seems that shutdown_nice may act as nop when initproc
is already initialized (to grab pid of 1), but init process is in
"pre-natal" state.
Tested by: Fabian Keil <fk@fabiankeil.de>
Reviewed by: njl, jkim
Approved by: rpaulo
KAME had explicit checks at one point using it, so just hide it behind
#if 0 for now until we are sure if we can completely dump it or not.
MFC after: 1 month
used for s/w retransmit schemes that want to access this information
w/o the overhead of decoding the raw frame. Note this also allows
drivers to record this information w/o writing the frame when the seq#
is obtained through an out-of-band mechanism (e.g. when a h/w assigned
seq# is reported in a descriptor on tx done notification).
Reviewed by: sephe, avatar
processed by ipfw once - avoid second ipfw_chk() call.
This saves us from unnecessary IPFW_RLOCK(), m_tag_find() calls and
ip/tcp/udp header parsing.
MFC after: 2 month
controllers may be configured as legacy IDE mode by modifying subclass and
progif without actually changing PCI device IDs. Instead of complicating
code, we always force AHCI mode while probing. Also we restore AHCI mode
while resuming per ATI/AMD register programming/requirement guides.
- Fix SB700/800 "combined" mode. Unlike SB600, this PATA controller can
combine two SATA ports and emulate one PATA channel as primary or secondary
depending on BIOS configuration. When the combined mode is disabled, this
channel disappears and it works just like SB600 PATA controller, however.
- Add more PCI device IDs for SB700/800 and adjust device descriptions.
SB800 shares the same PCI device IDs and added two more SATA IDs.
the nfsv4 Change attribute. There are 2 changes:
1 - The value now changes on metadata changes as well as data
modifications (incremented for IN_CHANGE instead of IN_UPDATE).
2 - It is now saved in spare space in the on-disk i-node so that it
survives a crash.
Since va_filerev is not passed out into user space, the only current
use of va_filerev is in the nfs server, which uses it as the directory
cookie verifier. Since this verifier is only passed back to the server
by a client verbatim and then the server doesn't check it, changing the
semantics should not break anything currently in FreeBSD.
Reviewed by: bde
Approved by: kib (mentor)
this driver to compile and limp along with the new layer. These changes
do not deal with proper locking around access to the HW. This is only
a starting point. I have not tested modem control but tip seems to work
okay and I can send and receive characters which I needed for one of my
-current boxes. I have not tied this driver back up to the build since
I don't want people to think it is ready for prime time. If anyone
else has some cycles to work on this feel free to!
Also add support for a 16 port PCI interface I have at work.
Glanced at by: ed
- Update mxge to use if_transmit(), and the new buf_ring
interfaces, so as to enable multiple transmit queues.
Use of if_transmit() is conditional on IFNET_BUF_RING,
and is enabled by default (as in if_em).
- Record a flow id on receive if receive hashing is active.
I currently only record the rx ring id (0..8) rather than
the 32-bit topelitz hash result, as doing the latter would
require shifting the driver to use a larger rx return ring.
Sponsored by: Myricom, Inc.
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.
Reviewed by: bz (an older version of the patch)
Approved by: julian (mentor)
Instead of using the antique insque()/remque() functions from
sys/queue.h, make this code use its own versions. Eventually the code
should just use the regular TAILQ/LIST macros.
(colloquially known as if_addrlist). Currently not acquired around
interface address loops that call out to the routing code due to
potential lock order issues.
MFC after: 3 weeks
initialize / release netgraph related state in iattach() / idetach()
functions called via the vnet module registration / initialization
framework, instead of initialization / cleanups being done in
mod_event handlers.
While here, introduce a crude hack aimed at preventing ng_ether to
autoattach to ng_eiface ifnets, which are also netgraph nodes already.
Reviewed by: bz
Approved by: julian (mentor)
first introduced @ r190909 with a vnet module deregistration
service.
kldunloadable modules, which are currently using vnet_mod_register()
to attach their per-vnet initialization routines to the vnet
initialization framework, should call vnet_mod_deregister() before
acknowledging MOD_UNLOAD requests in their mod_event handlers. Such
changes to the existing code base will follow in subsequent commits.
vnet_mod_deregister() does not check whether departing vnet modules
are registered as prerequisites for another module(s), so it should
be used with care. Currently I'm only aware of vnet modules which
are leafs on module dependency graphs that are kldunloadable.
This change also introduces per-vnet module destructor handler, which
calls vnet's module cleanup function, which (if required) has to be
registered in vnet module's vnet_modinfo_t structure .vmi_idetach
field. Once options VIMAGE becomes operational, the framework will
take care that module's cleanup function become invoked for each
active vnet instance, and that the memory allocated for each instance
gets freed. Currently calls to destructor handlers must always
succeed.
lookup of 'ia' from if_addrhead through most use. Note that we
currently have to drop it prematurely in some cases due to calls out to
the routing and interface code while using 'ia', but this closes many
races. Annotate several potential races that persist after this change.
Move to using M_NOWAIT for allocating new interface addresses due to
lock(s) being held.
MFC after: 3 weeks
This allows users to increase the maximum amount of pseudo-terminals
without changing any source code. Users must increase UT_LINESIZE before
attempting to increase kern.pts_maxdev.
pmap_clear_modify() on a page is pointless if that page is not mapped or
it is only mapped for read access. Instead, assert that the page is not
mapped or not mapped for write access as appropriate.
Eliminate unnecessary clearing of a page's dirty mask. Instead, assert
that the page's dirty mask is clear.
of the implementation of ioctls. This makes the mapping of ioctls to
specific privileges more explicit, and also simplifies the
implementation by reducing the use of FALLTHROUGH handling in switch.
While this is not intended to be a functional change, it does mean
that certain privilege checks are now performed earlier, so EPERM
might be returned in preference to EADDRNOTAVAIL for management
ioctls that could have failed for both reasons.
MFC after: 3 weeks
When memory is not zero'ed by firmware, uninitialized PCB can have bogus
contents, which appear as a saved onfault condition, Altivec context to
restore etc. and lead to corruption/crashes. This commit fixes such issues.
Submitted by: Michal Mazur arg ! semihalf dot com
Tested by: Andreas Tobler andreast-list ! fgznet dot ch
controller in the VIA southbridge functional in the CDS
(Configurable Development System) for MPC85XX.
The embedded USB controllers look operational but the
interrupt steering is still wrong.
that it is easier to lock:
- Handle the unsupported ioctl case at the beginning of in_control(),
handing off to ifp->if_ioctl, rather than looking up interfaces and
addresses unnecessarily in this case.
- Make it an invariant that ifp is always non-NULL when running
in_control()-implemented ioctls, simplifying the code structure.
MFC after: 3 weeks
vmopag" implementation. The vm_page_lookup() code modifies splay tree
of the object pages, and asserts that object lock is taken. First issue
could cause kernel data corruption, and second one instantly panics the
INVARIANTS-enabled kernel.
Take the advantage of the fact that object->memq is ordered by page index,
and iterate over memq to calculate the runs.
While there, make the code slightly more style-compliant by moving
variables declarations to the right place.
Discussed with: jhb, alc
Reviewed by: alc
MFC after: 2 weeks
used in some cases):
- Ignore DMA tag boundaries when allocating bounce pages. The boundaries
don't determine whether or not parts of a DMA request bounce. Instead,
they are just used to carve up segments.
- Allow tags with sub-page alignment to share bounce pages since bounce
pages are always page aligned.
Reviewed by: scottl (amd64)
MFC after: 1 month
(1) Don't manually configure if_output(), ether_ifattach() will do that
for us as part of link-layer setup.
(2) Call if_detach() before stopping nve in order to prevent calls into
the device driver after the driver has started shutting down.
Reviewed by: jhb
MFC after: 2 weeks
interface pointer, but also a reference to it.
Modify ifioctl() to use ifunit_ref(), holding the reference until
all ioctls, etc, have completed.
This closes a class of reader-writer races in which interfaces
could be removed during long-running ioctls, leading to crashes.
Many other consumers of ifunit() should now use ifunit_ref() to
avoid similar races.
MFC after: 3 weeks
pointers to "dead" implementations that no-op rather than invoking
the device driver. This would generally be unexpected and
possibly quite badly handled by most device drivers after
if_detach() has completed.
Reviewed by: bms
MFC after: 3 weeks
if_alloc(), and portions of data structure destruction from if_detach()
to if_free(). These changes leave more of the struct ifnet in a
safe-to-access condition between alloc and attach, and between detach
and free, and focus on attach/detach as stack usage events rather than
data structure initialization.
Affected fields include the linkstate task queue, if_afdata lock,
address lists, kqueue state, and MAC labels. ifq_attach() ifq_detach()
are not moved as ifq_attach() may use a queue length set by the device
driver between if_alloc() and if_attach().
MFC after: 3 weeks
calls if_free(), and remains set if the refcount is elevated. IF_DYING
skips the bit in the if_flags bitmask previously used by IFF_NEEDSGIANT,
so that an MFC can be done without changing which bit is used, as
IFF_NEEDSGIANT is still present in 7.x.
ifnet_byindex_ref() checks for IFF_DYING and returns NULL if it is set,
preventing new references from by acquired by index, preventing
monitoring sysctls from seeing it. Other lookup mechanisms currently
do not check IFF_DYING, but may need to in the future.
MFC after: 3 weeks
operates in the common memory mode and use polling mode to control
the status of operations as I don't have any board with interrupt
line routed yet. I'll add the GPIO interrupt driven mode as soon
as I get one.
logical CPUs in a package. We do this by numbering the non-boot CPUs
by starting with the first CPU whose APIC ID is after the boot CPU and
wrapping back around to APIC ID 0 if needed rather than always starting
at APIC ID 0. While here, adjust the cpu_mp_announce() routine to list
CPUs based on the mapping established by assign_cpu_ids() rather than
making assumptions about the algorithm assign_cpu_ids() uses.
MFC after: 1 month
Change the roothub exec functions to take the usb request and data pointers
directly rather than placing them on the parent bus struct.
Submitted by: Hans Petter Selasky
Fix a bug in the USB power daemon code where connection of multiple HUBs in
series would result in incorrect device suspend.
Reported by: Nicolas xxx@wanadoo.fr
Submitted by: Hans Petter Selasky
Use direct reference to parent high-speed HUB instead of indirect, due to
pointer clearing race at detach of parent USB HUB.
Reported by: kientzle
Submitted by: Hans Petter Selasky
PR: usb/133545