support for VMFS_ALIGNED_SPACE, which requests the allocation of an
address range best suited to superpages. The old options TRUE and FALSE
are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no
immediate need to update all of vm_map_find(9)'s callers.
While I'm here, correct a misstatement about vm_map_find(9)'s return
values in the man page.
hand, it may cause other threads to sleep since kqueue_scan() may mark
some knotes as infux. This could lead to the deadlock.
Before kqueue_scan() sleeps, wakeup the threads that are waiting for the
influx knotes produced by this thread.
Tested by: pho (previous version)
Reviewed by: jmg
MFC after: 2 weeks
closed is the legitimate situation. For instance, filedescriptor with
registered events may be closed in parallel with closing the kqueue.
Properly handle the case instead of asserting that this cannot happen.
Reported and tested by: pho
Reviewed by: jmg
MFC after: 2 weeks
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
syncache that has an invalid SEQ instead of only doing it when we suceed
in mallocing space for the log message.
MFC after: 1 week
Reviewed by: sam, bz
for UPA it should have fulfilled its purpose by now and Fireplane-
and JBus-based machines are way to messy in organization to implement
something equivalent.
- Fix a bunch of style(9) bugs.
Handle cases where dma function pointers may be NULL, and where
the max_iosize can't be derived from a DMA data structure. For
the latter, revert to the prior behaviour of using DFLTPHYS for
the max i/o size when there is no other data.
Reviewed by: marcel
No objection by: sos
aligned on an 8 byte boundary. Prior to rev 1.36 this wasn't a problem
because mbuf clusters tend be naturally aligned. The switch to using
split buffers with the first buffer being the embedded data area of the
mbuf has broken this assumption, at least on i386, causing a complete
failure of RX functionality. Fix this for now by using a full cluster for
the first RX buffer. A more sophisticated approach could be done with the
old buffer scheme to realign the m_data pointer with m_adj(), but I'm also
not clear on performance benefits of this old scheme or the performance
implications of adding an m_adj() call to every allocation.
and its children in the form:
"parent","child"
so that head and bottom of an oriented graph can be easilly detected and
various form of diagrams can be build.
The sysctl is called debug.witness.graphs and it is read-only; in order
to get the list of relations, a simple:
#sysctl debug.witness.graphs
will do the trick.
This approach has been choosen in order to support easilly things like
the DOT format and such. Soon, an auto-explicative awk script, which
filters simple informations returned by the sysctl and converts them into
a real DOT script, will be committed to the repository between examples.
Discussed with: rwatson
counter-timer timecounter so the associated SYSCTL nodes don't clash on
machines having multiple U2P and U2S bridges as well as establishing a
clear mapping between these bridges and their timecounter device.
- Don't bother setting up a "nice" name for the IOMMU, just use the name
returned by device_get_nameunit(9), too.
- Fix some minor style(9) bugs.
- Use __FBSDID in counter.c
MFC after: 1 week
VSOCK has been added as cache target. Now they process
not only VDIR but also VSOCK.
- fixed panic issue caused by cache incorrect free process
by "umount -f"
Submitted by: Masanori OZAWA <ozawa@ongs.co.jp>
MFC after: 1 week
perform various operations on a controller. Specifically, for each mpt(4)
device, create a character device in devfs which accepts ioctl requests for
reading and writing configuration pages and performing RAID actions.
MFC after: 1 week
Reviewed by: scottl
than checking whether audit is enabled globally, instead check whether
the current thread has an audit record. This avoids entering the audit
code to collect argument data if auditing is enabled but the current
system call is not of interest to audit.
MFC after: 1 week
Sponsored by: Apple, Inc.
method:
- If the last of the child cpufreq drivers returns an error while trying to
fetch its list of supported frequencies but an earlier driver found the
requested frequency, don't return an error to the caller.
- If all of the child cpufreq drivers fail and the attempt to match the
frequency based on 'cpu_est_clockrate()' fails, return ENXIO rather than
returning success and returning a frequency of CPUFREQ_VAL_UNKNOWN.
MFC after: 3 days
PR: kern/121433
Reported by: Eugene Grosbein eugen ! kuzbass dot ru
all cards/modes.
In addition to the intr forcing added with rev. 1.205 adopt the other
places to use the same logic.
We need to exclude a few chips/revisions (5700, 5788) from using the
enhanced version and fall back to the old way as that is the only
method they support.
Tested by: phk
Suggested by: davidch, Broadcom (thanks a lot for the help!)
MFC after: 16 days
- add / remove clients from cxgb_main.c now
- change ifdef TOE_ENABLED to TCP_OFFLOAD_DISABLE
- update copyrights
- fix transmit data mismatch bug caused by not setting SB_NOCOALESCE
on tx sockbuf on passive connections
- fix receive sequence mismatch bug caused by not setting SB_NOCOALESCE
on rx sockbuf on passive connections
- don't sleep without checking SBS_CANTRCVMORE first
- various ddp ordering fixes
Supported by: Chelsio Inc.
ALT_BREAK_TO_DEBUGGER. In addition to "Enter ~ ctrl-B" (to enter the
debugger), there is now "Enter ~ ctrl-P" (force panic) and
"Enter ~ ctrl-R" (request clean reboot, ala ctrl-alt-del on syscons).
We've used variations of this at work. The force panic sequence is
best used with KDB_UNATTENDED for when you just want it to dump and
get on with it.
The reboot request is a safer way of getting into single user than
a power cycle. eg: you've hosed the ability to log in (pam, rtld, etc).
It gives init the reboot signal, which causes an orderly reboot.
I've taken my best guess at what the !x86 and non-sio code changes
should be.
This also makes sio release its spinlock before calling KDB/DDB.
which are also likely to be irrelevant for sun4v (there's no SBus on sun4v
and only some EBus devices). While at it fix some style bugs according to
style.Makefile(5) where appropriate.
MFC after: 3 days
mount fs needing Giant to be held when processing bufobjs.
Use a different subqueue for pending workitems on filesystems requiring
Giant. This simplifies the code notably and also reduces the number of
Giant acquisitions (and the whole processing cost).
Suggested by: jeff
Reviewed by: kib
Tested by: pho
- Limit grabbing the lock to SIOCSIFFLAGS.
- Move ieee80211_start_all() to SIOCSIFFLAGS.
- Remove SIOCSIFMEDIA as it is not useful.
- Limit ether_ioctl to only SIOCGIFADDR. SIOCSIFADDR and SIOCSIFMTU have no
affect as there is no input/output path in the vap parent. The vap code
will handle the reinit of the mac address changes.
- Split off ndis_ioctl_80211 as it was getting too different to wired devices.
This fixes a copyout while locked and a lock recursion.
Reviewed by: sam
to profile outoing packets for a number of mbuf chain
related parameters
e.g. number of mbufs, wasted space.
probably will do with further work later.
Reviewed by: various
10/100 operation and place the mailbox registers at a different offset.
They also do not have an EEPROM, so the MAC address must be read from
NVRAM instead.
MFC after: 1 month
PR: kern/118975
Submitted by: benjsc, Thomas Nyström thn at saeab dot se
Submitted by: sephe (original patch for DragonflyBSD)
o The function is defined unconditionally but depends on SPR_SVR,
which is defined conditionally.
o spr.h defines mfspr() and mtspr(), which is no worse to use.
while holding the socket buffer lock. These leads to an
immediate panic due to recursing the socket buffer lock. This
bug was introduced in uipc_syscalls.c:1.240, but masked by
another bug until that was fixed in uipc_syscalls.c:1.269.
Note that the current fix isn't perfect, but better than
panicking: normally we guarantee that simultaneous invocations
of a system call to write on a stream socket won't be
interlaced, which is ensured by use of the socket buffer sleep
lock. This is guaranteed for the sendfile headers, but not
trailers. In practice, this is likely not a problem, but
should be fixed.
MFC after: 3 days
Pointy hat to: andre (1.240), cperciva (1.269)
Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.
Approved by: imp
total of 6 interrupt resources for scc(4) on macio(4). This
is 3 per channel, of which the 1st of each channel is the
interrupt associated with the SCC. The other 2 are for DMA
operation.
Change scc_bfe_attach() to accept an argument that's the
number of interrupts per channel (ipc) and change each bus
front-end (bfe) to pass that argument through a wrapper
for the device_attach method.
For now, we only allocate the 1st interrupt of each channel
to perserve behaviour.
by the parent for interrupt resources. This corrects parsing of
the interrupts property.
With parsing of the property fixed, add all interrupts to the
resource list. Bump the max. number of interrupts from 5 to 6
as scc(4) attached to macio(4) has 6 interrupts (3 per channel).
Submitted by: Nathan Whitehorn <nathanw@uchicago.edu>
- detect number of LAWs in run time and initalize accordingly
- introduce decode windows target IDs used in MPC8572
- other minor updates
Obtained from: Freescale, Semihalf
doesn't require parts of the Expansion ROM to be copied around,
for obtaining the MAC address on !OFW platforms.
- Don't unnecessarily cache bus space tag and handle nor RIDs
in the softcs of the front-ends.
- Don't use function calls in initializers.
- Let the SBus front-end depend on sbus(4).
info about all currently mounted file systems. When an address is given
as an argument, prints detailed info about the given mount point.
MFC after: 2 weeks
infrastructure. Its only consumer ever was sio(4) and thus was
unused on sparc64 since removing the last traces of sio(4) in
sparc64 configuration files in favor for uart(4) over three
years ago. If similar functionality is required again it should
be brought back as an MD intr_pending() which works for all
busses by using for example interrupt controller hooks.
when creating the parent bus DMA tag. While at it correct the style
and a nearby comment.
- Take advantage of m_collapse(9) for performance reasons.
MFC after: 2 weeks
PR 122839 is fixed in both em and in igb
Second, the issue on building modules since the static kernel
build changes is now resolved. I was not able to get the fancier
directory hierarchy working, but this works, both em and igb
build as modules now.
Third, there is now support in em for two new NICs, Hartwell
(or 82574) is a low cost PCIE dual port adapter that has MSIX,
for this release it uses 3 vectors only, RX, TX, and LINK. In
the next release I will add a second TX and RX queue. Also, there
is support here for ICH10, the followon to ICH9. Both of these are
early releases, general availability will follow soon.
Fourth: On Hartwell and ICH10 we now have IEEE 1588 PTP support,
I have implemented this in a provisional way so that early adopters
may try and comment on the functionality. The IOCTL structure may
change. This feature is off by default, you need to edit the Makefile
and add the EM_TIMESYNC define to get the code.
Enjoy all!!
assumptions about the state of the cooling devices. Instead, switch them
off on init and, only after that, we are in TZ_ACTIVE_NONE.
Submited by: Andriy Gapon <avg at icyb.net.ua>
Reviewed by: njl
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.
Sponsored by: Nokia
o Add CTASSERTs ensuring that HME_NRXDESC and HME_NTXDESC are set to
legal values.
o Use appropriate maxsize, nsegments and maxsegsize parameters when
creating DMA tags and correct some comments related to them.
o The FreeBSD bus_dmamap_sync(9) supports ored together flags for quite
some time now so collapse calls accordingly.
o Add missing BUS_DMASYNC_PREREAD when syncing the control DMA maps in
hme_rint() and hme_start_locked().
o Keep state of the link state and use it to enable or disable the MAC
in hme_mii_statchg() accordingly as well as to return early from
hme_start_locked() in case the link is down.
o Introduce a sc_flags and use it to replace individual members like
sc_pci.
o Add bus_barrier(9) calls to hme_mac_bitflip(), hme_mii_readreg(),
hme_mii_writereg() and hme_stop() to ensure the respective bit
has been written before we starting polling on it and for the right
bits to change.
o Rather just returning in case hme_mac_bitflip() fails and leaving us
in an undefined state report the problem and move on; chances are
the requested configuration will become active shortly after.
o Don't call hme_start_locked() in hme_init_locked() unconditionally
but only after calls to hme_init_locked() when it's appropriate, i.e.
in hme_watchdog().
o Add a KASSERT which asserts nsegs is valid also to hme_load_txmbuf().
o In hme_load_txmbuf():
- use a maximum of the newly introduced HME_NTXSEGS segments instead
of the incorrect HME_NTXQ, which reflects the maximum TX queue
length, for loading the mbufs and put the DMA segments back onto
the stack instead of the softc as 16 should be ok there.
- use the common errno(2) return values instead of homegrown ones,
- given that hme_load_txmbuf() is allowed to fail resulting in a
packet drop for quite some time now implement the functionality of
hme_txcksum() by means of m_pullup(9), which de-obfuscates the code
and allows to always retrieve the correct length of the IP header, [1]
- also add a KASSERT which asserts nsegs is valid,
- take advantage of m_collapse(9) instead of m_defrag(9) for
performance reasons.
o Don't bother to check whether the interface is running or whether its
queue is empty before calling hme_start_locked() in hme_tint(), the
former will check these anyway.
o In hme_intr() call hme_rint() before hme_tint() as gem_tint() may
take quite a while to return when it calls hme_start_locked().
o Get rid of sc_debug and just check if_flags for IFF_DEBUG directly.
o Add a shadow sc_ifflags so we don't reset the chip when unnecessary.
o Handle IFF_ALLMULTI correctly. [2]
o Use PCIR_BAR instead of a homegrown macro.
o Replace sc_enaddr[6] with sc_enaddr[ETHER_ADDR_LEN].
o Use the maximum of 256 TX descriptors for better performance as using
all of them has no additional static cost rather than using just half
of them.
Reported by: rwatson [2]
Suggested by: yongari [1]
Reviewed by: yongari
MFC after: 1 month
in order to get rid of bus space handle and tag in struct sym_hcb.
- Remove unused members related to bus addresses in struct sym_hcb.
- sym(4) takes care of allocating an instance of struct sym_hcb
itself so don't let newbus allocate it as an unused softc also.
- Add basic MPSAFE locking. This includes changing the sym(4) CCBs
to be allocated up-front instead of on demand as needed. Besides
making these allocations more likely to succeed, this also solves
the problem of calling bus_dmamap_create(9) with the SIM mutex
held.
Reviewed by: scottl
MFC after: 1 month
- Remove superfluous returns in functions returning void.
- In sym_alloc_lcb_tags() return directly instead of jumping
to a label which just returns.
- Fix some spelling in comments.
- Remove trailing whitespace.
exit requires entering the audit code. The result is much the same,
but they mean different things.
MFC afer: 3 days
Submitted by: Diego Giagio <dgiagio at gmail dot com>
the method for the (indent == NULL) case (i.e. the kern.geom.conftxt
sysctl). The purpose is to extend the conftxt output with scheme-
specific fields which can be used by libdisk. In particular, have
the schemes dump the xs and xt fields, which contain the backward
compatible values for class type and partition type. This allows
libdisk to work with the legacy slicers as well as with gpart and
helps/promotes migration.
don't send and EOI which works like on amd64/i386 and blocks all
interrupts on the relevant interrupt controller.
o Replace the post_filter and post_inthread hooks registered when
creating the interrupt events with just ic_clear as on sparc64 we
don't need to do any disable->EOI->enable dance to unblock all but
the relevant interrupt while running the filter or handler; just
not clearing the interrupt already has the same effect.
o Merge from amd64/i386:
- Split the intr_table_lock into an sx lock used for most things,
and a spin lock to protect intrcnt_index.
- Add support for binding interrupts to CPUs, including for the
bus_bind_intr(9) interface, a assign_cpu hook and initially
shuffling interrupts arround in a round-robin fashion.
Reviewed by: jhb
MFC after: 1 month
for better structure.
Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.
In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such. All that
is theoretically a matter for userland only.
Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC. For this we have <sys/clock.h>
<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on. These know only seconds and fractions thereof.
Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.
Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c. Remove references to it
elsewhere.
Remove a lot of unnecessary <sys/clock.h> includes.
Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.
communicate between two parts of this one function. This was causing
problems with shared lookups as each would trash the ino value in the
inode.
- Remove the unused i_ino field from the inode structure.
receiving or transmitting.
With IPv6 raw sockets, read lock rather than write lock the inpcb when
receiving. Unfortunately, IPv6 source address selection appears to
require a write lock on the inpcb for the time being.
MFC after: 3 months
interrupt. So, add a new function pointer, arm_post_filter, which defaults
to NULL, and which will be used as the post_filter arg for
intr_event_create(). Set it properly for the AT91, so that it boots again.
Reported by: hps
Note this includes changes to all drivers and moves some device firmware
loading to use firmware(9) and a separate module (e.g. ral). Also there
no longer are separate wlan_scan* modules; this functionality is now
bundled into the wlan module.
Supported by: Hobnob and Marvell
Reviewed by: many
Obtained from: Atheros (some bits)
A lot of testing has shown that the problem people were seeing was due
to invalid padding after the end of option list option, which was corrected
in tcp_output.c rev. 1.146.
Thanks to: anders@, s3raphi, Matt Reimer
Thanks to: Doug Hardie and Randy Rose, John Mayer, Susan Guzzardi
Special thanks to: dwhite@ and BitGravity
Discussed with: silby
MFC after: 1 day
So if we have channel 0..3 devclass_get_maxunit is 4.
It's never been a problem as devclass_get_device() has
catched a possibly bad input.
Discussed with: scottl
when reading credential data from sockets.
Teach pf to unlock the pcbinfo more quickly once it has acquired an
inpcb lock, as the inpcb lock is sufficient to protect the reference.
Assert locks, rather than read locks or write locks, on inpcbs in
subroutines--this is necessary as the inpcb may be passed down with a
write lock from the protocol, or may be passed down with a read lock
from the firewall lookup routine, and either is sufficient.
MFC after: 3 months
deserves its own internet memes). The trick is to force all available,
unused pins (that being advertised as "speaker") to behave as microphone
pins instead.
Reported / Tested by: Dmitry Kutsenko <kutsenko.truebsd.org>
MFC after: 3 days
we're certain the allocation will entierly succeed. This fixes a leak in a
fairly unlikely case.
Reported by: vijay singh <vijjus at rocketmail dot com>
MFC after: 1 week
noise from sio per unit. sio likes to probe if interrupts are configured
correctly by looking at the pending bits of the atpic in order to put a
non-fatal warning on the console. I think I'd rather read the pending
bits from the apics, but I'm not sure its worth the hassle.
move most offload functionality from NIC to TOE
factor out all socket and inpcb direct access
factor out access to locking in incpb, pcbinfo, and sockbuf
as the former is becoming deprecated and exhibits some extraneous
Giant-locking. The new callout(9) is declared MPSAFE, so it may
improve concurrency.
Tested by: matteo
Silence from: wpaul
MFC after: 1 month
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.
This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.
MFC after: 3 months
Tested by: kris (superset of committered patch)
done by understandable macros.
Fix the bug that prevented the system from responding on interfaces with
link local addresses assigned.
PR: 120958
Submitted by: James Snow <snow at teardrop.org>
MFC after: 2 weeks
have separate configuration spaces so by definition they implement
different PCI domains. Thus change psycho(4) to use PCI domains
instead of reenumerating all PCI busses so they have globally unique
bus numbers and drop support for reenumerating busses in the OFW PCI
code.
According to CVS history reenumeration was also required in order to
get some E450 to boot but given that no other open source kernel
changes the PCI bus numbers assigned by the firmware I believe the
real problem was that the old code used the bus number as the device
number for the PCI busses and unlike most of the other machines the
firmwares of the problematic ones don't use disjoint PCI bus numbers
across the host-PCI-bridges.
MFC after: 1 month
This avoids calling busdma in the request processing path which caused a traumatic performance degradation.
Allocation has be postponed to after we know how many devices we possible can have on portmulitpliers to save some space.
two ticks by counting the number of switches and the load when
sched_clock() is called.
- If the busy metric exceeds a threshold allow the idle thread to spin
waiting for new work for a brief period to avoid using IPIs. This
reduces the cost on the sender and receiver as well as reducing wakeup
latency considerably when it works.
Sponsored by: Nokia
variables and sysctl nodes.
- In reset walk the children of kern_sched_stats and reset the counters
via the oid_arg1 pointer. This allows us to add arbitrary counters to
the tree and still reset them properly.
- Define a set of switch types to be passed with flags to mi_switch().
These types are named SWT_*. These types correspond to SCHED_STATS
counters and are automatically handled in this way.
- Make the new SWT_ types more specific than the older switch stats.
There are now stats for idle switches, remote idle wakeups, remote
preemption ithreads idling, etc.
- Add switch statistics for ULE's pickcpu algorithm. These stats include
how much migration there is, how often affinity was successful, how
often threads were migrated to the local cpu on wakeup, etc.
Sponsored by: Nokia
the fact that we have a 1:1 mapping by virtue of the BATs.
Eliminate the now unused moea_rkva_alloc(), moea_pa_map() and
moea_pa_unmap() functions.
Pointed out by: grehan.
rev. 1.149 rework.
It allows to save several percents of CPU time on SMP by using UMA's
internal per-CPU allocation limits instead of own global variable
each time updated with atomics.
Tested with: Netperf cluster
deals with the usual __opendir2() calls, and the rest part with an interface
translator to expose fdopendir(3) functionality. Manual page was obtained from
kib@'s work for *at(2) system calls.
filesystem-specific vnode data to the struct vnode. Provide the
default implementation for the vop_advlock and vop_advlockasync.
Purge the locks on the vnode reclaim by using the lf_purgelocks().
The default implementation is augmented for the nfs and smbfs.
In the nfs_advlock, push the Giant inside the nfs_dolock.
Before the change, the vop_advlock and vop_advlockasync have taken the
unlocked vnode and dereferenced the fs-private inode data, racing with
with the vnode reclamation due to forced unmount. Now, the vop_getattr
under the shared vnode lock is used to obtain the inode size, and
later, in the lf_advlockasync, after locking the vnode interlock, the
VI_DOOMED flag is checked to prevent an operation on the doomed vnode.
The implementation of the lf_purgelocks() is submitted by dfr.
Reported by: kris
Tested by: kris, pho
Discussed with: jeff, dfr
MFC after: 2 weeks
- reorder structures fields (XX_refs) a bit to group fields modified
same time together. According to my tests it gives up to 10%
SMP performance benefit on real workload due to reduced inter-CPU
cache trashing.
- change q_flags from long to int as long is not really needed there and
it's usage with atomics is argued by some people.
- move NGF_WORKQ flag into the separate field q_flags2 as it protected by
queue mutex instead of node writer protection used by the rest of flags.
- move nd_work queue entry to ng_queue structure to which it is more
related and make it STAILQ instead of TAILQ as now it is a classic FIFO.
- remove q_node pointer from ng_queue structure as it is not really needed.
- reimplement item queue using STAILQ instead of own equal implementation.
As soon as BT subsystem has own item queues using ng_item.el_next update
it also.
- change depth field in ng_item from uintptr_t to u_int. It was made
uintptr_t to keep ABI compatibility.
Reviewed by: julian, emax
Tested with: Netperf cluster
inittodr() and resettodr(). Have nexus double as the clock device,
because it's the firmware that provides RTC services. We could
create a special (pseudo-) device for it, but that wasn't superior
enough to actually do it. Maybe later...
Requested by: phk
so credit its authors with contributions to this file. Remove
prototype copyright notice, although one might be warranted if someone
wanted to claim it badly enough.
Noticed by: Simon Burge.
routines in this file. Remove 'place holder' copyright since the
amount that's actually original is small relative to the length of the
file. The contents of this file appear to have originated at DECWRL
by way of NetBSD.
Noticed by: Simon Burge
o Implement IPI_PREEMPT,
o Set td_lock for the thread being switched out,
o For ULE & SMP, loop while td_lock points to blocked_lock for
the thread being switched in,
o Enable ULE by default in GENERIC and SKI,
clearing MSI enable bit for MSI capable hardwares resulted in Tx
problems. MSI enable bit is set only when MSI is requested from
user.
Tested by: remko
(i.e. fixed delivery) to SAPIC_DELMODE_LOWPRI. While the commit
log doesn't mention the change in behaviour, it is believed to be
deliberate. In the last 5.5 years this hasn't been a problem. Nor
do I think did it make any difference, but who knows. However, I
do know that it break SMP support for Montecito-based machines.
Switch back to fixed-CPU delivery so that SMP works again. This
gives me some time to look more closely at the problem, as well
as make sure the I-cache validation as it's implemented currently
is sufficient in SMP configurations...
mips32r2 and mips64r2 (and close relatives) processors. There
presently is support for ADMtek ADM5120, A mips 4Kc in a malta board,
the RB533 routerboard (based on IDT RC32434) and some preliminary
support for sibtye/broadcom designs. Other hardware support will be
forthcomcing.
This port boots multiuser under gxemul emulating the malta board and
also bootstraps on the hardware whose support is forthcoming...
Oleksandr Tymoshenko, Wojciech Koszek, Warner Losh, Olivier Houchard,
Randall Stewert and others that have contributed to the mips2 and/or
mips2-jnpr perforce branches. Juniper contirbuted a generic mips port
late in the life cycle of the misp2 branch. Warner Losh merged the
mips2 and Juniper code bases, and others list above have worked for
the past several months to get to multiuser.
In addition, the mips2 work owe a debt to the trail blazing efforts of
the original mips branch in perforce done by Juli Mallett.
mips32r2 and mips64r2 (and close relatives) processors. There
presently is support for ADMtek ADM5120, A mips 4Kc in a malta board,
the RB533 routerboard (based on IDT RC32434) and some preliminary
support for sibtye/broadcom designs. Other hardware support will be
forthcomcing.
This port boots multiuser under gxemul emulating the malta board and
also bootstraps on the hardware whose support is forthcoming...
Oleksandr Tymoshenko, Wojciech Koszek, Warner Losh, Olivier Houchard,
Randall Stewert and others that have contributed to the mips2 and/or
mips2-jnpr perforce branches. Juniper contirbuted a generic mips port
late in the life cycle of the misp2 branch. Warner Losh merged the
mips2 and Juniper code bases, and others list above have worked for
the past several months to get to multiuser.
In addition, the mips2 work owe a debt to the trail blazing efforts of
the original mips branch in perforce done by Juli Mallett.
mips32r2 and mips64r2 (and close relatives) processors. There
presently is support for ADMtek ADM5120, A mips 4Kc in a malta board,
the RB533 routerboard (based on IDT RC32434) and some preliminary
support for sibtye/broadcom designs. Other hardware support will be
forthcomcing.
This port boots multiuser under gxemul emulating the malta board and
also bootstraps on the hardware whose support is forthcoming...
Oleksandr Tymoshenko, Wojciech Koszek, Warner Losh, Olivier Houchard,
Randall Stewert and others that have contributed to the mips2 and/or
mips2-jnpr perforce branches. Juniper contirbuted a generic mips port
late in the life cycle of the misp2 branch. Warner Losh merged the
mips2 and Juniper code bases, and others list above have worked for
the past several months to get to multiuser.
In addition, the mips2 work owe a debt to the trail blazing efforts of
the original mips branch in perforce done by Juli Mallett.
merged juniper and mips2 code base. This represents the work of
Juniper Engineers, plus Oleksandr Tymoshenko, Wojciech Koszek, Warner
Losh, Olivier Houchard, Randall Stewert and others that have
contributed to the mips2 and/or mips2-jnpr perforce branches.
The original code from KAME did not take care of address
aliases or multiple ip addresses that have the same
prefix.
Reviewed by: rwatson, gnn, sam, kmacy, julian
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,
route add -net 192.103.54.0/24 10.9.44.1
route add -net 192.103.54.0/24 10.9.44.2
The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"
Multiple default routes can also be inserted. Here is the netstat
output:
default 10.2.5.1 UGS 0 3074 bge0 =>
default 10.2.5.2 UGS 0 0 bge0
When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,
route delete default
would fail and trigger the following error message:
"route: writing to routing socket: No such process"
"delete net default: not in table"
On the other hand,
route delete default 10.2.5.2
would be successful: "delete net default: gateway 10.2.5.2"
One does not have to specify a gateway if there is only a single
route for a particular destination.
I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options RADIX_MPATH" in the kernel configuration
to enable this feature.
Reviewed by: robert, sam, gnn, julian, kmacy
public namespace for WITNESS as they are only used internally so just
move them in the private namespace for the subsystem (with all related
supporting definitions).
Make clock_if.m and subr_rtc.c standard on i386
Add hints for "atrtc" driver, for non-PnP, non-ACPI systems.
NB: Make sure to install GENERIC.hints into /boot/device.hints in these!
Nuke MD inittodr(), resettodr() functions.
Don't attach to PHP0B00 in the "attimer" dummy driver any more, and remove
comments that no longer apply for that reason.
Add new "atrtc" device driver, which handles IBM PC AT Real Time
Clock compatible devices using subr_rtc and clock_if.
This driver is not entirely clean: other code still fondles the
hardware to get a statclock interrupt on non-ACPI timer systems.
Wrap some overly long lines.
After it has settled in -current, this will be ported to amd64.
Technically this is MFC'able, but I fail to see a good reason.
under bootverbose.
Struct ct is used for setting/reading real time clocks and I'm about
to Do Things to some of those, so a bit of preemptive debugging is
in order.
Remove a pointless __inline.
the only one difference is that lockmgr*() functions now accept
LK_NOWITNESS flag which skips ordering for the instanced calling.
- Remove an unuseful stub in witness_checkorder() (because the above check
doesn't allow ever happening) and allow witness_upgrade() to accept
non-try operation too.
- Fix speaker issues with Dell Vostro 1500 (GPIO0)
Tested by: John Wright <jwright.gmail.com>
- Apply ridiculous quirk on Asus A8X series (A8JC, A8M, A8xx, etc). These
different laptop series share simmilar pci id, hardware codecs, etc.
but works differently. A slight difference in connection type for
widget #26 is used to differentiate it.
Tested by: eric baumbach <embaumbach.gmail.com>
- Apply GPIO0 quirk for ASUS G2K laptop
- Sort ASUS ids accordingly.
Submitted by: jkim
MFC after: 3 days
TX traffic to sit in the send chain until a received packet kick
started the interrupt handler. This would cause extremely slow
performance when used with NFS over UDP.
- Removed untested polling code.
- Updated copyright year in the file header.
- Removed inadvertent ^M's created by DOS text editor.
MFC after: 2 weeks
be handled by chn_abort() and chn_start() alone. This should fix
few issues with single duplex hardware (mostly) or pre virtual record
(RELENG 6) under WINE emulation and possibly others that using
SNDCTL_DSP_SETTRIGGER.
MFC after: 3 days
The problem is that the PM support is part of a much larger WIP here, but due to popular demand I decided to get some of it imported.
Also I forgot the mention:
HW sponsored by: Vitsch Electronics / VEHosting
may be held for the duration of the various dirhash operations which
avoids many complex unlock/lock/revalidate sequences.
- Permit shared locks on lookup. To protect the ip->i_dirhash pointer we
use the vnode interlock in the shared case. Callers holding the
exclusive vnode lock can run without fear of concurrent modification to
i_dirhash.
- Hold an exclusive dirhash lock when creating the dirhash structure for
the first time or when re-creating a dirhash structure which has been
recycled.
Tested by: kris, pho
indexes so directory lookup becomes shared lock safe. In the modifying
cases an exclusive lock is held here so the commit routine may
rely on the state of i_offset.
- Similarly handle i_diroff by fetching at the start and setting only once
the operation is complete. Without the exclusive lock these are only
considered hints.
- Assert that an exclusive lock is held when we're preparing for a commit
routine.
- Honor the lock type request from lookup instead of always using exclusive
locking.
Tested by: pho, kris
I've taken a slightly different approach than is used with the ICH8 controllers
in that each controller is not identified individually (eg USB A, USB B, etc).
Instead I've given then same description to each one even though the device ID
differs. This can easily be changed if desired, or ICH8 (and any others using
that approach) can be made to work as this does.
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.
Reviewed by: jhb
Sponsored by: Nokia
2/4MB page from a PDE. Specifically, change it to use PG_PS_FRAME,
not PG_FRAME, to extract the physical address of a 2/4MB page from a
PDE.
Change the last argument passed to pmap_pv_insert_pde() from a
vm_page_t representing the first 4KB page of a 2/4MB page to the
vm_paddr_t of the 2/4MB page. This avoids an otherwise unnecessary
conversion from a vm_paddr_t to a vm_page_t in pmap_copy().
Support is working on the Silicon Image SiI3124/3132.
Support is working on some AHCI chips but far from all.
Remember this is WIP, so test reports and (constructive) suggestions are welcome!
received frame under certain conditions. wpaul said the length
0xfff0 is special meaning that indicates hardware is in the
process of copying a packet into host memory. But it seems
there are other cases that hardware is busy or stuck in bad
situation even if the received frame length is not 0xfff0.
To work-around this condition, add a check that verifys that
recevied frame length is in valid range. If received length is out
of range reinitialize hardware to recover from stuck condition.
Reported by: Mike Tancsa ( mike AT sentex DOT net )
Tested by: Mike Tancsa
Obtained from: OpenBSD
MFC after: 1 week
no longer needed, but for now we still want to be consistent with other
similar checks in the tree.
- Call ASSERT_VOP_ELOCKED() only when vget() returns 0.
Reviewed by: jeff
o create a private task queue thread that sets up root and current
directories (hooking mountroot event as needed); this is necessary
because task queue threads are parented from proc0 and it does not
have a reference to rootvnode (lost when / mounting moved to init)
o bounce image load + unload requests through the private task q so
we can load images even when the request is made from a thread that
does not have sufficient context (e.g. task q thread)
o add a check in the task q thread to fail requests before root is
mounted (just in case)
Reviewed by: jhb, mlaier, luigi (glance)
MFC after: 1 month
and linux_openat(). Instead just pass AT_FDCWD into linux_common_open()
for the linux_open() case. This prevents passing -1 as a dirfd to
openat() from succeeding which is wrong.
Suggested by: rwatson, kib
Approved by: kib (mentor)
ICMP unreach, frag needed. Up to now we only looked at the
interface MTU. Make sure to only use the minimum of the two.
In case IPSEC is compiled in, loop the mtu through ip_ipsec_mtu()
to avoid any further conditional maths.
Without this, PMTU was broken in those cases when there was a
route with a lower MTU than the MTU of the outgoing interface.
PR: kern/122338
Tested by: Mark Cammidge mark peralex.com
Reviewed by: silence on net@
MFC after: 2 weeks
so that all implemented variants have proper prototypes. The 8-bit,
16-bit and 64-bit variants are not implemented.
This really fixes the current build breakages caused by type casting
and struct aliasing rules.
commands can be written to /dev/psm%d and status can be read back from it.
- Reflect the change in psm(4) and bump version for ports.
MFC after: 1 week
Because of this we were not getting further interrupts for link state
changes, thus never went into iface UP state and thus could not transmit.
The only way out of this was an incoming packet generating an rx interrupt
and making us call into bge_link_upd.
Up to rev. 1.101, in bge_start_locked, we only returned instantly
if there was 'no link AND nothing queued for tx'. So with a packet queued
for tx, we hit the register scrubbing at the end of bge_start_locked
and were out fine. We simply lost a packet or two but got the interrupts
need to get into UP state.
With rev. 1.102 this was turned into 'if there is no link OR there is
nothing to send' (correct behaviour) and as long as there is no link
we never hit the register scrubbing and consequently never got the link UP.
What we do now is force an interrupt at the end of bge_ifmedia_upd_locked
so we will call bge_link_upd, clear the link state attention and get
further interrupts.
This helps to get the iface UP on an idle network or at least to get
it UP faster not depending on an rx intr anymore.
In case you could not get a DHCP lease or it took very long,
it was because of this.
It is unknown which chips are affected by this. ASIC rev. 0x2003 was the
most popular trouble candidate.
At least the fiber cards should have been working fine.
Which register to scrub is currently under discussion. The comitted
solution was tested and found to work for a lot of setups. It might
not help with MSI.
The reason why we end up in such a situation is entirely unknown.
PR: kern/111804
Tested by: phk, scottl at Y!
MFC after: 14 days
was changed in rev. 1.161 of tcp_var.h. All option now test for sufficient
space in TCP header before getting added.
Reported by: Mark Atkinson <atkin901-at-yahoo.com>
Tested by: Mark Atkinson <atkin901-at-yahoo.com>
MFC after: 1 week
bit in order to allow per-bit checks on the options flag, in particular
in the consumers code [1]
- Re-enable the check against TDP_DEADLKTREAT as the anti-waiters
starvation patch allows exclusive waiters to override new shared
requests.
[1] Requested by: pjd, jeff
buffer kernel descriptors, which is used to allow the buffer
currently in the BPF "store" position to be assigned to userspace
when it fills, even if userspace hasn't acknowledged the buffer
in the "hold" position yet. To implement this, notify the buffer
model when a buffer becomes full, and check that the store buffer
is writable, not just for it being full, before trying to append
new packet data. Shared memory buffers will be assigned to
userspace at most once per fill, be it in the store or in the
hold position.
This removes the restriction that at most one shared memory can
by owned by userspace, reducing the chances that userspace will
need to call select() after acknowledging one buffer in order to
wait for the next buffer when under high load. This more fully
realizes the goal of zero system calls in order to process a
high-speed packet stream from BPF.
Update bpf.4 to reflect that both buffers may be owned by userspace
at once; caution against assuming this.
state transitioning flags and of msleep(9) callings.
Use, instead, an algorithm very similar to what sx(9) and rwlock(9)
alredy do and direct accesses to the sleepqueue(9) primitive.
In order to avoid writer starvation a mechanism very similar to what
rwlock(9) uses now is implemented, with the correspective per-thread
shared lockmgrs counter.
This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and
lockmgr_args_rw(). These two are like the 2 "normal" versions, but they
both accept a rwlock as interlock. In order to realize this, the general
lockmgr manager function "__lockmgr_args()" has been implemented through
the generic lock layer. It supports all the blocking primitives, but
currently only these 2 mappers live.
The patch drops the support for WITNESS atm, but it will be probabilly
added soon. Also, there is a little race in the draining code which is
also present in the current CVS stock implementation: if some sharers,
once they wakeup, are in the runqueue they can contend the lock with
the exclusive drainer. This is hard to be fixed but the now committed
code mitigate this issue a lot better than the (past) CVS version.
In addition assertive KA_HELD and KA_UNHELD have been made mute
assertions because they are dangerous and they will be nomore supported
soon.
In order to avoid namespace pollution, stack.h is splitted into two
parts: one which includes only the "struct stack" definition (_stack.h)
and one defining the KPI. In this way, newly added _lockmgr.h can
just include _stack.h.
Kernel ABI results heavilly changed by this commit (the now committed
version of "struct lock" is a lot smaller than the previous one) and
KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction,
so manpages and __FreeBSD_version will be updated accordingly.
Tested by: kris, pho, jeff, danger
Reviewed by: jeff
Sponsored by: Google, Summer of Code program 2007
contigmalloc(9) as a last resort to steal pages from an inactive,
partially-used superpage reservation.
Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and
refactor it so that a separate subroutine is responsible for breaking
the selected reservation. This subroutine is also used by
vm_reserv_reclaim_contig().
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
Also, don't bother unmasking the interrupt in the post_filter case if
we never masked it. The INTR_FILTER case had been doing this by having
arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
both the 'post_filter' and 'post_ithread' hooks to match what the
non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
designed to do even in 5.x).
Glanced at by: piso
Reviewed by: marius
Requested by: marius [1], [5]
Tested on: amd64, i386, arm, sparc64
part of detecting the media. Explicitly ensure that we don't send it to
bpf(4) as bpf(4) isn't setup yet. This worked by accident before the bpf
interface stuff was reworked to avoid other races (bpf_peers_present, etc.)
but now it needs an explicit check to avoid a panic.
MFC after: 3 days
PR: kern/120915
UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM.
(UMA_SLAB_KMAP met its original demise in revision 1.30 of
vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame
allocators. Without it, UMA cannot correctly return pages from the
jumbo frame zones to the VM system because it resets the pages' object
field to NULL instead of the kernel object. In more detail, the jumbo
frame zones are created with the option UMA_ZONE_REFCNT. This causes
UMA to overwrite the pages' object field with the address of the slab.
However, when UMA wants to release these pages, it doesn't know how to
restore the object field, so it sets it to NULL. This change teaches
UMA how to reset the object field to the kernel object.
Crashes reported by: kris
Fix tested by: kris
Fix discussed with: jeff
MFC after: 6 weeks
spinning when readers hold a lock. This spinning is speculative because,
unlike the write case, we can not test whether the owners are running.
- Add speculative read spinning for readers who are blocked by pending
writers while a read lock is still held. This allows the thread to
spin until the write lock succeeds after which it may spin until the
writer has released the lock. This prevents excessive context switches
when readers and writers both hold the lock for brief periods.
Sponsored by: Nokia
the fdesc_allocvp(). The caller of the fdesc_allocvp() expects that the
returned vnode is not reclaimed. Do lock the vnode exclusive and drop
the lock after.
Reported by: pho
Reviewed by: jeff
fixed pri boost with '1' or any priority less than the current thread's
priority with a value greater than two. Default the boost to
PRI_MIN_TIMESHARE to prevent regular user-space threads from starving
threads in the kernel. This prevents these user-threads from also
being scheduled as if they are high fixed-priority kernel threads.
- Restore the setting of lowpri in tdq_choose(). It has to be either here
or in sched_switch(). I accidentally removed it from both places.
Tested by: kris
do this either. Simply check P_NOLOAD. It'd be nice if this was
in a thread flag so we didn't have an extra cache miss every time we
add and remove a thread from the run-queue.
- Pull all the code to deal with the trampoline stuff into one
centeralized place and use it from everywhere.
- Some minor style tidiness
Reviewed by: tinguely
platform, so use the latter in preference to the former. This makes
the fake_preload setup be the same between kb920x_machdep.c and
avila_machdep.c....
and the igb driver static in the kernel. But it also reflects
some other bug fixes in my development stream at Intel.
PR 122373 is also fixed in this code.
- Move callout thread creation from kern_intr.c to kern_timeout.c
- Call callout_tick() on every processor via hardclock_cpu() rather than
inspecting callout internal details in kern_clock.c.
- Remove callout implementation details from callout.h
- Package up all of the global variables into a per-cpu callout structure.
- Start one thread per-cpu. Threads are not strictly bound. They prefer
to execute on the native cpu but may migrate temporarily if interrupts
are starving callout processing.
- Run all callouts by default in the thread for cpu0 to maintain current
ordering and concurrency guarantees. Many consumers may not properly
handle concurrent execution.
- The new callout_reset_on() api allows specifying a particular cpu to
execute the callout on. This may migrate a callout to a new cpu.
callout_reset() schedules on the last assigned cpu while
callout_reset_curcpu() schedules on the current cpu.
Reviewed by: phk
Sponsored by: Nokia
given pmap is never NULL, and therefore pmap_pml4e() can never return
NULL. The pervasive use of these inline functions throughout the pmap
makes these simple changes worthwhile.
to trip a bug causing the latter to return a zeroed struct
aac_adapter_info. This causes two issues. One is cosmetic only --
a verbose boot prints information about the controller, and shows all
zero:
aac0: Unknown processor 0MHz, 0MB memory (0MB cache, 0MB execution),
unknown battery platform
The second problem is that the firmware version information is stored
away for aac_rev_check, for userland tools (like aaccli) to query via
the FSACTL_MINIPORT_REV_CHECK and FSACTL_LNX_MINIPORT_REV_CHECK ioctls.
When aaccli encounters this issue it prints
Command Error: <The current AFAAPI.DLL is too old to work with the
current controller software.>
Move the RequestSupplementAdapterInfo call after RequestAdapterInfo,
which seems to fix both problems.
These functions try the specified operation (rlocking and wlocking) and
true is returned if the operation completes, false otherwise.
The KPI is enriched by this commit, so __FreeBSD_version bumping and
manpage updating will happen soon.
Requested by: jeff, kris
abstraction as the RAID and CAM modules, making it nearly impossible
for enough initialization to be done in time for the RAID module to
know whether to attach. On top of this, no reset was being done on
the controller on attach, in violation of the spec. Additionally,
the port enable step was being deferred to the end of the attach
process, long after it should have been done to ensure reliable
operation from the controller. Fix all of these with a few hacks
to force the "attach" and "enable" steps of the core module early
on, and ensure that a reset and port enable also happens early on.
In the future, the driver needs to be refactored to eliminate the
core module abstraction, clean up withe reset/enable steps, and
defer event messages until all of the modules are available to
recieve them.
openat(2), faccessat(2), fchmodat(2), fchownat(2), fstatat(2),
futimesat(2), linkat(2), mkdirat(2), mkfifoat(2), mknodat(2),
readlinkat(2), renameat(2), symlinkat(2)
syscalls.
Based on the submission by rdivacky,
sponsored by Google Summer of Code 2007
Reviewed by: rwatson, rdivacky
Tested by: pho
openat() and the related syscalls.
Based on the submission by rdivacky,
sponsored by Google Summer of Code 2007
Reviewed by: rwatson, rdivacky
Tested by: pho
to protect the v_lock pointer. Removing the interlock acquisition
here allows vn_lock() to proceed without requiring the interlock
at all.
- If the lock mutated while we were sleeping on it the interlock has
been dropped. It is conceivable that the upper layer code was
relying on the interlock and LK_NOWAIT to protect the identity or
state of the vnode while acquiring the lock. In this case return
EBUSY rather than trying the new lock to prevent potential races.
Reviewed by: tegge
Keeping the lockmgr lock valid allows us to switch the v_lock pointer
in snapshot vnodes between the embedded lockmgr lock and snapdata
lock without needing the vnode interlock to protect against races
- Keep unused snapdata structures in a list.
- Add a function to lock the devvp and allocate a snapdata to it or
acquire a new one without races. The old function was safe from
creation races because we set the mount flag when creating snapshots
and thus serializing them. However, it might have been subject to
destroying races.
Reviewed by: tegge
was a kluge. This implementation matches the behaviour on powerpc
and sparc64.
While on the subject, make sure to invalidate the I-cache after
loading a kernel module.
MFC after: 2 weeks
incompatible with existing bindings.
- Try to copyout the setid in cpuset() before migrating the proc to the
setid in case the user has supplied a bad buffer.
- Rename cpuset_root() and cpuset_base() to cpuset_ref{root,base} to
be more descriptive and free cpuset_root to be used as a different
type of symbol.
- Make cpuset_root the cpuset_t set of all cpus in the system. This
should contain the same bitmask as all_cpus presently.
- Add a CPU_CMP() macro to compare two sets.
- Do not check destination hook presence, it will be done by netgraph.
- Use u_int instead of int in some places to simplify type conversions.
- Use NG_SEND_DATA_ONLY() macro instead of selfmade equivalent.
which simply want a reference should use vref(). Callers which want
to check validity need to hold a lock while performing any action
based on that validity. vn_lock() would always release the interlock
before returning making any action synchronous with the validity check
impossible.
SI_SUB_DRIVERS) to avoid loading schemes before all the GEOM
classes have been loaded and initialized. Otherwise we may
end up using mutexes that haven't been initialized (due to
g_retaste() posting an event).
vm_object_reference(). This is intended to get rid of vget()
consumers who don't wish to acquire a lock. This is functionally
the same as calling vref(). vm_object_reference_locked() already
uses vref.
Discussed with: alc
and netgraph in gernal). This also allows to add queues for an interface
that is not yet existing (you have to provide the bandwidth for the
interface, however).
PR: kern/106400, kern/117827
MFC after: 2 weeks
dropped after the call to lockmgr() so just revert this approach using
something similar to the precedent one:
BUF_LOCKWAITERS() just checks if there are waiters (not the actual number
of them) and it is based on newly introduced lockmgr_waiters() which
returns if the lockmgr has waiters or not. The name has been choosen
differently by old lockwaiters() in order to not confuse them.
KPI results enriched by this commit so __FreeBSD_version bumping and
manpage update will be happening soon.
'struct buf' also changes, so kernel ABI is disturbed.
Bug found by: jeff
Approved by: jeff, kib
allows the class to create a different GEOM for the same provider
as well as avoid that we end up with multiple GEOMs of the same
class with the same name.
For example, when a disk contains a PC98 partition table but
only MBR is supported, then the partition table can be treated
as a MBR. If support for PC98 is later loaded as a module, the
MBR scheme is pre-empted for the PC98 scheme as expected.
offload bugs by manual padding for short IP/UDP frames. Unfortunately
it seems that these workaround does not work reliably on newer PCIe
variants of RealTek chips.
To workaround the hardware bug, always pad short frames if Tx IP
checksum offload is requested. It seems that the hardware has a
bug in IP checksum offload handling. NetBSD manually pads short
frames only when the length of IP frame is less than 28 bytes but I
chose 60 bytes to safety. Also unconditionally set IP checksum
offload bit in Tx descriptor if any TCP or UDP checksum offload is
requested. This is the same way as Linux does but it's not
mentioned in data sheet.
Obtained from: NetBSD
Tested by: remko, danger
src/cddl and src/sys/cddl directories per the core@ decision following
the license review.
This change modifies the affected Makefiles to reference the sources
in their new location.
will never exit ngintr(), while there is some ready requests on the queue.
It was made years ago with hope of parallel queue processing by several
net threads. But even if we have several threads sometimes, we have no
rights to process queue in parallel as it will break original requests
serialization that is critically important for some setups.
from clearing the IFF_NEEDSGIANT flag on Giant-locked interfaces.
In particular, wpa_supplicant was doing this on USB interfaces,
causing panics when Giant-locked code was then called without Giant.
Submitted by: Alexey Popov
Reviewed by: rwatson
MFC after: 3 days
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.
1. Add support for automatic promotion of 4KB page mappings to 2MB page
mappings. Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic
promotion is disabled. Tested by: kris
2. To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.
The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.
frequency generation and what frequency the generated was anyones
guess.
In general the 32.768kHz RTC clock x-tal was the best, because that
was a regular wrist-watch Xtal, whereas the X-tal generating the
ISA bus frequency was much lower quality, often costing as much as
several cents a piece, so it made good sense to check the ISA bus
frequency against the RTC clock.
The other relevant property of those machines, is that they
typically had no more than 16MB RAM.
These days, CPU chips croak if their clocks are not tightly within
specs and all necessary frequencies are derived from the master
crystal by means if PLL's.
Considering that it takes on average 1.5 second to calibrate the
frequency of the i8254 counter, that more likely than not, we will
not actually use the result of the calibration, and as the final
clincher, we seldom use the i8254 for anything besides BEL in
syscons anyway, it has become time to drop the calibration code.
If you need to tell the system what frequency your i8254 runs,
you can do so from the loader using hw.i8254.freq or using the
sysctl kern.timecounter.tc.i8254.frequency.
The timer_spkr_*() functions take care of the enabling/disabling
of the speaker.
Test on the existence of timer_spkr_*() functions, rather than
architectures.
zero-copy to the store buffer position on the BPF descriptor,
and the 'b' buffer as the free buffer in order to fill them in
the order documented in bpf(4).
MFC after: 4 months
Suggested by: csjp
(such as 'atime' vs 'noatime'). The filesystems will always see either
'nofoo' or 'nonofoo', never plain 'foo'. As such, their list of valid
mount options should include 'nofoo' instead of 'foo'. With this fix,
you can do 'mount -u -o atime' on a FFS filesystem that isn't marked as
noatime without getting an error. You can also update a noatime FFS
filesystem mounted via mount(2) (e.g. 6.x /sbin/mount binary) to 'atime'
using nmount(2) (e.g. 7.x /sbin/mount binary).
MFC after: 1 week
Reviewed by: crodig
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.
The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()
the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.
Drop entirely timer2 on pc98, it is not used anywhere at all.
Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.
Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.
This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.
Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.
In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.
Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.
This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.
* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks
the owner of a queue to block and unblock execution of the tasks in the
queue while allowing tasks to continue to be added queue. Combining this
with taskqueue_drain() allows a queue to be safely disabled. The unblock
function may run (or schedule to run) the queue when it is called, just as
calling taskqueue_enqueue() would.
Reviewed by: jhb, sam
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.
Reviewed by: arch
There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
bpf_canfreebuf() in order to avoid potentially calling a non-inlinable
but trivial function in zero-copy buffer mode for every packet
received when we couldn't free the buffer anyway.
MFC after: 4 months
of pptpgre and ksocket nodes for all calls between two peers. This patch
modifies node's API by adding new "session_%04x" hook names support, while
keeping backward compatibility.
Together with appropriate user-level support (by latest mpd5) it gives
huge performance benefits for case of multiple active calls between
two peers because of avoiding data duplication and extra socket processing.
On my benchmarks I have got more then 10 times speedup for the 200
simultaneous PPTP calls between two peers.
In conclusion, it allows now to build effective "clients <=> PAC <=> PNS"
setups.
o sort mbuf flags together and extend values to 32 bits
o write M_COPYFLAGS in terms of M_PROTOFLAGS
o move M_COPYFLAGS and M_PROTOFLAGS up to be together with flag defs
Reviewed by: rwatson
MFC after: 3 weeks
- Take advantage of m_collapse(9).
- Sync with other NIC drivers and prepend a TX mbuf if the first attempt
to load it fails with an error other than EFBIG and stop trying instead
of freeing it and keeping on trying to enqueue more mbufs. Also ensure
the driver queue isn't empty before trying to enqueue mbufs in order to
reduce locking operations.
- In xl_ifmedia_upd() add a missing XL_UNLOCK(). [1]
- Const'ify the xl_devs array.
- Remove an outdated comment.
PR: 113406 [1]
MFC after: 1 month
- Correct the maxsize parameter when creating the mbufs busdma tag to
reflect the actual requirement of dc(4).
- Move the KASSERT in dc_newbuf() to the right spot.
- Also convert the TX side to take advantage of bus_dmamap_load_mbuf_sg(9).
- Move the comment regarding dc_start_locked() to the right spot.
MFC after: 2 weeks
- Resource allocation in aac_alloc (moved from from aac_init)
- Interrupt setup in aac_setup_intr (from aac_attach)
- Container probing in aac_get_container_info (from aac_startup and
aac_handle_aif)
- Firmware status check moved to aac_check_firmware from aac_init
In case of "new SA", we must check the hard lifetime of the old SA
to find out if it is not permanent and we can delete it.
Submitted by: sakane via gnn
MFC after: 3 days
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.
The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.
The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.
These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.
Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.
Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.
Sponsored by: Seccuris Inc.
In collaboration with: rwatson
Tested by: pwood, gallatin
MFC after: 4 months [1]
[1] Certain portions will probably not be MFCed, specifically things
that can break the monitoring ABI.
references to a vnode with VI_OWEINACT set will force the vinactive()
call. The kernel makes no guarantees about which reference was the
last to close a file or when the actual inactive processing will
happen. The previous code was designed to preserve existing semantics
in the face of shared locks, however, this was unnecessary.
Discussed with: mckusick
is requested. Handle this case specially before the while loop.
- Use the held vnode lock to check for VI_DOOMED. The vnode lock and
interlock must both be held to set VI_DOOMED so either one held, even
shared, is sufficient to check it.
No objection by: kib
are mixed. Some pure context switch microbenchmarks show up to 29%
improvement. Pipe based context switch microbenchmarks show up to 7%
improvement. Real world tests are far less impressive as they are
dominated more by actual work than switch overheads, but depending on
the machine in question, workload, kernel options, phase of moon, etc, a
few percent gain might be seen.
Summary of changes:
- don't reload MSR_[FG]SBASE registers when context switching between
non-threaded userland apps. These typically cost 120 clock cycles each
on an AMD cpu (less on Barcelona/Phenom). Intel cores are probably no
faster on this.
- The above change only helps unthreaded userland apps that tend to use
the same value for gsbase. Threaded apps will get no benefit from this.
- reorder things like accessing the pcb to be in memory order, to give
prefetching a better chance of working. Operations are now in increasing
memory address order, rather than reverse or random.
- Push some lesser used code out of the main code paths. Hopefully
allowing better code density in cache lines. This is probably futile.
- (part 2 of previous item) Reorder code so that branches have a more
realistic static branch prediction hint. Both Intel and AMD cpus
default to predicting branches to lower memory addresses as being
taken, and to higher memory addresses as not being taken. This is
overridden by the limited dynamic branch prediction subsystem. A trip
through userland might overflow this.
- Futule attempt at spreading the use of the results of previous operations
in new operations. Hopefully this will allow the cpus to execute in
parallel better.
- stop wasting 16 bytes at the top of kernel stack, below the PCB.
- Never load the userland fs/gsbase registers for kthreads, but preserve
curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!)
Microbenchmarking this code seems to be really sensitive to things like
scheduling luck, timing, cache behavior, tlb behavior, kernel options,
other random code changes, etc.
While it doesn't help heavy userland workloads much, it does help high
context switch loads a little, and should help those that involve
switching via kthreads a bit more.
A special thanks to Kris for the testing and reality checks, and Jeff for
tormenting me into doing this. :)
This is still work-in-progress.
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.
The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set. However, these changes enable
me to remove a work-around from pmap_promote_pde(), the superpage
promotion procedure.
(Note: The AMD processors that we have tested, including the latest,
the Phenom, still exhibit the historical behavior.)
Acknowledgments: After I observed the problem, Stephan (ups) was
instrumental in characterizing the exact behavior of Intel's recent
TLBs.
Tested by: Peter Holm
vnodes belonging to the mountpoint. Also, yield when in the
softdep_process_worklist() even when we are not going to sleep due to
buffer drain.
It is believed that the ULE fixed the problem [1], but the yielding
seems to be needed at least for the 4BSD case.
Discussed: on stable@, with bde
Reviewed by: tegge, jeff [1]
MFC after: 2 weeks
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.
As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.
Reported and tested by: kris (i386/pmap.c:pmap_remove() part)
Reviewed by: alc
MFC after: 1 week
multi-descriptor transmission attempt. Datasheet said nothing about
this requirements. This should fix a long-standing VLAN hardware
tagging issues with re(4).
Reported by: Giulio Ferro ( auryn AT zirakzigil DOT org )
Tested by: Giulio Ferro ( auryn AT zirakzigil DOT org )
to declaring a proper module. The module event handler is part of the
gpart core and will add the scheme to an internal list on module load
and will remove the scheme from the internal list on module unload.
This makes it possible to dynamically load and unload partitioning
schemes.
to it for tasting. This is useful when the class, through means outside
the scope of GEOM, can claim providers previously unclaimed.
The g_retaste() function posts an event which is handled by the
g_retaste_event().
Event suggested by: phk
exhaustion is encountered. There was a fix made previously for this
problem but the solution (breaking out of the receive loop) does not
seem to work. mbuf reuse strategy is already adopted by other drivers
such as if_bge. The problem was recreated and the patch is also
verified in the same test environment.
layouts different than the defaults:
o hint.npe.0.mac="A", "B", etc. specifies the window for MAC register accesses
o hint.npe.0.mii="A", "B", etc. specifies PHY registers
o hint.npe.1.phy=%d specifies the PHY to map to a port
This allows devices like NSLU to be setup w/o code changes and will
also be used for forthcoming support for more Avila boards.
Reviewed by: imp
MFC after 1 week
BO_LOCK/UNLOCK/MTX when manipulating the bufobj.
- Create a new lock in the bufobj to lock bufobj fields independently.
This leaves the vnode interlock as an 'identity' lock while the bufobj
is an io lock. The bufobj lock is ordered before the vnode interlock
and also before the mnt ilock.
- Exploit this new lock order to simplify softdep_check_suspend().
- A few sync related functions are marked with a new XXX to note that
we may not properly interlock against a non-zero bv_cnt when
attempting to sync all vnodes on a mountlist. I do not believe this
race is important. If I'm wrong this will make these locations easier
to find.
Reviewed by: kib (earlier diff)
Tested by: kris, pho (earlier diff)
code.
The bug:
There exists a race condition for timeout/untimeout(9) due to the
way that the softclock thread dequeues timeouts.
The softclock thread sets the c_func and c_arg of the callout to
NULL while holding the callout lock but not Giant. It then drops
the callout lock and acquires Giant.
It is at this point where untimeout(9) on another cpu/thread could
be called.
Since c_arg and c_func are cleared, untimeout(9) does not touch the
callout and returns as if the callout is canceled.
The softclock then tries to acquire Giant and likely blocks due to
the other cpu/thread holding it.
The other cpu/thread then likely deallocates the backing store that
c_arg points to and finishes working and hence drops Giant.
Softclock resumes and acquires giant and calls the function with
the now free'd c_arg and we have corruption/crash.
The fix:
We need to track curr_callout even for timeout(9) (LOCAL_ALLOC)
callouts. We need to free the callout after the softclock processes
it to deal with the race here.
Obtained from: Juniper Networks, iedowse
Reviewed by: jhb, iedowse
MFC After: 2 weeks.
around the check for the BV_BKGRDINPROG in the brelse() and bqrelse().
See the comment for the explanation why it is safe.
Tested by: pho
Submitted by: jeff
ffs_extread() when setting the IN_ACCESS flag by checking whether the
IN_ACCESS is already set. The possible race there is admissible.
Tested by: pho
Submitted by: jeff
to enter thread_suspend_check().
- Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the
thread_suspend_check() to ast() rather than userret().
- Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so
that we don't miss a suspend request. If this is set use the
expensive signal path.
- Set NEEDSUSPCHK when creating a new thread in thr in case the
creating thread is due to be suspended as well but has not yet.
Reviewed by: davidxu (Authored original patch)
lock in the 8259A drivers as these drivers are only used on UP systems.
This slightly reduces the penalty of an SMP kernel (such as GENERIC) on
a UP x86 machine.
resource to a CPU. The default method is to pass the request up to the
parent similar to BUS_CONFIG_INTR() so that all busses don't have to
explicitly implement bus_bind_intr. A bus_bind_intr(9) wrapper routine
similar to bus_setup/teardown_intr() is added for device drivers to use.
Unbinding an interrupt is done by binding it to NOCPU. The IRQ resource
must be allocated, but it can happen in any order with respect to
bus_setup_intr(). Currently it is only supported on amd64 and i386 via
nexus(4) methods that simply call the intr_bind() routine.
Tested by: gallatin
putting the correct size in the fib header. Presumably the older firmware
silently ignored a bad size field.
(This change tested with a 3805 controller. Passthrough devices were
created when running firmware build 12814, but not 15323 or later. With
this change they're created for both old and new firmware versions.)
Submitted by: Adaptec
FSACTL_LNX_SEND_LARGE_FIB, and FSACTL_LNX_SEND_RAW_SRB, and correct size
checks on FIBs passed in from userspace. Both changes were obtained from
Adaptec's driver build 15317. Adaptec's commandline RAID tool arcconf uses
these ioctls when creating a RAID-10 array (and probably other operations
too).
so the annoying message is not printed.
o Don't warn about FUTEX_FD not being implemented
and return ENOSYS instead of 0 (eg. success).
o Clear FUTEX_PRIVATE_FLAG as we actually implement
only private futexes so there is no reason to
return ENOSYS when app asks for a private futex.
We don't reject shared futexes because they worked
just fine with our implementation so far.
Approved by: kib (mentor)
Tested by: bsam
MFC after: 1 week
work on architectures with a write-back cache as the PIO writes end up
in the cache which the sync(BUS_DMASYNC_POSTREAD) in usb_transfer_complete
then discards; compensate in the xfer methods that do PIO by pushing the
writes out of the cache before usb_transfer_complete is called.
This fixes USB on xscale and likely other places.
Sponsored by: hobnob
Reviewed by: cognet, imp
MFC after: 1 month
obtain the reference. In particular, this fixes the panic reported in
the PR. Remove the comments stating that this needs to be done.
PR: kern/119422
MFC after: 1 week
all uses) involve a read but usbd_start_transfer only does a PREWRITE; change
this to BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE as I'm not sure if any
users do write+read.
Reviewed by: cognet, imp
MFC after: 1 month
rqindex back in struct thread.
- Compile kern_switch.c independently again and stop #include'ing it from
schedulers.
- Remove the ts_thread backpointers and convert most code to go from
struct thread to struct td_sched.
- Cleanup the ts_flags #define garbage that was causing us to sometimes
do things that expanded to td->td_sched->ts_thread->td_flags in 4BSD.
- Export the kern.sched sysctl node in sysctl.h
This one line change makes the following code found in many ethernet device drivers
(at least em, igb, ixgbe, and cxgb) gratuitous
case SIOCSIFADDR:
if (ifa->ifa_addr->sa_family == AF_INET) {
/*
* XXX
* Since resetting hardware takes a very long time
* and results in link renegotiation we only
* initialize the hardware only when it is absolutely
* required.
*/
ifp->if_flags |= IFF_UP;
if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
EM_CORE_LOCK(adapter);
em_init_locked(adapter);
EM_CORE_UNLOCK(adapter);
}
arp_ifinit(ifp, ifa);
} else
error = ether_ioctl(ifp, command, data);
break;
thread_fini(). The schedulers initialize themselves properly during
sched_fork_thread() anyhow. fini is only called when we're returning
the memory to the allocator which surely doesn't care what state the
memory is in.
is only used by 4bsd.
- Create a new runq_choose_fuzz() function rather than polluting runq_choose()
with 4BSD specific code.
- Move the fuzz sysctl into sched_4bsd.c
- Remove some dead code from kern_switch.c
maxsockets limit, not maxfiles limit. The question remains why those
limits are handled differently (with error code for maxfiles but with
sleep for maxsokets), but those would be addressed in a separate commit
if necessary.
Requested by: rwhatson, jeff
before doing the very expensive cursig() and related locking. NEEDSIGCHK
is updated whenever our signal mask change or when a signal is delivered and
should be sufficient to avoid the more expensive tests. This eliminates
another source of PROC_LOCK contention in multithreaded programs.
- In the last revision the code was changed to use maxfilesperproc rather than
the per-process file limit to restrict the size of the poll array. This
eliminates a significant source of process lock contention in multithreaded
programs and is cheaper. This had been committed with the wrong batch of
changes.
a simple (wmesg, count) tuple in a hash to keep track of how many times
we sleep at each wait message. We hash on message and not channel. No
line number information is given as typically wait messages are not used in
more than one place. Identical strings defined at different addresses will
show up with seperate counters.
- Use debug.sleepq.enable to enable, .reset to reset, and .stats dumps stats.
- Do an unsynchronized check in sleepq_switch() prior to switching before
calling sleepq_profile() which uses a global lock to synchronize the hash.
Only sleeps which actually cause a context switch are counted.
1.38 in 2001. Break out of the FOREACH_THREAD_IN_PROC loop when we've
discovered a new proc in the chain.
- Increment i and check for maxlockdepth once per matching process not
once per thread. This didn't properly terminate the loop before.
- Fix a bug which has existed potentially since rev 1.1. waitblock->lf_next
can be NULL when a thread has been woken-up but not yet scheduled. Check
for this condition rather than blindly dereferencing.
Found by: libMicro
requiring the per-process spinlock to only requiring the process lock.
- Reflect these changes in the proc.h documentation and consumers throughout
the kernel. This is a substantial reduction in locking cost for these
fields and was made possible by recent changes to threading support.
2's compliment.
The 2's compliment transform is done so a "count down" sampling interval
can be converted into a "count up" PMC value. a 2's complimented 'count down'
value is written to the PMC counter; then the read-back counter is reverted
via another 2's compliment.
PR: kern/121660
Reviewed by: jkoshy
Approved by: jkoshy
MFC after: 1 week
vm/vm_contig.c, vm/vm_page.c, and vm/vm_pageq.c. Today, vm/vm_pageq.c
has withered to the point that it contains only four short functions,
two of which are only used by vm/vm_page.c. Since I can't foresee any
reason for vm/vm_pageq.c to grow, it is time to fold the remaining
contents of vm/vm_pageq.c back into vm/vm_page.c.
Add some comments. Rename one of the functions, vm_pageq_enqueue(),
that is now static within vm/vm_page.c to vm_page_enqueue().
Eliminate PQ_MAXCOUNT as it no longer serves any purpose.
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
and collapse down to one intr_event_create() routine. The disable and
eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
routines since to trim one extra indirection.
Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on: {amd64,i386} x {FILTER, !FILTER}
the referenced data is only obtained/changed in the device open handler,
and the ioctl handler can only run after the open handler. Also fix a
few nearby style issues.
Submitted by: Matt Jacob
drivers.
In the giant_XXX wrappers for the device methods of the D_NEEDGIANT
drivers, do not dereference the cdev->si_devsw. It is racing with
the destroy_devl() clearing of the si_devsw. Instead, use the
dev_refthread() and return ENXIO for the destroyed device. [1]
The check for the D_INIT in the prep_cdevsw() was not synchronized with
the call of the fini_cdevsw() in destroy_devl(), that under rapid device
creation/destruction may result in the use of uninitialized cdevsw [2].
Change the protocol for the prep_cdevsw(), requiring it to be called
under dev_mtx, where the check for D_INIT is done.
Do not free the memory allocated for the gianttrick cdevsw while holding
the dev_mtx, put it into the free list to be freed later. Reuse the
d_gianttrick pointer to keep the size and layout of the struct cdevsw
(requested by phk). Free the memory in the dev_unlock_and_free(), and do
all the free after the dev_mtx is dropped (suggested by jhb).
Reported by: bsdimp + many [1], pho [2]
Reviewed by: phk, jhb
Tested by: pho
MFC after: 1 week
for a configurable number of seconds, spin the disk down. Spin it back
up on the next request.
Notice that the timeout is only armed by a request, so to spin down a
disk you may have to do:
atacontrol spindown ad10 5
dd if=/dev/ad10 of=/dev/null count=1
To disable spindown, set timeout to zero:
atacontrol spindown ad10 0
In order to debug any trouble caused, this code is somewhat noisy on the
console.
Enabling spindown on a disk containing / or /var/log/messages is not
going to do anything sensible.
Spinning a disk up and down all the time will wear it out, use sensibly.
Approved by: sos
10 microseconds is too short.
Always set the cpu to the highest frequency so that we get through
boot and don't handicap cpus where powerd(8) is not used.
10 microseconds is too short.
Always set the cpu to the highest frequency so that we get through
boot and don't handicap cpus where powerd(8) is not used.
monitor mode. This solves a problem that sometimes mangled frames
are passed.
Submitted by: Werner Backes <werner_at_bit-1.de>
Tested by: Werner Backes <werner_at_bit-1.de>
PR: kern/121608
Approved by: thompsa (mentor)
will have a special section, named .PPC.EMB.apuinfo, which will
tell GDB that a BookE processor is targeted and which will
result in GDB using a different register definition. In order
to support remote GDB for BookE, we need the GDB stub in the
kernel look for that section and use the BookE definitions.
uidinfo structure. This entirely removes contention observed on the
ui_mtxp mutex (as it is now gone).
- Convert the uihashtbl_mtx mutex to a rwlock, as most of the time we just
need to read-lock it.
Reviewed by: jhb, jeff, kris & others
Tested by: kris
this means that it no longer grabs the lagg rwlock. Use two port table arrays
which list the active ports for Tx and switch between them with an atomic op.
Now the lagg rwlock is only exclusively locked for management (ioctls) and
queuing of lacp control frames isnt needed.
a jail, etc. by simply calling setpriority(PRIO_PROCESS, <PID>, 0) and
checking the return value: 0 means that the process exists and -1 that
it doesn't exist.
Reviewed by: rwatson
MFC after: 1 week
Instead of checking each page for PG_UNMANAGED, perform a one-time
check whether the object is OBJT_PHYS. (PG_UNMANAGED pages only
belong to OBJT_PHYS objects.)
with style(9) recommendation that macros not contain the
terminating ';', leaving that to the invoker. All SYSINIT()
consumers must now provide a trailing ';'.
Unlike the change to remove the ';'s from callers, this change
shouldn't be MFC'd unless we don't mind requiring source changes
to third party modules that might still depend on SYSINIT()
providing its own ';'.
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.
MFC after: 1 month
Discussed with: imp, rink
Otherwise the parameter is no-op, since zone by default limits number
of descriptors to some 12K entries. Attempt to allocate more ends up
sleeping on zonelimit.
MFC after: 2 weeks
all. The reference in ia64 code is due to cutNpaste in its history
and can safely be removed.
Revired by: cognet, raj, marcel, jhb and maybe one other whom I'm forgetting
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
code wishes to bind an interrupt source to an individual CPU. The MD
code may reject the binding with an error. If an assign_cpu function
is not provided, then the kernel assumes the platform does not support
binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
event is bound to a CPU. Only shared ithreads are bound. We currently
leave private ithreads for drivers using filters + ithreads in the
INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
an interrupt source and binds its interrupt event to the specified CPU.
MI code can currently (ab)use this by doing:
intr_bind(rman_get_start(irq_res), cpu);
however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
where the implementation in the x86 nexus(4) driver would end up calling
intr_bind() internally.
Requested by: kmacy, gallatin, jeff
Tested on: {amd64, i386} x {regular, INTR_FILTER}
In that case return an continue processing the packet without IPsec.
PR: 121384
MFC after: 5 days
Reported by: Cyrus Rahman (crahman gmail.com)
Tested by: Cyrus Rahman (crahman gmail.com) [slightly older version]
"Fast IPsec: Initialized Security Association Processing." printf.
People kept asking questions about this after the IPsec shuffle.
This still is the Fast IPsec implementation so no worries that it would
be any slower now. There are no functional changes.
Discussed with: sam
MFC after: 4 days
No need to compile 'dead' code.
I am leaving it in because we will have to review the concept and
should use the common function in various places.
MFC after: 5 days
receivers from being given interrupts if any CPUs in the system were not
tagged as interrupt receivers that I introduced when switching the x86
interrupt code to track CPUs via FreeBSD CPU IDs rather than local APIC
IDs. In practice this only affects systems with Hyperthreading (though
disabling HTT in the BIOS would workaround the issue) as that is the only
case currently where one can have CPUs that aren't tagged as interrupt
receivers. On a Dell SC1425 test box with 2 x Xeon w/ HTT (so 4 logical
CPUs of which 2 were interrupt receivers) the result was that all
device interrupts were sent to CPU 0.
MFC after: 1 week
Pointy hat to: jhb
different "platforms" on x86 machines. The existing code already handles
having two platforms: ACPI and legacy. However, the existing approach was
rather hardcoded and difficult to extend. These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device). This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it
can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
resource managers so that subclassed nexus(4) drivers can invoke it from
their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
attach routine.
- The ACPI driver no longer contains an new-bus identify method. Instead
it exposes a public function (acpi_identify()) which is a probe routine
that the MD nexus(4) drivers can use to probe for ACPI. All of the
probe logic in acpi_probe() is now moved into acpi_identify() and
acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
acpi_identify() and claims the nexus0 device if the probe succeeds. It
then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
This matches the previous behavior where the old acpi_identify() would
fail to add an acpi0 device again leaving you with no devices.
Discussed with: imp
Silence on: arch@
callout_* API (e.g. callout_init_mtx(9)). This was one of the numerous
items on the http://wiki.freebsd.org/SMPTODO list.
Reviewed by: imp, obrien, jhb
MFC after: 1 week
virtual 86 mode to query the BIOS directly. This is needed for certain
HP machines whose BIOS only provide an SMAP when invoked from real mode.
On such machines the loader will be able to query the SMAP successfully
due to the recent BTX changes, but the kernel will not.
One thing I'm not sure of is if we can skip the INT 12h probe altogether
if we have the SMAP from the loader as it seems that we do the INT 12h
probe to setup enough state so we can use vm86 to call the BIOS.
MFC after: 1 week
failing to load on a kernel that has "nodevice mem" in the config. It will
now properly bring in the mem(4) module.
Submitted by: antoine
Reviewed by: imp
MFC after: 1 week
ABI and the direction flag, that is it now assumes that the direction
flag is cleared at the entry of a function and it doesn't clear once
more if needed. This new behaviour conforms to the i386/amd64 ABI.
Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.
jhb@ noted that it might break old apps if they assumed DF == 1 would be
preserved in the signal handlers, but that such apps should be rare and
that older versions of gcc would not generate such apps.
Submitted by: Aurelien Jarno <aurelien aurel32 net>
PR: 121422
Reviewed by: jhb
MFC after: 2 weeks
- Close a sleepqueue signal race by interlocking with the per-process
spinlock. This was mistakenly omitted from the thread_lock patch and
has been a race since.
MFC After: 1 week
PR: bin/117603
Reported by: Danny Braniss <danny@cs.huji.ac.il>
PhysMask fields based on the number of physical address bits supported
by the current CPU. The old code assumed 36 bits on i386 and 40 bits on
amd64. In truth, all Intel CPUs up until recently used 36 bits (a newer
Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits.
In at least one case (the new Intel CPU) having the size of the mask field
wrong resulted in writing questionable values into the MTRR registers on
the application processors (BSP as well if you modify the MTRRs via
memcontrol or running X, etc.). The result of the questionable physmask
was that all of memory was apparently treated as uncached rather than
write-back resulting in a very significant performance hit.
Fix this by constructing a run-time mask for the PhysBase and PhysMask
fields based on the number of physical address bits supported by the CPU.
All 64-bit capable CPUs provide a count of PA bits supported via the
0x80000008 extended CPUID feature, so use that if it is available. If that
feature is not available, then assume 36 PA bits.
While I'm here, expand the (now-unused) macros for the PhysBase and
PhysMask fields to the current largest possible value (52 PA bits).
MFC after: 1 week
PR: i386/120516
Reported by: Nokia
hangs (one at boot, one at shutdown) in recent machines. First, only try
to take ownership of the EHCI controller if the BIOS currently owns the
controller. On a HP DL160 G5, the machine hangs when we try to take
ownership. Second, don't bother trying to give up ownership of the
controller during shutdown. It's not strictly required and a Dell DCS S29
hangs on shutdown after the config write.
Both of these changes match the behavior of the Linux EHCI driver. I also
think both of these hangs are caused by bugs in the BIOS' SMM handler
causing it to get stuck in an infinite loop in SMM.
MFC after: 1 week
accept a mouse using the boot subclass. Instead, restore the original
hid_is_collection() test and fallback to testing the interface class,
subclass, and protocol if that fails.
MFC after: 1 week
PR: usb/118670
might be currently programmed into the registers.
Underlying firmware (U-Boot) would typically program MAC address into the
first unit only, and others are left uninitialized. It is now possible to
retrieve and program MAC address for all units properly, provided they were
passed on in the bootinfo metadata.
Reviewed by: imp, marcel
Approved by: cognet (mentor)
We're now more robust against cases of non-sorted and/or non-continuous
numbering of those entries.
Reviewed by: imp, marcel
Approved by: cognet (mentor)
This was introduced as a workaround long time ago for some Alpha firmware
(which is now gone), and actually prevented net_close() to ever be
called.
Certain firmwares (U-Boot) need local shutdown operations to be performed on a
network controller upon transaction end: such platform-specific hooks are
supposed to be called via netif_close() (from within net_close()).
This change effectively reverts the following CVS commit:
sys/boot/common/dev_net.c
revision 1.7
date: 2000/05/13 15:40:46; author: dfr; state: Exp; lines: +2 -1
Only probe network settings on the first open of the network device.
The alpha firmware takes a seriously long time to open the network device
the first time.
Also suppress excessive output while netbooting via loader, unless debugging.
While there, make sys/boot/uboot more style(9) compliant.
Reviewed by: imp
Approved by: cognet (mentor)
While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.
sched_sleep(). This removes extra thread_lock() acquisition and
allows the scheduler to decide what to do with the static boost.
- Change the priority arguments to cv_* to match sleepq/msleep/etc.
where 0 means no priority change. Catch -1 in cv_broadcastpri() and
convert it to 0 for now.
- Set a flag when sleeping in a way that is compatible with swapping
since direct priority comparisons are meaningless now.
- Add a sysctl to ule, kern.sched.static_boost, that defaults to on which
controls the boost behavior. Turning it off gives better performance
in some workloads but needs more investigation.
- While we're modifying sleepq, change signal and broadcast to both
return with the lock held as the lock was held on enter.
Reviewed by: jhb, peter
Before this patch callback returned result of the last finished call chain.
Now it returns last nonzero result from all call chain results in this request.
As soon as this improvement gives reliable error reporting, it is now possible
to remove dirty workaround in ng_socket, made to return ENOBUFS error statuses
of request-response operations. That workaround was responsible for returning
ENOBUFS errors to completely unrelated requests working at the same time
on socket.
set a default name. If the IRQ is added as a consequence of
configurating the IRQ without there ever being a handler
assigned to it, we will not have a name. This breaks the
fragile intrcnt/intrnames logic.
state change and reliable error recovery.
o Moved vr_softc structure and relevant macros to header file.
o Use PCIR_BAR macro to get BARs.
o Implemented suspend/resume methods.
o Implemented automatic Tx threshold configuration which will be
activated when it suffers from Tx underrun. Also Tx underrun
will try to restart only Tx path and resort to previous
full-reset(both Rx/Tx) operation if restarting Tx path have failed.
o Removed old bit-banging MII interface. Rhine provides simple and
efficient MII interface. While I'm here show PHY address and PHY
register number when its read/write operation was failed.
o Define VR_MII_TIMEOUT constant and use it in MII access routines.
o Always honor link up/down state reported by mii layers. The link
state information is used in vr_start() to determine whether we
got a valid link.
o Removed vr_setcfg() which is now handled in vr_link_task(), link
state taskqueue handler. When mii layer reports link state changes
the taskqueue handler reprograms MAC to reflect negotiated duplex
settings. Flow-control changes are not handled yet and it should
be revisited when mii layer knows the notion of flow-control.
o Added a new sysctl interface to get statistics of an instance of
the driver.(sysctl dev.vr.0.stats=1)
o Chip name was renamed to reflect the official name of the chips
described in VIA Rhine I/II/III datasheet.
REV_ID_3065_A -> REV_ID_VT6102_A
REV_ID_3065_B -> REV_ID_VT6102_B
REV_ID_3065_C -> REV_ID_VT6102_C
REV_ID_3106_J -> REV_ID_VT6105_A0
REV_ID_3106_S -> REV_ID_VT6105M_A0
The following chip revisions were added.
#define REV_ID_VT6105_B0 0x83
#define REV_ID_VT6105_LOM 0x8A
#define REV_ID_VT6107_A0 0x8C
#define REV_ID_VT6107_A1 0x8D
#define REV_ID_VT6105M_B1 0x94
o Always show chip revision number in device attach. This shall help
identifying revision specific issues.
o Check whether EEPROM reloading is complete by inspecting the state
of VR_EECSR_LOAD bit. This bit is self-cleared after the EEPROM
reloading. Previously vr(4) blindly spins for 200us which may/may
not enough to complete the EEPROM reload.
o Removed if_mtu setup. It's done in ether_ifattach().
o Use our own callout to drive watchdog timer.
o In vr_attach disable further interrupts after reset. For VT6102 or
newer hardwares, diable MII state change interrupt as well because
mii state handling is done by mii layer.
o Add more sane register initialization for VT6102 or newer chips.
- Have NIC report error instead of retrying forever.
- Let hardware detect MII coding error.
- Enable MODE10T mode.
- Enable memory-read-multiple for VT6107.
o PHY address for VT6105 or newer chips is located at fixed address 1.
For older chips the PHY address is stored in VR_PHYADDR register.
Armed with these information, there is no need to re-read
VR_PHYADDR register in miibus handler to get PHY address. This
saves one register access cycle for each MII access.
o Don't reprogram VR_PHYADDR register whenever access to a register
located at a PHY address is made. Rhine fmaily allows reprogramming
PHY address location via VR_PHYADDR register depending on
VR_MIISTAT_PHYOPT bit of VR_MIISTAT register. This used to lead
numerous phantom PHYs attached to miibus during phy probe phase and
driver used to limit allowable PHY address in mii register accessors
for certain chip revisions. This removes one more register access
cycle for each MII access.
o Correctly set VLAN header length.
o bus_dma(9) conversion.
- Limit DMA access to be in range of 32bit address space. Hardware
doesn't support DAC.
- Apply descriptor ring alignment requirements(16 bytes alignment)
- Apply Rx buffer address alignment requirements(4 bytes alignment)
- Apply Tx buffer address alignment requirements(4 bytes alignment)
for Rhine I chip. Rhine II or III has no Tx buffer address
alignment restrictions, though.
- Reduce number of allowable number of DMA segments to 8.
- Removed the atomic(9) used in descriptor ownership managements
as it's job of bus_dmamap_sync(9).
With these change vr(4) should work on all platforms.
o Rhine uses two separated 8bits command registers to control Tx/Rx
MAC. So don't access it as a single 16bit register.
o For non-strict alignment architectures vr(4) no longer require
time-consuming copy operation for received frames to align IP
header. This greatly improves Rx performance on i386/amd64
platforms. However the alignment is still necessary for
strict-alignment platforms(e.g. sparc64). The alignment is handled
in new fuction vr_fixup_rx().
o vr_rxeof() now rejects multiple-segmented(fragmented) frames as
vr(4) is not ready to handle this situation. Datasheet said nothing
about the reason when/why it happens.
o In vr_newbuf() don't set VR_RXSTAT_FIRSTFRAG/VR_RXSTAT_LASTFRAG
bits as it's set by hardware.
o Don't pass checksum offload information to upper layer for
fragmented frames. The hardware assisted checksum is valid only
when the frame is non-fragmented IP frames. Also mark the checksum
is valid for corrupted frames such that upper layers doesn't need
to recompute the checksum with software routine.
o Removed vr_rxeoc(). RxDMA doesn't seem to need to be idle before
sending VR_CMD_RX_GO command. Previously it used to stop RxDMA
first which in turn resulted in long delays in Rx error recovery.
o Rewrote Tx completion handler.
- Always check VR_TXSTAT_OWN bit in status word prior to
inspecting other status bits in the status word.
- Collision counter updates were corrected as VT3071 or newer
ones use different bits to notify collisions.
- Unlike other chip revisions, VT86C100A uses different bit to
indicate Tx underrun. For VT3071 or newer ones, check both
VR_TXSTAT_TBUFF and VR_TXSTAT_UDF bits to see whether Tx
underrun was happend. In case of Tx underrun requeue the failed
frame and restart stalled Tx SM. Also double Tx DMA threshold
size on each failure to mitigate future Tx underruns.
- Disarm watchdog timer only if we have no queued packets,
otherwise don't touch watchdog timer.
o Rewrote interrupt handler.
- status word in Tx/Rx descriptors indicates more detailed error
state required to recover from the specific error. There is no
need to rely on interrupt status word to recover from Tx/Rx
error except PCI bus error. Other event notifications like
statistics counter overflows or link state events will be
handled in main interrupt handler.
- Don't touch VR_IMR register if we are in suspend mode. Touching
the register may hang the hardware if we are in suspended state.
Previously it seems that touching VR_IMR register in interrupt
handler was to work-around panic occurred in system shutdown
stage on SMP systems. I think that work-around would hide
root-cause of the panic and I couldn't reproduce the panic
with multiple attempts on my box.
o While padding space to meet minimum frame size, zero the pad data
in order to avoid possibly leaking sensitive data.
o Rewrote vr_start_locked().
- Don't try to queue packets if number of available Tx descriptors
are short than that of required one.
o Don't reinitialize hardware whenever media configuration is
changed. Media/link state changes are reported from mii layer if
this happens and vr_link_task() will perform necessary changes.
o Don't reinitialize hardware if only PROMISC bit was changed. Just
toggle the PROMISC bit in hardware is sufficient to reflect the
request.
o Rearrganed the IFCAP_POLLING/IFCAP_HWCSUM handling in vr_ioctl().
o Generate Tx completion interrupts for every VR_TX_INTR_THRESH-th
frames. This reduces Tx completion interrupts under heavy network
loads.
o Since vr(4) doesn't request Tx interrupts for every queued frames,
reclaim any pending descriptors not handled in Tx completion
handler before actually firing up watchdog timeouts.
o Added vr_tx_stop()/vr_rx_stop() to wait for the end of active
TxDMA/RxDMA cycles(draining). These routines are used in vr_stop()
to ensure sane state of MAC before releasing allocated Tx/Rx
buffers. vr_link_task() also takes advantage of these functions to
get to idle state prior to restarting Tx/Rx.
o Added vr_tx_start()/vr_rx_start() to restart Rx/Tx. By separating
Rx operation from Tx operation vr(4) no longer need to full-reset
the hardware in case of Tx/Rx error recovery.
o Implemented WOL.
o Added VT6105M specific register definitions. VT6105M has the
following hardware capabilities.
- Tx/Rx IP/TCP/UDP checksum offload.
- VLAN hardware tag insertion/extraction. Due to lack of information
for getting extracted VLAN tag in Rx path, VLAN hardware support
was not implemented yet.
- CAM(Content Addressable Memory) based 32 entry perfect multicast/
VLAN filtering.
- 8 priority queues.
o Implemented CAM based 32 entry perfect multicast filtering for
VT6105M. If number of multicast entry is greater than 32, vr(4)
uses traditional hash based filtering.
o Reflect real Tx/Rx descriptor structure. Previously vr(4) used to
embed other driver (private) data into these structure. This type
of embedding make it hard to work on LP64 systems.
o Removed unused vr_mii_frame structure and MII bit-baning
definitions.
o Added new PCI configuration registers that controls mii operation
and mode selection.
o Reduced number of Tx/Rx descriptors to 128 from 256. From my
testing, increasing number of descriptors above than 64 didn't help
increasing performance at all. Experimentations show 128 Rx
descriptors seems to help a lot reducing Rx FIFO overruns under
high system loads. It seems the poor Tx performance of Rhine
hardwares comes from the limitation of hardware. You wouldn't
satuarte the link with vr(4) no matter how fast CPU/large number of
descriptors are used.
o Added vr_statistics structure to hold various counter values.
No regression was reported but one variant of Rhine III(VT6105M)
found on RouterBOARD 44 does not work yet(Reported by Milan Obuch).
I hope this would be resolved in near future.
I'd like to say big thanks to Mike Tancsa who kindly donated a Rhine
hardware to me. Without his enthusiastic testing and feedbacks
overhauling vr(4) never have been possible. Also thanks to Masayuki
Murayama who provided some good comments on the hardware's internals.
This driver is result of combined effort of many users who provided
many feedbacks so I'd like to say special thanks to them.
Hardware donated by: Mike Tancsa (mike AT sentex dot net)
Reviewed by: remko (initial version)
Tested by: Mike Tancsa(x86), JoaoBR ( joao AT matik DOT com DOT br )
Marcin Wisnicki ( mwisnicki+freebsd AT gmail DOT com )
Stefan Ehmann ( shoesoft AT gmx DOT net )
Florian Smeets ( flo AT kasimir DOT com )
Phil Oleson ( oz AT nixil DOT net )
Larry Baird ( lab AT gta DOT com )
Milan Obuch ( freebsd-current AT dino DOT sk )
remko (initial version)
tdq_runq_add to select the runq rather than hoping we set it properly
when we adjusted the priority. This involves the same number of
branches as before so should perform identically without the extra
fragility.
Tested by: bz
Reviewed by: bz
the cpufreq drivers to reliably use properties of PCI devices for quirks,
etc.
- For the legacy drivers, add CPU devices via an identify routine in the
CPU driver itself rather than in the legacy driver's attach routine.
- Add CPU devices after Host-PCI bridges in the acpi bus driver.
- Change the ichss(4) driver to use pci_find_bsf() to locate the ICH and
check its device ID rather than having a bogus PCI attachment that only
checked for the ID in probe and always failed. As a side effect, you
can now kldload ichss after boot.
- Fix the ichss(4) driver to use the correct device_t for the ICH (and not
for ichss0) when doing PCI config space operations to enable SpeedStep.
MFC after: 2 weeks
Reviewed by: njl, Andriy Gapon avg of icyb.net.ua
present in cpu_feature2. Also, use CPUID2_EST rather than a magic
number.
- Don't free the ACPI settings list in detach if we are going to fail the
request. Otherwise an attempt to kldunload est would free the array
but the driver would keep trying to use it.
MFC after: 1 week
routines (V86 requests from the client and hardware interrupt handlers):
- Install trampoline real mode interrupt handlers at IDT vectors 0x20-0x2f
to handle hardware interrupts by invoking the appropriate vector (0x8-0xf
or 0x70-0x78). This allows the 8259As to use vectors 0x20-0x2f in real
mode as well as protected mode will ensuring that the master 8259A
doesn't share IDT space with CPU exceptions in protected mode.
- Since we don't need to reserve space for page tables and a page directory
anymore since dropping paging support, move the TSS and protected mode
IDT up by 16k. Grow the ring 1 link stack by 16k as a result.
- Repurpose the ring 1 link stack to be used as a real mode stack when
invoking real mode routines either via a V86 request or a hardware
interrupts. This simplifies a few things as we avoid disturbing the
original user stack.
- Add some more block comments to explain how the code interacts with the
V86 structure as this wasn't immediately obvious from the prior comments
(e.g. that we explicitly copy the seg regs for real mode out of the V86
struct onto the stack to be popped off when going into real mode, etc.).
Also, document some of the stack frames we create going to real mode and
back.
- Remove all of the virtual 86 related code including having to simulate
various instructions and BIOS calls on a trap from virtual 86 mode.
- Explicitly panic if a user client attempts to perform a V86 CALL
request that isn't a far call.
- Bump version to 1.2.
Assuming this works ok this should fix some of the long standing issues
with USB booting as well as etherboot.
MFC after: 2 weeks
Submitted by: kib (some parts from his original real mode patch)
- Only calculate timeshare priorities once per tick or when a thread is woken
from sleeping.
- Keep the ts_runq pointer valid after all priority changes.
- Call tdq_runq_add() directly from sched_switch() without passing in via
tdq_add(). We don't need to adjust loads or runqs anymore.
- Sort tdq and ts_sched according to utilization to improve cache behavior.
Sponsored by: Nokia
- Normalize the preemption/ipi setting code by introducing sched_shouldpreempt()
so the logical is identical and not repeated between tdq_notify() and
sched_setpreempt().
- In tdq_notify() don't set NEEDRESCHED as we may not actually own the thread lock
this could have caused us to lose td_flags settings.
- Garbage collect some tunables that are no longer relevant.
the NOPs used are 0x01.
While we could simply pad with EOLs (which are 0x00), rather use an
explicit 0x00 constant there to not confuse poeple with 'EOL padding'.
Put in a comment saying just that.
Problem discussed on: src-committers with andre, silby, dwhite as
follow up to the rev. 1.161 commit of tcp_var.h.
MFC after: 11 days
the appropriate bit in the DEVACTB register.
This change allows the C2 state on those systems to work as expected.
Reviewed by: njl
Submitted by: Andriy Gapon <avg at icyb.net.ua>
MFC after: 1 week
Specifically, since the delete-behind heuristic is never applied to a
device-backed object, there is no point in checking whether each of the
object's pages is fictitious. (Only device-backed objects have
fictitious pages.)
know if has siblings that need an actual probe. Introduce a specail
return value called BUS_PROBE_NOOWILDCARD. If the driver returns
this, the probe is only successful for devices that have had a
specific devclass set for them.
Reviewed by: current@, jhb@, grehan@
in*() and out*() primitives should not be used, other than by
ISA drivers. In this case they were used for memory-mapped I/O
and were not even used in the spirit of the primitives.
if netgraph reported error while delivering to destination.
Reset 'next send' counter to the last requested by peer on ack timeout
to resend all subsequest packets after lost one again without additional hints.
Solaris and AIX.
fcntl(fd, F_DUP2FD, arg) and dup2(fd, arg) are functionnaly equivalent.
Document it.
Add some regression tests (identical to the dup2(2) regression tests).
PR: 120233
Submitted by: Jukka Ukkonen
Approved by: rwaston (mentor)
MFC after: 1 month
HPT drivers would sometimes test the value of a preprocessor definition but
not always make sure that the definition existed in the first place, leading
to warnings on newer compilers. I blindly assumed the same with this driver,
and it turned out to be wrong and to enable some code that doesn't work.
process lock leading to a hang. This bug was introduced in
kern_sig.c:1.351, when the call to expand_name() was moved earlier
bit this particular error case was not updated.
It so happens that U-Boot disables the D-cache when booting
an ELF image, so this change makes sure we run with the
D-cache enabled from now on. It shows too...
While here, remove the duplicate definition of the hw.model
sysctl.
variable is set. On my Mac Mini this puts the CPU in NAP mode when
the kernel is idle and, any technical or environmental reasons
aside, avoids that I have to listen to the fan all day :-)
trashing and improve performance.
Remove waitflag argument from ng_ksocket_incoming2(), it means nothing
as function call was queued by netgraph.
Remove node validity check, as node validity guarantied by netgraph.
Update comments.
value at the requested address as a symbol. For example, "ex /S
aio_swake" prints the name of the function currently registered in
via aio_swake hook.
The change as committed differs slightly from the patch in the PR,
as I force the size of the retrieved value (and the automatic
address increment) to be sizeof(void *). This seems to provide
the most useful auto-increment behavior, and defaults using the
default size (4), which is not sizeof(void *) on 64-bit platforms.
MFC after: 3 days
PR: 57976
Submitted by: Dan Strick <strick at covad.net>
for all network interfaces, not just ethernet-like ones.
Upgrade it to a louder WARNING and be explicit that the flag is obsolete.
Support for IFF_NEEDSGIANT will be removed in a few months (see arch@ for
details) and will not appear in 8.0.
Upgrade if_watchdog to a WARNING.
> 0 rather than >= 0, or we will panic when trying to deliver the signal.
MFC after: 3 days
PR: 100802
Submitted by: Valerio Daelli <valerio.daelli at gmail.com>
to flush the TLB instead of hardcoding a size of 33 pages. Apertures of
32MB and 64MB only use a 16 page GATT and an aperture of 128MB only uses
a 32 page GATT, so without this the code could walk off the end of the
pointer and cause a page fault if the next page was unmapped. Also, for
aperture sizes > 128MB, not all of the pages would be read. The Linux
driver has the same bug.
MFC after: 1 week
Tested by: Frédéric PRACA frederic.praca of freebsd-fr.org
hold the newline and nul terminator. Otherwise, there are cases where
garbage may end up in the command history due to a lack of a nul
terminator, or input may end up without room for a newline.
MFC after: 3 days
PR: 119079
Submitted by: Michael Plass <mfp49_freebsd@plass-family.net>
TCP/UDP checksum in driver for short frames. For frames that requires
hardware VLAN tag insertion, the checksum offload trick does not
work due to changes of checksum offset in mbuf after the VLAN tag.
Disable hardware checksum offload for VLAN interface to fix the bug.
Reported by: Christopher Cowart < ccowart AT rescomp DOT berkeley DOT edu >
Tested by: Christopher Cowart < ccowart AT rescomp DOT berkeley DOT edu >
MFC after: 5 days
returns EINVAL. Right now we return 0 or success for invalid commands,
which could be quite problematic in certain conditions.
MFC after: 1 week
Discussed with: rwatson
revision 1.6
date: 2004/08/21 18:50:34; author: alc; state: Exp; lines: +3 -1
Properly free the temporary sf_buf in uiomove_fromphys() if a copyin or
copyout fails.
Obtained from: DragonFlyBSD
Spotted out by: Mark Tinguely
MFC After: 3 days
restrict the utilization of direct pointers to the content of
ip packet. These modifications are functionally nop()s thus
can be merged with no side effects.
- Set M_BCAST|M_MCAST for incoming frames
- Send the frame to a local interface if the bridge returns the mbuf
Submitted by: Eugene Grosbein
Tested by: Boris Kochergin
private to the kernel, some ports define _KERNEL and include this
header. While arguably this is wrong, it's also reality. By having
the MD fields last, architectures that have CPU-specific variations
of PCPU_MD_FIELDS will at least have the MI fields at a constant
offset. Of course, having all MI fields first helps kernel debugging
as well, so this is not a change without some benefits to us.
This change does not result in an ABI breakage, because this header
is not part of the ABI. Recompilation of lsof is required though :-)
used in the kernel only (by virtue of checking for _KERNEL),
ports like lsof (part of gtop) cheat. It sets _KERNEL, but does
not set either AIM or E500. As such, PCPU_MD_FIELDS didn't get
defined and the build broke.
The catch-all is to define PCPU_MD_FIELDS with a dummy integer
when at the end of line we ended up without a definition for it.
the input field from the current cursor location, rather than the end of
the input line, as the cursor may not be at the end of the line.
Otherwise, we may overshoot, overwriting a bit of the previous line and
failing to fully overwrite the current line.
MFC after: 3 days
PR: 119079
Submitted by: Michael Plass <mfp49_freebsd@plass-family.net>
allocator for jumbo frame. Also remove unneeded jlist lock which
is no longer required to protect jumbo buffers.
With these changes jumbo frame performance of nfe(4) was slightly
increased and users should not encounter jumbo buffer allocation
failure anymore.
to avoid terrible unpredicted effects for netgraph operation of their
exhaustion while allocating control messages.
Add separate configurable 512 items limit for data items allocation
for DoS/overload protection.
Discussed with: julian
it's probed first. The PowerPC platform code deals with everything.
As such, probe devices in order of their location in the memory map.
o Refactor the ocpbus_alloc_resource for readability and make sure we
set the RID in the resource as per the new convention.
- Even for the PCI Express host controller we need to use bus 0
for configuration space accesses to devices directly on the
host controller's bus.
- Pass the maximum number of slots to pci_ocp_init() because the
caller knows how many slots the bus has. Previously a PCI or
PCI-X bus underneath a PCI Express host controller would not
be enumerated properly.
o Pull the interrupt routing logic out of pci_ocp_init() and into
its own function. The logic is not quite right and is expected
to be a bit more complex.
o Fix/add support for PCI domains. The PCI domain is the unit
number as per other PCI host controller drivers. As such, we
can use logical bus numbers again and don't have to guarantee
globally unique bus numbers. Remove pci_ocp_busnr. Return the
highest bus number ito the caller of pci_ocp_init() now that
we don't have a global variable anymore.
o BAR programming fixes:
- Non-type0 headers have at most 1 BAR, not 0.
- First write ~0 to the BAR in question and then read back its
size.
Obtained from: Juniper Networks (mostly)
It is normally initialized by ffs_statfs() after ffs_mount finished.
The extattr autostart code calls the ufs_lookup(), that uses value above
to iterate over the directory blocks, see bmask initialization in the
ufs_lookup() and ufsdirhash. Having the filesystem with root directory
spanning more then one block would result in reading a random kernel
memory.
PR: kern/120781
Test case provided by: rwatson
MFC after: 1 week
expressions on i386 are evaluated in the range of the long double type,
so this is wrong in a different but hopefully less worse way than
before. Since expressions are evaluated in long double registers,
there is no runtime cost to using long double instead of double to
declare intermediate values (except in cases where this avoids compiler
bugs), and by careful use of float_t or double_t it is possible to
avoid some of the compiler bugs in this area, provided these types are
declared as long double.
I was going to change float.h to be less broken and more usable in
combination with the change here (in particular, it is more necessary
to know the effective number of bits in a double_t when double_t !=
double, since DBL_MANT_DIG no longer logically gives this, and
LDBL_MANT_DIG doesn't give it either with FreeBSD-i386's default
rounding precision. However, this was too hard for now. In particular,
LDBL_MANT_DIG is used a lot in libm, so it cannot be changed. One
thing that is completely broken now is LDBL_MAX. This may have sort
of worked when it was changed from DBL_MAX in 2002 (adding 0 to it at
runtime gave +Inf, but you could at least compare with it), but starting
with gcc-3.3.1 in 2003, it is always +Inf due to evaluating it at
compile time in the default rounding precision.
expressions on i386 are evaluated in the range of the long double type,
so this is wrong in a different but hopefully less worse way than
before. Since expressions are evaluated in long double registers,
there is no runtime cost to using long double instead of double to
declare intermediate values (except in cases where this avoids compiler
bugs), and by careful use of float_t or double_t it is possible to
avoid some of the compiler bugs in this area, provided these types are
declared as long double.
I was going to change float.h to be less broken and more usable in
combination with the change here (in particular, it is more necessary
to know the effective number of bits in a double_t when double_t !=
double, since DBL_MANT_DIG no longer logically gives this, and
LDBL_MANT_DIG doesn't give it either with FreeBSD-i386's default
rounding precision. However, this was too hard for now. In particular,
LDBL_MANT_DIG is used a lot in libm, so it cannot be changed. One
thing that is completely broken now is LDBL_MAX. This may have sort
of worked when it was changed from DBL_MAX in 2002 (adding 0 to it at
runtime gave +Inf, but you could at least compare with it), but starting
with gcc-3.3.1 in 2003, it is always +Inf due to evaluating it at
compile time in the default rounding precision.
mount options that mount_nfs could pass down, if it passed
down string mount options. Right now, mount_nfs jut passes
down a single mount option named "nfs_args" with a fully
initialized 'struct nfs_args'.
In future commits, we will add code to the kernel for parsing stringified
NFS mount options, so that we can convert mount_nfs to pass string options
from userspace to kernel, instead of an initialized struct nfs_args.
the same way that it is default initialized in revision 1.77 of mount_nfs.c.
Right now, this is a no-op, because currently we initialize
struct nfs_args in mount_nfs in userspace, and pass it
down into the kernel via nmount(), so we overwrite whatever we initialize
here with the value passed in from userspace.
However, this lays the groundwork for moving away from passing
struct nfs_args from userspace to kernel via nmount(), so that we
can instead pass string mount options via nmount() which can be parsed in
the kernel. This will make it easier to add new NFS mount options.
passing it to cpuset_which(). Pass in 'set' instead. This argument
is not used but for convenience cpuset_which() nulls all incoming
parameters.
Submitted by: davidxu
Patch in the PR was modified to check active jumbo buffers in use
and other possible jumbo buffer leak.
Jumbo buffer usage in lge(4) still wouldn't be reliable due to lack
of driver lock in local jumbo buffer allocator. Either introduce
a new lock to protect jumbo buffer or switch to UMA backed page
allocator for jumbo frame is required.
PR: kern/78072
mask none of the upper bits are set.
- Be more careful about enforcing the boundaries of masks and child sets.
- Introduce a few more CPU_* macros for implementing these tests.
- Change the cpusetsize argument to be bytes rather than bits to match
other apis.
Sponsored by: Nokia
IPPORT_EPHEMERALFIRST and IPPORT_EPHEMERALLAST with values
10000 and 65535 respectively.
The rationale behind is that it makes the attacker's life more
difficult if he/she wants to guess the ephemeral port range and
also lowers the probability of a port colision (described in
draft-ietf-tsvwg-port-randomization-01.txt).
While there, remove code duplication in in_pcbbind_setup().
Submitted by: Fernando Gont <fernando at gont.com.ar>
Approved by: njl (mentor)
Reviewed by: silby, bms
Discussed on: freebsd-net
mappings. Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic
promotion is disabled. (Expect this to change.)
Reviewed by: ups
Tested by: kris, Peter Holm
the specific semantics of ockmgr aren't required: update UFS1 extended
attributes to protect its data structures using an sx lock.
While here, update comments on lock granularity.
MFC after: 2 weeks
The kernel config file is KERNCONF=MPC85XX, so the usual procedure applies:
1. make buildworld TARGET_ARCH=powerpc
2. make buildkernel TARGET_ARCH=powerpc TARGET_CPUTYPE=e500 KERNCONF=MPC85XX
This default config uses kernel-level FPU emulation. For the soft-float world
approach:
1. make buildworld TARGET_ARCH=powerpc TARGET_CPUTYPE=e500
2. disable FPU_EMU option in sys/powerpc/conf/MPC85XX
3. make buildkernel TARGET_ARCH=powerpc TARGET_CPUTYPE=e500 KERNCONF=MPC85XX
Approved by: cognet (mentor)
MFp4: e500
TSEC is the MAC engine offering 10, 100 or 1000 Mbps speed and is found on
different Freescale parts (MPC83xx, MPC85xx). Depending on the silicon version
there are up to four TSEC units integrated on the chip.
This driver also works with the enhanced version of the controller (eTSEC),
which is backwards compatible, but doesn't take advantage of its additional
features (various off-loading mechanisms) at the moment.
Approved by: cognet (mentor)
Obtained from: Semihalf
MFp4: e500
The QUICC engine is found on various Freescale parts including MPC85xx, and
provides multiple generic time-division serial channel resources, which are in
turn muxed/demuxed by the Serial Communications Controller (SCC).
Along with core QUICC/SCC functionality a uart(4)-compliant device driver is
provided which allows for serial ports over QUICC/SCC.
Approved by: cognet (mentor)
Obtained from: Juniper
MFp4: e500
The PQ3 is a high performance integrated communications processing system
based on the e500 core, which is an embedded RISC processor that implements
the 32-bit Book E definition of the PowerPC architecture. For details refer
to: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC8555E
This port was tested and successfully run on the following members of the PQ3
family: MPC8533, MPC8541, MPC8548, MPC8555.
The following major integrated peripherals are supported:
* On-chip peripherals bus
* OpenPIC interrupt controller
* UART
* Ethernet (TSEC)
* Host/PCI bridge
* QUICC engine (SCC functionality)
This commit brings the main functionality and will be followed by individual
drivers that are logically separate from this base.
Approved by: cognet (mentor)
Obtained from: Juniper, Semihalf
MFp4: e500
native extended attributes. This didn't interfere with the operation of
UFS2 extended attributes, but the code shouldn't be running for UFS2.
MFC after: 2 weeks
a queue entry field, just copy out the unsigned int that is the trigger
message. In practice, auditd always requested sizeof(unsigned int), so
the extra bytes were ignored, but copying them out was not the intent.
MFC after: 1 month
soft lifetime [1] introduced in rev. 1.21 of key.c.
Along with that, fix a related problem in key_debug
printing the correct data.
While there replace a printf by panic in a sanity check.
PR: 120751
Submitted by: Kazuaki ODA (kazuaki aliceblue.jp) [1]
MFC after: 5 days
Rework of this area is a pre-requirement for importing e500 support (and
other PowerPC core variations in the future). Mainly the following
headers are refactored so that we can cover for low-level differences between
various machines within PowerPC architecture:
<machine/pcpu.h>
<machine/pcb.h>
<machine/kdb.h>
<machine/hid.h>
<machine/frame.h>
Areas which use the above are adjusted and cleaned up.
Credits for this rework go to marcel@
Approved by: cognet (mentor)
MFp4: e500
- Move the assigment of the socket down before we first need it.
No need to do it at the beginning and then drop out the function
by one of the returns before using it 100 lines further down.
- Use t_maxopd which was assigned the "tcp_mssdflt" for the corrrect
AF already instead of another #ifdef ? : #endif block doing the same.
- Remove an unneeded (duplicate) assignment of mss to t_maxseg just before
we possibly change mss and re-do the assignment without using t_maxseg
in between.
Reviewed by: silby
No objections: net@ (silence)
MFC after: 5 days
- When searching for affinity search backwards in the tree from the last
cpu we ran on while the thread still has affinity for the group. This
can take advantage of knowledge of shared L2 or L3 caches among a
group of cores.
- When searching for the least loaded cpu find the least loaded cpu via
the least loaded path through the tree. This load balances system bus
links, individual cache levels, and hyper-threaded/SMT cores.
- Make the periodic balancer recursively balance the highest and lowest
loaded cpu across each link.
Add support for cpusets:
- Convert the cpuset to a simple native cpumask_t while the kernel still
only supports cpumask.
- Pass the derived cpumask down through the cpu_search functions to
restrict the result cpus.
- Make the various steal functions resilient to failure since all threads
can not run on all cpus any longer.
General improvements:
- Precisely track the lowest priority thread on every runq with
tdq_setlowpri(). Before it was more advisory but this ended up having
pathological behaviors.
- Remove many #ifdef SMP conditions to simplify the code.
- Get rid of the old cumbersome tdq_group. This is more naturally
expressed via the cpu_group tree.
Sponsored by: Nokia
Testing by: kris
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.
Sponsored by: Nokia
and assignment.
- Add a reference to a struct cpuset in each thread that is inherited from
the thread that created it.
- Release the reference when the thread is destroyed.
- Add prototypes for syscalls and macros for manipulating cpusets in
sys/cpuset.h
- Add syscalls to create, get, and set new numbered cpusets:
cpuset(), cpuset_{get,set}id()
- Add syscalls for getting and setting affinity masks for cpusets or
individual threads: cpuid_{get,set}affinity()
- Add types for the 'level' and 'which' parameters for the cpuset. This
will permit expansion of the api to cover cpu masks for other objects
identifiable with an id_t integer. For example, IRQs and Jails may be
coming soon.
- The root set 0 contains all valid cpus. All thread initially belong to
cpuset 1. This permits migrating all threads off of certain cpus to
reserve them for special applications.
Sponsored by: Nokia
Discussed with: arch, rwatson, brooks, davidxu, deischen
Reviewed by: antoine
not have VTOC information about the partitions, it will be created.
This is because the VTOC information is used for the partition type
and FreeBSD's sunlabel(8) does not create nor use VTOC information.
For this purpose, new tags have been added to support FreeBSD's
partition types.
structure. This allows per-CPU variations of struct pmap on a
single architecture without affecting the machine-independent
fields. As such, the PMAP variations don't affect the ABI. They
become part of it.
CPUFREQ_DRV_SETTINGS(). The value of count on input is used to
prefent overflow of the settings buffer passed into CPUFREQ_DRV_SETTINGS().
This corrects the "est: CPU supports Enhanced Speedstep, but is not recognized."
error on my system.
MFC after: 1 week
than rely on the lockmgr support [1]:
* bump the waiters only if the interlock is held
* let brelvp() return the waiters count
* rely on brelvp() instead than BUF_LOCKWAITERS() in order to check
for the waiters number
- Remove a namespace pollution introduced recently with lockmgr.h
including lock.h by including lock.h directly in the consumers and
making it mandatory for using lockmgr.
- Modify flags accepted by lockinit():
* introduce LK_NOPROFILE which disables lock profiling for the
specified lockmgr
* introduce LK_QUIET which disables ktr tracing for the specified
lockmgr [2]
* disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it
can only be used on a per-instance basis
- Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer
used
This patch breaks KPI so __FreBSD_version will be bumped and manpages
updated by further commits. Additively, 'struct buf' changes results in
a disturbed ABI also.
[2] Really, currently there is no ktr tracing in the lockmgr, but it
will be added soon.
[1] Submitted by: kib
Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
partition table is empty, check to see if we have something that
looks sufficiently like a BPB. On non-i386 machines, the boot
sector typically doesn't contain boot code; the end of the boot
sector is all zeroes. This is also where the partition table is
for MBRs.
We only check the sector size and cluster size, as that seems to
be the most reliable across implementations, BPB versions and
platforms.
just em, there is an igb driver (this follows behavior with our Linux drivers).
All adapters up to the 82575 are supported in em, and new client/desktop support
will continue to be in that adapter.
The igb driver is for new server NICs like the 82575 and its followons.
Advanced features for virtualization and performance will be in this driver.
Also, both drivers now have shared code that is up to the latest we have
released. Some stylistic changes as well.
Enjoy :)
code to add padlock features to the CPU model on VIA CPUs was no longer
effective. Change the code to instead output a separate printf during
dmesg for VIA Padlock features similar to other cpuid feature bitmasks.
MFC after: 1 week
Add "show sysregs" command to ddb. On i386, this gives gdt, idt, ldt,
cr0-4, etc. Support should be added for other platforms that have a
different set of registers for system use.
while still restricting auto-channel select to only those channels
permitted by regulatory constraints (sorta, we're still missing the
checks to honor radar and noadhoc status on channels). This somehow
got lost in the initial merge of the revised scanning code.
Reviewed by: jhay
MFC after: 2 weeks
frames. This bug seems to happen on certain hardware model/revision
(e.g. 88E8053) but it's not identified which hardwares are affected.
Revision 1.4 of if_mskreg.h was not enough to workaround the bug.
To workaround it, inrease GMAC FIFO threshold by one FIFO word to
flush received pause frames.
Reported by: das, Kirill Nuzhdin < kirill.nuzhdin AT rad dot chem dot msu dot ru >
Tested by: das, Kirill Nuzhdin
only because there's a partition table where the boot sector has
boot code. Boot sectors without boot code look like a MBR for all
practical purposes. This change adds a check for the partition table
and fails the probe when it's obvously invalid. The assumption being
that the sector contains a boot sector and not a MBR.
More checks are needed to distinguish a boot secto without boot code
from a (empty) MBR.
This fixes the panic which happens when mdcreate_vnode() calls vn_close()
and mddestroy() calls it again further down the error handling path.
Reviewed by: kris, kib
MFC after: 3 days
- Consolidate the code to humanize the size of a disk partition into a
single function based on the code for GPT partitions and use it for
GPT partitions, BSD slices, and BSD partitions.
- Teach the humanize code to use KB for small partitions (e.g. GPT boot
partitions now show up as 64KB rather than 0MB).
- Pad a few partition type names out so that things line up in the
common case.
MFC after: 1 week
weren't displayed on the new console. However, the config string has been
altered as part of being parsed so we only display the first option. Fix
this by saving a copy of /boot.config before parsing it and displaying the
saved copy after parsing.
MFC after: 1 week
PR: i386/103972
Submitted by: Alexandre Belloni alexandre.belloni of netasq.com
global audit mutex and condition variables, with an sx lock which protects
the trail vnode and credential while in use, and is acquired by the system
call code when rotating the trail. Previously, a "message" would be sent
to the kernel audit worker, which did the rotation, but the new code is
simpler and (hopefully) less error-prone.
Obtained from: TrustedBSD Project
MFC after: 1 month
the limit in bytes) hard coded into both the kernel and userland.
Make both these limits a sysctl, so it is easy to change the limit.
If the userland part of ipfw finds that the sysctls don't exist,
it will just fall back to the traditional limits.
(100 packets is quite a small limit these days. If you want to test
TCP at 100Mbps, 100 packets can only accommodate a DBP of 12ms.)
Note these sysctls in the man page and warn against increasing them
without thinking first.
MFC after: 3 weeks
device supports retrieving a serial number. Instead, first query the
list of VPD pages it does support, and only query the serial number if
it's supported, else silently move on. This eliminates a lot of noise
during verbose booting, and will likely eliminate the need for most
NOSERIAL quirks.
pmap_remove_all() must not be called on fictitious pages. To date,
fictitious pages have been allocated from zeroed memory, effectively
hiding this problem because the fictitious pages appear to have an empty
pv list. Submitted by: Kostik Belousov
Rewrite the comments describing vm_object_page_remove() to better
describe what it does. Add an assertion. Reviewed by: Kostik Belousov
MFC after: 1 week
the vnode interlock is not held. vn_printf() already correctly handles
locked and unlocked vnode interlocks, and all the in-tree vop_print
methods are interlock-agnostic.
Some code calls vprintf() with the vnode interlock held, that causes
unjustified panics with INVARIANTS (ffs_syncvnode() as example).
Reported by: Peter Holm
want to adjust this code to just assume that all CPUs >= Esther should
be checked for the extended cpuid flags register.
MFC after: 3 days
PR: i386/119491
The code seems pretty MPSAFE and Giant is held over kproc_exit() which
at lowel calls exit1(). exit1() requires Giant to be unowned so this
opens a window for races.
Reported by: Bryan Venteicher <bryanv at daemoninthecloset dot org>
Tested by: Bryan Venteicher <bryanv at daemoninthecloset dot org>
always curthread.
As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.
Tested by: Andrea Barberio <insomniac at slackware dot it>
source upgrades by falling back to GNU ar(1) as necessary. Option
WITH_BSDAR is gone. Option _WITH_GNUAR to aid in upgrades is *not*
supposed to be set by the user.
Stop bootstrapping BSD ar(1) on the next __FreeBSD_version bump, as
there are no known bugs in it. Bump __FreeBSD_version to anticipate
this and to flag the switch to BSD ar(1), should it be needed for
something.
Input from: obrien, des, kaiw
first before they can be set to Explorer mode.
PR: kern/118578
Submitted by: Andriy Gapon <avg@icyb.net.ua> (I added some comments)
Reviewed by: philip
MFC after: 1 month
variations (e500 currently), this provides a gcc-level FPU emulation and is an
alternative approach to the recently introduced kernel-level emulation
(FPU_EMU).
Approved by: cognet (mentor)
MFp4: e500
only anonymous default (OBJT_DEFAULT) and swap (OBJT_SWAP) objects should
ever have OBJ_ONEMAPPING set. However, vm_object_deallocate() was
setting it on device (OBJT_DEVICE) objects. As a result,
vm_object_page_remove() could be called on a device object and if that
occurred pmap_remove_all() would be called on the device object's pages.
However, a device object's pages are fictitious, and fictitious pages do
not have an initialized pv list (struct md_page).
To date, fictitious pages have been allocated from zeroed memory,
effectively hiding this problem. Now, however, the conversion of rotting
diagnostics to invariants in the amd64 and i386 pmaps has revealed the
problem. Specifically, assertion failures have occurred during the
initialization phase of the X server on some hardware.
MFC after: 1 week
Discussed with: Kostik Belousov
Reported by: Michiel Boland
namespace in order to handle lockmgr fields in a controlled way instead
than spreading all around bogus stubs:
- VN_LOCK_AREC() allows lock recursion for a specified vnode
- VN_LOCK_ASHARE() allows lock sharing for a specified vnode
In FFS land:
- BUF_AREC() allows lock recursion for a specified buffer lock
- BUF_NOREC() disallows recursion for a specified buffer lock
Side note: union_subr.c::unionfs_node_update() is the only other function
directly handling lockmgr fields. As this is not simple to fix, it has
been left behind as "sole" exception.
the same order that FreeBSD 6 and before did. Doug
White and the other bloodhounds at ISC discovered that
while FreeBSD 7's ordering of options was more efficient,
it caused some cable modem routers to ignore the
SYN-ACKs ordered in this fashion.
The placement of sackOK after the timestamp option seems
to be the critical difference:
FreeBSD 6:
<mss 1460,nop,wscale 1,nop,nop,timestamp 3512155768 0,sackOK,eol>
FreeBSD 7.0:
<mss 1460,nop,wscale 3,sackOK,timestamp 1370692577 0>
FreeBSD 7.0 + this change:
<mss 1460,nop,wscale 3,nop,nop,timestamp 7371813 0,sackOK,eol>
MFC after: 1 week
the provided trailers. This has been broken since revision 1.240.
Submitted by: Dan Nelson
PR: kern/120948
"sounds ok to me" from: phk
MFC after: 3 days
can run on processors that don't have a FPU. This is typically the
case for Book E processors. While a tuned system will probably want
to use soft-float (or use a processor that has a FPU if the usage is
FP intensive enough), allowing hard-float on FPU-less systems gives
great portability and flexibility.
Obtained from: NetBSD
o Disable interrupts while not running U-Boot code. We clobber
registers that the U-Boot interrupt handlers assume to be
fixed as per the U-Boot register usage. At this time this only
applies to r14. U-Boot uses r2 now for what they used r29 for.
After we restore r14 in preparation of doing the syscall, we
re-enable interrupts. When we return from the syscall, we
disable interrupts and restore the callee-saved r14.
(link) address and the physical (load) address. Ideally, the mapping
between link and load addresses should be abstracted by the copyin(),
copyout() and readin() functions, so that we don't have to add kluges
in __elfN(loadimage)(). Then, we could also have paged virtual memory
for the kernel. This can be important under EFI, where you need to
allocate physical memory form the firmware if you want to work in all
scenarios.
o Move the API prototypes to a separate header (glue.h)
o Allow the platform to hint libuboot about where to look
for the API signature. The uboot_address variable is
expected to be defined by the platform.
- add support for T3C
- add DDP support (zero-copy receive)
- fix TOE transmit of large requests
- fix shutdown so that sockets don't remain in CLOSING state indefinitely
- register listeners when an interface is brought up after tom is loaded
- fix setting of multicast filter
- enable link at device attach
- exit tick handler if shutdown is in progress
- add helper for logging TCB
- add sysctls for dumping transmit queues
- note that TOE wxill not be MFC'd until after 7.0 has been finalized
MFC after: 3 days
consists of the null-terminated name and the contents of any structure
you wish to record. A new ktrstruct() function constructs and emits a
KTR_STRUCT record. It is accompanied by convenience macros for struct
stat and struct sockaddr.
In kdump(1), KTR_STRUCT records are handled by a dispatcher function
that runs stringent sanity checks on its contents before handing it
over to individual decoding funtions for each type of structure.
Currently supported structures are struct stat and struct sockaddr for
the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK
and AF_IPX is present but disabled, as I am unable to test it properly.
Since 's' was already taken, the letter 't' is used by ktrace(1) to
enable KTR_STRUCT trace points, and in kdump(1) to enable their
decoding.
Derived from patches by Andrew Li <andrew2.li@citi.com>.
PR: kern/117836
MFC after: 3 weeks
Check that only MREMAP_FIXED and MREMAP_MAYMOVE flags are specified.
Check for the page alignment of the addr argument.
Submitted by: rdivacky
MFC after: 1 week
- Added loose RX MTU functionality to allow frames larger than 1500 bytes
to be accepted even though the interface MTU is set to 1500.
- Implemented new TCP header splitting/jumbo frame support which uses
two chains for receive traffic rather than the original single recevie
chain.
- Added additional debug support code.
binutils ar and ranlib to gar and granlib, respectively.
* Introduce a temporary variable WITH_GNUAR as a safety net.
When buildworld with -DWITH_GNUAR, GNU binutils ar and ranlib
will install as default ones and 'BSD' ar will be disabled.
* Bump __FreeBSD_version to reflect the import of 'BSD' ar(1).
Approved by: jkoshy (mentor)
The logical disks will appear as /dev/lvm/<vol group>-<logical vol>, for
instance /dev/lvm/vg0-home. G_LINUX_LVM currently supports linear stripes with
segments on multiple physical disks. The metadata is read only, logical
volumes can not be allocated or resized.
Reviewed by: Ivan Voras
Previously known as geom_lvm(4), rename requested by des, phk.
file system. In particular, stop overwriting mount point
flags in nfs_mountdiskless() because now they are set
elsewhere. (They were _initialized_ by that function in
the 4.4BSD days, when mount structures were not allocated
in a centralized manner -- see rev. 1.1 of this file.)
Fix nfs_mount(), which happened to depend on the loss of
MNT_ROOTFS when it came to update handling.
Also note that mountnfs() no longer handles updates. Now
they shouldn't reach this function, so printf a diagnostic
message if that happens due to a coding error.
macros. The only semantic change was the need to add a vc_opened field
to struct vcomm since we can no longer use the request queue returning
to an uninitialized state to hold whether or not the device is open.
MFC after: 1 month
the same operation of lockmgr() but accepting a custom wmesg, prio and
timo for the particular lock instance, overriding default values
lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo.
- Use lockmgr_args() in order to implement BUF_TIMELOCK()
- Cleanup BUF_LOCK()
- Remove LK_INTERNAL as it is nomore used in the lockmgr namespace
Tested by: Andrea Barberio <insomniac at slackware dot it>
fundamentally fairly confused about how signals work and when it is
appropriate for upcalls to be interrupted. In particular, we should
be exempting certain upcalls from interruption, we should not always
eventually time out sleeping on a upcall, and we should not be
interrupting the sleep for certain signals that we currently are
(including SIGINFO). This code needs to be reworked in the style of
NFS interruptible mounts.
MFC after: 1 month
coherent with the data caches. Implement a quick fix to allow
us to boot on Montecito, while I'm working on a better fix in
the mean time.
Commit made on Montecito-based Itanium...
is to be requested via a "ro" option. At the same time, MNT_RDONLY
is gradually becoming an indicator of the current state of the FS
instead of a command flag. Today passing MNT_RDONLY alone to the
kernel's mount machinery will lead to various glitches. (See the
PRs for examples.)
Therefore mount the root FS with a "ro" option instead of the
MNT_RDONLY flag. (Note that MNT_RDONLY still is added to the mount
flags internally, by vfs_donmount(), if "ro" was specified.)
To be able to pass "ro" cleanly to kernel_vmount(), teach the latter
function to accept options with NULL values.
Also correct the comment explaining how mount_arg() handles length
of -1.
PR: bin/106636 kern/120319
Submitted by: Jaakko Heinonen <see PR kern/120319 for email> (originally)
legacy interrupts rather than MSI as a special case. Prior to this
commit, the interrupt handler was doing the slow handshaking with
the device to ensure the legacy interrupt was lowered in both
the legacy and MSI-X case. This handshaking was not
required for MSI-X.
allocator for jumbo frame.
o Removed unneeded jlist lock which was used to manage jumbo
buffers.
o Don't reinitialize hardware if MTU was not changed.
o Added additional check for minimal MTU size.
o Added a new tunable hw.skc.jumbo_disable to disable jumbo frame
support for the driver. The tunable could be set for systems that
do not need to use jumbo frames and it would save
(9K * number of Rx descriptors) bytes kernel memory.
o Jumbo buffer allocation failure is no longer critical error for
the operation of sk(4). If sk(4) encounter the allocation failure
it just disables jumbo frame support and continues to work without
user intervention.
With these changes jumbo frame performance of sk(4) was slightly
increased and users should not encounter jumbo buffer allocation
failure. Previously sk(4) tried to allocate physically contiguous
memory, 3388KB for 256 Rx descriptors. Sometimes that amount of
contiguous memory region could not be available for running systems
which in turn resulted in failure of loading the driver.
Tested by: Cy Schubert < Cy.Schubert () komquats dot com >
modules using invalid ABI versions (e.g. a 7.x module with an 8.x kernel)
for a given kernel:
- Add a 'kernel' module version whose value is __FreeBSD_version.
- Add a version dependency on 'kernel' in every module that has an
acceptable version range of __FreeBSD_version up to the end of the
branch __FreeBSD_version is part of. E.g. a module compiled on 701000
would work on kernels with versions between 701000 and 799999 inclusive.
Discussed on: arch@
MFC after: 1 week
A couple of notes for this:
* WITNESS support, when enabled, is only used for shared locks in order
to avoid problems with the "disowned" locks
* KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order
to assert for a generic thread (not curthread) owning or not the
lock. Really, this kind of check is bogus but it seems very
widespread in the consumers code. So, for the moment, we cater this
untrusted behaviour, until the consumers are not fixed and the
options could be removed (hopefully during 8.0-CURRENT lifecycle)
* Implementing KA_HELD and KA_UNHELD (not surported natively by
WITNESS) made necessary the introduction of LA_MASKASSERT which
specifies the range for default lock assertion flags
* About other aspects, lockmgr_assert() follows exactly what other
locking primitives offer about this operation.
- Build real assertions for buffer cache locks on the top of
lockmgr_assert(). They can be used with the BUF_ASSERT_*(bp)
paradigm.
- Add checks at lock destruction time and use a cookie for verifying
lock integrity at any operation.
- Redefine BUF_LOCKFREE() in order to not use a direct assert but
let it rely on the aforementioned destruction time check.
KPI results evidently broken, so __FreeBSD_version bumping and
manpage update result necessary and will be committed soon.
Side note: lockmgr_assert() will be used soon in order to implement
real assertions in the vnode namespace replacing the legacy and still
bogus "VOP_ISLOCKED()" way.
Tested by: kris (earlier version)
Reviewed by: jhb
access cache improvements:
- Flush just access control state on CODA_PURGEUSER, not the full
namecache for /coda.
- When replacing a fid on a cnode as a result of, e.g.,
reintegration after offline operation, we no longer need to
purge the namecache entries associated with its vnode.
MFC after: 1 month
modeled on the access cache found in NFS, smbfs, and the Linux coda
module. This is a positive access cache of a single entry per file,
tracking recently granted rights, but unlike NFS and smbfs,
supporting explicit invalidation by the distributed file system.
For each cnode, maintain a C_ACCCACHE flag indicating the validity
of the cache, and a cached uid and mode tracking recently granted
positive access control decisions.
Prefer the cache to venus_access() in VOP_ACCESS() if it is valid,
and when we must fall back to venus_access(), update the cache.
Allow Venus to clear the access cache, either the whole cache on
CODA_FLUSH, or just entries for a specific uid on CODA_PURGEUSER.
Unlike the Coda module on Linux, we don't flush all entries on a
user purge using a generation number, we instead walk present
cnodes and clear only entries for the specific user, meaning it is
somewhat more expensive but won't hit all users.
Since the Coda module is agressive about not keeping around
unopened cnodes, the utility of the cache is somewhat limited for
files, but works will for directories. We should make Coda less
agressive about GCing cnodes in VOP_INACTIVE() in order to improve
the effectiveness of in-kernel caching of attributes and access
rights.
MFC after: 1 month
VFS namecache, as is done by the Coda module on Linux. Unlike the Coda
namecache, the global VFS namecache isn't tagged by credential, so use
ore conservative flushing behavior (for now) when CODA_PURGEUSER is
issued by Venus.
This improves overall integration with the FreeBSD VFS, including
allowing __getcwd() to work better, procfs/procstat monitoring, and so
on. This improves shell behavior in many cases, and improves ".."
handling. It may lead to some slowdown until we've implemented a
specific access cache, which should net improve performance, but in the
mean time, lookup access control now always goes to Venus, whereas
previously it didn't.
MFC after: 1 month
When ntfs_ntput() reaches 0 in the refcount the inode lockmgr is not
released and directly destroyed. Fix this by unlocking the lockmgr() even
in the case of zero-refcount.
Reported by: dougb, yar, Scot Hetzel <swhetzel at gmail dot com>
Submitted by: yar
nfs_xid_gen() function instead of duplicating the logic in both
nfsm_rpchead() and the NFS3ERR_JUKEBOX handling in nfs_request().
MFC after: 1 week
Submitted by: mohans (a long while ago)
through the FreeBSD ABI. IPC_INFO, SHM_INFO, SHM_STAT were added
specifically for Linux binary support. They are not documented
as being a part of the FreeBSD ABI, also, the structures necessary
for them have been hidden away from the users for a long time.
Also, the Linux ABI layer uses it's own structures to populate the
responses back to the user to ensure that the ABI is consistent.
I think there is a bit more separation work that needs to happen.
Reviewed by: jhb
Discussed with: jhb
Discussed on: freebsd-arch@ (very briefly)
MFC after: 1 month
the PIC also informs the platform at which IRQ level it can start
assigning IPIs, since this can depend on the number of IRQs
supported for external interrupts.
PAGE_SIZE or less, the bounce page counting logic was flawed and wouldn't
reserve any pages. Adjust to be correct. Review of other architectures is
forthcoming.
Submitted by: Joseph Golio
With write-allocate cache we get into the following scenario:
1. data has been updated in the memory by the USB HC, but
2. D-cache holds an un-flushed value of it
3. when affected cache line is being replaced, the old (un-flushed) value is
flushed and overwrites the newly arrived
This is possible due to how write-allocate works with virtual caches (ARM for
example).
In case of USB transfers it leads to fatal tags discrepancies in umass(4)
operation, which look like the following:
umass0: Invalid CSW: tag 1 should be 2
(probe0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(probe0:umass-sim0:0:0:0): Retrying Command
umass0: Invalid CSW: tag 1 should be 3
(probe0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(probe0:umass-sim0:0:0:0): Retrying Command
umass0: Invalid CSW: tag 1 should be 4
(probe0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(probe0:umass-sim0:0:0:0): Retrying Command
umass0: Invalid CSW: tag 1 should be 5
(probe0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(probe0:umass-sim0:0:0:0): Retrying Command
umass0: Invalid CSW: tag 1 should be 6
(probe0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(probe0:umass-sim0:0:0:0): error 5
(probe0:umass-sim0:0:0:0): Retries Exausted
To eliminate this, a BUS_DMASYNC_PREREAD sync operation is required in
usbd_start_transfer().
Credits for nailing this down go to Grzegorz Bernacki gjb AT semihalf DOT com.
Reviewed by: imp
Approved by: cognet (mentor)
historical relic, and are no longer appropriate for either LAN or WAN
mounting. At modern (gigabit and 10 gigabit) LAN speeds packet loss
from socket buffer fill events is common, and sequence numbers wrap
quickly enough that data corruption is possible. TCP solves both of
these problems without imposing significant overhead.
MFC after: 1 month
sectors so the geometry of large IDE disks has to be adjusted. This
corresponds to what the OpenSolaris dad(7D) driver does except that
the latter only tweaks sectors and effectively limits the mediasize
to 128GB so the cylinders and heads fields won't ever overflow. Not
limiting the mediasize is a compromise between allowing to use Sun
disk label as far as possible and being able to use the entire disk
with another disk label.
This allows to use the full capacity of large IDE disks if they were
not labeled under (Open)Solaris (in both ways of the meaning).
MFC after: 2 weeks
Turn off TFTP support by default: when both TFTP and NFS are enabled in the
loader, strange interactions occur in the pure netbooting scenario (i.e.
loader is TFTP-ed, kernel+world mounted over NFS), leading to very slow access
to the NFS-exported files.
Reviewed by: grehan
Approved by: cognet (mentor)
The logical disks will appear as /dev/lvm/<vol group>-<logical vol>, for
instance /dev/lvm/vg0-home. GLVM currently supports linear stripes with
segments on multiple physical disks. The metadata is read only, logical
volumes can not be allocated or resized.
Reviewed by: Ivan Voras
- Include lock.h in lockmgr.h as nested header in order to safely use
LOCK_FILE and LOCK_LINE. As long as this code will be replaced soon
we can tollerate for a while this namespace pollution even if the real
fix would be to let lockmgr() depend by lock.h as a separate header.
tree, restyle everything but coda.h (which is more explicitly shared
across systems) into a closer approximation to style(9).
Remove a few more unused function prototypes.
Add or clarify some comments.
MFC after: 1 month
NOP-message polling in ciss_periodic().
Note that setting the tunable to non-zero can be workaround only for
`ADAPTER HEARTBEAT FAILED' problem, and may freeze the system w/o
the problem.
Reviewed by: scottl
Reported by: Attila Nagy
MFC after: 3 days
owned by a NULL owner. This will lead consequent VOP_ISLOCKED() present
into nfs_upgrade_vnlock() to panic as it only acquire curthread now.
Fix nfs_upgrade_vnlock() and nfs_downgrade_vnlock() in order to not use
more the struct thread pointer passed as argument (as it is really nomore
required there as vn_lock() and VOP_UNLOCK doesn't get the lock more).
Using curthread, in place, doesn't get ambiguity as LK_EXCLOTHER should
be handled as a "not locked" request by both functions.
Reported by: kris
Tested by: kris
Reviewed by: ups
- Rename print_vattr to coda_print_vattr and make static, rename
print_cred to coda_print_cred.
- Remove unused coda_vop_nop.
- Add XXX comment because coda_readdir forwards to the cache vnode's
readdir rather than venus_readdir, and annotate venus_readdir as
unused.
- Rename vc_nb_* to vc_*.
- Use d_open_t, d_close_t, d_read_t, d_write_t, d_ioctl_t and d_poll_t
for prototyping vc_* as that is the intent, don't use our own
definitions.
- Rename coda_nb_statfs to coda_statfs, rename NB_SFS_SIZ to
CODA_SFS_SIZ.
- Replace one more OBE reference to NetBSD with a reference to FreeBSD.
- Tidy up a little vertical whitespace here and there.
- Annotate coda_nc_zapvnode as unused.
- Remove unused vcodattach.
- Annotate VM_INTR as unused.
- Annotate that coda_fhtovp is unused and doesn't match the FreeBSD
prototype, so isn't hooked up to vfs_fhtovp. If we want NFS export of
Coda to work someday, this needs to be fixed.
- Remove unused getNewVnode.
- Remove unused coda_vget, coda_init, coda_quotactl prototypes.
MFC after: 1 month
the mountpoint for a specific device. This was implemented incorrectly,
a bad idea in a fundamental sense, and also never used, so presumably
a long-idle debugging function.
MFC after: 1 month
for vop_bmap; delete the existing stub that returned either EINVAL
or EOPNOTSUPP, and had unreachable calls to VOP_BMAP on the cache
vnode.
MFC after: 1 month
directory, and jail directory within procstat. While this functionality
is available already in fstat, encapsulating it in the kern.proc.filedesc
sysctl makes it accessible without using kvm and thus without needing
elevated permissions.
The new procstat output looks like:
PID COMM FD T V FLAGS REF OFFSET PRO NAME
76792 tcsh cwd v d -------- - - - /usr/src
76792 tcsh root v d -------- - - - /
76792 tcsh 15 v c rw------ 16 9130 - -
76792 tcsh 16 v c rw------ 16 9130 - -
76792 tcsh 17 v c rw------ 16 9130 - -
76792 tcsh 18 v c rw------ 16 9130 - -
76792 tcsh 19 v c rw------ 16 9130 - -
I am also bumping __FreeBSD_version for this as this new feature will be
used in at least one port.
Reviewed by: rwatson
Approved by: rwatson
then later to FreeBSD. Update various NetBSD-related comments: in some
cases delete them because they don't appply, in others update to say
FreeBSD as they still apply but in FreeBSD (and might for that matter
no longer apply on NetBSD), and flag one case where I'm not sure
whether it applies.
MFC after: 1 month
locks of those vnodes. Probably, Coda should do the same lock sharing/
pass-through that is done for nullfs, but in the mean time this ensures
that locks are adequately held to prevent corruption of data structures
in the cache file system.
Assuming most operations came from the top layer of Coda and weren't
performed directly on the cache vnodes, in practice this corruption was
relatively unlikely as the Coda vnode locks were ensuring exclusive
access for most consumers.
This causes WITNESS to squeal like a pig immediately when Coda is used,
rather than waiting until file close; I noticed these problems because
of the lack of said squealing.
MFC after: 1 month
vget() calls using inode numbers to query the root of /coda, which is not
needed since we now cache the root vnode with the mountpoint.
MFC after: 1 month
VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should
only acquire curthread as argument; this will lead in axing the additional
argument from both functions, making the code cleaner.
Reviewed by: jeff, kib
the provided lock or &blocked_lock. The thread may be temporarily
assigned to the blocked_lock by the scheduler so a direct comparison
can not always be made.
- Use THREAD_LOCKPTR_ASSERT() in the primary consumers of the scheduling
interfaces. The schedulers themselves still use more explicit asserts.
Sponsored by: Nokia
obtained from OpenBSD with an algorithm suggested
by Amit Klein. The OpenBSD algorithm has a few
flaws; see Amit's paper for more information.
For a description of how this algorithm works,
please see the comments within the code.
Note that this commit does not yet enable random IP ID
generation by default. There are still some concerns
that doing so will adversely affect performance.
Reviewed by: rwatson
MFC After: 2 weeks
- Move recursion checking into rwlock inlines to free a bit for use with
adaptive spinners.
- Clear the RW_LOCK_WRITE_SPINNERS flag whenever the lock state changes
causing write spinners to restart their loop.
- Write spinners are limited by a count while readers hold the lock as
there is no way to know for certain whether readers are running still.
- In the read path block if there are write waiters or spinners to avoid
starving writers. Use a new per-thread count, td_rw_rlocks, to skip
starvation avoidance if it might cause a deadlock.
- Remove or change invalid assertions in turnstiles.
Reviewed by: attilio (developed parts of the patch as well)
Sponsored by: Nokia
This support tries to be as parallel as possible with other locking
primitives, but there are differences; more specifically:
- The base witness support is alredy equipped for allowing lock
duplication acquisition as lockmgr rely on this.
- In the case of lockmgr_disown() the lock result unlocked by witness
even if it is still held by the "kernel context"
- In the case of upgrading we can have 3 different situations:
* Total unlocking of the shared lock and nothing else
* Real witness upgrade if the owner is the first upgrader
* Shared unlocking and exclusive locking if the owner is not the first
upgrade but it is still allowed to upgrade
- LK_DRAIN is basically handled like an exclusive acquisition
Additively new options LK_NODUP and LK_NOWITNESS can now be used with
lockinit(): LK_NOWITNESS disables WITNESS for the specified lock while
LK_NODUP enable duplicated locks tracking. This will require manpages
update and a __FreeBSD_version bumping (addressed by further commits).
This patch also fixes a problem occurring if a lockmgr is held in
exclusive mode and the same owner try to acquire it in shared mode:
currently there is a spourious shared locking acquisition while what
we really want is a lock downgrade. Probabilly, this situation can be
better served with a EDEADLK failing errno return.
Side note: first testing on this patch alredy reveleated several LORs
reported, so please expect LORs cascades until resolved. NTFS also is
reported broken by WITNESS introduction. BTW, NTFS is exposing a lock
leak which needs to be fixed, and this patch can help it out if
rightly tweaked.
Tested by: kris, yar, Scot Hetzel <swhetzel at gmail dot com>
done in consumers code: using locks properties is much more appropriate.
Fix current code doing these bogus checks.
Note: Really, callout are not usable by all !(LC_SPINLOCK | LC_SLEEPABLE)
primitives like rmlocks doesn't implement the generic lock layer
functions, but they can be equipped for this, so the check is still
valid.
Tested by: matteo, kris (earlier version)
Reviewed by: jhb
This allows to fix a problem with ARM kernel.bin not having the MFS image
embedded: it is objcopied from the kernel.noheader temporary ELF file, which
was not subject to embedding the MFS image previously.
Reviewed by: imp
Approved by: cognet (mentor)
De-hardcode usage of ARM_TP_ADDRESS and RAS local storage, and move this
special purpose page to a more convenient place i.e. after the vectors high
page, more towards the end of address space. Previous location (0xe000_0000)
caused grief if KVA was to go beyond the default limit.
Note that ARM world rebuilding is required after this change since the
location of ARM_TP_ADDRESS is shared between kernel and userland.
Submitted by: Grzegorz Bernacki (gjb AT semihalf dot com)
Reviewed by: imp
Approved by: cognet (mentor)
- Expose sbrelease_internal(), a variant of sbrelease() with no
expectations about the validity of locks in the socket buffer.
- Use sbrelease_internel() in sorflush(), and as a result avoid intializing
and destroying a socket buffer lock for the temporary stack copy of the
actual buffer, asb.
- Add a comment indicating why we do what we do, and remove an XXX since
things have gotten less ugly in sorflush() lately.
This makes socket close cleaner, and possibly also marginally faster.
MFC after: 3 weeks
referencing the files VM pages are returned from the network stack,
making changes to the file safe.
This flag does not guarantee that the data has been transmitted to the
other end.
- Rename rt2560_read_eeprom to rt2560_read_config, we already have
rt2560_eeprom_read
- If hardware gives us wrong encryption done index, shout out loudly and
terminate the processing loop
- Process encryption done if RX done bit is set in interrupt status register
(according to Ralink Linux driver)
- Turn VALID/BUSY bits in TX descriptor only after TX descriptor is fully setup
- Fix BBP read: RT2560_BBPCSR can't be written until its RT2560_BBP_BUSY bit is
off (according to Ralink Linux driver)
- Skip invalid (0 of 0xffff) BBP register/value entries stored in EEPROM
- Fix channel TX power location in EEPROM, if channel TX power is above 31 set
it to 24 (TX power only has 5bits in RF register, "24" is according to Ralink
Linux driver)
- Configure BBP according to the BBP register/value stored in EEPROM, restore
BBP17 (RX sensitivity tuning) to default value after this.
- Set TX/RX antenna after BBP is initialized; these two operation will try to
set BBP registers
- Reconfigure ACK TX time registers according to 802.11g standard (TX @36Mb,
other side's ACK should be sent @24Mb).
- 2560 parts have two TX ring: one for management/control packets, one for data
packets. Add private OACTIVE flag for each of them. Turn on IFF_DRV_OACTIVE
if one of private OACTIVE is on; turn off IFF_DRV_OACTIVE iff all of them are
off.
- Rework watchdog to mimic old if_watchdog action. Process TX done/encryption
done in watchdog function (according to Ralink Linux driver)
Obtained from: DragonFly
Approved by: sam (mentor)
Tested by: sam
Related to PR: kern/117655
# Forcing long slot time setting is not included in this commit, comment and
# related code is in place, so if problem pops up, quick tests could be done.
down some DCMD's without any data. Thanks to Dell and LSI for helping
to provide clues to figure out this problem. Now MegaCli can upgrade
the firmware and should work identical when run on Linux.
Reviewed by: scottl, LSI
MFC after: 1 day
ipsec*_set_policy and do the privilege check only if needed.
Try to assimilate both ip*_ctloutput code blocks calling ipsec*_set_policy.
Reviewed by: rwatson
a variety of bootloaders. This sometimes means that different loader
scripts are required within one ${MACHINE_ARCH}, which makes the
current practice of using ldscript.${MACHINE_ARCH} unsuitable.
Instead, make the default the current convention and allow the ld
scripts to be overridden as necessary.
If we aren't arm, pc98 or sun4v, then enable treating warnings like
errors. That doesn't mean these platforms aren't -Werror clean, just
that we haven't enforced it before. Someone with some spare time
should investigate these three platforms to see if any can be removed.
PCI-express chipset (and thus has functional MSI) if there are any
PCI-express devices in the system, not requiring a root port device.
With PCI-X the chipset detection has to be very conservative because there
are known systems with PCI-X devices that do not appear to have PCI-X
chipsets. However, with PCI-express I'm not sure it is possible to have
a PCI-express device in a system with a non-PCI-express chipset. If we
assume that is the case then this change is valid. It is also required
for at least some PCI-express systems that don't have any devices with
a root port capability (some ICH9 systems).
MFC after: 1 week
Reported by: jfv
free function controlable, instead of passing the KVA of the buffer
storage as the first argument.
Fix all conventional users of the API to pass the KVA of the buffer
as the first argument, to make this a no-op commit.
Likely break the only non-convetional user of the API, after informing
the relevant committer.
Update the mbuf(9) manual page, which was already out of sync on
this point.
Bump __FreeBSD_version to 800016 as there is no way to tell how
many arguments a CPP macro needs any other way.
This paves the way for giving sendfile(9) a way to wait for the
passed storage to have been accessed before returning.
This does not affect the memory layout or size of mbufs.
Parental oversight by: sam and rwatson.
No MFC is anticipated.
aligned (or at least not cross a page boundary). However, it turns out
that on at least one machine one table header does cross a page boundary.
This caused problems with the MADT early probe as it uses the crash dump
map to load ACPI tables by loading the RSDT/XSDT into pages 1 ... N and
loading the header of each ACPI table header into page 0 looking for the
MADT. However, if a table header crossed a page boundary, then page 1
would get trashed resulting in a panic. Fix this by reserving the first
2 pages for ACPI table headers (headers are less than a page in size,
so 2 pages will be sufficient) and use pages 2 .. N for the RSDT and XSDT.
Note: amd64 should probably be simplified to just use pmap_mapbios()
for all these tables which will use the direct map and not need the
crash dump hack.
MFC after: 5 days
Tested on: i386
Reported by: Pete French petefrench of ticketswitch.com
read socket buffers in shutdown() and close():
- Call socantrcvmore() before sblock() to dislodge any threads that
might be sleeping (potentially indefinitely) while holding sblock(),
such as a thread blocked in recv().
- Flag the sblock() call as non-interruptible so that a signal
delivered to the thread calling sorflush() doesn't cause sblock() to
fail. The sblock() is required to ensure that all other socket
consumer threads have, in fact, left, and do not enter, the socket
buffer until we're done flushin it.
To implement the latter, change the 'flags' argument to sblock() to
accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK
flag. When SBL_NOINTR is set, it forces a non-interruptible sx
acquisition, regardless of the setting of the disposition of SB_NOINTR
on the socket buffer; without this change it would be possible for
another thread to clear SB_NOINTR between when the socket buffer mutex
is released and sblock() is invoked.
Reviewed by: bz, kmacy
Reported by: Jos Backus <jos at catnook dot com>
The only downside is that it renames pmap_vac_me_harder() to pmap_fix_cache().
From Mark's email on -arm :
pmap_get_vac_flags(), pmap_vac_me_harder(), pmap_vac_me_kpmap(), and
pmap_vac_me_user() has been rewritten as pmap_fix_cache() to be more
efficient in the kernel map case. I also removed the reference to
the md.kro_mappings, md.krw_mappings, md.uro_mappings, and md.urw_mappings
counts.
In pmap_clearbit(), we can also skip over tests and writeback/invalidations
in the PVF_MOD and PVF_REF cases if those bits are not set in the pv_flag.
PVF_WRITE will turn caching back on and remove the PV_MOD bit.
In pmap_nuke_pv(), the vm_page_flag_clear(pg, PG_WRITEABLE) has been moved
to the pmap_fix_cache().
We can be more agressive in attempting to turn caching back on by calling
pmap_fix_cache() at times that may be appropriate to turn cache on
(a kernel mapping has been removed, a write has been removed or a read
has been removed and we know the mapping does not have multiple write
mappings to a page).
In pmap_remove_pages() the cpu_idcache_wbinv_all() is moved to happen
before the page tables are NULLed because the caches are virtually
indexed and virtually tagged.
In pmap_remove_all(), the pmap_remove_write(m) is added before the
page tables are NULLed because the caches are virtually indexed and
virtually tagged. This also removes the need for the caches fixing routine
(whichever is being used pmap_vac_me_harder() or pmap_fix_cache()) to be
called on any of these mappings.
In pmap_remove(), I simplified the cache cleaning process and removed
extra TLB removals. Basically if more than PMAP_REMOVE_CLEAN_LIST_SIZE
are removed, then just flush the entire cache.
This implemntation made for growing down stack organization like i386/amd64
platforms have, but prefers different machine dependent version if it is present.
o conversion to callout(9) API.
o add a missing driver lock in bfe_ifmedia_sts().
o use our callout to drive watchdog timer.
o restart Tx routine if pending queued packets are present in
watchdog handler.
o unarm watchdog timer only if there are no queued packets.
o don't blindly reset phy and let phy driver handle link change
request in bfe_init_locked().
o return the status of mii_mediachg() to caller in
bfe_ifmedia_upd(). Previously it always returned 0 to caller.
o add check for IFF_DRV_RUNNING flag as well as IFF_DRV_OACTIVE
in bfe_start_locked().
o implement miibus_statchg method that keeps track of current
link state changes as well as negotiated speed/duplex/
flow-control configuration.
Reprogram MAC to appropriate duplex state. Flow-control
configuration was also implemented but commented out at the
moment. The flow-control configuration will be enabled again
after we have general flow-control framework in mii layer.
Reported by: Yousif Hassan < yousif () alumni ! jmu ! edu >
Tesdted by: Yousif Hassan < yousif () alumni ! jmu ! edu >
This makes sure that process tokens credentials with un-initialized
audit contexts are handled correctly. Currently, when invariants are
enabled, this change fixes a panic by ensuring that we have a valid
termid family. Also, this fixes token generation for process tokens
making sure that userspace is always getting a valid token.
This is consistent with what Solaris does when an audit context is
un-initialized.
Obtained from: TrustedBSD Project
MFC after: 1 week
relabel check for MLS rather than returning 0 directly.
This problem didn't result in a vulnerability currently as the central
implementation of ifnet relabeling also checks for UNIX privilege, and
we currently don't guarantee containment for the root user in mac_mls,
but we should be using the MLS definition of privilege as well as the
UNIX definition in anticipation of supporting root containment at some
point.
MFC after: 3 days
Submitted by: Zhouyi Zhou <zhouzhouyi at gmail dot com>
Sponsored by: Google SoC 2007
- Fix whitespace according to style(9).
- Sync the comment describing why we have to wait in nsphy_reset()
with nsphyter_reset(). It's true that the manual tells to not do a
reset within 500us of applying power but that's unlikely the cause
of problems seen here. Generally having to wait 500us after a reset
however is.
DP83847 PHYs. The main reason for using a specific driver for these
PHYs are reset quirks similar to the nsphy(4) driven DP83840A.
PR: 112654
Obtained from: NetBSD
MFC after: 2 weeks
Thanks to: mlaier for testing w/ DP83815
overridden at compile-time using kernel options of the same names.
Rather than doing a compile-time CTASSERT of buffer sizes being
even multiples of block sizes, just adjust them at boottime, as
the failure mode is more user-friendly.
MFC after: 2 months
PR: 119993
Suggested by: Scot Hetzel <swhetzel at gmail dot com>
fields in FTS and FTSENT structs being too narrow. In addition,
the narrow types creep from there into fts.c. As a result, fts(3)
consumers, e.g., find(1) or rm(1), can't handle file trees an ordinary
user can create, which can have security implications.
To fix the historic implementation of fts(3), OpenBSD and NetBSD
have already changed <fts.h> in somewhat incompatible ways, so we
are free to do so, too. This change is a superset of changes from
the other BSDs with a few more improvements. It doesn't touch
fts(3) functionality; it just extends integer types used by it to
match modern reality and the C standard.
Here are its points:
o For C object sizes, use size_t unless it's 100% certain that
the object will be really small. (Note that fts(3) can construct
pathnames _much_ longer than PATH_MAX for its consumers.)
o Avoid the short types because on modern platforms using them
results in larger and slower code. Change shorts to ints as
follows:
- For variables than count simple, limited things like states,
use plain vanilla `int' as it's the type of choice in C.
- For a limited number of bit flags use `unsigned' because signed
bit-wise operations are implementation-defined, i.e., unportable,
in C.
o For things that should be at least 64 bits wide, use long long
and not int64_t, as the latter is an optional type. See
FTSENT.fts_number aka FTS.fts_bignum. Extending fts_number `to
satisfy future needs' is pointless because there is fts_pointer,
which can be used to link to arbitrary data from an FTSENT.
However, there already are fts(3) consumers that require fts_number,
or fts_bignum, have at least 64 bits in it, so we must allow for them.
o For the tree depth, use `long'. This is a trade-off between making
this field too wide and allowing for 64-bit inode numbers and/or
chain-mounted filesystems. On the one hand, `long' is almost
enough for 32-bit filesystems on a 32-bit platform (our ino_t is
uint32_t now). On the other hand, platforms with a 64-bit (or
wider) `long' will be ready for 64-bit inode numbers, as well as
for several 32-bit filesystems mounted one under another. Note
that fts_level has to be signed because -1 is a magic value for it,
FTS_ROOTPARENTLEVEL.
o For the `nlinks' local var in fts_build(), use `long'. The logic
in fts_build() requires that `nlinks' be signed, but our nlink_t
currently is uint16_t. Therefore let's make the signed var wide
enough to be able to represent 2^16-1 in pure C99, and even 2^32-1
on a 64-bit platform. Perhaps the logic should be changed just
to use nlink_t, but it can be done later w/o breaking fts(3) ABI
any more because `nlinks' is just a local var.
This commit also inludes supporting stuff for the fts change:
o Preserve the old versions of fts(3) functions through libc symbol
versioning because the old versions appeared in all our former releases.
o Bump __FreeBSD_version just in case. There is a small chance that
some ill-written 3-rd party apps may fail to build or work correctly
if compiled after this change.
o Update the fts(3) manpage accordingly. In particular, remove
references to fts_bignum, which was a FreeBSD-specific hack to work
around the too narrow types of FTSENT members. Now fts_number is
at least 64 bits wide (long long) and fts_bignum is an undocumented
alias for fts_number kept around for compatibility reasons. According
to Google Code Search, the only big consumers of fts_bignum are in
our own source tree, so they can be fixed easily to use fts_number.
o Mention the change in src/UPDATING.
PR: bin/104458
Approved by: re (quite a while ago)
Discussed with: deischen (the symbol versioning part)
Reviewed by: -arch (mostly silence); das (generally OK, but we didn't
agree on some types used; assuming that no objections on
-arch let me to stick to my opinion)
exposed as kernel compile options, they have more meaningful names.
PR: 119993
MFC after: 2 months
Suggested by: Scot Hetzel <swhetzel at gmail dot com>
bug that caused us to reintroduce it is believed to be fixed, and Kris
says he no longer sees problems with fifofs in highly parallel builds.
If this works out, we'll MFC it for 7.1.
MFC after: 3 months
Pointed out by: kris
resulted in the argument to the make_dev() to be a unit number.
Correct this by supplying a minor number to make_dev(), and using
the unit number for the calculation of the slave tty name.
Reported and tested by: Peter Holm
Reviewed by: jhb
Yet another pointy hat to: kib
MFC after: 1 day
while the thread does not hold the thread lock would stop blocking for
subsequent interruptible sleeps and would always immediately fail the
sleep with EWOULDBLOCK instead (even sleeps that didn't have a timeout).
Some background:
- KSE has a facility for allowing one thread to interrupt another thread.
During this process, the target thread aborts any interruptible sleeps
much as if the target thread had a pending signal. Once the target
thread acknowledges the interrupt, normal sleep handling resumes. KSE
manages this via the TDF_INTERRUPTED flag. Specifically, it sets the
flag when it sends an interrupt to another thread and clears it when
the interrupt is acknowledged. (Note that this is purely a software
interrupt sort of thing and has no relation to hardware interrupts
or kernel interrupt threads.)
- The old code for handling the sleep timeout race handled the race
by setting the TDF_INTERRUPT flag and faking a KSE-style thread
interrupt to the thread in the process of going to sleep. It probably
should have just checked the TDF_TIMEOUT flag in sleepq_catch_signals()
instead.
- The bug was that the sleepq code would set TDF_INTERRUPT but it was
never cleared. The sleepq code couldn't safely clear it in case there
actually was a real KSE thread interrupt pending for the target thread
(in fact, the sleepq timeout actually stomped on said pending interrupt).
Thus, any future interruptible sleeps (*sleep(.. PCATCH ..) or
cv_*wait_sig()) would see the TDF_INTERRUPT flag set and immediately
fail with EWOULDBLOCK. The flag could be cleared if the thread belonged
to a KSE process and another thread posted an interrupt to the original
thread. However, in the more common case of a non-KSE process, the
thread would pretty much stop sleeping.
- Fix the bug by just setting TDF_TIMEOUT in the sleepq timeout code and
not messing with TDF_INTERRUPT and td_intrval. With yesterday's fix to
fix sleepq_switch() to check TDF_TIMEOUT, this is now sufficient.
MFC after: 3 days
exposing them to all consumers of ip_fw.h. These structures are
used in both ipfw(8) and ipfw(4), but not part of the user<->kernel
interface for other applications to use, rather, shared
implementation.
MFC after: 3 days
Reported by: Paul Vixie <paul at vix dot com>
being properly cancelled by a timeout. In general there is a race
between a the sleepq timeout handler firing while the thread is still
in the process of going to sleep. In 6.x with sched_lock, the race was
largely protected by sched_lock. The only place it was "exposed" and had
to be handled was while checking for any pending signals in
sleepq_catch_signals().
With the thread lock changes, the thread lock is dropped in between
sleepq_add() and sleepq_*wait*() opening up a new window for this race.
Thus, if the timeout fired while the sleeping thread was in between
sleepq_add() and sleepq_*wait*(), the thread would be marked as timed
out, but the thread would not be dequeued and sleepq_switch() would
still block the thread until it was awakened via some other means. In
the case of pause(9) where there is no other wakeup, the thread would
never be awakened.
Fix this by teaching sleepq_switch() to check if the thread has had its
sleep canceled before blocking by checking the TDF_TIMEOUT flag and
aborting the sleep and dequeueing the thread if it is set.
MFC after: 3 days
Reported by: dwhite, peter
`kn_sdata' member of the newly registered knote. The problem is that
this member is overwritten by a call to kevent(2) with the EV_ADD flag,
targetted at the same kevent/knote. For instance, a userland application
may set the pointer to NULL, leading to a panic.
A testcase was provided by the submitter.
PR: kern/118911
Submitted by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after: 1 day
- Remove the "thread" argument from the lockmgr() function as it is
always curthread now
- Axe lockcount() function as it is no longer used
- Axe LOCKMGR_ASSERT() as it is bogus really and no currently used.
Hopefully this will be soonly replaced by something suitable for it.
- Remove the prototype for dumplockinfo() as the function is no longer
present
Addictionally:
- Introduce a KASSERT() in lockstatus() in order to let it accept only
curthread or NULL as they should only be passed
- Do a little bit of style(9) cleanup on lockmgr.h
KPI results heavilly broken by this change, so manpages and
FreeBSD_version will be modified accordingly by further commits.
Tested by: matteo
doesn't overflow in arc.c in this check:
if (kmem_used() > (kmem_size() * 4) / 5)
return (1);
With this bug ZFS almost doesn't cache.
Only 32bit machines are affected that have vm.kmem_size set to values >=1GB.
Reported by: David Taylor <davidt@yadt.co.uk>
Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).
Leave a few comments to be addressed later.
Reviewed by: rwatson (older version, before addressing his comments)
- Improve error handling for load operations.
- Fix a memory corruption bug when using certain linux management apps.
- Allocate all commands up front to avoid OOM deadlocks later on.
while in principle a good idea, opened us up to a race inherrent to
the syncache's direct insertion of incoming TCP connections into the
"completed connection" listen queue, as it transpires that the socket
is inserted before the inpcb is fully filled in by syncache_expand().
The bug manifested with the occasional returning of 0.0.0.0:0 in the
address returned by the accept() system call, which occurred if accept
managed to execute tcp_usr_accept() before syncache_expand() had copied
the endpoint addresses into inpcb connection state.
Re-add tcbinfo locking around the address copyout, which has the effect
of delaying the copy until syncache_expand() has finished running, as
it is run while the tcbinfo lock is held. This is undesirable in that
it increases contention on tcbinfo further, but a more significant
change will be required to how the syncache inserts new sockets in
order to fix this and keep more granular locking here. In particular,
either more state needs to be passed into sonewconn() so that
pru_attach() can fill in the fields *before* the socket is inserted, or
the socket needs to be inserted in the incomplete connection queue
until it is actually ready to be used.
Reported by: glebius (and kris)
Tested by: glebius
a run-queue. If the priority is numerically raised only change lowpri
if we're certain it will be correct. Some slop is allowed however
previously we could erroneously raise lowpri for an idle cpu that a
thread had recently run on which lead to errors in load balancing
decisions.
to files, such as ktrace output, under CODA_VERBOSE. Otherwise, each
such call to VOP_WRITE() results in a kernel printf.
MFC after: 3 days
Obtained from: NetBSD
checksum offoload by downloading AIC-6915 firmware. Changes are
o Header file cleanup.
o Simplified probe logic.
o s/u_int{8,16,32}_t/uint{8,16,32}_t/g
o K&R -> ANSI C.
o In register access function, added support both memory mapped and
IO space register acccess. The function will dynamically detect
which method would be choosed.
o sf_setperf() was modified to support strict-alignment
architectures.
o Use SF_MII_DATAPORT instead of hardcoded value 0xffff.
o Added link state/speed, duplex changes handling task q. The task q
is also responsible for flow control settings.
o Always hornor link up/down state reported by mii layers. The link
state information is used in sf_start() to determine whether we
got a valid link.
o Added experimental flow-control setup. It was commented out but
will be activated once we have flow-cotrol infrastructure in mii
layer.
o Simplify IFF_UP/IFCAP_POLLING and IFF_PROMISC handling logic. Rx
filter always honors promiscuous mode.
o Implemented suspend/resume methods.
o Reorganized Rx filter routine so promiscuous mode changes doesn't
require interface re-initialization.
o Reimplemnted driver probe routine such that it looks for matching
device from supported hardware list table. This change will help to
add newer hardware revision to the driver.
o Use ETHER_ADDR_LEN instead of hardcoded value.
o Prefer memory space register mapping over I/O space as the hardware
requires lots of register access to get various consumer/producer
index. Failing to get memory space mapping, sf(4) falls back to I/O
space mapping. Use of memory space register mapping requires
somewhat large memory space(512K), though.
o Switch to simpler bus_{read,write}_{1,2,4}.
o Use PCIR_BAR macro to get BARs.
o Program PCI cache line size if the cache line size was set to 0
and enable PCI MWI.
o Add a new sysctl node 'dev.sf.N.stats' that shows various MAC
counters for Rx/Tx statistics.
o Add a sysctl node to configure interrupt moderation timer. The
timer defers interrupts generation until time specified in timer
control register is expired. The value in the timer register is in
units of 102.4us. The allowable range for the timer is 0 - 31
(0 ~ 3.276ms).
The default value is 1(102.4us). Users can change the timer value
with dev.sf.N.int_mod sysctl(8) variable/loader(8) tunable.
o bus_dma(9) conversion
- Enable 64bit DMA addressing.
- Enable 64bit descriptor format support.
- Apply descriptor ring alignment requirements(256 bytes alignment).
- Apply Rx buffer address alignment requirements(4 bytes alignment).
- Apply 4GB boundary restrictions(Tx/Rx ring and its completion ring
should live in the same 4GB address space.)
- Set number of allowable number of DMA segments to 16. In fact,
AIC-6915 doesn't have a limit for number of DMA segments but it
would be waste of Tx descriptor resource if we allow more than 16.
- Rx/Tx side bus_dmamap_load_mbuf_sg(9) support.
- Added alignment fixup code for strict-alignment architectures.
- Added endianness support code in Tx/Rx descriptor access.
With these changes sf(4) should work on all platforms.
o Don't set if_mtu in device attach, it's handled in ether_ifattach.
o Use our own callout to drive watchdog timer.
o Enable VLAN oversized frames and announce sf(4)'s VLAN capability
to upper layer.
o In sf_detach(), remove mtx_initialized KASSERT as it's not possible
to get there without initialzing the mutex. Also mark that we're
about to detaching so active bpf listeners do not panic the system.
o To reduce PCI register access cycles, Rx completion ring is
directly scanned instead of reading consumer/producer index
registers. In theory, Tx completion ring also can be directly
scanned. However the completion ring is composed of two types
completion(1 for Tx done and 1 and DMA done). So reading producer
index via register access would be more safer way to detect the
ring wrap-around.
o In sf_rxeof(), don't use m_devget(9) to align recevied frames. The
alignment is required only for strict-alignment architectures and
now the alignment is handled by sf_fixup_rx() if required. The
removal of the copy operation in fast path should increase Rx
performance a lot on non-strict-alignemnt architectures such as
i386 and amd64.
o In sf_newbuf(), don't set descriptor valid bit as sf(4) is
programmed to run with normal mode. In normal mode, the valid bit
have no meaning. The valid bit should be used only when the
hardware uses polling(prefetch) mode. The end of descriptor queue
bit could be used if needed, but sf(4) relys on auto-wrapping of
hardware on 256 descriptor queue entries so both valid and
descriptor end bit are not used anymore.
o Don't disable generation of Tx DMA completion as said in datasheet
and use the Tx DMA completion entry instead of relying on Tx done
completion entry. Also added additional Tx completion entry type
check in Tx completion handler.
o Don't blindly reset watchdog timer in sf_txeof(). sf(4) now unarm
the the watchdog only if there are no active Tx descriptors in Tx
queue.
o Don't manually update various counters in driver, instead, use
built-in MAC statistic registers to update them. The statistic
registers are updated in every second.
o Modified Tx underrun handlers to increase the threshold value
in units of 256 bytes. Previously it used to increase 16 bytes
at a time which seems to take too long to stabalize whenever Tx
underrun occurrs.
o In interrupt handler, additional check for the interrupt is
performed such that interrupts only for this device is allowed to
process descriptor rings. Because reading SF_ISR register clears
all interrtups, nuke writing to a SF_ISR register.
o Tx underrun is abonormal condition and SF_ISR_ABNORMALINTR includes
the interrupt. So there is no need to inspect the Tx underrun again
in main interrupt loop.
o Don't blindly reinitialize hardware for abnormal interrupt
condition. sf(4) reintializes the hardware only when it encounters
DMA error which requires an explicit hardware reinitialization.
o Fix a long standing bug that incorrectly clears MAC statistic
registers in sf_init_locked.
o Added strict-alignment safe way of ethernet address reprogramming
as IF_LLADDR may return unaligned address.
o Move sf_reset() to sf_init_locked in order to always reset the
hardware to a known state prior to configuring hardware.
o Set default Rx DMA, Tx DMA paramters as shown in datasheet.
o Enable PCI busmaster logic and autopadding for VLAN frames.
o Rework sf_encap.
- Previously sf(4) used to type 0 of Tx descriptor with padding
enabled to store driver private data. Emebedding private data
structures into descriptors is bad idea as the structure size
would be different between 64bit and 32bit architectures. The
type 0 descriptor allows fixed number of DMA segments in
a descriptor format and provides relatively simple interface to
manage multi-fragmented frames.
However, it wastes lots of Tx descriptors as not all frames are
fragmented as the number of allowable segments in a descriptor.
- To overcome the limitation of type 0 descriptor, switch to type
2 descriptor which allows 64bit DMA addressing and can handle
unliumited number of fragmented DMA segments. The drawback of
type 2 descriptor is in its complexity in managing descriptors
as driver should handle the end of Tx ring manually.
- Manually set Tx desciptor queue end mark and record number of
used descriptors to reclaim used descriptors in sf_txeof().
o Rework sf_start.
- Honor link up/down state before attempting transmission.
- Because sf(4) uses only one of two Tx queues, use low priority
queue instead of high one. This will remove one shift operation
in each Tx kick command.
- Cache last produder index into softc such that subsequenet Tx
operation doesn't need to access producer index register.
o Rewrote sf_stats_update to include all available MAC statistic
counters.
o Employ AIC-6915 firmware from Adaptec and implement firmware
download routine and TCP/UDP checksum offload.
Partial checksum offload support was commented out due to the
possibility of firmware bug in RxGFP.
The firmware can strip VLAN tag in Rx path but the lack of firmware
assistance of VLAN tag insertion in transmit side made it useless
on FreeBSD. Unlike checksum offload, FreeBSD requires both Tx/Rx
hardware VLAN assistance capability. The firmware may also detect
wakeup frame and can wake system up from states other than D0.
However, the lack of wakeup support form D3cold state keep me from
adding WOL capability. Also detecting WOL frame requires firmware
support but it's not yet known to me whether the firmware can
process the WOL frame.
o Changed *_ADDR_HIADDR to *_ADDR_HI to match other definitions of
registers.
o Added definitioan to interrupt moderation related constants.
o Redefined SF_INTRS to include Tx DMA done and DMA errors. Removed
Tx done as it's not needed anymore.
o Added definition for Rx/Tx DMA high priority threshold.
o Nuked unused marco SF_IDX_LO, SF_IDX_HI.
o Added complete MAC statistic register definition.
o Modified sf_stats structure to hold all MAC statistic regiters.
o Nuke various driver private padding data in Tx/Rx descriptor
definition. sf(4) no longer requires private padding. Also remove
unused padding related definitions. This greatly simplifies
descriptor manipulation on 64bit architectures.
o Becase we no longer pad driver private data into descriptor,
remove deprecated/not-applicable comments for padding.
o Redefine Rx/Tx desciptor status. sf(4) doesn't use bit fileds
anymore to support endianness.
Tested by: bruffer (initial version)
be wrong but I couldn't find a way to make it work. In addition, the
number of TxGFP instruction does not match the firmware image size,
so I guess something was wrong when Adaptec generated the TxGFP
firmware from their DDK.
According to datasheet, normally, the first GFP instruction would be
opcode C, WaitForStartOfFrame, to synchronize checksumming with
incoming frame. But the first instruction in TxGFP firmware was
opcode 1, BrToImmIfTrue, so it could not process checksum correctly,
I guess. Checking for RxGFP firmware also indicates the first
instruction should be opcode C. Since the number of instructions in
TxGFP firmware lacks exactly one instruction, I prepended the opcode
C to TxGFP firmware image. With this change, the resulting image size
perfectly matches with the nummber of instructions and Tx checksum
offload seems to work without problems.
lockmgr lkp, when held in exclusive mode, is recursed
- Introduce the function BUF_RECURSED() which does the same for bufobj
locks based on the top of lockmgr_recursed()
- Introduce the function BUF_ISLOCKED() which works like the counterpart
VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj
BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus
BUF_REFCNT() in a more explicative and SMP-compliant way.
This allows us to axe out BUF_REFCNT() and leaving the function
lockcount() totally unused in our stock kernel. Further commits will
axe lockcount() as well as part of lockmgr() cleanup.
KPI results, obviously, broken so further commits will update manpages
and freebsd version.
Tested by: kris (on UFS and NFS)
- Don't specify vnode operations for mknod, lease, and advlock--let them
fall through to vop_default.
- Implement vop_default with &default_vnodeops, rather than with VOP_PANIC,
so that unimplemented vnode operations are handled in more sensible ways
than panicking, such as EOPNOTSUPP on ACL queries generated by bsdtar,
or mknod.
MFC after: 3 days
fill out all fields, just fill out the ones the file system knows
about. Among other things, this causes the outpuf of "mount" and
"df" to make quite a bit more sense as /dev/cfs0 is specified as the
mountfrom name.
MFC after: 3 days
vnodes during coda_unmount() in order to detect errant use of them
after the vnode references may no longer be valid.
No need to clear the VV_ROOT flag on mi_rootvp flag (especially after
the vnode reference is no longer valid) as this isn't done on other
file systems.
MFC after: 3 days
and then release it when it is closed: we rely on the caller to keep the
vnode around with a valid reference. This avoids vrele() destroying the
vnode vop_close() is being called from during a call to vop_close(), and
a crash due to lockmgr recursing the vnode lock when a Coda unmount
occurs.
MFC after: 3 days
Move all extern variable definitions to associated .h files, move some
extern variable definitions between include files to place them more
appropriately.
MFC after: 3 days
Coda vnode derived from it, in the style of nullfs. This allows files
in the Coda file system to be memory-mapped, such as with execve(2) or
mmap(2).
MFC after: 3 days
Reported by: Rune <u+openafsdev-sr55 at chalmers dot se>
"BSM conversion requested for unknown event 43140"
It should be noted that we need to audit the fd argument for this system
call.
Obtained from: TrustedBSD Project
MFC after: 1 week
unp_connect(): it is expected to return with the lock held, and two
possible error paths otherwise returned with it unlocked.
The fix committed here is slightly different from the patch in the
PR, but along an alternative line suggested in the PR.
PR: 119778
MFC after: 3 days
Submitted by: James Juran <james dot juran at baesystems dot com>
was missed. As result, pty_create_slave() may index out of the names[]
bounds, creating wrong slave tty names.
Tested by: kensmith
Reviewed by: jhb
MFC after: 3 days
since the the command and data that is being built to be sent to or read
from the HW lives in the softc. Commands are later run via an_setdef etc.
In the ioctl path various references are kept to the data stored in
the softc so it needs to be protected. Almost think of the command
in the softc a global variable since it essentially is. Since locking
wasn't done in this type of context the commands would get corrupted.
Thanks to avatar@ for catching some lock issues and dhw@ for testing.
Things are a lot more stable except for the MPI-350 cards. My an(4)
remote laptop stays on the network now.
The driver should be changed so that it uses private memory that is passed
to the functions that talk to the card. Then only those functions would
really need to grab locks.
Reviewed by: avatar@
drop the lock and then re-acquire it, revalidating TCP connection state
assumptions when we do so. This avoids a potential lock order reversal
(and potential deadlock, although none have been reported) due to the
inpcb lock being held over a page fault.
MFC after: 1 week
PR: 102752
Reviewed by: bz
Reported by: Václav Haisman <v dot haisman at sh dot cvut dot cz>
shortest possible chain of mbufs of m_defrag(9). What we want is
chains of mbufs that can be safely stored to a Tx descriptor which
can have up to STGE_MAXTXSEGS mbufs. The ethernet controller does
not need to align Tx buffers on 32bit boundary. So the use of
m_defrag(9) was waste of time.
specified in Table 7-10 in their destination address field shall not be relayed
by the Bridge. Add a check in bridge_forward() to adhere to this.
PR: kern/119744
compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel
option DIAGNOSTIC still disables inlining of certain pmap functions.)
Eliminate dead code from pmap_enter(). This code implemented an assertion.
On i386, an equivalent check is already implemented. However, on amd64,
a small change is required to implement an equivalent check.
Eliminate \n from a nearby panic string.
Use KASSERT() to reimplement pmap_copy()'s two assertions.
in the range and precision of their type(s) on amd64, but FLT_EVAL_METHOD
said that they were evalated in the "interesting" (buggy) i387 methods.
float_t was broken compatibly with FLT_EVAL_METHOD.
These definitions seem to be broken on powerpc and possibly on arm.
float_t is float on powerpc with gcc [-notraditional] according to
glibc, and FLT_EVAL_METHOD is marked with XXX on arm.
problems when the DRM driver is loaded and the AIXGL extension is loaded
, the AIXGL driver requests a drm_close and this will cause the radeon
driver to fail while starting X windows.
PR: kern/114688
Submitted by: vehemens <vehemens at verizon dot net>
Prodded by: Robert Noland
Approved by: imp (mentor, a while ago already), anholt
MFC After: 1 week
encounters a syntax error, and add a tip about adding first
the `vital' options and then experimental ones.
PR: docs/119658
Submitted by: Julian Stacey, jhs at berklix.org
other. The first one survives, the rest are removed. So far, it appears
only some acpi_perf(4) BIOS tables have these invalid states, but address
this in the core to be sure to handle other potential driver data.
PR: kern/114722
Tested by: stefan.lambrev / moneybookers.com
MFC after: 3 days
- Track packet zone mbufs separately from other mbufs
- free packet zone buffers via m_free rather than trying to manage the refcount
as with clusters - its refcount and management seems to be "special"
but reread it from the device_t every time the device list is fetched.
Previously the device name in pciconf -l would not be updated when a driver
was unloaded or if a device was detached and attached to a different
driver.
MFC after: 1 week
PR: kern/104777
Submitted by: "Iasen Kostoff" tbyte | otel net
queues (which we call slices). The NIC will steer traffic into up to
hw.mxge.max_slices different receive rings based on a configurable
hash type (hw.mxge.rss_hash_type).
Currently the driver defaults to using a single slice, so the default
behavior is unchanged. Also, transmit from non-zero slices is
disabled currently.
spec:
- Use read/modify/write cycles to enable and disable the HPET instead of
writing 0 to reserved bits.
- Shutdown the HPET during suspend as encouraged by the spec.
- Fail to attach to an HPET with a period of zero.
MFC after: 1 week
PR: kern/119675 [3]
Reported by: Leo Bicknell | bicknell ufp.org
- Fix a bug introduced in 1.4.20 where speculative read by the processor in the
write-only doorbell region would cause a target-abort (as opposed to simply
returning random data). This could manifest itself as NMI or machine freeze
depending on how the BIOS/OS/chipset configuration handles target-abort.
- Add support for new revisions of -R cards (with AEL1002/AEL1010 xaui->xfi)
- Increase an internal timing (dispatch engine): fix possible spurious reset
(seen on very few cards).
lowest priority on the queue for the current cpu vs curthread's
priority. In the case that curthread is waking up many threads of a
lower priority as would happen with a turnstile_broadcast() or wakeup()
of many threads this prevents them from all ending up on the current cpu.
- In sched_add() make the relationship between a scheduled ithread and
the current cpu advisory rather than strict. Only give the ithread
affinity for the current cpu if it's actually being scheduled from
a hardware interrupt. This prevents it from migrating when it simply
blocks on a lock.
Sponsored by: Nokia
- increase asserts for mbuf accounting
- track outstanding mbufs (maps very closely to leaked)
- actually only create one thread per port if !multiq
Oddly enough this fixes the use after free
- move txq_segs to stack in t3_encap
- add checks that pidx doesn't move pass cidx
- simplify mbuf free logic in collapse mbufs routine
- move cxgb_tx_common in to cxgb_multiq.c and rename to cxgb_tx
- move cxgb_tx_common dependencies
- further simplify cxgb_dequeue_packet for the non-multiqueue case
- only launch one service thread per port in the non-multiq case
- remove dead cleaning code from cxgb_sge.c
- simplify PIO case substantially in by returning directly from mbuf collapse
and just using m_copydata
- remove gratuitous m_gethdr in the rx path
- clarify freeing of mbufs in collapse
o Increased number of Rx/Tx descriptors to 256 for 8169 GigEs
because it's hard to push the hardware to the limit with default
64 descriptors.
TSO requires large number of Tx descriptors to pass a full sized
TCP segment(65535 bytes IP packet) to hardware. Previously it
consumed 32 Tx descriptors, assuming MCLBYTES DMA segment size,
to send the TCP segment which means re(4) couldn't queue more
than two full sized IP packets.
For 8139C+ it still uses 64 Rx/Tx descriptors due to its hardware
limitations. With this changes there are (very) small waste of
memory for 8139C+ users but I don't think it would affect 8139C+
users for most cases.
o Various bus_dma(9) fixes.
- The hardware supports DAC so allow 64bit DMA operations.
- Removed BUS_DMA_ALLOC_NOW flag.
- Increased DMA segment size to 4096 from MCLBYTES because TSO
consumes too many descriptors with MCLBYTES DMA segment size.
- Tx/Rx side bus_dmamap_load_mbuf_sg(9) support. With these
changes the code is more readable than previous one and got a
(slightly) better performance as it doesn't need to pass/
decode arguments to/from callback function.
- Removed unnecessary callback function re_dmamap_desc() and
nuked rl_dmaload_arg structure which was used in the callback.
- Additional protection for DMA map load failure. In case of
failure reuse current map instead of returning a bogus DMA
map.
- Deferred DMA map unloading/sync operation for maximum
performance until we really need to load new DMA map. If we
happen to reuse current map(e.g. input error) there is no need
to sync/unload/load again.
- The number of allowable Tx DMA segments for a mbuf chains are
now 32 instead of magic nseg value. If the number of available
Tx descriptors are short enough to send highly fragmented mbuf
chains an optimized re_defrag() is called to collapse mbuf
chains which is supposed to be much faster than m_defrag(9).
re_defrag() was borrowed from ath(4).
- Separated Rx/Tx DMA tag from a common DMA tag such that Rx DMA
tag correctly uses DMA maps that were created with DMA alignment
restriction(8bytes alignments). Tx DMA tag does not have such
alignment limitation.
- Added additional sanity checks for DMA ring map load failure.
- Added additional spare Rx DMA map for graceful handling of Rx
DMA map load failure.
- Fixed misused bus_dmamap_sync(9) and added missing
bus_dmamap_sync(9) in re_encap()/re_txeof()/re_rxeof().
o Enabled TSO again as re(4) have reasonable number of Tx
descriptors.
o Don't touch DMA address of a Tx descriptor in re_txeof(). It's
not needed.
o Fix incorrect update of if_ierrors counter. For Rx buffer
shortage it should update if_qdrops as the buffer is reused.
o Added checks for unsupported H/W revisions and return ENXIO for
these hardwares. This is required to remove resource allocation
code in re_probe as other drivers do in device probe routine.
o Modified descriptor index manipulation macros as it's now possible
to have different number of descriptors for Rx/Tx.
o In re_start, to save a lock operation, use IFQ_DRV_IS_EMPTY before
trying to invoke IFQ_DRV_DEQUEUE. Also don't blindly call re_encap
since we already know the number of available Tx descriptors in
advance.
o Removed RL_TX_DESC_THLD which was used to reserve RL_TX_DESC_THLD
descriptors in Tx path. There is no such a limitation mentioned in
8139C+/8169/8110/8168/8101/8111 datasheet and it seems to work ok
without reserving RL_TX_DESC_THLD descriptors.
o Fix a comment for RL_GTXSTART. The register is 8bits register.
o Added comments for 8169/8139C+ hardware restrictions on descriptors.
o Removed forward declaration for "struct rl_softc", it's not needed.
o Added a new structure rl_txdesc for Tx descriptor managements and
a structure rl_rxdesc for Rx descriptor managements.
o Removed unused member variable rl_intlock in driver softc. There are
still several unused member variables which are supposed to be used
to access hardware statistics counters. But it seems that accessing
hardware counters were not implemented yet.
the kernel's direct map instead of the pmap's recursive mapping to access
the lowest level in the page table. The direct map is preferable for two
reasons: (1) The TLB is more likely to hold the required direct mapping
because pmap_enter() has already used the direct map to access a nearby
PTE and (2) loading a direct mapping into the TLB involves walking only 2
or 3 levels of the page table instead of 4.
- Turn on WOL bits in suspend/shutdown method.
- WOL is disabled in resume routine as WOL can interfere normal
Rx operation.
- Move stge_reset() to stge_init_locked() as resetting hardware
clears configured Rx information which in turn results in
non-working Rx module after suspend/shutdown operation.
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
zone code. The GPE handler method (i.e. _L00) generates various Notify
events that need to be run to completion before the GPE is re-enabled.
In ACPI-CA, we queue an asynch callback at the same priority as a Notify
so that it will only run after all Notify handlers have completed. The
callback re-enables the GPE afterwards. We also changed the priority of
Notifies to be the same as GPEs, given the possibility that another GPE
could arrive before the Notifies have completed and we don't want it to
get queued ahead of the rest.
The ACPI-CA change was submitted by Alexey Starikovskiy (SUSE) and will
appear in a later release. Special thanks to him for helping track this
bug down.
MFC after: 1 week
Tested by: jhb, Yousif Hassan <yousif / alumni.jmu.edu>
zone code. The GPE handler method (i.e. _L00) generates various Notify
events that need to be run to completion before the GPE is re-enabled.
In ACPI-CA, we queue an asynch callback at the same priority as a Notify
so that it will only run after all Notify handlers have completed. The
callback re-enables the GPE afterwards. We also changed the priority of
Notifies to be the same as GPEs, given the possibility that another GPE
could arrive before the Notifies have completed and we don't want it to
get queued ahead of the rest.
The ACPI-CA change was submitted by Alexey Starikovskiy (SUSE) and will
appear in a later release. Special thanks to him for helping track this
bug down.
MFC after: 1 week
Tested by: jhb, Yousif Hassan <yousif / alumni.jmu.edu>
memcpy/memset/memcmp and friends from libkern/arm to arm/arm/support.S, and so
I did, but in the process, I didn't add the appropriate copyrights.
This is a major oversight from me, and I apology to the NetBSD people for it.
MFC After: 1 day
via a new socket during an NFS operation as that reconnect takes place in
the context of an arbitrary thread with an arbitrary credential. Ideally
we would like to use the mount point's credential for the entire process
of setting up the socket to connect to the NFS server. Since some of the
APIs (sobind(), etc.) only take a thread pointer and infer the credential
from that instead of a direct credential, work around the problem by
temporarily changing the current thread's credential to that of the mount
point while connecting the socket and then reverting back to the original
credential when we are done.
Reviewed by: rwatson
Tested on: UDP, TCP, TCP with forced reconnect
of fpget*() and fpset*()).
The i386 fpget*() were efficient but a bit obfuscated (using macros
and a case statement to demultiplex them through a single inline).
The demultiplexing mainly gave smaller source code.
The i386 fpset*() were obfuscated in the same way and were very
inefficient due to the case statement not having enough cases or
complexity so all cases used the FP environment.
This also fixes a harmless bug in rev.1.12. fpsetmask() extracted the
old value from the bit-field twice, but the doubled shift was harmless
since the shift count is 0.
All fp*() interfaces are now inline functions on i386. They used to
be macros that call (a different set of) inline functions. This is a
small ABI change which shouldn't cause problems since cases where
inlining fails (mainly -O0) only give (working) static functions.
others can be replaced cleanly by the amd64 versions. There is no
current amd64 version to merge, but there is an old one which is
similar.
Fix the following bugs in fpresetsticky():
- garbage args clobbered non-sticky bits in the status register
- the return value was usually garbage since it was masked with the
arg instead of with the field selector.
Optimize fpresetsticky() to avoid using the environment as in
feclearexcept() (use only fnclex() if possible) and also to avoid
using fnclex() for null changes. The second of these optimizations
might not be so good since its branch might cost more than it saves.
Unmasked exceptions (which can be fixed up using fpset*() before they
trap) are very rare, especially on amd64 since SSE exceptions trap
synchronously, but I want to merge the faster amd64 implementations of
fpset*() back to i386 without introducing the bug on i386.
The i386 implementation has always avoided the trap automatically by
changing things using load/store of the FP environment, but this is
very slow. Most changes only affect the control word, so they can
usually be done much more efficiently, and amd64 has always done this,
but loading the control word can trap.
This version use the fast method only in the usual case where it will
not trap. This only costs a couple of integer instructions (including
one branch which I haven't optimized carefully yet) in the usual case,
but bloats the inlines a lot. The inlines were already a bit too large
to handle both the FPU and SSE.
a panic race on module unload. The wakeup() is internal to
kproc_exit/kthread_exit. The correct fix is to fix the msleep() in
detach to sleep on fdc->fdc_thread instead of &fdc->fdc_thread.
Noted and reviewed by: jhb
Pointy hat to: kib
MFC after: 1 week
panic but it won't actually lock anything.
This can lead some paths to reach lockmgr_disown() with inconsistent
lock which will let trigger the relative assertions.
Fix those in order to recognize panic situation and to not trigger.
Reported by: pho
Submitted by: kib
- fix a previous style fix: shifts should be in the correct direction even
if they are null.
- restore a comment about namespace pollution from floatingpoint.h 1.12 and
update it.
- remove unused namespace pollution FP_*REG.
- improve some comments.
- sort macro definitions for entry points.
- don't use underscores for macro args.
Wakeup the thread doing the fdc_detach() when the fdc worker thread exits [1].
Write access to the write-protected floppy shall call device_unbusy() to
pair the device_busy() in the fd_access() [2].
PR: 116537 [1], 116539 [2]
MFC after: 1 week
rspq lock. Not doing so was causing us to skip re-enabling the interrupt.
- remove duplicate credits sysctl
- add support for dumping hardware context of the txq
- decrement budget_left when we break out of the process_responses loop
interrupt handlers for child devices by adding a dummy handler that is
always present so that the underlying interrupt thread is always around
avoiding panics from stray interrupts.
MFC after: 3 days
soconnect()) instead of &thread0 when establishing a connection to the NFS
server. Otherwise inconsistent credentials may be used when setting up
the NFS socket.
MFC after: 1 week
Reviewed by: rwatson
maintain a separate td_incruntime to hold unbilled CPU usage for
the thread that has the previous properties of td_runtime.
When thread information is requested using the thread monitoring
sysctls, export thread td_runtime instead of process rusage runtime
in kinfo_proc.
This restores the display of individual ithread and other kernel
thread CPU usage since inception in ps -H and top -SH, as well for
libthr user threads, valuable debugging information lost with the
move to try kthreads since they are no longer independent processes.
There is universal agreement that we should rewrite the process and
thread export sysctls, but this commit gets things going a bit
better in the mean time. Likewise, there are resevations about the
continued validity of statclock given the speed of modern processors.
Reviewed by: attilio, emaste, jhb, julian
address space in kmem map call vm_lowmem event in a loop and wait a bit for
subsystems to reclaim some memory which in turn will reclaim address space as
well.
Note, this is a work-around.
Reviewed by: alc
Approved by: alc
MFC after: 3 days
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>
- return the error from cxgb_tx_common so that when an error is hit we dont
spin forever in the taskq thread
- remove unused rxsd_ref
- simplify header_offset calculation for embedded mbuf headers
- fix memory leak by making sure that mbuf header initialization took place
- disable printf's for stalled queue, don't do offload/ctrl queue restart
when tunnel queue is restarted
- add more diagnostic information about the txq state
- add facility to dump the actual contents of the hardware queue using sysctl
- fix this to compile with C++ by casting ints to enums in a few places
and by using the correct parameter type for _fpsetprec(). Remove
__cplusplus ifdefs which disabled the buggy code.
- remove __CC_SUPPORTS___INLINE ifdefs. `__inline' vs `inline', and either
of these #defined away, are supposed to be handled by very old ifdefs
in <sys/cdefs.h>. Thus the __CC_SUPPORTS___INLINE macro is not needed
here (or anywhere else that it used). It is less needed here than in
most places, since this file is userland-only and userland is far from
supporting INTEL_COMPILER. The __CC_SUPPORTS___INLINE__ macro which
was used here is even less needed. It is to support spelling `inline'
as `__inline__' instead of the usual spelling `__inline'.
Fix some style bugs that I missed in the previous commit (remove unused
asms and sort more variables).
Now, lockmgr() function can only be called passing curthread and the
KASSERT() is upgraded according with this.
In order to support on-the-fly owner switching, the new function
lockmgr_disown() has been introduced and gets used in BUF_KERNPROC().
KPI, so, results changed and FreeBSD version will be bumped soon.
Differently from previous code, we assume idle thread cannot try to
acquire the lockmgr as it cannot sleep, so loose the relative check[1]
in BUF_KERNPROC().
Tested by: kris
[1] kib asked for a KASSERT in the lockmgr_disown() about this
condition, but after thinking at it, as this is a well known general
rule, I found it not really necessary.
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
object which provides the backing store. Each descriptor starts off with
a size of zero, but the size can be altered via ftruncate(2). The shared
memory file descriptors also support fstat(2). read(2), write(2),
ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
manage shared memory file descriptors. The virtual namespace that maps
pathnames to shared memory file descriptors is implemented as a hash
table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
path argument to shm_open(2). In this case, an unnamed shared memory
file descriptor will be created similar to the IPC_PRIVATE key for
shmget(2). Note that the shared memory object can still be shared among
processes by sharing the file descriptor via fork(2) or sendmsg(2), but
it is unnamed. This effectively serves to implement the getmemfd() idea
bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
collected when they are not referenced by any open file descriptors or
the shm_open(2) virtual namespace.
Submitted by: dillon, peter (previous versions)
Submitted by: rwatson (I based this on his version)
Reviewed by: alc (suggested converting getmemfd() to shm_open())
assertion hit in swapoff_one() when we un-mount a swap partition. We
should be using curthread where we used thread0 before. This change
also replaces the thread argument with a credential argument, as the
MAC framework only requires the cred.
It should be noted that this allows the machine to be rebooted without
panicing with "cannot differ from curthread or NULL" when MAC is enabled.
Submitted by: rwatson
Reviewed by: attilio
MFC after: 2 weeks
dev2udev() when a tty was being detached concurrently with the sysctl
handler:
- Hold the 'tty_list_mutex' lock while we read all the fields out of the
struct tty for copying out later. Previously the pty(4) and pts(4)
destroy routines could set t_dev to NULL, drop their reference on the
tty and destroy the cdev while the sysctl handler was attempting to
invoke dev2udev() on the cdev being destroyed. This happened when the
sysctl handler read the value of t_dev prior to it being set to NULL
either due to it being stale or due to timing races. By holding the
list lock we guarantee that the destroy routines will block in ttyrel()
in that case and not destroy the cdev until after we've copied all of our
data. We may see a NULL cdev pointer or we may see the previous value,
but the previous value will no longer point to a destroyed cdev if we
see it.
- Fix the ttyfree() routine used by tty device drivers in their detach
methods to use ttyrel() on the tty so we don't leak them. Also, fix it
to use the same order of operations as pty/pts destruction (set t_dev
NULL, ttyrel(), destroy_dev()) so it cooperates with the sysctl handler.
MFC after: 3 days
Tested by: avatar
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.
Submitted by: rwatson
add correct locking to the operation of unmounting.
This will prevent debugging kernels from panicking if mounting a
non-hpfs partition (I'm not sure if this can be a problem with a
successful mounting operation though).
MFC: 3 days
pv_list_count from struct md_page. Ever since Peter rewrote the pv
entry allocator for amd64 and i386 pv_list_count has been correctly
maintained but otherwise unused.
- spell 16384 as 16384 and not as BKVASIZE. 16384 is (not quite) just a
magic size that works well in practice. BKVASIZE should be MAXBSIZE
(65536), but is 16384 because i386's don't have enough kva for it to
be MAXBSIZE; 16384 works (not so well) for it for much the same reasons
that it works well in the heuristic.
- expand and/or add comments about this and other details.
- don't explicitly inline this function.
- fix some other style bugs.
ABI override binary isn't found. This could probably be smoother, but
it is what I did in p4 change #126891 on 2007/09/27. It should solve
the "ld-elf32.so.1"-in-chroot problem.
allocation, free the indirect blocks before clearing the disk pointers,
that could lead to the softupdate inconsistencies in the case of the
machine or disk crash at the wrong time.
Rearrange the recover code to do the ffs_blkfree() after the second
ffs_syncvnode(), that clears the pointers chain.
Proposed and reviewed by: tegge
Tested by: Peter Holm
MFC after: 3 weeks
happen if there are no files open. Accounting for these can
eventually return a negative value for olenp causing sysctl to
crash with a bad malloc.
Reported by: Pawel Worach <pawel.worach@gmail.com>
set, announce BIO_DELETE capability and issue ATA_CFA_ERASE when we get one.
Once we issue more BIO_DELETE, this will improve lifetime, and
possibly write speed of Flash based devices which have usable flash
adaptation layers.
For now, about the only usage is the newfs(1) -E flag.
Approved by: sos
peoples code with irrelevant changes[1]:
Use bus_{read|write_*() instead of bus_space_{read|write}_*() for
purely stylistic reasons.
Due to compiler optimizations and inlining, this is for all practical
purposes without effect in the compiled code.
[1] NB: Approved by: sos
instead of writing apologetic comments. As it turns out, I need every
kernel page table page to have a legitimate pindex to support superpage
promotion on kernel memory.
Correct a nearby style error: Pointers should be compared to NULL.
queues lock is acquired. Otherwise, the state of a reservation's
pages' flags and its population count can be inconsistent. That could
result in a page being freed twice.
Reported by: kris
- Clear all of the gc flags before doing a run. Stale flags were causing
us to skip some descriptors.
- If a unp socket has been marked REF in a gc pass it can't be dead.
Found by: rwatson's test tool.
of two compares against 0. The negative effect of cache flushing
is probably more than the gain by not doing the two compares (the
value is almost certainly in register or at worst, cache).
Note that the uses of m_freem() are in error cases and m_freem()
handles NULL anyhow. So fast-path really isn't changed much at all.
feature is represented by a node in the new 'kern.features' sysctl node.
A feature is present if the corresponding node is present and evaluates to
true.
A FEATURE() wrapper macro is added which takes the sysctl node name and
a description of the feature as the sole arguments and creates a read-only
sysctl node with a value of 1.
Discussed on: arch
correct number of acpi_thermalX devices. Having this wrong caused the
acpi_thermal thread to realloc the array of devices on each loop iteration.
MFC after: 1 week
PR: kern/118497
Submitted by: Pasi Parviainen
- Introduce a finit() which is used to initailize the fields of struct file
in such a way that the ops vector is only valid after the data, type,
and flags are valid.
- Protect f_flag and f_count with atomic operations.
- Remove the global list of all files and associated accounting.
- Rewrite the unp garbage collection such that it no longer requires
the global list of all files and instead uses a list of all unp sockets.
- Mark sockets in the accept queue so we don't incorrectly gc them.
Tested by: kris, pho
possible to end up in the interrupt handler again while processing the
previous RX interrupt in ifp->if_input() because the MD interrupt code
disables the delivery of the respective interrupt until all associated
handlers were called (in the INTR_FILTER case the MI code supposedly
does the same). Toggling the NIC interrupt enable bit in these handlers
still is necessary though as some chips (f.e. the VMware emulated one)
require this to be done in order to keep issuing interrupts.
MFC after: 1 month
implemented with macros. This patch improves code readability. Reasoning
behind vidd_* is a sort of "video discipline".
List of macros is supposed to be complete--all methods of video_switch
ought to have their respective macros from now on.
Functionally, this code should be no-op. My intention is to leave current
behaviour of touched code as is.
No objections: rwatson
Silence on: freebsd-current@
Approved by: cognet
implemented with macros. This patch improves code readability. Reasoning
behind kbdd_* is a "keyboard discipline".
List of macros is supposed to be complete--all methods of keyboard_switch
should have their respective macros from now on.
Functionally, this code should be no-op. My intention is to leave current
behaviour of code as is.
Glanced at by: rwatson
Reviewed by: emax, marcel
Approved by: cognet
machine-independent support for superpages. (The earlier part was
the rewrite of the physical memory allocator.) The remainder of the
code required for superpages support is machine-dependent and will
be added to the various pmap implementations at a later date.
Initially, I am only supporting one large page size per architecture.
Moreover, I am only enabling the reservation system on amd64. (In
an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0
in amd64/include/vmparam.h or your kernel configuration file.)
argument. It allows ppp, mpd or any other node consumer to request
connection to specified access concentrator.
Proposed by: Alexander A. Burylov <burylov@mail.ru>
Without it, code has two problems:
- behaviour of the old and new [l]stat are different with regard of
the /compat/linux
- directly accessing the userspace data from the kernel asks for
the panics.
Reported and tested by: Peter Holm
Reviewed by: rdivacky
MFC after: 3 days
the inode, do the rollback in case the allocation failed (due to
insufficient free space or quota limits). But, the code does leaves the
buffers corresponding to the inoirect blocks on the vnode bufobj list.
This causes several assertion failures (for instance, "ffs_truncate3"
in ffs_truncate()) to fail, and could result in the indirect block
aliasing problem, like writing the context of such blocks to random
disk location.
Remove the buffers from the bufobj properly.
Reported and tested by: Peter Holm
Reviewed by: tegge
MFC after: 3 weeks
so that the results end up in the DDB output stream rather than the
console output stream.
This should likely also be done for the vprint() function it calls.
MFC after: 3 months
This option just adds complexity and the new implementation no longer
will support it, so axing it now that it is unused is probabilly the
better idea.
FreeBSD version is bumped in order to reflect the KPI breakage introduced
by this patch.
In the ports tree, kris found that only old OSKit code uses it, but as
it is thought to work only on 2.x kernels serie, version bumping will
solve any problem.
with the interlock), owner of the lock should be only curthread or at
least, for its limited usage, NULL which identifies LK_KERNPROC.
The thread "extra argument" for the lockmgr interface is going to be
removed in the near future, but for the moment, just let kernel run for
some days with this check on in order to find potential deadlocking
places around the kernel and fix them.
p_candebug() will return EAGAIN which, if the other process never
leaves execve(), will result in the sysctl spinning and never returning
to userspace. Processes should always eventually leave execve(), but
spinning in kernel while we wait is bad for countless reasons, and
particularly harmful if execve() itself is deadlocked.
Possibly we should return another error, or return a marker indicating
the thread is in execve() so it can be reported that way in userspace.
Reported by: kris
equivalent with this and so operate the switch.
That call is the only one remaining LK_EXCLUPGRADE consumer and removing
it will prepare the ground for LK_EXCLUPGRADE axing and further
lockmgr improvements.
Discussed with: jeff, ups
Recycle the vm object's "pg_color" field to represent the color of the
first virtual page address at which the object is mapped instead of the
color of the object's first physical page. Since an object may not be
mapped, introduce a flag "OBJ_COLORED" that indicates whether "pg_color"
is valid.
mounted FS' problems. These are more along the lines of 'avoiding an
avoidable panic' than a complete solution to removable devices. We
now close the barn door after the horse has gotten lose and has been
hit by a truck, as it were. The barn no longer catches fire in this
case, but the horse is still dead :-).
The vfs_bio.c fix causes us not to put a failed write back into the
dirty pool if the error returned was ENXIO. In that case, the buffer
is treated like any other clean buffer that's being retured. ENXIO
means the device isn't there anymore and will never be there again in
the future, so retrying is futile.
The vfs_mount.c fix treats 'ENXIO' as success for unmounting a file
system. If the device is gone, retrying later won't help and we'll
never be able to unmount the device.
These two are part of a larger patch set submitted by the author. The
other patches will be forth coming. I added comments to these two
patches.
Submitted by: Henrik Gulbrandsen
Reviewed by: phk@
PR: usb/46176 (partial)
functions. It is easily triggered by running routed, and, I expect, by
running any other daemon that uses routing sockets.
Reviewed by: net@
MFC after: 1 week
- Use the correct offsets when copying out the results of PCIOCGETCONF_OLD.
This happened to not affect the 64-bit architectures because there the
addition of pc_domain to struct pcisel didn't change the overall size of
struct pci_conf. [1]
- Always copy the name and unit information to conf_old so it's also part
of the output once this information is cached in dinfo.
- Use the correct type for flags in struct pci_match_conf_old. This
change is more or less cosmetic though.
Reported and tested by: bde [1]
Reviewed by: imp
MFC after: 3 days
Committed from: 24C3
If a mouse has both a wheel and a Z direction we report both.
XXX Due to tradition the wheel is reported as the Z direction (and the Z
direction as W).
Now Apple's Mighty Mouse is fully supported, except the X11 mouse driver
doesn't know what to do with the new coordinate.
MFC after: 3 months
Approved by: njl (mentor), imp
dump using mechanically generated/extracted debugging output rather than
a simple memory dump. Current sources of debugging output are:
- DDB output capture buffer, if there is captured output to save
- Kernel message buffer
- Kernel configuration, if included in kernel
- Kernel version string
- Panic message
Textdumps are stored in swap/dump partitions as with regular dumps, but
are laid out as ustar files in order to allow multiple parts to be stored
as a stream of sequentially written blocks. Blocks are written out in
reverse order, as the size of a textdump isn't known a priori. As with
regular dumps, they will be extracted using savecore(8).
One new DDB(4) command is added, "textdump", which accepts "set",
"unset", and "status" arguments. By default, normal kernel dumps are
generated unless "textdump set" is run in order to schedule a textdump.
It can be canceled using "textdump unset" to restore generation of a
normal kernel dump.
Several sysctls exist to configure aspects of textdumps;
debug.ddb.textdump.pending can be set to check whether a textdump is
pending, or set/unset in order to control whether the next kernel dump
will be a textdump from userspace.
While textdumps don't have to be generated as a result of a DDB script
run automatically as part of a kernel panic, this is a particular useful
way to use them, as instead of generating a complete memory dump, a
simple transcript of an automated DDB session can be captured using the
DDB output capture and textdump facilities. This can be used to
generate quite brief kernel bug reports rich in debugging information
but not dependent on kernel symbol tables or precisely synchronized
source code. Most textdumps I generate are less than 100k including
the full message buffer. Using textdumps with an interactive debugging
session is also useful, with capture being enabled/disabled in order to
record some but not all of the DDB session.
MFC after: 3 months
to identify textdumps in the swap/dump partition. While textdumps
aren't really an architecture, they are architecture-neutral and so
don't really correspond to any existing architecture.
Define a version number for textdumps, KERNELDUMP_TEXT_VERSION, of 1.
MFC after: 3 months
define a set of named scripts. Each script consists of a list of DDB
commands separated by ";"s that will be executed verbatim. No higher
level language constructs, such as branching, are provided for:
scripts are executed by sequentially injecting commands into the DDB
input buffer.
Four new commands are present in DDB: "run" to run a specific script,
"script" to define or print a script, "scripts" to list currently
defined scripts, and "unscript" to delete a script, modeled on shell
alias commands. Scripts may also be manipulated using sysctls in the
debug.ddb.scripting MIB space, although users will prefer to use the
soon-to-be-added ddb(8) tool for usability reasons.
Scripts with certain names are automatically executed on various DDB
events, such as entering the debugger via a panic, a witness error,
watchdog, breakpoint, sysctl, serial break, etc, allowing customized
handling.
MFC after: 3 months
(dummynet), ipsec_filter() would return the empty error code and the ipsec code
would continue to forward/deference the null mbuf.
Found by: m0n0wall
Reviewed by: bz
MFC after: 3 days
captured to a memory buffer for later inspection using sysctl(8), or in the
future, to a textdump.
A new DDB command, "capture", is added, which accepts arguments "on", "off",
"reset", and "status".
A new DDB sysctl tree, debug.ddb.capture, is added, which can be used to
resize the capture buffer and extract buffer contents.
MFC after: 3 months
kern.console format as is. Thus, no difference in output format should
appear after this commit.
Reviewed by: cognet@ (mentor)
Approved by: cognet@ (mentor)
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.
Assign approximate why values to all current consumers of the
kdb_enter() interface.
fget() call, that is sleeping point, and possibly dropping Giant.
The snp_target == NULL implies the snp_tty == NULL. Remove the code
that is put under snp_target == NULL and snp_tty != NULL clause.
In snpclose(), do the snp_detach() before scheduling the snp device
destruction. Otherwise, after the return from snpclose(), the snp
device is already removed from the snp_list, but tty is still in
snooped state. Any attempt to do i/o on such tty cause panic because
ttytosnp() returns NULL.
Tested by: Peter Holm
MFC after: 1 week
o BSD disklabels have relative offsets. Even for the BSD in MBR slice
setup, except when the mbroffset ioctl is supported. Since we don't
support that ioctl, bsdlabel(8) expects relative offsets. So, when
reading an existing disklabel, correct for disklabels that mistakenly
have the mbroffset offsets.
o Don't take the geometry seriously, because it's untrustworthy. We do
expect the numbers to be within range. This means that the secperunit
field will not be computed from secpercyl and ncyls, but simply is
the mediasize in sectors.
o Don't enforce partitions to be aligned to track boundaries. The
default label, constructed by bsdlabel(8), puts partition a at offset
BBSIZE bytes, which commonly means sector 16.
free the MAC label on the inpcb before freeing the inpcb.
MFC after: 3 days
Submitted by: tanyong <tanyong at ercist dot iscas dot ac dot cn>,
zhouzhouyi
old code special cased them too early which caused a few differences for
these sort of links relative to other PCI links:
- They were always re-routed via the BIOS call instead of assuming that
they were already routed if the BIOS had programmed the IRQ into a
matching device during POST.
- If the BIOS did route that link to a different IRQ that was marked as
invalid, we trusted the $PIR table rather than the BIOS IRQ.
This change moves the special casing for "unique IRQ" links to only take
that into account when picking an IRQ for an unrouted link so that these
links will now not be routed if the BIOS appears to have routed it already
(some BIOSen have problems with that) and so that if the BIOS uses a
different IRQ than the $PIR, we trust the BIOS routing instead (this is
what we do for all other links as well).
Reported by: Bruce Walter walter of fortean com
MFC after: 1 week
page to be in the free lists. Instead, it now returns TRUE if it
removed the page from the free lists and FALSE if the page was not
in the free lists.
This change is required to support superpage reservations. Specifically,
once reservations are introduced, a cached page can either be in the
free lists or a reservation.
as multicast/broadcast frames. Previously re(4) ignored multicast
frames in promiscuous mode. The RTL8169 datasheet was not clear
how it handles multicast frames in promiscuous mode.
PR: kern/118572
MFC after: 3 days
NULL and doesn't point to a NULL pointer before dereferencing it. This
fixes a panic triggered by Xorg 7.3.
Reported and tested by: Bill Green
MFC after: 3 days
a pointer to struct bus_space. The structure contains function
pointers that do the actual bus space access.
The reason for this change is that previously all bus space
accesses were little endian (i.e. had an explicit byte-swap
for multi-byte accesses), because all busses on Macs are little
endian.
The upcoming support for Book E, and in particular the E500
core, requires support for big-endian busses because all
embedded peripherals are in the native byte-order.
With this change, there's no distinction between I/O port
space and memory mapped I/O. PowerPC doesn't have I/O port
space. Busses assign tags based on the byte-order only.
For that purpose, two global structures exist (bs_be_tag and
bs_le_tag), of which the address can be taken to get a valid
tag.
Obtained from: Juniper, Semihalf
is actually a circular log. Deal with it rolling around. Fortunately,
the log area is big and I haven't seen any roll over yet. Update and
get rid of the obsolete comment.
When system ticks are positive, for entries in the cache
bucket, syncache_timer() ran on every tick (doing nothing
useful) instead of the supposed 3, 6, 12, and 24 seconds
later (when it's time to retransmit SYN,ACK).
When ticks are negative, syncache_timer() was scheduled
for the too far future (up to ~25 days on systems with
HZ=1000), no SYN,ACK retransmits were attempted at all,
and syncache entries added in that period that correspond
to non-established connections stay there forever.
Only HEAD and RELENG_7 are affected.
Reviewed by: silby, kmacy (earlier version)
Submitted by: Maxim Dounin, ru
- Rename output routines tcp_gen_* -> tcp_output_*.
- Rename notification routines that turn in to no-ops in the absence of TOE
from tcp_gen_* -> tcp_offload_*.
- Fix some minor comment nits.
- Add a /* FALLTHROUGH */
Reviewed by: Sam Leffler, Robert Watson, and Mike Silbersack
link has been marked discarding by Spanning Tree. This would cause the bridge
to see duplicate packets to itself even if STP has correctly calculated the
topology and blocked redundant links.
Reported by: trasz
Tested by: trasz
MFC after: 3 days
administratively down (!IFF_UP)
- Use the same parameters to lagg_link_active() to get the backup port as in
the output path, this didnt actually matter in practice as sc_primary is
always the first on the port list.
MFC after: 3 days
would be properly disposed of, but the global label structure for the
semaphore wouldn't be freed.
MFC after: 3 days
Reported by: tanyong <tanyong at ercist dot iscas dot ac dot cn>,
zhouzhouyi
destroy call; this transpired because the inpcb alloc path for IPv4/IPv6
is the same code, but IPv6 has a separate free path. The results was
that as new IPv6 TCP connections were created, kernel memory would
gradually leak.
MFC after: 3 days
Reported by: tanyong <tanyong at ercist dot iscas dot ac dot cn>,
zhouzhouyi
and t3_push_frames).
- Import latest changes to cxgb_main.c and cxgb_sge.c from toestack p4 branch
- make driver local copy of tcp_subr.c and tcp_usrreq.c and override tcp_usrreqs so
TOE can also functions on versions with unmodified TCP
- add cxgb back to the build
- rename tcp_ofld.[ch] to tcp_offload.[ch]
- document usage and locking conventions of the functions in the
toe_usrreqs function vector
- document tcpcb, inpcb, and socket fields used by toe
- widen the listen interface into 2 functions
- rename DISABLE_TCP_OFFLOAD to TCP_OFFLOAD_DISABLE
- shrink conditional compilation to reduce the likelihood of bitrot
- replace sc->sc_toepcb checks in tcp_syncache.c with TOEPCB_ISSET
or any other bio chopping geom a reasonable size of work.
Check for delivered signals between chunks, because the request size
and service time is unbounded.
details from consumers.
- Track individual selecters on a per-descriptor basis such that there
are no longer collisions and after sleeping for events only those
descriptors which triggered events must be rescaned.
- Protect the selinfo (per descriptor) structure with a mtx pool mutex.
mtx pool mutexes were chosen to preserve api compatibility with
existing code which does nothing but bzero() to setup selinfo
structures.
- Use a per-thread wait channel rather than a global wait channel.
- Hide select implementation details in a seltd structure which is
opaque to the rest of the kernel.
- Provide a 'selsocket' interface for those kernel consumers who wish to
select on a socket when they have no fd so they no longer have to
be aware of select implementation details.
Tested by: kris
Reviewed on: arch
processors (it's the PowerPC Operating Environment Architecture).
AIM designates the processors made by the Apple-IBM-Motorola
alliance and those we typically support.
While here, remove the NetBSD option IPKDB. It's not an option
used by us. Also, PPC_HAVE_FPU is not used by us either. Remove
that too.
Obtained from: Juniper, Semihalf
the ABI when enabled. There is no longer an embedded lock_profile_object
in each lock. Instead a list of lock_profile_objects is kept per-thread
for each lock it may own. The cnt_hold statistic is now always 0 to
facilitate this.
- Support shared locking by tracking individual lock instances and
statistics in the per-thread per-instance lock_profile_object.
- Make the lock profiling hash table a per-cpu singly linked list with a
per-cpu static lock_prof allocator. This removes the need for an array
of spinlocks and reduces cache contention between cores.
- Use a seperate hash for spinlocks and other locks so that only a
critical_enter() is required and not a spinlock_enter() to modify the
per-cpu tables.
- Count time spent spinning in the lock statistics.
- Remove the LOCK_PROFILE_SHARED option as it is always supported now.
- Specifically drop and release the scheduler locks in both schedulers
since we track owners now.
In collaboration with: Kip Macy
Sponsored by: Nokia
cards:
o RocketRAID 172x series
o RocketRAID 174x series
o RocketRAID 2210
o RocketRAID 222x series
o RocketRAID 2240
o RocketRAID 230x series
o RocketRAID 231x series
o RocketRAID 232x series
o RocketRAID 2340
o RocketRAID 2522
Many thanks to Highpoint for their continued support of FreeBSD.
Submitted by: Highpoint
it's multi DAC / playback channels is not that good. Enabling vchans
make the bug more visible since playback allocation will look for
possible free hardware channels first (i.e: the next DAC, the very first
has been consumed by vchan mixer) which in this case has been proven faulty.
Tested by: Dominic Fandrey <LoN_Kamikaze at gmx dot de>
URL: http://lists.freebsd.org/pipermail/freebsd-stable/2007-December/039022.html
The HT1000 DMA engine seems to not always like 64K transfers and sometimes barfs data all over memory leading to instant chrash and burn.
Also fix 48bit adressing issues, apparently newer chips needs 16bit writes and not the usual fifo thing.
HW donated by: Travis Mikalson at TerraNovaNet
- make neccessary changes to release offload resources when a syncache
entry is removed before connection establishment
- disable checks for offloaded connection where insufficient information
is available
Reviewed by: silby
register (MacBooks only).
This allows MacBooks to boot in SMP mode without any trick and solves
the timer problems with HZ=1000.
MFC after: 1 week
Reviewed by: njl (mentor), jhb
Approved by: njl (mentor), jhb
Previous value 16 was too small for real LAC as temporal activity
spike cound easily overflow queue demanding tunnel disconnection due
to possible state inconsistency.
that favours true hardware channel, the first instance of recording
request will grab this channel (the first channel is being used as
vchan master). In many cases, it is not really work as intended and give
false impression of broken recording.
PR: kern/118546
MFC after: 3 days
7.2.3, bytes 0-3 and 5-15 are used to calculate the checksum of a descriptor
tag.
PR: kern/90521
Submitted by: Björn König <bkoenig@cs.tu-berlin.de>
Reviewed by: scottl
Approved by: emax (mentor)
header, then don't try to pullup anything, because there is no next
header if we hit IPPROTO_NONE. Set ulp to a non-NULL value so the
search for an upper layer header terinates.
This is based on Pekka's diagnosis, but I chose a simpler fix.
PR: 115261
Submitted by: Pekka Savola <pekkas@netcore.fi>
Reviewed by: mlaier
MFC after: 2 weeks
Ethernet Controller. Multicast filtering wasn't tested and needs more
expore. While I'm here change complex if statements with switch
statement which would improve readability.
Reported by: Abdullah Ibn Hamad Al-Marri < wearabnet AT yahoo DOT ca >
Tested by: Abdullah Ibn Hamad Al-Marri < wearabnet AT yahoo DOT ca >
pass back the desired buffer length. This fixes scanning with the Marvell
88W8335 and BCM4328 wireless cards.
PR: kern/118370
Submitted by: Weongyo Jeong
Tested by: Ed Schouten
the sent_queue. Sometimes I wonder why any code
ever works :-)
- Fix the pad of the last mbuf routine, It was working improperly
on non-4 byte aligned chunks which could cause memory overruns.
MFC after: 1 week
by Daniel Kamm.
Adaptec RAID 51245
Adaptec RAID 51645
Adaptec RAID 52445
Adaptec RAID 5405
Sun STK RAID REM
Sun STK RAID EM
SG-XPCIESAS-R-IN
SG-XPCIESAS-R-EX
XXX: This only works currently with GEOM_GPT which only exists in 6.x.
XXX: I didn't add 'mbroffset' support for a GPT partition holding a BSD
label as I'm not sure if they use relative or absolute offsets.
MFC after: 3 days
o Disklabels can have between 8 and 20 partitions (inclusive).
o No device special file is created for the raw partition.
o Switch ia64 to use this backend.
o No support for boot code yet.
- Missing lock when sending data and moving it to the
outqueue.
- If a mbuf alloc fails during moving to outqueue the
reassembly of the old mbuf chain was incorrect.
- some_taken becomes a counter in sctputil.c instead of a set to 1.
- Fix a panic to be only under invarients and have a proper recovery.
- msg_flags needed to be set.to the value collected not or'd.
MFC after: 1 week
initialized before use and returned integrally instead of up to size.
Submitted by: Ilja van Sprundel <ilja -at- netric.org>
Reviewed by: secteam
MFC after: 1 day
on 1/2 of each of the successive limits tied to the limit for
2k clusters.
- Adds real functionality in so that doing a sysctl to change these
actually changes them :-)
MFC after: 1 week
when applicable.
Aquire Giant slightly later for vnlru.
In the syncer, aquire the Giant only when a vnode belongs to the
non-MPsafe fs.
In both speedup_syncer() and syncer_shutdown(), remove the syncer thread from
the lbolt sleep queue after the syncer state is modified, not before.
Herded by: attilio
Tested by: Peter Holm
Reviewed by: ups
MFC after: 1 week
This makes updates mounts such as:
"mount -u -o rdonly" work more like, "mount -u -o ro".
References to "-o rdonly" were changed to "-o ro" in revision 1.60 of
the mount(8) man page,
but some people still like to use "-o rdonly" since it was documented
in earlier versions of FreeBSD.
Requested by: rwatson
MFC after: 1 week
within the jail are never freed. si_cred is only used by the MAC framework so
make the cred reference conditional on it being compiled in, this is not a fix
and will need to be reviewed for any new consumers of si_cred.
This will quell some user complaint when using jails with a default kernel.
Reviewed by: rwatson
MFC after: 3 days
INCLUDE_CONFIG_FILE. Make a user to look at what config(8) actually does,
and how can one fetch actual configuration file.
Reported by: many
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
is what gcc3 complains about.
Without this change, it's impossible to build the kernel with gcc3.
Tested by: cognet@ (mentor)
Approved by: cognet@ (mentor)
test incorrect.
- Fix the initial buf calculation to be more friendly, calc is the same
but we use different variable to make it easier amongst the different
code versions.
MFC after: 1 week
sending, once the locks are all unlocked to
do the copy's in, its possible that other
events could then raise the number of bytes
outstanding pushing it so not all the message
would fit. This would then cause us to send
only part of the message. This fix makes it
so we keep a "reserved" amount that can be
kept in mind when making calculations to send.
- rcv msg args with a NULL/NULL for to/tolen will return an error incorrectly
for the 1-2-1 model.
- We were not doing 0 len return correctly and not setting cantrcv more
correctly. Previouly we "fixed" this area by taking out the socantrcv
since we then could not get the data out. The correct rix is to still
flag the socket but alow a by-pass route to continue to read until
all data is consumed.
MFC after: 1 week
with insufficient protection mode.
For the i386 and amd64, create the tunable, machdep.prot_fault_translation,
with the following behaviour:
0 = autodetect the signal to be delivered on KERN_PROTECTION_FAILURE
from vm_fault based on the ELF OSABI note:
no note or __FreeBSD_version < 700004 - SIGBUS/BUS_PAGE_FAULT
note, and __FreeBSD_version >= 700004 - SIGSEGV/SEGV_ACCERR
1 = always SIGBUS/BUS_PAGE_FAULT
2 = always SIGSEGV/SEGV_ACCERR
This would do mostly automatic correction of ABI breakage, with the exception
of the untaged binaries for 7-CURRENT/RELENG_7 before the note is fixed. For
them, sysctl would allow to run the binary with manual settings.
Discussed with: portmgr (kris)
PR: kern/118304
MFC after: 3 days
dereferencing. Unaligned access could cause panic on strict alignment
architectures.
Reviewed by: marcel, marius (also tested on sparc64, thanks !)
MFC after: 3 days
Before this fix, FreeBSD would negotiate SACK on outgoing
connections, but would always fail to negotiate it on incoming
connections.
Discovered by: James Healy and Lawrence Stewart
Submitted by: James Healy and Lawrence Stewart
MFC after: 3 days
attached. Otherwise, the snp->snp_tty would be overwritten, while the
tty line discipline still set to the snpdisc. Then snplwrite() causes
panic because ttytosnp() cannot find the snp.
MFC after: 1 week
support its -k argument:
kern.proc.kstack - dump the kernel stack of a process, if debugging
is permitted.
This sysctl is present if either "options DDB" or "options STACK" is
compiled into the kernel. Having support for tracing the kernel
stacks of processes from user space makes it much easier to debug
(or understand) specific wmesg's while avoiding the need to enter
DDB in order to determine the path by which a process came to be
blocked on a particular wait channel or lock.
- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).
Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.
Update stack(9) man page.
Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)
We used to allocate the domains 0-14 for userland, and leave the domain 15
for the kernel. Now supersections requires the use of domain 0, so we
switched the kernel domain to 0, and use 1-15 for userland.
How it's done currently, the kernel domain could be allocated for a
userland process.
So switch back to the previous way we did things, set the first available
domain to 0, and just add 1 to get the real domain number in the struct pmap.
Reported by: Mark Tinguely <tinguely AT casselton DOT net>
MFC After: 3 days
1. A packet comes in that is to be forwarded
2. The destination of the packet is rewritten by some firewall code
3. The next link's MTU is too small
4. The packet has the DF bit set
Then the current code is such that instead of setting the next
link's MTU in the ICMP error, ip_next_mtu() is called and a guess
is sent as to which MTU is supposed to be tried next. This is because
in this case ip_forward() is called with srcrt set to 1. In that
case the ia pointer remains NULL but it is needed to get the MTU
of the interface the packet is to be sent out from.
Thus, we always set ia to the outgoing interface.
MFC after: 2 weeks
The RAS implementation would set the end address, then the start
address. These were used by the kernel to restart a RAS sequence if
it was interrupted. When the thread switching code ran, it would
check these values and adjust the PC and clear them if it did.
However, there's a small flaw in this scheme. Thread T1, sets the end
address and gets preempted. Thread T2 runs and also does a RAS
operation. This resets end to zero. Thread T1 now runs again and
sets start and then begins the RAS sequence, but is preempted before
the RAS sequence executes its last instruction. The kernel code that
would ordinarily restart the RAS sequence doesn't because the PC isn't
between start and 0, so the PC isn't set to the start of the sequence.
So when T1 is resumed again, it is at the wrong location for RAS to
produce the correct results. This causes the wrong results for the
atomic sequence.
The window for the first race is 3 instructions. The window for the
second race is 5-10 instructions depending on the atomic operation.
This makes this failure fairly rare and hard to reproduce.
Mutexs are implemented in libthr using atomic operations. When the
above race would occur, a lock could get stuck locked, causing many
downstream problems, as you might expect.
Also, make sure to reset the start and end address when doing a syscall, or
a malicious process could set them before doing a syscall.
Reviewed by: imp, ups (thanks guys)
Pointy hat to: cognet
MFC After: 3 days
its -f and -v arguments:
kern.proc.filedesc - dump file descriptor information for a process, if
debugging is permitted, including socket addresses, open flags, file
offsets, file paths, etc.
kern.proc.vmmap - dump virtual memory mapping information for a process,
if debugging is permitted, including layout and information on
underlying objects, such as the type of object and path.
These provide a superset of the information historically available
through the now-deprecated procfs(4), and are intended to be exported
in an ABI-robust form.
January 1, 1601. The 1601 - 1970 period was in seconds rather than 100ns
units.
Remove duplication by having NdisGetCurrentSystemTime call ntoskrnl_time.
linker interfaces for looking up function names and offsets from
instruction pointers. Create two variants of each call: one that is
"DDB-safe" and avoids locking in the linker, and one that is safe for
use in live kernels, by virtue of observing locking, and in particular
safe when kernel modules are being loaded and unloaded simultaneous to
their use. This will allow them to be used outside of debugging
contexts.
Modify two of three current stack(9) consumers to use the DDB-safe
interfaces, as they run in low-level debugging contexts, such as inside
lockmgr(9) and the kernel memory allocator.
Update man page.
sx driver), change a magic value in the PLX bridge chip. Apparently later
builds of the PCI cards had corrected values in the configuration eeprom.
This change supposedly fixes some pci bus problems.
information in support of DDB(4); these functions bypass normal linker
locking as they may run in contexts where locking is unsafe (such as the
kernel debugger).
Add a new interface linker_ddb_search_symbol_name(), which looks up a
symbol name and offset given an address, and also
linker_search_symbol_name() which does the same but *does* follow the
locking conventions of the linker.
Unlike existing functions, these functions place the name in a
caller-provided buffer, which is stable even after linker locks have been
released. These functions will be used in upcoming revisions to stack(9)
to support kernel stack trace generation in contexts as part of a live,
rather than suspended, kernel.
gets enabled when INVARIANTS is on instead of DIAGNOSTIC (which apparently
nobody uses). From Tor's description:
This happens when the block range spans two block maps, the first in the
inode (mapping up to NDADDR direct blocks) and the second being the first
indirect block. The current check assumes that both block maps are
indirect blocks.
Work done by: tegge
Tested by: kris, kensmith
in the tcp header. With relevant parts of the tcp header changing after
the 'signature' was computed, the signature becomes invalid.
Reviewed by: tools/regression/netinet/tcpconnect
MFC after: 3 days
Tested by: Nick Hilliard (see net@)
is required by the X.Org PCI domains code and additionally needs
a workaround for Hummingbird and Sabre bridges as these don't
allow their config headers to be read at any width, which is an
unusual behavior.
- In psycho(4) take advantage of DEFINE_CLASS_0 and use more
appropriate types for some softc members.
MFC after: 3 days
hack means you can get the units and flags to match up more easily with
serial consoles on machines with acpi tables that cause the com ports
to be probed in the wrong order (and hence get the wrong sio unit number).
This replaces the common alternative hack of editing the code to comment
out the acpi attachment. This could go away entirely when device wiring
patches are committed.
stomping on the units intended for the motherboard sio ports. This is
no real substitute for the not-yet-committed device wiring enhancements.
Code taken from sio's pci attachment.
allocation fails and pv entries are reclaimed, there may be an unused pv
entry in a pv chunk that survived the reclamation. However, previously,
after reclamation, get_pv_entry() did not look for an unused pv entry in
a surviving pv chunk; it simply retried the page allocation. Now, it
does look for an unused pv entry before retrying the page allocation.
Note: This only applies to RELENG_7. Earlier branches use a different
pv entry allocator.
MFC after: 6 weeks
Intel CPUs with family 0x6, model 0xE and later (i.e., Intel Core(TM))
have a PMC architecture that differs somewhat from previous CPUs in
family 0x6. Even though the basic programming model is similar, the
documented set of legal values that may be loaded into their PMC MSRs
differs from that of the previous PMCs in family 0x6 and reusing bit
values valid for the older PMCs could result in undefined behaviour in
the general case.
per-cpu area. cp_time[] goes away and a new function creates a merged
cp_time-like array for things like linprocfs, sysctl etc. The
atomic ops for updating cp_time[] in statclock go away, and the scope
of the thread lock is reduced.
sysctl kern.cp_time returns a backwards compatible cp_time[] array.
A new kern.cp_times sysctl returns the individual per-cpu stats.
I have pending changes to make top and vmstat optionally show per-cpu
stats.
I'm very aware that there are something like 5 or 6 other versions "out
there" for doing this - but none were handy when I needed them.
I did merge my changes with John Baldwin's, and ended up replacing a
few chunks of my stuff with his, and stealing some other code.
Reviewed by: jhb
Partly obtained from: jhb
since the branch caches on at least Athlon XP through Athlon 64 CPU's
don't understand such instructions and guarantee a cache miss taking
at least 10 cycles. Use the documented workaround "ret $0" instead
("nop; ret" also works, but "ret $0" is probably faster on old CPUs).
Normal code (even asm code) doesn't branch to "ret", since there is
usually some cleanup to do, but the __mcount, .mcount and .mexitcount
entry points were optimized too well to have the minimum number of
instructions (3 instructions each if profiling is not enabled) and
they did this. I didn't see a significant number of cache misses for
.mexitcount, but for the shared "ret" for __mcount and .mcount I
observed cache misses costing 26 cycles each. For a send(2) syscall
that makes about 70 function calls, the cost of these cache misses
alone increased the syscall time from about 4000 cycles to about 7000
cycles. 4000 is for a profiling (GUPROF) kernel with profiling disabled;
after this fix, configuring profiling only costs about 600 cycles in the
4000, which is consistent with almost perfect branch prediction in the
mcounting calls.
unused except to obfuscate disassemblies. -mprofiler-epilogue is
currently with gcc-4 (it does too little), but -finstrument-functions
is broken in a different way (it does too much).
amd64 version: meger whitespace fixes from i386 version.
Call uma_sel_align() there at well.
Set CPU_CONTROL_VECRELOC if we're using the high vectors page.
Submitted by: Rafal Jaworowski <raj AT semihalf DOT com>
MFC After: 1 week
bpf will see inner and outer headers or just inner or outer
headers for incoming and outgoing IPsec packets.
This is useful in bpf to not have over long lines for debugging
or selcting packets based on the inner headers.
It also properly defines the behavior of what the firewalls see.
Last but not least it gives you if_enc(4) for IPv6 as well.
[ As some auxiliary state was not available in the later
input path we save it in the tdbi. That way tcpdump can give a
consistent view of either of (authentic,confidential) for both
before and after states. ]
Discussed with: thompsa (2007-04-25, basic idea of unifying paths)
Reviewed by: thompsa, gnn
- On amd64, just assume type #1 is always used. PCI 2.0 mandated
deprecated type #2 and required type #1 for all future bridges which
was well before amd64 existed.
- For i386, ignore whatever value was in 0xcf8 before testing for type #1
and instead rely on the other tests to determine if type #1 works. Some
newer machines leave garbage in 0xcf8 during boot and as a result the
kernel doesn't find PCI at all (which greatly confuses ACPI which expects
PCI to exist when PCI busses are in the namespace).
MFC after: 3 days
Discussed with: scottl
ZFS porting style didn't extend this, instead using a heap of additional
header files that don't get installed.
My intention had been to allow OpenSolaris external code to build on
FreeBSD out of the box (i.e. without a src tree).
Make clear that this is not a good idea when called from
tcp_output()->ipsec_hdrsiz_tcp()->ipsec4_hdrsize_tcp()
as we do not know if IPsec processing is needed at that point.
T_DIRECT filtering so that disk drives can be attached via the
pass driver. Add CAM locking. Don't mark CAM commands as SG64
since the hardware isn't designed to deal with 64-bit passthru
commands. Hopefully the bounce buffer changes that were done
for the management/ioctl interface are robust enough to handle
this deficiency for CAM as well.
- Enable pcbeep control for Acer + ALC268 (nid 29). Give enough (fake)
hints so the parser will grab it and allocate "speaker" control.
- Fix regression while preparing DAC and ADC for multichannel
format. Since playback policy is to output to every possible path,
ensure that each DAC is started.
Reported / Tested by: Guy Brand
Currently, Giant is not too much contented so that it is ok to treact it
like any other mutexes.
Please don't forget to update your own custom config kernel files.
Approved by: cognet, marcel (maintainers of arches where option is
not enabled at the moment)
of some old programs. Since sigval is union type, this change will not have
binary compatibility problem.
MFC: after 3 days
Discussed with: rwatson, glebius
It should just contain the value we want to add, as if we're interrupted
between the add and the str, we will restart from the beginning. Just use
a register we can scratch instead.
MFC After: 1 week
routine. It is not needed as the existing tests for segment coalescing
already handle bounced addresses and it prevents legal segment coalescing
in certain edge cases.
MFC after: 1 week
Reviewed by: scottl
The call should happen with the driver lock held. We don't hold the driver
lock in newstate as it's a separate thread where we can't sleep (and we only
call wpi_cmd in async mode).
Discovered By: Attillo's callout rework
Approved By: mlaier (comentor)
currently, before to spin the turnstile spinlock is acquired and the
waiters flag is set.
This is not strictly necessary, so just spin before to acquire the
spinlock and to set the flags.
This will simplify a lot other functions too, as now we have the waiters
flag set only if there are actually waiters.
This should make wakeup/sleeping couplet faster under intensive mutex
workload.
This also fixes a bug in rw_try_upgrade() in the adaptive case, where
turnstile_lookup() will recurse on the ts_lock lock that will never be
really released [1].
[1] Reported by: jeff with Nokia help
Tested by: pho, kris (earlier, bugged version of rwlock part)
Discussed with: jhb [2], jeff
MFC after: 1 week
[2] John had a similar patch about 6.x and/or 7.x about mutexes probabilly
sends frames up the stack after changing the current channel then
the lookup by ieee channel number may fail leaving a null ptr in
se_chan; if this happens fallback to the channel recorded when the
frame is processed (curchan). Since the frame doesn't contribute
to scan results for the sta this is acceptable.
Reviewed by: thompsa
MFC after: 3 days
1837014 Kernel panics after authentication of an outgoing packet
1836992 Potential bugs in packet auth code (w/patches)
1836967 Kernel panic when using auth rule with keep state
and another reported only to FreeBSD by Andiry (see PR)
PR: kern/118251
Submitted by: Andriy Syrovenko <andriys@gmail.com>
Reviewed by: darrenr
MFC after: 5 days
cast as uint32_t which is defined as unsigned int. gcc doesn't want to
consider that there might not be much difference between an int and
a long on a 32 bit architecture.
vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better
handle a lock-ordering problem. Consequently, trylock's failure on the
page's containing object no longer implies that the page cannot be
laundered.
MFC after: 6 weeks
This has the benefit that rmlocks have proper support for reader recursion
(in contrast to rwlock(9) which could potential lead to writer stravation).
It also means a significant performance gain, eventhough only visible in
microbenchmarks at the moment.
Discussed on: -arch, -net
malloc_type_allocated(..., 0) calls that occur when contigmalloc() has
failed. Eliminate the acquisition and release of the page queues lock
from vm_page_release_contig(). Rename contigmalloc2() to
contigmapping(), reflecting what it does.
as up if at least one of its ports also has a link up. This fixes using
carp+lagg together and any other system that relies on linkstate events.
PR: kern/113956
MFC after: 3 days
the inpcb when there's an inpcb without associated timewait state, and
not unlocking when the inpcb has been freed. This avoids a kernel panic
when tcpdrop(8) is run on a socket in the TIMEWAIT state.
MFC after: 3 days
Reported by: Rako <rako29 at gmail dot com>
should never be moved by one lock to another.
As, luckily, nothing in our tree is using it, axe the function.
This breaks lockmgr KPI, so interested, third-party modules should update
their source code with appropriate replacement.
Ok'ed by: ups, rwatson
MFC after: 3 days
comments from vnode_pager_setsize(). This call was introduced in
revision 1.140 to address a problem that no longer exists.
Specifically, pmap_zero_page_area() has replaced a (possibly)
problematic implementation of page zeroing that was based on
vm_pager_map(), bzero(), and vm_pager_unmap().
while the global callout spinlock is not held, and can lead to PF#.
Reported by: dougb, Mark Atkinson <atkin901 at yahoo dot com>
Tested by: dougb
Diagnosed by: jhb
The lookup hurts a bit for connections but had been there anyway
if IPSEC was compiled in. So moving the lookup up a bit gives us
TSO support at not extra cost.
PR: kern/115586
Tested by: gallatin
Discussed with: kmacy
MFC after: 2 months
a good job of it) in the copypktopts() function, just call ip6_clearpktopts()
directly. Otherwise, the callers of this function would end up freeing the
memory twice.
Reviewed by: jinmei
PR: kern/116360
o Acer Aspire 4520 laptop
- jack sensing / automute
o Toshiba Satellite A135-S4527 laptop
- jack sensing / automute
Tested by: lioux
o Apple Macbook 3 (is it?)
- require gpio0 (for speakers) and ovref50 (for headphone)
to make it works
- jack sensing / automute
Tested by: Ed Schouten
* Add Nvidia MCP67 controller ids.
* Be sensible about simmilar controller with multiple pci ids.
* Connect unused DAC/ADC to stream#0 rather than forcing each of them
managing their own stream.
MFC after: 3 days
include the ithread scheduling step. Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled. Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state. The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler. Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example. This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported. Many thanks to Sam
Lefler for getting to the bottom of this problem.
Reviewed by: jhb, jeff, silby
- Bring HEAD up to the latest shared code
- Fix TSO problem using limited MSS and forwarding
- Dual lock implementation
- New device support
- For my ease, this code can compile in either 6.x or later
- brings this driver in sync with the 6.3
prepend a data mbuf in front of a header mbuf without moving the header
to the new mbuf, and (2) a possible alignment problem on architectures
with strict alignment as reported in kern/4184.
PR: kern/4184 (1)
addresses as the source of an AARP request. While this PR was submitted
in the context of work in OpenBSD to port netatalk (in 1997), I've
synchronized the code more to our ARP input routine, which had similar
requirements.
Submitted by: Denton Gentry
PR: kern/4184
MFC after: 1 week
only at address 0 which is supposed to be the only valid phy address
on Marvell PHY. The more correct solution would be masking PHY
address ranges allowable in PHY probe routine. Unfortunately,
FreeBSD has no way to retrict the PHY address ranges or to pass special
flags to PHY driver.
This change assumes that PHY hardwares attached to msk(4) would be
Marvell made 88E11xx PHY.
With this changes the phantom phys attached on 88E8036(Yukon FE)
should disappear.
Reported by: Oleg Lomaka < oleg AT lomaka DOT org DOT ua >
Tested by: Oleg Lomaka < oleg AT lomaka DOT org DOT ua >
only 4KB SRAM.
o Rework setting Tx/Rx RAM buffer size. Give receiver 2/3 of memory
and round it down to the multiple of 1024. The RAM buffer size of
Yukon II should be multiple of 1024. This fixes bogus RAM buffer
configuration used in Yukon FE.
Reported by: Oleg Lomaka < oleg AT lomaka DOT org DOT ua >
Tested by: Oleg Lomaka < oleg AT lomaka DOT org DOT ua >
timestamps in the initial SYN packet actually use them in the rest of the
connection. Unfortunately, during the 7.0 testing cycle users have already
found network devices that violate this constraint.
RFC 1323 states 'and may send a TSopt in other segments' rather than
'and MUST send', so we must allow it.
Discovered by: Rob Zietlow
Tracked down by: Kip Macy
PR: bin/118005
publicly available datasheet for Yukon II and don't know what
bug/workaround exist for the specific hardware revision. Also I don't
think the vendor will release hardware errata in near future.
The hardware feature lists were not used at all except setting water
mark registers. Since msk(4) should know exact chip model/revision
number to decide which hardware capability could be used the extra
feature lists were redundant.
o Enable jumbo frame support for EC Ultra and disable jumbo frame
for FE.
o Enable store and forward mode for standard MTU sized frame.
o Enable TSO for EC Ultra. However TSO/checksum offload is disabled
for jumbo frame case. Because EC Ultra can't use store and forward
mode for jumbo frame TSO/checksum offload is not available.
o Adjust Tx GMAC almost empty threshold value and add a jumbo frame
water mark. The maic value was obtained from Marvell's sk98lin
driver.
o Fix EC Ultra chip revision number.
rwlocks in conjuction with callouts. The function does basically what
callout_init_mtx() alredy does with the difference of using a rwlock
as extra argument.
CALLOUT_SHAREDLOCK flag can be used, now, in order to acquire the lock only
in read mode when running the callout handler. It has no effects when used
in conjuction with mtx.
In order to implement this, underlying callout functions have been made
completely lock type-unaware, so accordingly with this, sysctl
debug.to_avg_mtxcalls is now changed in the generic
debug.to_avg_lockcalls.
Note: currently the allowed lock classes are mutexes and rwlocks because
callout handlers run in softclock swi, so they cannot sleep and they
cannot acquire sleepable locks like sx or lockmgr.
Requested by: kmacy, pjd, rwatson
Reviewed by: jhb
Revert the probe in atapi-cd.c to the old usage now its fixed on AHCI.
THis change also fixes using virtual CD's om fx parallels.
Still leaves the GEOM problem of telling media vs device access apart in the access function.
server-side RPC retranmission cache for non-idempotent operations: these
hacks substituted 0 (success) for the expected EEXIST in the event that
a target name already existed for LINK, SYMLINK, and MKDIR operations,
under the assumption that EEXIST represented a second application of the
original RPC rather than a true failure.
Background: certain NFS operations (in this case, LINK, SYMLINK, and
MKDIR) are not idempotent, as they leave behind persisting state on the
server that prevents them from being replayed without an error;if an UDP
RPC reply is lost leading to a retransmission by theclient, the second
reply will return EEXIST rather than success, asthe new object has
already been created. The NFS client previouslysilently mapped the
EEXIST return into success to paper over thisproblem.
However, in all modern NFS server implementations, a reply cache is kept
in order to retransmit the original reply to a retransmitted request,
rather than performing the operation a second time, allowing this hack
to be avoided. This allows link()-based filelocking over NFS to operate
correctly, as an application requestingthe creation of a new link for a
file to tell if it succeededatomically or not.
Other NFS clients, including Solaris and Linux, generally follow this
behavior for the same reasons. Most clients also now default to TCP,
which also helps avoid the issue of retransmitted but non-idempotent
requests in most cases.
Reported by: Adam McDougall <mcdouga9 at egr dot msu dot edu>,
Timo Sirainen <tss at iki dot fi>
Reviewed by: mohans
MFC after: 1 week
o buffered write, for chunks smaller than PIPE_MINDIRECT bytes
o direct write, for everything else
A call to writev(2) may receive struct iov of various size and the
kernel may have to switch from one solution to the other. Before doing
this, it must wake reader processes and any select/poll/kqueue up.
This commit fixes a bug where select/poll/kqueue are not triggered
when switching from buffered write to direct write. It adds calls to
pipeselwakeup().
I give more details on freebsd-arch@:
http://lists.freebsd.org/pipermail/freebsd-arch/2007-September/006790.html
This should fix issues with Erlang (lang/erlang) and kqueue.
Reported by: Rickard Green (Erlang)
time ago (2002 according to the gcc log). Using the proper name
fixes a warning in src/lib/libc/gen/ulimit.c about the second
argument of va_start() not being the last named (when it really
was).
This has the following benefits:
- allows to use the AT keyboard maps in share/syscons/keymaps with
sunkbd(4),
- allows to use kbdmux(4) with sunkbd(4),
- allows Sun RS232 keyboards to be configured and used the same
way as Sun USB keyboards driven by ukbd(4) (which also does AT
keyboard emulation) with X.Org, putting an end to the problem
of native support for the former in X.Org being broken over and
over again.
MFC after: 3 days
an unified way for all the lock primitives to express lock assertions.
Currenty, lockmgrs and rmlocks don't have assertions, so just panic in
that case.
This will be a base for more callout improvements.
Ok'ed by: jhb, jeff
strees2 suite, to quote his letter, this change:
1. It removes the tn_lookup_dirent stuff. I think this cannot be fixed,
because nothing protects vnode/tmpfs node between lookup is done, and
actual operation is performed, in the case the vnode lock is dropped.
At least, this is the case with the from vnode for rename.
For now, we do the linear lookup in the parent node. This has its own
drawbacks. Not mentioning speed (that could be fixed by using hash), the
real problem is the situation where several hardlinks exist in the dvp.
But, I think this is fixable.
2. The patch restores the VV_ROOT flag on the root vnode after it became
reclaimed and allocated again. This fixes MPASS assertion at the start
of the tmpfs_lookup() reported by many.
Submitted by: kib
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().
Reviewed by: tegge
MFC after: 6 weeks
If it is set to zero value (default) dummynet module will try to emulate
real link as close as possible (bandwidth & latency): packet will not leave
pipe faster than it should be on real link with given bandwidth.
(This is original behaviour of dummynet which was altered in previous commit)
If it is set to non-zero value only bandwidth is enforced: packet's latency
can be lower comparing to real link with given bandwidth.
- Document recently introduced dummynet(4) sysctl variables.
Requested by: luigi, julian
MFC after: 3 month
with ACCESSPERMS. Document in mount_ntfs(8) only the nine
low-order bits of mask are used (taken from mount_msdosfs(8)).
PR: kern/114856
Submitted by: Ighighi
MFC after: 1 month
In case attach fails because of the priv check we leaked the
memory and left so_pcb as fodder for invariants.
Reported by: Pawel Worach
Reviewed by: rwatson
- Implement timing out of VPD register access.[1]
- Fix an off-by-one error of freeing malloc'd space when checksum is invalid.
- Fix style(9) bugs, i.e., sizeof cannot be followed by space.
- Retire now obsolete 'hw.pci.enable_vpd' tunable.
Submitted by: cokane (initial revision)[1]
Reviewed by: marius (intermediate revision)
Silence from: jhb, jmg, rwatson
Tested by: cokane, jkim
MFC after: 3 days
The register layout is little different from memory-mapped stats
in the previous generation chips. In fact, it is bad because
registers in this range are cleared after reading them.
Reviewed by: scottl
MFC after: 3 days
- Trying to eliminate another racing by replacing the timeout(9) with
callout APIs. In addition to that, the callout_drain() in an_detach()
help us to avoid a possible panic-on-free due to the callout API tries
to lock a destroyed mutex.
- In an_stats_update(), check the return value of an_read_record(). This
should reduce the chance of device removal(PCCARD) panic [2].
- Adding a comment to state the fact that an_stats_update() is now called
via callout(9) with a lock held [2].
Submitted by: jhb [1], ambrisko [2]
Reviewed by: jhb, ambrisko
Reported by: dhw
Tested by: dhw
MFC after: 3 days
priorities of the technologies supported by 802.3 Selector Field
value.
1000BASE-T full duplex
1000BASE-T
100BASE-T2 full duplex
100BASE-TX full duplex
100BASE-T2
100BASE-T4
100BASE-TX
10BASE-T full duplex
10BAST-T
However PHY drivers didn't honor the order such that 100BASE-T4 had
higher priority than 100BASE-TX full duplex. Fix that long standing
bugs such that have PHY drivers choose the highest common denominator
ability.
Fix a bug in dcphy which inadvertently aceepts 100BASE-T4.
PR: 92599
- Populate the register values for the trapframe put on the stack by the
double fault handler.
- Teach DDB's trace routine to treat a double fault like other trap frames.
MFC after: 3 days
process_fini, thread_ctor, thread_dtor, thread_init, thread_fini. This
will allow us to extend dynamically areas in proc/thread for dtrace ;-)
Reviewed by: rwatson
- process_ctor,dtor, init and fini
- thread_ctor,dtor, init and fini
This allows the ability to add on additional things
during construction/destruction of threads and processes.
Reviewed by: rwatson
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().
i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().
PR: ia64/118024
removing some copy&pasted code.
- Reduce copy and paste in ng_apply_item().
- Resurrect ng_send_fn() as a valid symbol, not a define.
Reviewed by: mav, julian
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.
sys/dev/acpica/acpi.c rev 1.196 a while ago:
Grab Giant around calls to DEVICE_SUSPEND/RESUME in
acpi_SetSleepState().
If we are resuming non-MPSAFE drivers, they need Giant held for them.
This may fix some obscure suspend/resume problems. It has fixed keyrate
setting problems that were triggered by cardbus (MPSAFE) changing the
ordering for syscons resume (non-MPSAFE). Also, add some asserts that
Giant is held in our suspend/resume and shutdown methods.
Submitted by: Marko Zec
amd64 mechanism over. Instead of page table hackery that isn't
actually needed, just use 'struct pcpu __pcpu[MAXCPU]' for backing like
all the other platforms do. Get rid of 'struct privatespace' and a
while mess of #ifdef SMP garbage that set it up. As a bonus, this
returns the 4MB of KVA that we stole to implement it the old way.
This also allows you to read the pcpu data for each cpu when reading a
minidump.
Background information: Originally, pcpu stuff was implemented as having
per-cpu page tables and magic to make different data structures appear
at the same actual address. In order to share page tables, we switched
to using the GDT and %fs/%gs to access it. But we still did the evil
magic to set it up for the old way. The "idle stacks" are not used
for the idle process anymore and are just used for a few functions during
bootup, then ignored. (excercise for reader: free these afterwards).
that the driver will handle WEP encryption. However, this does not seem to be
implemented by this driver (or maybe the chipset doesn't support it?)
Removing the flag makes my wpi card work using wpa_supplicant(8) on a
network with 802.1x security (without this change it authenticated fine, but
tcpdump only saw garbage packets)
Reviewed by: benjsc, imp (mentor)
Approved by: imp (mentor), sam
frequency from OpenFirmware moved out and into a routine that is called
from cpu_startup().
This allows correct reporting of the CPU clockspeed when printing out
CPU information at boot time.
Reported by: numerous
Reviewed by: marcel
MFC after: 1 day
Enhanced Disk Drive Specification Ver 3.0 defines that the version
of extension in AH would be 30h.
Correct the check for that to be >=30h instead of >3h.
MFC after: 2 months
from messing with the spdb and sadb.
Problem sneaked in with the fast_ipsec+v6->ipsec merger by no
longer going via raw_usrreqs.pr_attach.
Reported by: Pawel Worach
Identified by: rwatson
Reviewed by: rwatson
MFC after: 3 days
bumped to 800004 to note the change though userland apps should not be
affected since they use <sys/agpio.h> rather than the headers in
sys/dev/agp.
Discussed with: anholt
Repocopy by: simon
and update the rx code to handle multiple frames in a single usb
transfer. AX772 parts (at least) exhibit many input errors when
operated with a 2K rx buffer and no errors w/ a 4K rx buffer (it's
unclear what the cause of the errors is for 2K so this may just be
covering up the real issue). Larger rx buffer sizes show no
significant performance improvement for AX772. Bypassing the common
buffer management routines also eliminates an extra context switch
on every packet which noticeably improves performance (TCP netperf
rx goes from 45 Mb/s to 85 MB/s).
Submitted by: "J.R. Oldroyd" <fbsd@opal.com>
Reviewed by: imp
Obtained from: openbsd (partly)
MFC after: 3 weeks
The reliability of it's multi DAC / playback channels is
not that good. Enabling vchans make the bug more visible
since playback allocation will look for possible free
hardware channels first (i.e: the next DAC, the very first
has been consumed by vchan mixer) which in this case has
been proven faulty.
Reported / Tested by: Sascha Klauder
MFC after: 3 days
This includes:
o mtree (for legal/intel_wpi)
o manpage for i386/amd64 archs
o module for i386/amd64 archs
o NOTES for i386/amd64 archs
Approved by: mlaier (comentor)
proc_rwmen.
Otherwise copy on write may create an anonymous page that is
not marked as dirty. Since writing data to these pages
in this function also does not dirty these pages they may be
later discarded by the pagedaemon.
- Use unit2minor() and minor2unit() to generate minor numbers to support
unit numbers higher than 255.
- Use simple string operations on the 'names' array rather than hard-coded
constants and switch statements so that more ptys can be added by simply
expanding the 'names' array.
MFC after: 1 week
lock optimized for almost exclusive reader access. (see also rmlock.9)
TODO:
Convert to per cpu variables linkerset as soon as it is available.
Optimize UP (single processor) case.
- Patch registers CR47 and CR157 on devices that require it.
- Fix power calibration setting on ZD1211B.
Obtained from: OpenBSD
- Fix multicast transfer by properly reprogram multicast global
hash table, which in turns fix promiscuous mode and IPv6
autoconfiguration / local networking.
Reviewed by: sam, Weongyo Jeong
Tested using: Aztech WL230 , Belkin F5D7050, Unicorn WL-54G,
3COM 3CRUSB10075
MFC after: 1 week
I've tried to move md(4) to use geom_disk class, like real disks do, but
this requires major rework of some of the existing features such as
configuration dumping for example. Therefore just putting devstat support
directly into md(4) seems to be optimal solution.
Now you can see md(4) stats in `systat -vm' again.
MFC after: 2 weeks
present on the MacBook, MacBook Pro, and Intel MacMini.
This driver exports information via sysctl in its private sysctl tree
dev.asmc.*. You can get information about temperatures, fan speeds, the
keyboard light sensor and the Sudden Motion Sensor (SMS).
The SMS is very useful to park the disk heads when the laptop is
moved. Basically, the SMS is setup so that, under movement, we get an
interrupt on irq 6 and a devd notification is sent.
Sponsored by: Google Summer of Code 2007
Approved by: njl (mentor)
Reviewed by: attilio (previous version, but very similar), jhb (interrupt
specific review)
LINUX_SIOCGIFCOUNT just returns 0 since it is not implemented in the
Linux 2.6.16.
LINUX_SIOCGIFINDEX/LINUX_SIOGIFINDEX are mapped to the FreeBSD native
SIOCGIFINDEX.
Tested by: Peter Kostouros <kpeter@melbpc.org.au>
Reviewed by: brooks, rpaulo (on net@)
Submitted by: rdivacky
MFC after: 1 week
2) Alter packet flow inside dummynet: allow certain packets to bypass
dummynet scheduler. Benefits are:
- lower latency: if packet flow does not exceed pipe bandwidth, packets
will not be (up to tick) delayed (due to dummynet's scheduler granularity).
- lower overhead: if packet avoids dummynet scheduler it shouldn't reenter ip
stack later. Such packets can be fastforwarded.
- recursion (which can lead to kernel stack exhaution) eliminated. This fix
long existed panic, which can be triggered this way:
kldload dummynet
sysctl net.inet.ip.fw.one_pass=0
ipfw pipe 1 config bw 0
for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done
ping -c 1 localhost
3) Three new sysctl nodes are added:
net.inet.ip.dummynet.io_pkt - packets passed to dummynet
net.inet.ip.dummynet.io_pkt_fast - packets avoided dummynet scheduler
net.inet.ip.dummynet.io_pkt_drop - packets dropped by dummynet
P.S. Above comments are true only for layer 3 packets. Layer 2 packet flow
is not changed yet.
MFC after: 3 month
while other variants have inorder ethernet address for the same
chipset. Override ethernet address ordering if we already know how
it was stored. This fixes the use of inversed ethernet address on
MCP67.
Submitted by: ariff
MFC after: 3 days
Allocate space in keyboard state structure instead to prevent random byte
from possibly overwritten stack location frombeing shoved into USB device
when transfer actually takes place.
This fixes at least one instance of LEDs not working with USB keyboards.
characters (mostly "&"). Because top(1) shows only first six characters of
wait channel, without this change we saw only one meaningful character.
Requested by: kris & others
MFC after: 1 week
must be globally performed before calling any of the TLB invalidation
functions.
With one exception, on amd64, this requirement was already met. Fix this
one case. Also, as a clarification, change an existing atomic op into a
release. (Suggested by: jhb)
Reported and reviewed by: ups
MFC after: 3 days
o do not override the home channel recorded for the sta when the frame is
received off-channel; this fixes a problem where we might think the sta
was operating on the channel the frame was received on causing association
requests to be ignored/rejected (likely cause of kern/99036)
o don't include rssi of off-channel frames in the avg rssi used to select
a bss; this gives us a better estimate of the signal we will see for the
station when on-channel
PR: kern/99036
Found by: Yubin Gong
Reviewed by: sephe
MFC after: 1 week
This import includes:
o wpi Wireless driver for the Intel 3945 Wireless Lan Controller (802.11abg) (sys/dev/wpi)
o Intel firmware revision 2.14.4 & associated LICENSE (sys/dev/contrib/wpi, sys/contrib/dev/wpi/LICENSE)
o wpifw Firmware driver (sys/modules/wpifw)
Approved by: mlaier, sam (co-mentors)
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.
As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.
The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).
In collaboration with: Peter Holm
Reviewed by: jhb
default object rather than cache it was to have
vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is
no cached page in object at pindex. This allows to avoid explicit
checks for cached pages in vm_object_backing_scan().
For now, we need the same bandaid for the swap object, otherwise both
the vm_page_lookup() and the pager can report that there is no page at
offset, while page is stored in the cache. Also, this fixes another
instance of the KASSERT("object type is incompatible") failure in the
vm_page_cache_transfer().
Reported and tested by: Peter Holm
Reviewed by: alc
MFC after: 3 days
interface. Once the limit is reached packets with unknown source addresses are
dropped until an existing host cache entry expires or is removed. Useful to
use with the STICKY cache option.
Sponsored by: miniSuperHappyDevHouse NZ
reset problem when we reboot the system with the zyd device inserted.
Submitted by: Weongyo Jeong
Reported by: Ted Lindgreen (ted@tednet.nl)
MFC after: 1 week
it's been printing out scary messages about "Unhanded Event Notify Frame"
that are needlessly worrisome to users. Change this warning to only print
out at an elevated debugging level.
warnings. Specifically, whenever vm_page_alloc(9) returned NULL to
get_pv_entry(), we issued a warning regardless of the number of pv
entries in use. (Note: The older pv entry allocator in RELENG_6 does
not have this problem.)
Reported by: Jeremy Chadwick
Eliminate the direct call to pagedaemon_wakeup() by get_pv_entry().
This was a holdover from earlier times when the page daemon was
responsible for the reclamation of pv entries.
MFC after: 5 days
Put in a little comment explaining why it went away.
Re-enable it in the case there an exisiting process is just splitting
off its address space and file descriptors.
(I donpt think anything uses that code but it needs some sort of locking
and this does the job.
Reviewed by: Davidxu, alc, others
MFC after: 3 days
CPUs to make sure idle threads are evicted from the softc before returning
from acpi_cpu_shutdown(). However, this is unnecessary since stop_cpus()
handles this for itself and at this point it's possible that our IPI will be
blocked (interrupts disabled).
Thanks to: Glen Leeder <glen.leeder / nokia.com>
MFC after: 3 days
don't do this right; instead go to the scan cache so we pass through
auth state (if the cache is warm we can do this w/o an actual scan)
MFC after: 1 week
(BIO_WRITE and BIO_FLUSH) as it is done is Solaris. The difference is
that Solaris calls it only for sync requests, but we can't say in GEOM
is the request is sync or async, so we do it for every request.
MFC after: 1 week
to change the freq before the other CPUs are active. The current code
always attempts to change all CPUs to match each other, and the requisite
sched_bind() call won't work before APs are launched.
/dev/agpgart and agp_free_res() frees resources like the BAR for the
aperture. Splitting this up lets chipset-specific detach routines
manipulate the aperture during their detach routines without panicing.
MFC after: 1 week
Reviewed by: anholt
* Do not hold any locks over calls to copyin/copyout.
* Clean up some #ifdefs
* fix a possible mbuf leak when NAT fails on policy routed packets
PR: 117216
- Select a tag gains ability to optionally save new tags
off in the timewait system.
- When looking up associations do not give back a stcb that
is in the about-to-be-freed state, and instead continue
looking for other candiates.
- New function to query to see if value is in time-wait.
- Timewait had a time comparison error that caused very
few vtags to actually stay in time-wait.
- When setting tags in time-wait, we now use the time
requested NOT a fixed constant value.
- sstat now gets the proper associd when we do the query.
- When we process an association, we expect the tag chosen
(if we have one from a cookie) to be in time-wait. Before
we would NOT allow the assoc up by checking if its good.
In theory this should have caused almost all assoc not
to come up except for the time-comparison bug above (this
bug was hidden by the time comparison bug :-D).
- Don't save tags for nonce values in the time-wait cache
since these are used only during cookie collisions and do
not matter if they are unique or not.
MFC after: 1 week
set this flag and it was more or less just copied and pasted from
another FreeBSD driver while porting this driver from NetBSD, whose
gentbi(4) doesn't set MIIF_NOISOLATE either.
- Fix spelling in a comment.
OK'ed by: yongari
MFC after: 3 months
zero (0). Actual RFCOMM channel will be assigned after listen(2)
call is done on a RFCOMM socket bound to a ''wildcard'' RFCOMM
channel zero (0).
Address locking issues in ng_btsocket_rfcomm_bind()
Submitted by: Heiko Wundram (Beenic) < wundram at beenic dot net >
MFC after: 1 week
- Remove AU_.* hard-coded audit class constants, as udit classes are now
entirely dynamically configured using /etc/security/audit_class.
Obtained from: TrustedBSD Project
supports the removal of hard-coded audit class constants in OpenBSM
1.0. All audit classes are now dynamically configured via the
audit_class database.
Obtained from: TrustedBSD Project
changes:
01 - Enhanced LRO:
LRO feature is extended to support multi-buffer mode. Previously,
Ethernet frames received in contiguous buffers were offloaded.
Now, frames received in multiple non-contiguous buffers can be
offloaded, as well. The driver now supports LRO for jumbo frames.
02 - Locks Optimization:
The driver code was re-organized to limit the use of locks.
Moreover, lock contention was reduced by replacing wait locks
with try locks.
03 - Code Optimization:
The driver code was re-factored to eliminate some memcpy
operations. Fast path loops were optimized.
04 - Tag Creations:
Physical Buffer Tags are now optimized based upon frame size.
For better performance, Physical Memory Maps are now re-used.
05 - Configuration:
Features such as TSO, LRO, and Interrupt Mode can be configured
either at load or at run time. Rx buffer mode (mode 1 or mode 2)
can be configured at load time through kenv.
06 - Driver Statistics:
Run time statistics are enhanced to provide better visibility
into the driver performance.
07 - Bug Fixes:
The driver contains fixes for the problems discovered and
reported since last submission.
08 - MSI support:
Added Message Signaled Interrupt feature which currently uses 1
message.
09 Removed feature:
Rx 3 buffer mode feature has been removed. Driver now supports 1,
2 and 5 buffer modes of which 2 and 5 buffer modes can be used
for header separation.
10 Compiler warning:
Fixed compiler warning when compiled for 32 bit system.
11 Copyright notice:
Source files are updated with the proper copyright notice.
MFC after: 3 days
Submitted by: Alicia Pena <Alicia dot Pena at neterion dot com>,
Muhammad Shafiq <Muhammad dot Shafiq at neterion dot com>
made by Michael Eisele and the patch was slightly modified by me.
With this change several NVIDIA ethernet controllers(e.g. MCP61)
works.
RTL8211B(L) is RealTek's new gigabit PHY. The PHY has several
features including crossover correction, polarity correction as
well as supporting triple speed(10/100/1000bps). Data transfer
between MAC and PHY is via RGMII for 1000baseT, MII for
10baseT/100baseTX.
Unfortunately, RealTek used the same model number for RTL8211B(L)
PHY so there is no way to discriminate between RTL8211B(L) and its
predecessors. ATM RTL8211B uses revision number 2 so checking the
revision number seems to be only way to identify it.
Obtained from: Michael Eisele [1]
Tested by: clemens fischer < ino-qc AT spotteswoode DOT de DOT eu DOT org >
mii_anegticks to MII_ANEGTICKS_GIGE and use it. Previously it used
to MII_ANEGTICKS which may not be enough to wait before retrying
autonegotiation process at 1000bps.
o Reset autonegotation timer if media option is not IFM_AUTO or we
got a valid link.
o Announce link loss right after it happends.
o Autonegiation is retried every mii_anegticks seconds.
o Report link state changes right after setting autonegotiation.
Blade 1500/SX1500 boards have inherited the firmware bug of the
AX1105 mainboards to not include an interrupt map entry for the
parallel port controller (for the AX1105 the heuristic code for
E450s probably erroneously kicks in and guesses an interrupt).
- Take advantage of bus_generic_setup_intr(9).
- Fix some whitespace bugs.
entry point, which is no longer required now that we don't support
old-style multicast tunnels. This removes the last mbuf object class
entry point that isn't init/copy/destroy.
Obtained from: TrustedBSD Project
Framework by moving from mac_mbuf_create_netlayer() to more specific
entry points for specific network services:
- mac_netinet_firewall_reply() to be used when replying to in-bound TCP
segments in pf and ipfw (etc).
- Rename mac_netinet_icmp_reply() to mac_netinet_icmp_replyinplace() and
add mac_netinet_icmp_reply(), reflecting that in some cases we overwrite
a label in place, but in others we apply the label to a new mbuf.
Obtained from: TrustedBSD Project
in the TrustedBSD MAC Framework:
- Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send()
for AARP packet labeling, rather than using a generic link layer
entry point.
- Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send()
for ND6 packet labeling, rather than using a generic link layer entry
point.
- Add expliict entry point mac_netinet_arp_send() for ARP packet
labeling, and mac_netinet_igmp_send() for IGMP packet labeling,
rather than using a generic link layer entry point.
- Remove previous genering link layer entry point,
mac_mbuf_create_linklayer() as it is no longer used.
- Add implementations of new entry points to various policies, largely
by replicating the existing link layer entry point for them; remove
old link layer entry point implementation.
- Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global
to the MAC Framework rather than static to mac_net.c as it is now
needed outside of mac_net.c.
Obtained from: TrustedBSD Project
reason (not all BIOSen have _DIS methods for all link devices for example).
This matches the behavior of attach() with respect to _DIS as well.
Submitted by: njl
userland preemption directly from hardclock() via sched_clock() when a
thread uses up a full quantum instead of using a periodic timeout to cause
a userland preemption every so often. This fixes a potential deadlock
when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held
by a thread pinned or bound to another CPU. The current thread on that
CPU will never be preempted while softclock is blocked.
Note that ULE already drives its round-robin userland preemption from
sched_clock() as well and always enables IPI_PREEMPT.
MFC after: 1 week
a private softc list is needed neither for tracking clones in general
nor for destroying all clones before the module unload -- if_clone
takes care of all that. (Note that some other interface drivers do
need a softc list to be able to scan it for their private purposes.)
noatime, noexec, suiddir, nosuid, nosymfollow, union,
noclusterr, noclusterw, multilabel, acls, force, update,
async. These options correspond to MOPT_STDOPTS, MOPT_FORCE, MOPT_UPDATE,
and MOPT_ASYNC.
Currently, mount_nfs converts these "-o" options from strings
to MNT_ flags via getmntopts(),
and passes the flags from userspace to the kernel.
This change will allow us in future to pass these mount options
as strings directly to the kernel via nmount() when doing NFS mounts.
out instead of returning an error.
(1) This makes the behavior consistent with mount(2).
(2) This makes update mounts on the root file system work properly.
(3) The explicit checks for MNT_ROOTFS in src/sbin/fsck_ffs/main.c
and src/usr.sbin/mountd/mountd.c which were put in to
eliminate errors during update mounts on the root file system
can be removed.
The only place were MNT_ROOTFS can be validly set
is inside the kernel, i.e. with vfs_mountroot_try().
Reviewed by: phk
MFC after: 3 days
handle to the PCI device_t if the ACPI device_t is already attached to a
driver. This happens on the Tablet TC1000 which for some reason includes
two PCI-ISA bridges and treats the second bridge as an ACPI system resource
device.
Reviewed by: njl (a while ago)
MFC after: 3 days
that would have an offset beyond the end of the target object. Such
pages should remain in the source object.
MFC after: 3 days
Diagnosed and reviewed by: Kostik Belousov
Reported and tested by: Peter Holm
defined. This lets each boot program choose which version of cgbase() it
wants to use rather than forcing ufsread.c to have that knowledge.
MFC after: 1 week
Discussed with: imp
saves about 500 bytes in the boot code. While the AT91RM9200 has 12k
of space for the boot loader, which is more than i386's 8k, the code
generated by gcc is a bit bigger.
I've had this in p4 for about two years now.
we move towards netinet as a pseudo-object for the MAC Framework.
Rename 'mac_create_mbuf_linklayer' to 'mac_mbuf_create_linklayer' to
reflect general object-first ordering preference.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
kthread_add() takes the same parameters as the old kthread_create()
plus a pointer to a process structure, and adds a kernel thread
to that process.
kproc_kthread_add() takes the parameters for kthread_add,
plus a process name and a pointer to a pointer to a process instead of just
a pointer, and if the proc * is NULL, it creates the process to the
specifications required, before adding the thread to it.
All other old kthread_xxx() calls return, but act on (struct thread *)
instead of (struct proc *). One reason to change the name is so that
any old kernel modules that are lying around and expect kthread_create()
to make a process will not just accidentally link.
fix top to show kernel threads by their thread name in -SH mode
add a tdnam formatting option to ps to show thread names.
make all idle threads actual kthreads and put them into their own idled process.
make all interrupt threads kthreads and put them in an interd process
(mainly for aesthetic and accounting reasons)
rename proc 0 to be 'kernel' and it's swapper thread is now 'swapper'
man page fixes to follow.
refactored it to be a generic device.
Instead of being part of the standard kernel, there is now a 'nvram' device
for i386/amd64. It is in DEFAULTS like io and mem, and can be turned off
with 'nodevice nvram'. This matches the previous behavior when it was
first committed.
This change introduces audit_proc_coredump() which is called by coredump(9)
to create an audit record for the coredump event. When a process
dumps a core, it could be security relevant. It could be an indicator that
a stack within the process has been overflowed with an incorrectly constructed
malicious payload or a number of other events.
The record that is generated looks like this:
header,111,10,process dumped core,0,Thu Oct 25 19:36:29 2007, + 179 msec
argument,0,0xb,signal
path,/usr/home/csjp/test.core
subject,csjp,csjp,staff,csjp,staff,1101,1095,50457,10.37.129.2
return,success,1
trailer,111
- We allocate a completely new record to make sure we arent clobbering
the audit data associated with the syscall that produced the core
(assuming the core is being generated in response to SIGABRT and not
an invalid memory access).
- Shuffle around expand_name() so we can use the coredump name at the very
beginning of the coredump call. Make sure we free the storage referenced
by "name" if we need to bail out early.
- Audit both successful and failed coredump creation efforts
Obtained from: TrustedBSD Project
Reviewed by: rwatson
MFC after: 1 month
primary object type, and then by secondarily by method name. This sorts
entry points relating to particular objects, such as pipes, sockets, and
vnodes together.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
the PS/2 mouse controller. Thus, when acpi_ibm(4) claimed the mouse
device, the mouse would stop working. The one ACPI dump of an R40 that
I've looked at includes an HKEY device with the proper "IBM0068" ID, so
I'm not sure how the "IBM0057" ID could have helped at all.
MFC after: 1 week
Approved by: njl
Rework the read/write support in the bios disk driver some to cut down
on duplicated code.
- All of the bounce buffer and retry logic duplicated in bd_read() and
bd_write() are merged into a single bd_io() routine that takes an
extra direction argument. bd_read() and bd_write() are now simple
wrappers around bd_io().
from mac_vfs.c to mac_process.c to join other functions that setup up
process labels for specific purposes. Unlike the two proc create calls,
this call is intended to run after creation when a process registers as
the NFS daemon, so remains an _associate_ call..
Obtained from: TrustedBSD Project
than mac_<policy>_whatever, as this shortens the names and makes the code
a bit easier to read.
When dealing with label structures, name variables 'mb', 'ml', 'mm rather
than the longer 'mac_biba', 'mac_lomac', and 'mac_mls', likewise making
the code a little easier to read.
Obtained from: TrustedBSD Project
order. The kernel used to shuffle them around to get things right,
but that was recently fixed. This makes our boot loader match the
behavior of most other boot loaders for the atmel parts. This bug was
inherited from the Kwikbyte loader that we started from.
This bug was discovered by Bj.ANvrn KNvnig back in June, but fell on the
floor. He provided patches to the kernel, include backwards
compatibility options that were similar to Olivier's if_ate.c commit.
in the same order as it's set in ate_set_mac.
I remember a discussion about this on -arm, but apparently nothing was done.
Warner, is this wrong ?
X-MFC After: proper review
on i386 and amd64 machines. The overall process is that /boot/pmbr lives
in the PMBR (similar to /boot/mbr for MBR disks) and is responsible for
locating and loading /boot/gptboot. /boot/gptboot is similar to /boot/boot
except that it groks GPT rather than MBR + bsdlabel. Unlike /boot/boot,
/boot/gptboot lives in its own dedicated GPT partition with a new
"FreeBSD boot" type. This partition does not have a fixed size in that
/boot/pmbr will load the entire partition into the lower 640k. However,
it is limited in that it can only be 545k. That's still a lot better than
the current 7.5k limit for boot2 on MBR. gptboot mostly acts just like
boot2 in that it reads /boot.config and loads up /boot/loader. Some more
details:
- Include uuid_equal() and uuid_is_nil() in libstand.
- Add a new 'boot' command to gpt(8) which makes a GPT disk bootable using
/boot/pmbr and /boot/gptboot. Note that the disk must have some free
space for the boot partition.
- This required exposing the backend of the 'add' function as a
gpt_add_part() function to the rest of gpt(8). 'boot' uses this to
create a boot partition if needed.
- Don't cripple cgbase() in the UFS boot code for /boot/gptboot so that
it can handle a filesystem > 1.5 TB.
- /boot/gptboot has a simple loader (gptldr) that doesn't do any I/O
unlike boot1 since /boot/pmbr loads all of gptboot up front. The
C portion of gptboot (gptboot.c) has been repocopied from boot2.c.
The primary changes are to parse the GPT to find a root filesystem
and to use 64-bit disk addresses. Currently gptboot assumes that the
first UFS partition on the disk is the / filesystem, but this algorithm
will likely be improved in the future.
- Teach the biosdisk driver in /boot/loader to understand GPT tables.
GPT partitions are identified as 'disk0pX:' (e.g. disk0p2:) which is
similar to the /dev names the kernel uses (e.g. /dev/ad0p2).
- Add a new "freebsd-boot" alias to g_part() for the new boot UUID.
MFC after: 1 month
Discussed with: marcel (some things might still change, but am committing
what I have so far)
the PCIOCGETCONF, PCIOCREAD and PCIOCWRITE IOCTLs, which was broken
with the introduction of PCI domain support.
As the size of struct pci_conf_io wasn't changed with that commit,
this unfortunately requires the ABI of PCIOCGETCONF to be broken
again in order to be able to provide backwards compatibility to
the old version of that IOCTL.
Requested by: imp
Discussed with: re (kensmith)
Reviewed by: PCI maintainers (imp, jhb)
MFC after: 5 days
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:
mac_<object>_<method/action>
mac_<object>_check_<method/action>
The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.
All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
on duplicated code and support 64-bit LBAs for GPT.
- The code to manage an EDD or C/H/S I/O request are now in their own
routines. The EDD routine now handles a full 64-bit LBA instead of
truncating LBAs to the lower 32-bits. (MBRs and BSD labels only
have 32-bit LBAs anyway, so the only LBAs ever passed down were 32-bit).
- All of the bounce buffer and retry logic duplicated in bd_read() and
bd_write() are merged into a single bd_io() routine that takes an
extra direction argument. bd_read() and bd_write() are now simple
wrappers around bd_io().
- If a disk supports EDD then always use it rather than only using it if
the cylinder is > 1023. Other parts of the boot code already do
something similar to this. Also, GPT just uses LBAs, so for a GPT disk
it's probably best to ignore C/H/S completely. Always using EDD when
it is supported by a disk is an easy way to accomplish this.
MFC after: 1 week
Slightly cleanup the 'bootdev' concept on x86 by changing the various
macros to treat the 'slice' field as a real part of the bootdev instead
of as hack that spans two other fields (adaptor (sic) and controller)
that are not used in any modern FreeBSD boot code.
macros to treat the 'slice' field as a real part of the bootdev instead
of as hack that spans two other fields (adaptor (sic) and controller)
that are not used in any modern FreeBSD boot code.
MFC after: 1 week
audit it at the beginning of the syscall. This fixes a problem
where the user supplies an invalid process ID which is > 0 which
results in the PID argument not being audited.
Obtained from: TrustedBSD Project
MFC after: 1 week
state is stored in an extended subject token now. Make sure
that we are using the extended data. This fixes the termID
for process tokens.
Obtained from: TrustedBSD Project
Discussed with: rwatson
MFC after: 1 week
After discussions with jeff, alc, (various Ironport people), david Xu,
and mostly Alfred (who found the problem) it has been demonstrated that this
is not needed for our implementations of threads and represents a real
(as in we've seen it happen a lot) deadlock danger.
Several points:
Since forking multiple threads is not allowed, and posix states that
any mutexes owned by othre threads wilol be owned in the child by
phantom threads, and therads shouldn't ba accessing shared structures without
protection, It can be proved that if this leads to the child process accessing
inconsistent data, it's a programming error.
The mode of thread_single() being used in fork() is the wrong one.
It is using SINGLE_NO_EXIT when it should be using SINGLE_BOUNDARY.
Even if this we used, System processes have no need to do it as they have
no userland to get inconsistent.
This commmit first fixes the above bugs to get tehm correct in CVS.
then removes them with #ifdef.
This is so that history contains the corrected version should it
be needed in the future.
This code may be needed if we implement the forkall() syscall from
Solaris. It may be needed for other non-posix thread libraries
at some time in the future, so let the code sit for a short while
while I do some work on it anyhow.
This removes a reproducible lockup in NFS.
It may be argued that maybe doing a fork while holding a vnode lock may
not be the best idea in th efirst place but it shouldn't cause a deadlock.
The removal has been running under soak test for several days now.
This removal should be seriously considered for 7.0 and RELENG_6.
Note. There is code in the core-dumping code that may have a similar problem
with coredumping threaded processes
MFC After: 4 days
kern/sched_ule.c - Add __powerpc__ to the list of supported architectures
powerpc/conf/GENERIC - Swap SCHED_4BSD with SCHED_ULE
powerpc/powerpc/genassym.c - Export TD_LOCK field of thread struct
powerpc/powerpc/swtch.S - Handle new 3rd parameter to cpu_switch() by
updating the old thread's lock. Note: uniprocessor-only, will require
modification for MP support.
powerpc/powerpc/vm_machdep.c - Set 3rd param of cpu_switch to mutex of
old thread's lock, making the call a no-op.
Reviewed by: marcel, jeffr (slightly older version)
Specifically, if two threads were doing concurrent lookups and the existing
gateway was marked down, the the first thread would drop a reference on the
gateway route and then unlock the "root" route while it tried to allocate
a new route. The second thread could then also drop a reference on the
same gateway route resulting in a reference underflow. Fix this by
clearing the gateway route pointer after dropping the reference count but
before dropping the lock. Secondly, in this same case, the second thread
would overwrite the gateway route pointer w/o free'ing a reference to the
route installed by the first thread. In practice this would probably just
fix a lost reference that would result in a route never being freed.
This fixes panics observed in rt_check() and rtexpunge().
MFC after: 1 week
PR: kern/112490
Insight from: mehuljv at yahoo.com
Reviewed by: ru (found the "not-setting it to NULL" part)
Tested by: several
- markvoldirty() needs to write to underlying GEOM provider. We
have to do that *before* g_access() which sets the GEOM provider
to read-only.
- Remove dirty flag before free'ing iconv related resources. The
dirty flag removal could fail, and it is hard to revert the
iconv-free after the fail.
- Mark volume as dirty if we have failed to mark it clean for safe.
- Other style fixes to the touched functions.
cache: vnode_pager_setsize() must handle the case where a file is
truncated to a non-page-size-aligned boundary and there is a cached
page underlying the new end of file.
Reported by: kris, tegge
Tested by: kris
MFC after: 3 days
since revision 1.1. Specifically, neither traversal of the vm map checks
whether the end of the vm map has been reached. Consequently, the first
traversal can wrap around and bogusly return an error.
This error has gone unnoticed for so long because no one had ever before
tried msync(2)ing a region above the stack.
Reported by: peter
MFC after: 1 week
for kldstat(2).
This allows libdtrace to determine the exact file from which
a kernel module was loaded without having to guess.
The kldstat(2) API is versioned with the size of the
kld_file_stat structure, so this change creates version 2.
Add the pathname to the verbose output of kldstat(8) too.
MFC: 3 days
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the
new kthread_xxx() calls.
optimization level (-march=pentium-mmx for example) does not insert
intermediate ops which would trash the carry.
Change both sys/i386/i386/in_cksum.c[1] and sys/i386/include/in_cksum.h.
To my best understanding the same problem was addressed in rev. 1.16
of src/sys/i386/include/in_cksum.h for just a single function 3y ago.
Reviewed by: jhb
Submitted by: Zhouyi ZHOU <zhouzhouyi FreeBSD.org> (intial version of [1])
MFC after: 5 days
PR: 115678, 69257
codecs. Codec at address 0 seems purely digital, or perhaps an HDMI
interface. Let the driver skip it and continue scanning the codecs
starting with address 2 (Realtek ALC885).
* Due to possibilities of future similar cases, put enough logic
in hdac_scan_codecs() to force codec scanning starting from
XX address via tunable "hint.pcm.%d.codec_index".
Reported / Tested by: Toomas Pelberg <toomasp@gmx.net>
- Trivial headphone / speaker automute fixup for Fujitsu-Siemens
AMILO Si 1848 laptop.
Reported / Tested by: Ed <ed@bsd.it>
- Trivial headphone / speaker automute fixup for Fujitsu-Siemens
Lifebook S7020D laptop.
Reported / Tested by: Jaromir Dvoracek <jarek@ataxo.com>
- Some smart vendor trying to create interplanetary wormhole by
screwing pci config space during their BIOS update. The side effects
of their failure attempt includes mutilated hardware id, broken
speaker automuting and loosing the entire analog CD connectivity,
thus causing enough collateral damages to collapse the entire
universe. Move along with it.
Please exercise extra cautious when applying BIOS updates.
Reported / Tested by: Pietro Cerutti <gahr@gahr.ch>
- assembled laptop, based on the MSI-1034
(662) which is now becoming MSI-034A.
- Fix no sound issues (on headphones) for Lenovo ThinkCentre A55 due
to global automute table entry which is not applicable for
non-laptops.
Reported / Tested by: Piotr Smyrak <piotr.smyrak@heron.pl>
- Speaker mute control for HP DC7700 since the front headphone jack
does not generate any interesting unsolicited signal/response.
Reported / Tested by: tyop @ irc.freenode.net
Approved by: re (kensmith)
MFC after: 3 days
When item forwarded refence counter is incremented, when item
processed, counter decremented. When counter reaches zero,
apply handler is getting called.
Now it allows to report right connect() call status from user-level
at the right time.
This is much simpler than for ffs since there are many fewer places
where we need to choose between a delayed write and a sync write --
just 5 in msdosfs and more than 30 in ffs.
This is more complete and correct than in ffs. Several places in ffs
are are still missing the choice. ffs_update() has a layering violation
that breaks callers which want to force a sync update (mainly fsync(2)
and O_SYNC write(2)).
However, fsync(2) and O_SYNC write(2) are still more broken than in
ffs, since they are broken for default (non-sync non-async) mounts
too. Both fail to sync the FAT in all cases, and both fail to sync
the directory entry in some cases after losing a race. Async everything
is probably safer than the half-baked sync of metadata given by default
mounts.
us to scale up to sb_max, aka kern.ipc.maxsockbuf.
We do this because there are broken firewalls that will corrupt the window
scale option, leading to the other endpoint believing that our advertised
window is unscaled. At scale factors larger than 5 the unscaled window will
drop below 1500 bytes, leading to serious problems when traversing these
broken firewalls.
With the default maxsockbuf of 256K, a scale factor of 3 will be chosen by
this algorithm. Those who choose a larger maxsockbuf should watch out
for the compatiblity problems mentioned above.
Reviewed by: andre
queue so the output network card must support the same tagging mechanism as
how the frame was input (prepended Ethernet header tag or stripped HW mflag).
Now the vlan Ethernet header is _always_ stripped in ether_input and the mbuf
flagged, only only network cards with VLAN_HWTAGGING enabled would properly
re-tag any outgoing vlan frames.
If the outgoing interface does not support hardware tagging then readd the vlan
header to the front of the frame. Move the common vlan encapsulation in to
ether_vlanencap().
Reported by: Erik Osterholm, Jon Otterholm
MFC after: 1 week
leaving space for adding missing options. Negative options are sorted
after removing their "no" prefix, and generic options are sorted before
msdosfs-specific ones.
(except indirectly for the size pseudo-attribute). If anything deserves
a sync update, then it is ids and immutable flags, since these are
related to security, but ffs never synced these and msdosfs doesn't
support them. (ufs_setattr() only does an update in one case where
it is least needed (for timestamps); it did pessimal sync updates for
timestamps until 1998/03/08 but was changed for unlogged reasons related
to soft updates.)
Now msdosfs calls deupdat() with waitfor == 0, which normally gives a
delayed update to disk but always gives a sync update of timestamps
in core, while for ffs everything is delayed until the syncer daemon
or other activity causes an update (except for timestamps).
This gives a large optimization mainly for things like cp -p, where
attribute adjustment could easily triple the number of physical I/O's
if it is done synchronously (but cp -p to msdosfs is not as bad as
that, since msdosfs doesn't support many attributes so null adjustments
are more common, and msdosfs doesn't support ctimes so even if cp
doesn't weed out null adjustments they don't become non-null after
clobbering the ctime).
in the way we implement handling of relocations.
As for the kernel part this fixes the loading of lots of modules,
which failed to load due to unresolvable symbols when built after
the GCC 4.2.0 import. This wasn't due to a change in GCC itself
though but one of several changes in configuration done along the
import. Specfically, HAVE_AS_REGISTER_PSEUDO_OP, which causes GCC
to denote global registers used for scratch purposes and in turn
GAS uses R_SPARC_OLO10 relocations for, is now defined.
While at it replace some more ELF_R_TYPE which should have been
ELF64_R_TYPE_ID but didn't cause problems so far.
- Sync a sanity check between kernel and rtld(1) and change it to be
maintenance free regarding the type used for the lookup table.
- Sprinkle const on lookup tables.
- Use __FBSDID.
Reported and tested by: yongari
MFC after: 5 days
- fix a bug during cookie collision that prevented an
association from coming up in a specific restart case.
- Fix it so the shutdown-pending flag gets removed (this is
more for correctness then needed) when we enter shutdown-sent
or shutdown-ack-sent states.
- Fix a bug that caused the receiver to sometimes NOT send
a SACK when a duplicate TSN arrived. Without this fix
it was possible for the association to fall down if the
- Deleted primary destination is also stored when SCTP_MOBILITY_BASE.
(Previously, it is stored when only SCTP_MOBILITY_FASTHANDOFF)
- Fix a locking issue where we might call send_initiate_ack() and
incorrectly state the lock held/not held. Also fix it so that
when we release the lock the inp cannot be deleted on us.
- Add the debug option that can cause the stack to panic instead
of aborting an assoc. This does not and should never show up
in options but is useful for debugging unexpected aborts.
- Add cumack_log sent to track sending cumack information for
the debug case where we are running a special log per assoc.
- Added extra () aroudn sctp_sbspace macro to avoid compile warnings.
MFC after: 1 week
This avoids back-to-back faults for all TLB misses. This can be
improved further in the future by also setting PTE_DIRTY for TLB
misses for write accesses.
MFC after: 1 week
ukbd_poll to mark this keyboard instance as polling before calling
usbd_set_polling at USB level. usbd_set_polling runs softintr before
returning, stealing our input and making consequent polling getchar
kind of pointless.
This allows USB keyboards to coexist peacefully with serial console in DDB
and other contexts where polling is used.
MFC after: 1 week
properly due to the shortage of the RX buffer size. In a case of zyd
devices, up to 3 frames can be combined in an USB transaction. So, RX
buffer should be at least ((MCLBYTES + extra structs) * 3)
Submitted by: Weongyo Jeong <weongyo.jeong@gmail.com>
MFC after: 3 days
(it is established practice) and ``-o whiteout=whenneeded'' is less
disk-space using mode especially for resource restricted environments
like embedded environments. (Contributed by Ed Schouten. Thanks)
Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week
Some folks who have reported some issues have solved with transparent mode.
We guess it is time to change the default copy mode. The transparent-mode is
the best in most situations.
Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week
applications that use procfs on unionfs.
- Removed unionfs internal cache mechanism because it has
vfs_cache support instead. As a result, it just simplified code of
unionfs.
- Fixed kern/111262 issue.
Submitted by: Masanori Ozawa <ozawa@ongs.co.jp> (unionfs developer)
Reviewed by: jeff, kensmith
Approved by: re (kensmith)
MFC after: 1 week
make sure to never call sched_bind() for uninitialised CPUs.
Submitted by: Constantine A. Murenin <cnst@FreeBSD.org>
Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors)
Mentored by: syrinx
Tested by: many
OKed by: kensmith
This commit includes the following core components:
* sample configuration file for sensorsd
* rc(8) script and glue code for sensorsd(8)
* sysctl(3) doc fixes for CTL_HW tree
* sysctl(3) documentation for hardware sensors
* sysctl(8) documentation for hardware sensors
* support for the sensor structure for sysctl(8)
* rc.conf(5) documentation for starting sensorsd(8)
* sensor_attach(9) et al documentation
* /sys/kern/kern_sensors.c
o sensor_attach(9) API for drivers to register ksensors
o sensor_task_register(9) API for the update task
o sysctl(3) glue code
o hw.sensors shadow tree for sysctl(8) internal magic
* <sys/sensors.h>
* HW_SENSORS definition for <sys/sysctl.h>
* sensors display for systat(1), including documentation
* sensorsd(8) and all applicable documentation
The userland part of the framework is entirely source-code
compatible with OpenBSD 4.1, 4.2 and -current as of today.
All sensor readings can be viewed with `sysctl hw.sensors`,
monitored in semi-realtime with `systat -sensors` and also
logged with `sensorsd`.
Submitted by: Constantine A. Murenin <cnst@FreeBSD.org>
Sponsored by: Google Summer of Code 2007 (GSoC2007/cnst-sensors)
Mentored by: syrinx
Tested by: many
OKed by: kensmith
Obtained from: OpenBSD (parts)
which is ukbd0. Specifically, the keyboard driver structures for ukbd0
are not allocated/freed but are statically allocated via a persistent
global variable. There is some additional magic for the ukbd0 such that
if the keyboard is marked as probed in this global variable, then we
don't check to see if the device_t we are probing has an interface.
This causes a problem if an attach of ukbd0 fails without fulling clearing
the state in the global variable. Specifically, if the keyboard fails to
initialize in init_keyboard() or kbd_register(), then the keyboard will
still be marked as probed. The USB layer will then try to offer the
"generic" version of the USB keyboard device (as opposed to the
per-interface sub-devices) and the ukbd(4) driver will see that the
keyboard is marked probe and will skip the "is this a per-interface device"
check. Later in ukbd_attach() it panics because it tries to dereference
the interface pointer which is NULL.
The fix is to clear the flags in the persistent keyboard data for ukbd0
when init_keyboard() or kbd_register() fail.
MFC after: 1 week
Reviewed by: imp
- Eliminate the hideous nfs_sndlock that serialized NFS/TCP request senders
thru the sndlock.
- Institute a new nfs_connectlock that serializes NFS/TCP reconnects. Add
logic to wait for pending request senders to finish sending before
reconnecting. Dial down the sb_timeo for NFS/TCP sockets to 1 sec.
- Break out the nfs xid manipulation under a new nfs xid lock, rather than
over loading the nfs request lock for this purpose.
- Fix some of the locking in nfs_request.
Many thanks to Kris Kennaway for his help with this and for initiating the
MP scaling analysis and work. Kris also tested this patch thorougly.
Approved by: re@ (Ken Smith)
on multiple different audit pipes. The old method used cv_signal()
which would result in only one thread being woken up after we
appended a record to it's queue. This resulted in un-timely wake-ups
when processing audit records real-time.
- Assign PSOCK priority to threads that have been sleeping on a read(2).
This is the same priority threads are woken up with when they select(2)
or poll(2). This yields fairness between various forms of sleep on
the audit pipes.
Obtained from: TrustedBSD Project
Discussed with: rwatson
MFC after: 1 week
This fixes the process portion of the bpf(4) stats if the peer forks
into the background after it's opened the descriptor. This bug
results in the following behavior for netstat -B:
# netstat -B
Pid Netif Flags Recv Drop Match Sblen Hblen Command
netstat: kern.proc.pid failed: No such process
78023 em0 p--s-- 2237404 43119 2237404 13986 0 ??????
MFC after: 1 week
- Add proper scanning support rather than letting the firmware grab the first
access point
- Overhaul state changes
- Use macros for locking and provide _locked() versions of some functions
- Increase debugging output
- Use a callout rather than the old watchdog interface
- Improve style, function names and defines
- Add WPA (TKIP) support
Based heavily on a patchset provided by Sam Leffler.
VR_STICKHW register would result in unexpected results on these
hardwares. wpaul said the following for the issue.
The vr_attach() routine unconditionally does this for all supported
chips:
/*
* Windows may put the chip in suspend mode when it
* shuts down. Be sure to kick it in the head to wake it
* up again.
*/
VR_CLRBIT(sc, VR_STICKHW, (VR_STICKHW_DS0|VR_STICKHW_DS1));
The problem is, the VR_STICKHW register is not valid on all Rhine
devices. The VT86C100A chip, which is present on the D-Link DFE-530TX
boards, doesn't support power management, and its register space is
only 128 bytes wide. The VR_STICKHW register offset falls outside this
range. This may go unnoticed in most scenarios, but if you happen to have
another PCI device in your system which is assigned the register
space immediately after that of the Rhine, the vr(4) driver will
incorrectly stomp it. In my case, the BIOS on my test board decided
to put the register space for my PRO/100 ethernet board right next
to the Rhine, and the Rhine driver ended up clobbering the IMR register
of the PRO/100 device. (Long story short: the board kept locking up on
boot. Took me the better part of the morning suss out why.)
The strictly correct thing to do would be to check the PCI config space
to make sure the device supports the power management capability and only
write to the VR_STICKHW register if it does.
Instead of inspecting chip revision numbers for the availability of
VR_STICKHW register, check the existence of power management capability
of the hardware as wpaul suggested.
Reported by: wpaul
Suggested by: wpaul
OK'ed by: jhb
1. The locking was changed to shared but roundrobin mode still updated a
pointer in the softc with the next tx interface to use. This will panic
under high load. Change this to an atomically incremented sequence number in
order to choose the tx port in round robin.
2. IFQ_HANDOFF will free the mbuf if the queue is full, this will then be freed
again by lagg_start() and panic. Reorganised the error handling and freeing
to fix this.
MFC after: 3 days
SAS-enabled cards. It also makes the driver MPSAFE, eliminating some
problems that resulted from CAM becoming MPSAFE. Many thanks to 3Ware/AMCC
for continuing to support FreeBSD.
Submitted by: Manjunath Ranganathaiah
Approved by: re
voltage of 0. This can result in a divide by zero trap. Add a guard
for this case. The value of lfcap is checked in acpi_battery_bif_valid()
just before this, so it is safe.
Reportd by: sam
Approved by: re
MFC after: 3 days
of directly from acpi0. Before it would attach prior to the sysresource
devices, causing the later allocation of its memory range to fail and
print a warning like "acpi0: reservation of fed00000, 1000 (3) failed".
Use an explicit define for our probe order base value of 10.
Help from: jhb
Tested by: Abdullah Ibn Hamad Al-Marri <almarrie / gmail.com>
MFC after: 3 days
Approved by: re
fixes a bug on UP machines with SMP kernels where the idle thread
constantly switches after trying to steal work from the local cpu.
- Make the idle stealing code more robust against self selection.
- Prefer to steal from the cpu with the highest load that has at least one
transferable thread. Before we selected the cpu with the highest
transferable count which excludes bound threads.
Collaborated with: csjp
Approved by: re
to simply switch rather than lowering priority and switching. This allows
threads of equal priority to run but not lesser priority.
Discussed with: davidxu
Reported by: NIIMI Satoshi <sa2c@sa2c.net>
Approved by: re
critical_exit() owepreempt check. ULE will always use owepreempt to
preempt the idle thread. This change does not effect 4BSD since it will
never set owepreempt without PREEMPTION enabled.
- Remove some unused code from choosethread().
Discussed with: jhb
Approved by: re
it must first ensure that the page is no longer mapped. This is
trivially accomplished by calling pmap_remove_all() a little earlier
in vm_page_cache(). While I'm in the neighborbood, make a related
panic message a little more useful.
Approved by: re (kensmith)
Reported by: Peter Holm and Konstantin Belousov
Reviewed by: Konstantin Belousov
a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs
when uma_small_free() frees a page. The solution has two parts: (1) Mark
pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock
assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested.
This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable
flags, i.e., they do not change state between the time that a page is
allocated and freed.
Approved by: re (kensmith)
PR: 116794
TCP: [X.X.X.X]:X to [X.X.X.X]:X tcpflags 0x18<PUSH,ACK>; tcp_do_segment: FIN_WAIT_2: Received data after socket was closed, sending RST and removing tcpcb
So that it also includes how many bytes of data were received. It now looks
like this:
TCP: [X.X.X.X]:X to [X.X.X.X]:X tcpflags 0x18<PUSH,ACK>; tcp_do_segment: FIN_WAIT_2: Received X bytes of data after socket was closed, sending RST and removing tcpcb
Approved by: re (gnn)
not being independently freeable. This allows one to embed an mbuf in
the cluster itself. This confers the benefits of the packet zone on
all cluster sizes. Embedded mbufs currently suffer from the same
limitation that packet zone mbufs do in that one cannot disconnect
them and pass them around independently of the cluster. It would
likely be possible to eliminate this limitation in the future by
adding a second reference for the mbuf itself.
Approved by: re(gnn)
problems with the syncache, it produces a lot of console noise and has led
to quite a few false positive bug reports. It can be selectively
re-enabled when debugging specific problems by frobbing the same sysctl.
Discussed with: silby
Approved by: re (gnn)
directory itself (rather than any of its contents) is visible to the
current thread.
MFC after: 1 week
PR: kern/90063
Submitted by: john of 8192.net
Approved by: re (kensmith)
with all functions supported. This is done adding usb device IDs
to the table of recognised devices (because there is no standard
'scanner' class, so no other way to recognise them), and with
a small change to the uscanner attach routine that prevents
reconfiguring the whole USB device while we are dealing only with
one of its USB interfaces.
The latter part has been suggested by Steinar Hamre in
http://www.freebsd.org/cgi/query-pr.cgi?pr=107665 , i have
only added a bit of explaination to the code.
I have personally tried this on the Epson DX-5050 and DX-6000
devices (on the US market they have different names, CX-something).
I have good reasons to think that, possibly with the mere addition
of more USB ids to the table in uscanner.c, this should work with
all Epson multifunction devices in that family (from DX-3800 to
DX-7000 - these units are in the 50-120$ price range).
More details on related topics (SANE configuration, OCR, etc.)
at http://info.iet.unipi.it/~luigi/FreeBSD/dx5050.html
Manpage updates coming soon.
Approved by: re, imp
MFC after: 3 days
UDMA modes.
Please notice that Soekris NET5501 bios versions before 1.32f has a bug
that prevents this from working.
Approved by: re (gnn)
MFC: 2 weeks
Before that fix, it was possible for the function to fail if number
of sharers changes between 'x = sx->sx_lock' step and atomic_cmpset_acq_ptr()
call.
This fixes ZFS problem when ZFS returns strange EIO errors under load.
In ZFS there is a code that depends on the fact that sx_try_slock() can
only fail if there is an exclusive owner.
Discussed with: attilio
Reviewed by: jhb
Approved by: re (kensmith)
after the switch leads to a race where the outgoing thread still owns
the local queue lock while another cpu may switch it in. This race
is only possible on machines where cpu_switch can take significantly
longer on different cpus which in practice means HTT machines with
unfair thread scheduling algorithms.
Found by: kris (of course)
Approved by: re
starvation caused by unbalanced interrupt loads.
- Change the rebalancer to work on stathz ticks but retain randomization.
- Simplify locking in tdq_idled() to use the tdq_lock_pair() rather than
complex sequences of locks to avoid deadlock.
Reported by: kris
Approved by: re
retransmittion by handover event (fast mobility code)
- Fixed problem of mobility code which is caused by remaining
parameters in the deleted primary destination.
- Add a missing lock. When a peer sends an INIT, and while we
are processing it to send an INIT-ACK the socket is closed,
we did not hold a lock to keep the socket from going away.
Add protection for this case.
- Fix so that arwnd is alway uses the minimal rwnd if the user
has set the socket buffer smaller. Found this when the test
org decided to see what happens when you set in a rwnd of 10
bytes (which is not allowed per RFC .. 4k is minimum).
- Fixes so a cookie-echo ootb will NOT cause an abort to
be sent. This was happening in a MPI collision case.
- Examined all panics and unless there was no recovery, moved
any that were not already to INVARANTS.
Approved by: re@freebsd.org (gnn)
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.
Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)
After discussion with Sam, switch back to use firmware(9) instead of
having the firmware in hex format.
Put the binary firmware uuencoded into sys/contrib/dev/npe, and slap a
LICENSE file, as found on the Intel website.
Approved by: re (blanket), mux (mentor)
MFC After: 1 week
Without this change the following situation was possible:
1. Provider is orphaned from within class' access() method on last write
close - orphan provider event is send.
2. GEOM detects last write close on a provider and sends new provider event.
3. g_orphan_register() is called, and calls all orphan methods of attached
consumers.
4. New provider event is executed on orphaned provider, all classes can
taste already orphaned provider, and some may attach consumers to it.
Those consumers will never go away, because the g_orphan_register()
was already called.
We end up with a zombie provider.
With this change, at step 3, we will cancel new provider event.
How to repeat this problem:
# mdconfig -a -t malloc -s 10m
# geli init -i 0 md0
# geli attach md0
# newfs -L test /dev/md0.eli
# mount /dev/ufs/test /mnt/tmp
# geli detach -l md0.eli
# umount /mnt/tmp
# glabel status
Name Status Components
ufs/test N/A N/A
Reviewed by: phk
Approved by: re (kensmith)
value for kern.sched.preempt_thresh appropriately. It can still by
adjusted at runtime. ULE will still use IPI_PREEMPT in certain
migration situations.
- Assert that we're not trying to compile ULE on an unsupported
architecture. To date, I believe only i386 and amd64 have implemented
the third cpu switch argument required.
Approved by: re
cache: vm_object_page_remove() should convert any cached pages that
fall with the specified range to free pages. Otherwise, there could
be a problem if a file is first truncated and then regrown.
Specifically, some old data from prior to the truncation might reappear.
Generalize vm_page_cache_free() to support the conversion of either a
subset or the entirety of an object's cached pages.
Reported by: tegge
Reviewed by: tegge
Approved by: re (kensmith)
to gem_attach() as the former access softc members not yet initialized
at that time and gem_reset() actually is enough to stop the chip. [1]
o Revise the use of gem_bitwait(); add bus_barrier() calls before calling
gem_bitwait() to ensure the respective bit has been written before we
starting polling on it and poll for the right bits to change, f.e. even
though we only reset RX we have to actually wait for both GEM_RESET_RX
and GEM_RESET_TX to clear. Add some additional gem_bitwait() calls in
places we've been missing them according to the GEM documentation.
Along with this some excessive DELAYs, which probably only were added
because of bugs in gem_bitwait() and its use in the first place, as
well as as have of an gem_bitwait() reimplementation in gem_reset_tx()
were removed.
o Add gem_reset_rxdma() and use it to deal with GEM_MAC_RX_OVERFLOW errors
more gracefully as unlike gem_init_locked() it resets the RX DMA engine
only, causing no link loss and the FIFOs not to be cleared. Also use it
deal with GEM_INTR_RX_TAG_ERR errors, with previously were unhandled.
This was based on information obtained from the Linux GEM and OpenSolaris
ERI drivers.
o Turn on workarounds for silicon bugs in the Apple GMAC variants.
This was based on information obtained from the Darwin GMAC and Linux GEM
drivers.
o Turn on "infinite" (i.e. maximum 31 * 64 bytes in length) DMA bursts.
This greatly improves especially RX performance.
o Optimize the RX path, this consists of:
- kicking the receiver as soon as we've a spare descriptor in gem_rint()
again instead of just once after all the ready ones have been handled;
- kicking the receiver the right way, i.e. as outlined in the GEM
documentation in batches of 4 and by pointing it to the descriptor
after the last valid one;
- calling gem_rint() before gem_tint() in gem_intr() as gem_tint() may
take quite a while;
- doubling the size of the RX ring to 256 descriptors.
Overall the RX performance of a GEM in a 1GHz Sun Fire V210 was improved
from ~100Mbit/s to ~850Mbit/s.
o In gem_add_rxbuf() don't assign the newly allocated mbuf to rxs_mbuf
before calling bus_dmamap_load_mbuf_sg(), if bus_dmamap_load_mbuf_sg()
fails we'll free the newly allocated mbuf, unable to recycle the
previous one but a NULL pointer dereference instead.
o In gem_init_locked() honor the return value of gem_meminit().
o Simplify gem_ringsize() and dont' return garbage in the default case.
Based on OpenBSD.
o Don't turn on MAC control, MIF and PCS interrupts unless GEM_DEBUG is
defined as we don't need/use these interrupts for operation.
o In gem_start_locked() sync the DMA maps of the descriptor rings before
every kick of the transmitter and not just once after enqueuing all
packets as the NIC might instantly start transmitting after we kicked
it the first time.
o Keep state of the link state and use it to enable or disable the MAC
in gem_mii_statchg() accordingly as well as to return early from
gem_start_locked() in case the link is down. [3]
o Initialize the maximum frame size to a sane value.
o In gem_mii_statchg() enable carrier extension if appropriate.
o Increment if_ierrors in case of an GEM_MAC_RX_OVERFLOW error and in
gem_eint(). [3]
o Handle IFF_ALLMULTI correctly; don't set it if we've turned promiscuous
group mode on and don't clear the flag if we've disabled promiscuous
group mode (these were mostly NOPs though). [2]
o Let gem_eint() also report GEM_INTR_PERR errors.
o Move setting sc_variant from gem_pci_probe() to gem_pci_attach() as
device probe methods are not supposed to touch the softc.
o Collapse sc_inited and sc_pci into bits for sc_flags.
o Add CTASSERTs ensuring that GEM_NRXDESC and GEM_NTXDESC are set to
legal values.
o Correctly set up for 802.3x flow control, though #ifdef out the code
that actually enables it as this needs more testing and mainly a proper
framework to support it.
o Correct and add some conversions from hard-coded functions names to
__func__ which were borked or forgotten in if_gem.c rev. 1.42.
o Use PCIR_BAR instead of a homegrown macro.
o Replace sc_enaddr[6] with sc_enaddr[ETHER_ADDR_LEN].
o In gem_pci_attach() in case attaching fails release the resources in
the opposite order they were allocated.
o Make gem_reset() static to if_gem.c as it's not needed outside that
module.
o Remove the GEM_GIGABIT flag and the associated code; GEM_GIGABIT was
never set and the associated code was in the wrong place.
o Remove sc_mif_config; it was only used to cache the contents of the
respective register within gem_attach().
o Remove the #ifdef'ed out NetBSD/OpenBSD code for establishing a suspend
hook as it will never be used on FreeBSD.
o Also probe Apple Intrepid 2 GMAC and Apple Shasta GMAC, add support for
Apple K2 GMAC. Based on OpenBSD.
o Add support for Sun GBE/P cards, or in other words actually add support
for cards based on GEM to gem(4). This mainly consists of adding support
for the TBI of these chips. Along with this the PHY selection code was
rewritten to hardcode the PHY number for certain configurations as for
example the PHY of the on-board ERI of Blade 1000 shows up twice causing
no link as the second incarnation is isolated.
These changes were ported from OpenBSD with some additional improvements
and modulo some bugs.
o Add code to if_gem_pci.c allowing to read the MAC-address from the VPD on
systems without Open Firmware.
This is an improved version of my variant of the respective code in
if_hme_pci.c
o Now that gem(4) is MI enable it for all archs.
Pointed out by: yongari [1]
Suggested by: rwatson [2], yongari [3]
Tested on: i386 (GEM), powerpc (GMACs by marcel and yongari),
sparc64 (ERI and GEM)
Reviewed by: yongari
Approved by: re (kensmith)
33MHz for calculating the latency timer values for its children.
Inspired by NetBSD doing the same and Linux as well as OpenSolaris
using a similar approach.
While at it rename a variable and change its type to be more
appropriate fuer values of PCI properties so the variable can be
more easily reused.
- Initialize the cache line size register of PCI devices to a
legal value; the cache line size is limited to 64 bytes by the
Fireplane/Safari, JBus and UPA interconnection busses. Setting
it to an unsupported value caused bad performance at least with
GEM as it causes them to not do cache line bursts and to not
issue cache line commands on the PCI bus.
Approved by: re (kensmith)
MFC after: 1 week
timeout occurring at exactly the same time. If this happens, the nfsiod
exits although there may be a queued async IO request for it.
Found by : Kris Kennaway
Approved by: re
ways:
(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.
This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.
Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.
(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.
Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.
Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)
This patch was part of ACPI-CA 20070508 release and the
following is excerpt from its change log:
Fixed a problem where the Global Lock handle was not properly
updated if a thread that acquired the Global Lock via executing
AML code then attempted to acquire the lock via the
AcpiAcquireGlobalLock interface. Reported by Joe Liu.
Approved by: re (kensmith)
Tested by: ambrisko
Obtained from: Intel
polling/interrupt-driven fallback and instead use polling only during
boot and pure interrupt-driven mode after boot. Polled mode could be
relegated completely to a legacy role if we could enable interrupts
during boot. Polled mode can be forced after boot by setting
debug.acpi.ec.polled="1", i.e. if there are timeouts.
- Use polling only during boot, shutdown, or if requested by the user.
Otherwise, use a generation count of GPEs, incremented atomically. This
prevents an old status value from being used if the EC is really slow
and the same condition (i.e. multiple IBEs for a write transaction) is
being checked.
- Check for and run the query handler directly if the SCI bit is set in
the status register during boot. Previously, the query handler wouldn't
run until interrupts were finally enabled late in boot.
- During boot and after starting a command, check if the event appears
to already have occurred before we even start waiting. If so, it's
possible the EC is very slow and we might accept an old status value.
Print a warning in this case. Once we've booted, interrupt-driven mode
should work just fine but polled mode could be unreliable. There's not
much more we can do about this until interrupts are enabled during boot.
- In the above case, we also do one final check if the interrupt-driven
mode gets a timeout. If the status is complete, it will force the
system back into polled mode since interrupt mode doesn't work. For
polled mode during boot, if the status appears to be already complete
before beginning the check loop, it waits 10 us before actually checking
the status, just in case the EC is really slow and hasn't gotten to work
on the new request yet.
- Use upper-case hex for the _Qxx method
- Use device_printf for errors, don't hide them under verbose
- Increase default total timeout to 750 ms and decrease polling interval
to 5 us.
- Don't pass the status value via the softc. Just read it directly.
- Remove the mutex. We use the sx lock for transaction serialization
with the query handler.
- Remove the Intel copyright notice as no code of theirs was ever
present in this file (verified against rev 1.1)
- Allow KTR module-only builds for ease of testing
Thanks to jkim and Alexey Starikovskiy for helpful discussions and testing.
Approved by: re
MFC after: 2 weeks
- Reintegrate the ANSI C function declaration change
from tcp_timer.c rev 1.92
- Reorganize the tcpcb structure so that it has a single
pointer to the "tcp_timer" structure which contains all
of the tcp timer callouts. This change means that when
the single tcp timer change is reintegrated, tcpcb will
not change in size, and therefore the ABI between
netstat and the kernel will not change.
Neither of these changes should have any functional
impact.
Reviewed by: bmah, rrs
Approved by: re (bmah)
route and once they are done with it, call rtfree(). rtfree() should
only be used when we are certain we hold the last reference to the
route. This bug results in console messages like the following:
rtfree: 0xc40f7000 has 1 refs
This patch switches the rtfree() to use RTFREE_LOCKED() instead,
which should handle the reference counting on the route better.
Approved by: re@ (gnn)
Reviewed by: bms
Reported by: many via net@ and current@
Tested by: many
All active fields in fsi are advisory/optional, so we shouldn't do
extra work to make them valid at all times, but instead we write to
the fsi too often (we still do), and we searched for a free cluster
for fsinxtfree too often.
This commit just removes the whole search and its results, so that we
write out our in-core copy of fsinxtfree instead of writing a "fixed"
copy and clobbering our in-core copy. This saves fixing 3 bugs:
- off-by-1 error for the end of the search, resulting in fsinxtfree
not actually being adjusted iff only the last cluster is free.
- missing adjustment when no clusters are free.
- off-by-many error for the start of the search. Starting the search
at 0 instead of at (the in-core copy of) fsinxtfree did more than
defeat the reasons for existence of fsinxtfree. fsinxtfree exists
mainly to avoid having to start at 0 for just the first search per
mount, but has the side effect of reducing bias towards allocating
near cluster 0. The bias would normally only be generated by the
first search per mount (if fsinxtfree is not supported), but since
we also adjusted the in-core copy of fsinxtfree here, we were doing
extra work to maximize the bias.
Approved by: re (kensmith)
providers with limited physical storage and add physical storage as
needed.
Submitted by: Ivan Voras
Sponsored by: Google Summer of Code 2006
Approved by: re (kensmith)
- Improve load long-term load balancer by always IPIing exactly once.
Previously the delay after rebalancing could cause problems with
uneven workloads.
- Allow nice to have a linear effect on the interactivity score. This
allows negatively niced programs to stay interactive longer. It may be
useful with very expensive Xorg servers under high loads. In general
it should not be necessary to alter the nice level to improve interactive
response. We may also want to consider never allowing positively niced
processes to become interactive at all.
- Initialize ccpu to 0 rather than 0.0. The decimal point was leftover
from when the code was copied from 4bsd. ccpu is 0 in ULE because ULE
only exports weighted cpu values.
Reported by: Steve Kargl (Load balancing problem)
Approved by: re
Eliminates panics due to locking issues.
Idea taken from src/sys/gnu/fs/xfs/FreeBSD/xfs_super.c.
PR: 89966, 92000, 104393
Reported by: H. Matsuo <hiroshi50000 yahoo co jp>,
Chris <m2chrischou gmail.com>,
Andrey V. Elsukov <bu7cher yandex ru>,
Jan Henrik Sylvester <me janh de>
Approved by: re (kensmith)
simplifies code and should speedup pppoe_findsession() function which is
called for every incoming packet.
Approved by: re (kensmith), glebius (mentor)
changes the units from seconds to the value of 'ticks' when swapped
in/out. ULE does not have a periodic timer that scans all threads in
the system and as such maintaining a per-second counter is difficult.
- Change computations requiring the unit in seconds to subtract ticks
and divide by hz. This does make the wraparound condition hz times
more frequent but this is still in the range of several months to
years and the adverse effects are minimal.
Approved by: re
last interface should own the address, but the current code
fumbles the handoff. This fixes that.
- move address related debugs to PCB4 and add additional ones to
help in debugging address problems.
Approved by: re@freebsd.org (K Smith)
changes the units from seconds to the value of 'ticks' when swapped
in/out. ULE does not have a periodic timer that scans all threads in
the system and as such maintaining a per-second counter is difficult.
- Change computations requiring the unit in seconds to subtract ticks
and divide by hz. This does make the wraparound condition hz times
more frequent but this is still in the range of several months to
years and the adverse effects are minimal.
Approved by: re
In particular:
- smp_tlb_mtx is no longer used, so it is axed.
- smp rendezvous lock isn't really a leaf spin-mutex. Its bad placement in
the table, however, has been the source of a false positive LOR reporting
with the dt_lock. However, smp rendezvous lock would have had sched_lock
there for older lock, so it wasn't still a leaf lock.
- allpmaps is only used in ia32 architecture, so it is inserted in the
appropriate stub.
Addictionally:
- kse_zombie_lock is no longer present, so its definition is axed out.
- zombie_lock doesn't need to have an exported symbol, so just let's it be
declared as static.
Tested by: kris
Approved by: jeff (mentor)
Approved by: re
Together with the sys/i386/i386/trap.c rev. 1.306 it fixes the PR.
Submitted by: rdivacky
Suggested by: jhb
Sponsored by: Google Summer of Code 2007
PR: kern/77710
Approved by: re (kensmith)
- The output routine of low level console is not protected by any lock
by default.
- Increment and decrement of sc->write_in_progress are not atomic and
this may cause console hang.
- We also have many other states used by emulator that should be protected
by the lock.
- This change does not fix interspersed messages which PRINTF_BUFR_SIZE
kernel option should fix.
Approved by: re (bmah)
MFC after: 1 week
commented out until I can re-test them on all our architectures. I
had re@ approval to commit this a long time ago, but that's before we
were this close to the branch.
Approved by: re@
controller if it's sole child device has the "usb" device class.
Previously ehci(4) would think that PCI-ISA bridges on the same slot
(such as in some Intel ICHs) were "neighbors" resulting in spurious
warnings about neighbor count mismatches.
- Fix a memory leak when looking for neighbors.
MFC after: 1 week
Approved by: re (kensmith)
Tested by: phk
to become unkillable when that process is sent a termination signal. The
process will sit in waitvt looping in the kernel, and chewing up all
available CPU until the system is rebooted.
Submitted by: Jilles Tjoelker <jilles@stack.nl>
Reviewed by: bde
Approved by: re (kensmith)
MFC after: 1 week
to the node before starting the work, otherwise the node may go
away before a reference is made in ieee80211_send_mgmt.
Approved by: re (blanket wireless)
Obtained from: Atheros
3 arguments, but we had forgotten the second argument. Also make the
Linux statfs64 struct depend on the architecture because it has an
extra 4 bytes padding on amd64 compared to i386.
The three argument fix is from David Taylor, the struct statfs64
stuff is my fault. With this patch I can install i386 Linux matlab
on an amd64 machine.
Submitted by: David Taylor <davidt_at_yadt.co.uk>
Approved by: re (kensmith)
the mfi(4) LSI MegaSAS RAID card. Looking at the Linux driver for the
mpt(4) it should be 0x0062 and not 0x0060. Tested with an mfi card
of this device id.
Approved by: re (bmah)
Reviewed by: scottl
MFC after: 3 days
also involves macro changes to have a RLOCK and a WLOCK
and placing the correct version within the code.
- The INP-INFO lock is changed to a rwlock.
- When sctp_shutdown() is called on Mac OS X, the socket lock is held.
So call sctp_chunk_output with SCTP_SO_LOCKED and
not SCTP_SO_NOT_LOCKED.
- Add SCTP_IPI_ADDR_[RW]LOCK and SCTP_IPI_ADDR_[RW]UNLOCK for Mac OS X.
- u_int64_t -> uint64_t
- add missing addr unlock for error return path
Approved by: re@freebsd.org (K Smith)
was used in assembler code in such a way that no unresolved relocation
records were generated, so ld didn't flag the problem. You can see
this with an 'nm' of the kernel. There will be 'U MAXCPU' on SMP systems.
The impact of this is that the intrcount/intrnames arrays do not have
the intended amount of space reserved. This could lead to interesting
problems due to the arrays being present in the middle of kernel code.
An overflow would be rather interesting as executable code would be used
as per-cpu incrementing interrupt counters.
This fixes it for now by exporting MAXCPU to the assembler. A better fix
might be to define these data structures in C - they're only referenced
in the kernel from C code these days anyway.
Approved by: re (kensmith)
o add driver callback to handle notification of beacon changes;
this is required for devices that manage beacon frames themselves
(devices must override the default handler which does nothing)
o move beacon update-related flags from ieee80211com to the beacon
offsets storage (or handle however a driver wants)
o expand beacon offsets structure with members needed for 11h/dfs
and appie's
o change calling convention for ieee80211_beacon_alloc and
ieee80211_beacon_update
o add overlapping bss support for 11g; requires driver to pass
beacon frames from overlapping bss up to net80211 which is not
presently done by any driver
o move HT beacon contents update to a routine in the HT code area
Reviewed by: avatar, thompsa, sephe
Approved by: re (blanket wireless)
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.
Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)
does not have a rate table in older hal's so if we scan such a
channel the driver will hit an assertion or crash; for old hal's
fallback to using the static turbo rate table for this mode
(not correct but good enough for now given none of the rate
control algorithms understand how to switch between base+boost)
Approved by: re (blanket wireless)
mode works properly, previously the hostap channel could not be changed off #3.
Fix an ifp/sc misuse while I am here.
Reported by: many
Approved by: re (bmah)
- Fix panic from mutex unlock on freed lock when ASCONF-ACK
aborts an assoc
- Fix panic from addr lock recursion when ASCONFs are queued
in the front states
- ASCONFs "queued" in the front states should really be
bundled after the COOKIE-ACK, not in front of it
- Fix issue with addresses deleted in the front states from
being sent with ASCONF(DELETE)-- replaced
sctp_asconf_queue_add_sa() with delete specific function
- Comment change in sctp.h the drafts are now RFC's
Approved by: re@freebsd.org (B Mah)
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren. They should
be. This changes fixes that.
It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc(). I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago. This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().
Approved by: re (kensmith)
status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have
already been recycled, i.e., freed and reallocated to a new purpose;
thus, asserting that such pages cannot be written is inappropriate.
Reported by: kris
Submitted by: tegge
Approved by: re (kensmith)
MFC after: 1 week
change interrupt if the link is established with link parter. However
interrupt handler didn't acknowledge the interrupt if nfe(4) was not
running at the time of interrupt delivery. This caused endless
interrupt generation. Fix the bug by acknowledging the interrupt
regardless of running state of the driver.
PR: kern/116295
Submitted by: Mark Derbyshire (mark At taom dot com)
Approved by: re (kensmith)
per-primitive macros like MTX_NOPROFILE, SX_NOPROFILE or RW_NOPROFILE) is
not really honoured. In particular lock_profile_obtain_lock_failure() and
lock_profile_obtain_lock_success() are naked respect this flag.
The bug leads to locks marked with no-profiling to be profiled as well.
In the case of the clock_lock, used by the timer i8254 this leads to
unpredictable behaviour both on amd64 and ia32 (double faults panic,
sudden reboots, etc.). The amd64 clock_lock is also not marked as
not profilable as it should be.
Fix these bugs adding proper checks in the lock profiling code and at
clock_lock initialization time.
i8254 bug pointed out by: kris
Tested by: matteo, Giuseppe Cocomazzi <sbudella at libero dot it>
Approved by: jeff (mentor)
Approved by: re
incorrect and should be OFF letting IP fragment
large cookie-echos.
- Rename sysctl variable logging to log_level.
- Fix description of sysctl variable stats.
- Add sysctl variable log to make sctp_log readable via sysctl
mechanism (this is by compile switch and targets non KTR platforms or
when someone wants to do performance wise tracing).
- Removed debug code
Approved by: re@freebsd.org (B Mah)
stream (using EEOR mode). Changed to EINVAL (in sctp_output.c)
- Static analysis comments added
- fix in mobility code to return a value (static analysis found).
- sctp6_notify function made visible instead of
static (this is needed for Panda).
Approved by: re@freebsd.org (B Mah)
races for some struct thread members.
More specifically, this bug seems responsible for some memory dumping
problems people were experiencing.
Fix this adding correct thread locking.
Tested by: rwatson
Submitted by: tegge
Approved by: jeff
Approved by: re
matches the BPF registers (which are the only thing that is assigned
to/from BPF memory). This is a pedantic change that shouldn't change
any behaviour.
PR: 115931
Submitted by: Matthew Luckie <mjl@luckie.org.nz>
Approved by: re (bmah)
MFC after: 3 weeks
Fix a few while (!uart_getreg() & SR1_TNF) when
while (!(uart_getreg() & SR18TNF)) was really meant.
This driver should die anyway, it's awful, and uart_ns8250 should be fine
for the StrongArm 1110. I'll kill it later.
Submitted by: Mikhael Skvorts
Approved by: re (blanket)
prevents insmntque() from placing reallocated syncer vnode on mount
list, that causes panic in vfs_allocate_syncvnode().
Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is
not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is
set and cleared simultaneously with MNTK_UNMOUNT, except on umount error
path, where it is cleaned just before the syncer vnode is going to be
allocated.
Reported by: Peter Jeremy <peterjeremy optushome com au>
Suggested by: tegge
Approved by: re (rwatson)
other changes too).
(without any real order)
1. Use device_get_nameunit for mutex naming
2. Add timer for low-latency playback
3. Move most mixer controls from sysctls to mixer(8) controls.
This is a largest part of this patch.
4. Add analog/digital switch (as a temporary sysctl)
5. Get back support for low-bitrate playback (with help of (2))
6. Change locking for exclusive I/O. Writing to non-PTR register
is almost safe and does not need to be ordered with PTR operations.
7. Disable MIDI until we get it to detach properly and fix memory
managment problems.
8. Enable multichannel playback by default. It is as stable as
single-channel mode. Multichannel recording is still an
experimental feature.
9. Multichannel options can be changed by loader tunables.
10. Add a way to disable card from a loader tunable.
11. Add new PCI IDs.
12. Debugger settings are loader tunables now.
14. Remove some unused variables.
15. Mark pcm sub-devices MPSAFE.
16. Partially revert (bus_setup_intr -> snd_setup_intr) since it need
to be done independently
Submitted by: Yuriy Tsibizov (driver maintainer)
Approved by: re (bmah)
The control input routine passes a NULL as its void argument when it
has reached the innermost header, which terminates the loop.
Reported by: Pawel Worach <pawel.worach@gmail.com>
Approved by: re
topology foo functions.
Working at the patch for topology problems in ia32/amd64 evicted some
problems regarding functions ordering in the SI_SUB_CPU family of
SYSINIT'ed subsystems.
In order to avoid problems with new modified to involved functions, a
correct ordering is not semantically specified for SI_SUB_CPU functions
(for a larger view of the issue please visit:
http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html )
Discussed with: peter
Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org>
Approved by: jeff
Approved by: re
- duplicate #define in header, thanks to Kevin Lo for pointing out.
- incorrect BUSMASTER enable logic, thanks Patrick Oeschger
- 82543 fails due to bogus IO BAR logic
- Allow 82571 to use MSI interrupts
- Checksum Offload for UDP not working on 82575
Approved by:re
value, so we don't run out of KVA. The default vnodes limit fits better for
UFS, but ZFS allocated more file system specific memory for a vnode than UFS.
Don't touch vnodes limit if we detect it was tuned by system administrator
and restore original value when ZFS is unloaded.
This isn't final fix, but before we implement something better, this will
help to stabilize ZFS under heavy load on i386.
Approved by: re (bmah)
code comes from.
- Fix a LOR on Mac OS X: Do not hold an stcb lock when
calling soisconnected for a socket which has the
SS_INCOMP bit set on so_state.
- fix a comment to be non c++ style.
Approved by: re@freebsd.org (B Mah)
jumping to dropunlock to avoid a panic. While here move the calls to
ipsec4_in_reject() and ipsec6_in_reject() so they are after we obtain
the lock on inp.
Original patch to avoid panic: pjd
Review of locking adjustments: gnn, sam
Approved by: re (rwatson)
- Resort includes a bit.
- Correct typos and wording problems in comments.
- Rename udpcksum to udp_cksum to be consistent with other UDP-related
configuration variables.
- Remove indirection of udp_notify through local notify variable in
udp_ctlinput(), which is presumably due to copying and pasting from TCP,
where multiple notify routines exist.
Approved by: re (kensmith)
is given (with newfs or tunefs) and dirsize overflows.
In case dirsize is <= 0 because of an overflow set maxcontigdirs
to 0 so it will be 1 later. This is what would happen for large
fs_avgfilesize. [1]
Identified with help from: roberto, pjd
Submitted by: pjd [1]
Approved by: re (rwatson)
MFC after: 8 days
ioctl().
Note that other information provided by ifconfig(8) such like "list chan"
or "list ap" are still not available at this moment.
Before an(4) is connected to wlan(4), users are encouraged to use
ancontrol(8) to retrieve aforementioned information.
Reported by: dhw (http://lists.freebsd.org/pipermail/freebsd-current/2007-July/074848.html)
Reviewed by: ambrisko
Tested by: dhw
Approved by: re (bmah)
flags, the absense of these flags causes problems in other areas such as
bridging which expect them to be correct.
At the moment only Ethernet DLTs are checked.
Reviewed by: bms, csjp, sam
Approved by: re (bmah)
point to mac_check_vnode_unlink(), reflecting UNIX naming conventions.
This is the first of several commits to synchronize the MAC Framework
in FreeBSD 7.0 with the MAC Framework as it will appear in Mac OS X
Leopard.
Reveiwed by: csjp, Samy Bahra <sbahra at gwu dot edu>
Submitted by: Jacques Vidrine <nectar at apple dot com>
Obtained from: Apple Computer, Inc.
Sponsored by: SPARTA, SPAWAR
Approved by: re (bmah)
- fix the use after free seen when sending packets small enough to fit as an immediate
and bpf peers are present
- update to firmware rev 4.7 along with various small vendor fixes
Supported by: Chelsio
Approved by: re (blanket)
MFC after: 3 days
the recent send code, but uio may be NULL on sendfile
calls. Change to use sndlen variable.
- EMSGSIZE is not being returned in non-blocking mode
and needs a small tweak to look if the msg would
ever fit when returning EWOULDBLOCK.
- FWD-TSN has a bug in stream processing which could
cause a panic. This is a follow on to the codenomicon
fix.
- PDAPI level 1 and 2 do not work unless the reader
gets his returned buffer full. Fix so we can break
out when at level 1 or 2.
- Fix fast-handoff features to copy across properly on
accepted sockets
- Fix sctp_peeloff() system call when no true system call
exists to screen arguments for errors. In cases where a
real system call exists the system call itself does this.
- Fix raddr leak in recent add-ip code change for bundled
asconfs (even when non-bundled asconfs are received)
- Make sure ipi_addr lock is held when walking global addr
list. Need to change this lock type to a rwlock().
- Add don't wake flag on both input and output when the
socket is closing.
- When deleting an address verify the interface is correct
before allowing the delete to process. This protects panda
and unnumbered.
- Clean up old sysctl stuff and get rid of the old Open/Net
BSD structures.
- Add a function to watch the ranges in the sysctl sets.
- When appending in the reassembly queue, validate that
the assoc has not gone to about to be freed. If so
(in the middle) abort out. Note this especially effects
MAC I think due to the lock/unlock they do (or with
LOCK testing in place).
- Netstat patch to get rid of warnings.
- Make sure that no data gets queued to inactive/unconfirmed
destinations. This especially effect CMT but also makes a
impact on regular SCTP as well.
- During init collision when we detect seq number out
of sync we need to treat it like Case C and discard
the cookie (no invarient needed here).
- Atomic access to the random store.
- When we declare a vtag good, we need to shove it
into the time wait hash to prevent further use. When
the tag is put into the assoc hash, we need to remove it
from the twait hash (where it will surely be). This prevents
duplicate tag assignments.
- Move decr-ref count to better protect sysctl out of
data.
- ltrace error corrections in sctp6_usrreq.c
- Add hook for interface up/down to be sent to us.
- Make sysctl() exported structures independent of processor
architecture.
- Fix route and src addr cache clearing for delete address case.
- Make sure address marked SCTP_DEL_IP_ADDRESS is never selected
as src addr.
- in icmp handling fixed so we actually look at the icmp codes
to figure out what to do.
- Modified mobility code.
Reception of DELETE IP ADDRESS for a primary destination and
SET PRIMARY for a new primary destination is used for
retransmission trigger to the new primary destination.
Also, in this case, destination of chunks in send_queue are
changed to the new primary destination.
- Fix so that we disallow sending by mbuf to ever have EEOR
mode set upon it.
Approved by: re@freebsd.org (B Mah)
additional flags to many function calls. The flags only
get used in BSD when we compile with lock testing. These
flags allow apple to escape the "giant" lock it holds on
the socket and have more fine-grained locking in the NKE.
It also allows us to test (with witness) the locking used
by apple via a compile switch (manually applied).
Approved by: re@freebsd.org(B Mah)
- Fix copyrights, comments in UDPv6.
- Remove macro defines for in6pcb and udp6stat.
- Consistently refer to inpcbs as 'inp' and not also 'in6p'.
Reviewed by: gnn, jinmei, bz
Approved by: re (bmah)
TCP timers as a single timer, but retain the API changes necessary to
reintroduce this change. This will back out the source of at least two
reported problems: lock leaks in certain timer edge cases, and TCP timers
continuing to fire after a connection has closed (a bug previously fixed and
then reintroduced with the timer rewrite).
In a follow-up commit, some minor restylings and comment changes performed
after the TCP timer rewrite will be reapplied, and a further change to allow
the TCP timer rewrite to be added back without disturbing the ABI. The new
design is believed to be a good thing, but the outstanding issues are
leading to significant stability/correctness problems that are holding
up 7.0.
This patch was generated by silby, but is being committed by proxy due to
poor network connectivity for silby this week.
Approved by: re (kensmith)
Submitted by: silby
Tested by: rwatson, kris
Problems reported by: peter, kris, others
that can lead to a panic when the stick is yanked.
- make sure that zyd_attach() returns 0 or errno.
Submitted by: Weongyo Jeong <weongyo.jeong@gmail.com>
Reported by: Ted Lindgreen <ted@tednet.nl>
Reviewed by: sam
Approved by: re (blanket wireless)
with the INTR_FILTER-enabled MI code. Basically this consists of
registering an interrupt controller (of which there can be multiple
and optionally different ones either per host-to-foo bridge or shared
amongst host-to-foo bridges in any one machine) along with an interrupt
vector as specific argument for all the interrupt vectors used by a
given host-to-foo bridge (roughly similar to registering interrupt
sources on amd64 and i386), providing functions to enable, clear and
disable the interrupts of the children beneath the bridge.
This also includes:
- No longer entering a critical section in tl0_intr() and tl1_intr()
for executing interrupt handlers but rather let the handlers enter
it themselves so in the case of intr_event_handle() we don't enter
a nested critical section.
- Adding infrastructure for binding delivery of interrupt vectors to
specific CPUs which later on can be interfaced with the code from
amd64/i386 for binding interrupts to specific CPUs.
- Getting rid of the wrapper hack introduced along the lines of the
API changes for INTR_FILTER which as a side-effect caused interrupts
associated with ithread handlers only to get the elevated priority
of those associated with filters ("fast handlers") (this removes the
hack also in the non-INTR_FILTER case).
- Disabling (by not clearing) an interrupt in the interrupt controller
until all associated handlers have been executed, which is crucial
for the typical locking strategy of NIC drivers in order to work
correctly in case of shared interrupts. This was a more or less
theoretical problem on sparc64 though, as shared interrupts are
rather uncommon there except for the on-board SCCs and UARTs.
Note that due to the behavior of at least of some of the interrupt
controllers used on sparc64 an enable+EOI instead of a disable+EOI
approach (as implied by the INTR_FILTER MI code and implemented on
other architectures) is used as the latter can cause lost interrupts
or in the worst case interrupt starvation.
o Correct a typo in sbus_alloc_resource() which caused (pass-through)
allocations to only work down to the grandchildren of the bus, which
wasn't a real problem so far as we don't support any devices which are
great-grandchildren or greater of a U2S bridge, yet.
o In fhc(4) use bus_{read,write}_4() instead of bus_space_{read,write}_4()
in order to get rid of sc_bh and sc_bt in the fhc_softc. Also get rid
of some other unneeded members in fhc_softc.
Reviewed by: marcel (earlier version)
Approved by: re (kensmith)
o reset ni_inact when ni_inact_reload is changed so we're
assured a valid setting
o never let ni_inact go negative
o add a knob to disable hostap sta idle handling (e.g. so it can be done
by a user application)
o remove bogus reload on associate
Reviewed by: avatar
Approved by: re (blanket wireless)
o update ic_lastdata to reflect time of last outbound frame
o outbound traffic must preempt/cancel bg scanning to avoid delays
This stuff was somehow missed in the initial import.
Reviewed by: thompsa, avatar, sephe (earlier version)
Approved by: re (blanket wireless)
o add ic_extieee to hold the HT40 extension channel number
o add ic_state to track dynamic channel state for DFS
o add flags to mark regulatory channel requirements
o add state defs for DFS/radar support
Reviewed by: avatar
Approved by: re (blanket wireless)
o update 11n definitions to D2.0 spec
o add IEEE80211_CAPINFO_SPECTRUM_MGMT for DFS support
o add CSA ie definition for DFS support
o purge some unused definitions
o correct 802.11 reason and status codes
o correct reason code returned when a sta tries to associate to an
ap operating with WPA/RSN but without a WPA/RSN ie
Reviewed by: thompsa, avatar
Approved by: re (blanket wireless)
device and have had the crypto bits stripped from the 802.11 header
o strip mbuf flags in the rx path before passing up the stack
Reviewed by: thompsa, sephe, avatar
Approved by: re (blanket wireless)
The first drop was Beta, this code is expected to be the release version.
Note that this driver code will build in either 6.2 or 7. If you
use the code in 6.2 you will not get TSO or MSI/X support but it will
function in a legacy mode.
Approved by: re
used, rather than the one passed via 'req', which may not reflect a
rewrite. This call to useracc() is redundant to validation performed by
later copyin()/copyout() calls, so there isn't a security issue here,
but this could technically lead to excessive validation of addresses if
the length in newlen is shorter than req.newlen.
Approved by: re (kensmith)
Reviewed by: jhb
Submitted by: Constantine A. Murenin <cnst+freebsd@bugmail.mojo.ru>
Sponsored by: Google Summer of Code 2007
When any PnP device exists, isa_release_resource() is called with no
activated resource. So a bushandle is not allocated yet.
Approved by: re (kensmith)
can easily block in bread(), and then there was nothing to prevent the
static buffer (nambuf_{ptr,len,last_id}) being clobbered by another
thread.
The effects of the bug seem to have been limited to failed lookups and
mangled names in readdir(), since Giant locking provides enough
serialization to prevent concurrent calls to the functions that access
the buffer. They were very obvious for multiple concurrent tree walks,
especially with a small cluster size.
The bug was introduced in msdosfs_conv.c 1.34 and associated changes,
and is in all releases starting with 5.2.
The fix is to allocate the buffer as a local variable and pass around
pointers to it like "_r" functions in libc do. Stack use from this
is large but not too large. This also fixes a memory leak on module
unload.
Reviewed by: kib
Approved by: re (kensmith)
the callout_lock spin lock and the sleepqueue spin locks. In the fix,
callout_drain() has to drop the callout_lock so it can acquire the
sleepqueue lock. The state of the callout can change while the
callout_lock is held however (for example, it can be rescheduled via
callout_reset()). The previous code assumed that the only state change
that could happen is that the callout could finish executing. This change
alters callout_drain() to effectively restart and recheck everything
after it acquires the sleepqueue lock thus handling all the possible
states that the callout could be in after any changes while callout_lock
was dropped.
Approved by: re (kensmith)
Tested by: kris
needed at least to convince the BIOS to give us access to CPU freq
control on MacBooks.
Submitted by: Rui Paulo <rpaulo / fnop.net>
Approved by: re
MFC after: 5 days
active in failover mode rather than all interfaces with a link. This makes it
clear if the master interface is in use or one of the backup links.
Found by: Writing the Handbook section
Approved by: re (kensmith)
ktruserret() is invoked, an unlocked check of the per-process queue
is performed inline, thus, we don't lock the ktrace_sx on every userret().
Pointy hat to: jhb
Approved by: re (kensmith)
Pointy hat recovered from: rwatson
ZD1211/ZD1211B USB IEEE 802.11b/g wireless network devices. Not (yet)
connected to the build process (next batch of commits once I've looped
the current back back).
Submitted by: Weongyo Jeong
Reviewed by: sam@
Approved by: re@
import. The PF mbuf-tagging support routines changed to link the
allocated tags into the provided mbuf themselves, so the left-over
m_tag_prepend() was trying to add a bogus (usually NULL) tag.
Reviewed by: mlaier
Approved by: re
64-bit counters) to a 4.x statfs structure (with long-sized counters).
- For block counters, we scale up the block size sufficiently large so
that the resulting block counts fit into a the long-sized (long for the
ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs
VOP did this already. This can lie about the block size to 4.x binaries,
but it presents a more accurate picture of the ratios of free and
available space.
- For non-block counters, fix the freebsd32 stats converter to cap the
values at INT32_MAX rather than losing the upper 32-bits to match the
behavior of the 4.x statfs conversion routine in vfs_syscalls.c
Approved by: re (kensmith)
with Linux 2.6 emulation. This shall be reimplemented once FreeBSD gets
native scheduler affinity syscalls.
Submitted by: rdivacky
Reviewed by: jkim
Sponsored by: Google Summer of Code 2007
Approved by: re (kensmith)
Both WWNN and WWPN are 64-bit unsigned integers and they are prefixed
with "0x", which requires two more bytes each.
Submitted by: Danny Braniss (danny at cs dot huji dot ac dot il)
via Matthew Jacob (lydianconcepts at gmail dot com)
Approved by: re (bmah)
MFC after: 3 days
the last message on the send stream was "null" but still
there, a state we allow, we could get hung and not clean
it up and wait for the shutdown guard timer to clear the
association without a graceful close. Fix this so that
that we properly clean up.
- Added support for Multiple ASCONF per new RFC. We only
(so far) accept input of these and cannot yet generate
a multi-asconf.
- Sysctl'd support for experimental Fast Handover feature. Always
disabled unless sysctl or socket option changes to enable.
- Error case in add-ip where the peer supports AUTH and ADD-IP
but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to
ABORT in this case.
- According to the Kyoto summit of socket api developers
(Solaris, Linux, BSD). We need to have:
o non-eeor mode messages be atomic - Fixed
o Allow implicit setup of an assoc in 1-2-1 model if
using the sctp_**() send calls - Fixed
o Get rid of HAVE_XXX declarations - Done
o add a sctp_pr_policy in hole in sndrcvinfo structure - Done
o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch!
- Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize
when we close sending out the data and disabling Nagle.
- Change key concatenation order to match the auth RFC
- When sending OOTB shutdown_complete always do csum.
- Don't send PKT-DROP to a PKT-DROP
- For abort chunks just always checksums same for
shutdown-complete.
- inpcb_free front state had a bug where in queue
data could wedge an assoc. We need to just abandon
ones in front states (free_assoc).
- If a peer sends us a 64k abort, we would try to
assemble a response packet which may be larger than
64k. This then would be dropped by IP. Instead make
a "minimum" size for us 64k-2k (we want at least
2k for our initack). If we receive such an init
discard it early without all the processing.
- When we peel off we must increment the tcb ref count
to keep it from being freed from underneath us.
- handling fwd-tsn had bugs that caused memory overwrites
when given faulty data, fixed so can't happen and we
also stop at the first bad stream no.
- Fixed so comm-up generates the adaption indication.
- peeloff did not get the hmac params copied.
- fix it so we lock the addr list when doing src-addr selection
(in future we need to use a multi-reader/one writer lock here)
- During lowlevel output, we could end up with a _l_addr set
to null if the iterator is calling the output routine. This
means we would possibly crash when we gather the MTU info.
Fix so we only do the gather where we have a src address
cached.
- we need to be sure to set abort flag on conn state when
we receive an abort.
- peeloff could leak a socket. Moved code so the close will
find the socket if the peeloff fails (uipc_syscalls.c)
Approved by: re@freebsd.org(Ken Smith)
pack a set number correctly.
Submitted by: oleg
o Plug a memory leak.
Submitted by: oleg and Andrey V. Elsukov
Approved by: re (kensmith)
MFC after: 1 week
- remove cpl->iff panic - we can't know the port number from the rspq on the 4-port
- pick the ifnet based on the interface in the CPL header
- switch to using qset 0 for egress on the 4-port for now - may change
when we start using RSS
- move ether_ifdetach to before the port lock gets deinitialized to avoid
hang in the case where there are BPF peers (cxgb_ioctl is called indirectly
when BPF peers are present)
- don't call t3_mac_reset if multiport is set, this was causing tx errors
by misconfiguring the MAC on the 4-port
- change V_TXPKT_INTF to use txpkt_intf as the interfaces are not contiguous
- free the mbuf immediately in the case where the payload is small enough to be copied
into the rspq
- only update the coalesce timer if for a queue if packets were taken off of it
- add in missed 20ms DELAY in initializaton vsc8211
- prompt MFC as this only applies to the 4-port which is currently completely
broken - OK'd by kensmith
Supported by: Chelsio
Approved by: re (blanket)
MFC after: 0 days
when peer acks the add in case the routing table changes.
- Fix sctp_lower_sosend to send shutdown chunk for mbuf send
case when sndlen = 0 and sinfoflag = SCTP_EOF
- Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data,
So that it does not send the "null" data mbuf out and cause
it to get freed twice.
- Fix so auto-asconf sysctl actually effect the socket's asconf state.
- Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets.
- Memset bug in sctp_output.c (arguments were reversed) submitted
found and reported by Dave Jones (davej@codemonkey.org.uk).
- PD-API point needs to be invoked >= not just > to conform to socket api
draft this fixes sctp_indata.c in the two places need to be >=.
- move M_NOTIFICATION to use M_PROTO5.
- PEER_ADDR_PARAMS did not fail properly if you specify an address
that is not in the association with a valid assoc_id. This meant
you got or set the stcb level values instead of the destination
you thought you were going to get/set. Now validate if the
stcb is non-null and the net is NULL that the sa_family is
set and the address is unspecified otherwise return an error.
- The thread based iterator could crash if associations were freed
at the exact time it was running. rework the worker thread to
use the increment/decrement to prevent this and no longer use
the markers that the timer based iterator uses.
- Fix the memleak in sctp_add_addr_to_vrf() for the case when it is
detected that ifa is already pointing to a ifn.
- Fix it so that if someone is so insane that they drop the
send window below the minimal add mark, they still can send.
- Changed all state for associations to use mask safe macro.
- During front states in association freeing in sctp_inpcbfree, we
had a locking problem where locks were not in place where they
should have been.
- Free association calls were not testing the return value in
sctp_inpcb_free() properly... others should be cast void returns
where we don't care about the return value.
- If a reference count is held on an assoc, even from the "force free"
we should not do the actual free.. but instead let the timer
free it.
- When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED
flag is set, we must NOT process the packet but handle it like
ootb. This is because while freeing an assoc we release the
locks to get all the higher order locks so we can purge all
the hash tables. This leaves a hole if a packet comes in
just at that point. Now sctp_common_input_processing() will
call the ootb code in such a case.
- Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes
it so we don't have a conflict (I think this is a covertity change).
We made this change AFTER some conversation and looking to make sure
that M_PROTO5 does not have a problem between SCTP and the 802.11
stuff (which is the only other place its used).
- Fixed lock order reversal and missing atomic protection around
locked_tcb during association lookup and the 1-2-1 model.
- Added debug to source address selection.
- V6 output must always do checksum even for loopback.
- Remove more locks around inp that are not needed for an atomically
added/subtracted ref count.
- slight optimization in the way we zero the array in sctp_sack_check()
- It was possible to respond to a ABORT() with bad checksum with
a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT
send a PKT-DROP to any ABORT().
- Add an option for local logging (useful for macintosh or when
you need better performing during debugging). Note no commands
are here to get the log info, you must just use kgdb.
- The timer code needs to be aware of if it needs to call
sctp_sack_check() to slide the maps and adjust the cum-ack.
This is because it may be out of sync cum-ack wise.
- Added threshold managment logging.
- If the user picked just the right size, that just filled the send
window minus one mtu, we would enter a forever loop not copying and
at the same time not blocking. Change from < to <= solves this.
- Sysctl added to control the fragment interleave level which defaults
to 1.
- My rwnd control was not being used to control the rwnd properly (we
did not add and subtract to it :-() this is now fixed so we handle
small messages (1 byte etc) better to bring our rwnd down more
slowly.
Approved by: re@freebsd.org (Bruce Mah)
ICMP error message, do not access th_flags. The field is beyond
the first eight bytes of the header that are required to be present
and were pulled up in the mbuf.
A random value of th_flags can have TH_SYN set, which made the
sequence number comparison not apply the window scaling factor,
which led to legitimate ICMP(v6) packets getting blocked with
"BAD ICMP" debug log messages (if enabled with pfctl -xm), thus
breaking PMTU discovery.
Triggering the bug requires TCP window scaling to be enabled
(sysctl net.inet.tcp.rfc1323, enabled by default) on both end-
points of the TCP connection. Large scaling factors increase
the probability of triggering the bug.
PR: kern/115413: [ipv6] ipv6 pmtu not working
Tested by: Jacek Zapala
Reviewed by: mlaier
Approved by: re (kensmith)
on an down mxge interface
- Fix a bug where mxge reported the link state as
active when it wasn't (after ifconfig down).
- Prevent spurious watchdog resets when link partner is not consuming
- Add support for CX4 and popular XFP media detection
- Update the firmware and associated header files to 1.4.25
Approved by: re (kensmith)
longer create a pv entry for that mapping. (The two exceptions are
mappings into the kernel's exec and pipe submaps.) Consequently, there is
no reason for get_pv_entry() to dig deep into the free page queues, i.e.,
use VM_ALLOC_SYSTEM, by default. This revision changes get_pv_entry() to
use VM_ALLOC_NORMAL by default, i.e., before calling pmap_collect() to
reclaim pv entries.
Approved by: re (kensmith)
ENOENT if the option wasn't provided, instead of setting it to 0.
xfs however didn't catch up on this, so it assumed something went bad if
vfs_getopts() sets the error to non-zero, and just returns the error.
Unbreak xfs mount by just ignoring the error if vfs_getopts() sets the
error to ENOENT, as we should have sane defaults.
Reviewed by: kan
Approved by: re (rwatson)
Tested by: rpaulo
For this, introduce vm_map_fixed() that does that for MAP_FIXED case.
Dropping the lock allowed for parallel thread to occupy the freed space.
Reported by: Tijl Coosemans <tijl ulyssis org>
Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks
aio_proc_rundown.
Do not allow for zero-length read to be passed to the fo_read file method
by aio.
Reported and tested by: Peter Holm
Approved by: re (kensmith)
of the bits we want to ignore on the first pass rather than doing a
linear scan. This puts us within a few instructions of the cost of
runq_findbit() and removes this function from the top of profiling output
for context switch heavy workloads.
Approved by: re
on 2cpu machines by reducing it to 1 by default. This improves loaded
operation on 8cpu machines by increasing it to 3 where the extra idle
time is not as critical.
Approved by: re
have caused a hang, but we got lucky with the available multi-CPU states
on actual hardware.
Submitted by: Bjorn Koenig <bkoenig / alpha-tierchen.de>
Approved by: re
MFC after: 3 days
sys/vm/device_pager.c:
Protect the creation of the phys pager with non-NULL handle with the
phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now
synchronized with its removal from the list, and phys_pager_mtx is put
before vm object lock in lock order. Dispose the phys_pager_alloc_lock
and tsleep calls, together with acquiring Giant, since phys_pager_mtx
now covers the same block.
Reviewed by: alc
Approved by: re (kensmith)
Without it some errors may left unnoticed and unhandeled
that will lead to hooks left in half-connected state.
Reviewed by: julian@
Approved by: re (kensmith), glebius (mentor)
- Remove unneeded WLOCK/UNLOCK of inp for getting TCB lock.
- Fix panic that may occur when freeing an assoc that has partial
delivery in progress (may dereference null socket pointer when
queuing partial delivery aborted notification)
- Some spacing and comment fixes.
- Fix address add handling to clear cached routes and source addresses
when peer acks the add in case the routing table changes.
Approved by: re@freebsd.org (Bruce Mah)
projected_offset against isn_offset to account for
wrap around.
Reviewed by: gnn, kmacy, silby
Submitted by: yusheng.huang@bluecoat.com
Approved by: re
MFC: 3 days
and newer CPUs (including Core 2 and Core / Core 2 based Xeons). The
driver attaches to each cpu device and creates a sysctl node in that
device's sysctl context (dev.cpu.N.temperature). When invoked, the
handler binds to the appropriate CPU to ensure a correct reading.
Submitted by: Rui Paulo <rpaulo@fnop.net>
Sponsored by: Google Summer of Code 2007
Tested by: des, marcus, Constantine A. Murenin, Ian FREISLICH
Approved by: re (kensmith)
MFC after: 3 weeks
% mount | grep home
/dev/ad4s1e on /home (ufs, local, noatime, soft-updates)
% mount -u -o atime /home
% mount | grep home
/dev/ad4s1e on /home (ufs, local, soft-updates)
Restore this behavior for on 7.x for the following mount options:
noatime, noclusterr, noclusterw, noexec, nosuid, nosymfollow
In addition, on 7.x, the following are equivalent:
mount -u -o atime /home
mount -u -o nonoatime /home
Ideally, when we introduce new mount options, we should avoid
options starting with "no". :)
Requested by: jhb
Reported by: Karol Kwiat <karol.kwiat gmail com>, Scott Hetzel <swhetzel gmail com>
Approved by: re (bmah)
Proxy commit for: rodrigc
Without this the PHY wouldn't work as expected. This should fix
dual-boot Windows XP machine where RealTek Windows drivers put the
PHY in power down mode during shutdown. The magic PHY register
accesses come from RealTek driver. No datasheets mention the magic
PHY registers.
In general, the PHY wakeup code should go into PHY driver. However it
seems that it only apply to RTL8169S single chip and it would be
another hack if we have rgephy(4) check what parent driver/chip model
is attached.
Reported by: lofi, Laurens Timmermans ( laurens AT timkapel DOT nl )
Tested by: lofi
Obtained from: RealTek FreeBSD driver
Approved by: re (Ken Smith)
- Don't leak the config lock if detach() fails due to the controller char
dev being open.
- Close a race between detach() and a process opening the controller char
dev.
MFC after: 1 week
Approved by: re (bmah)
Fix a resource allocation bug (explained by jhb on -acpi)
Thanks for Mike Tancsa for testing and helping track down the bug.
Approved by: re (kensmith)
MFC after: 3 weeks
detailed status on each of the backing subdisks. This allows userland
to see which subdisks are online, failed, missing, or a hot spare.
MFC after: 1 week
Approved by: re (bmah)
Reviewed by: sos
differ in their details with calls to a new function, ehci_hcreset(),
that performs the reset.
The original sequences either had no delay or a 1ms delay between
telling the controller to stop and asserting the controller reset
bit. One instance of the original reset sequence waited for the
controller to indicate that its reset was complete before continuing,
but the other two immediately let the subsequent code execute. The
latter is a problem on some hardware, because a read of the HCCPARAMS
register returns an incorrect value while the reset is in progress,
which triggers an infinite loop in ehci_pci_givecontroller(), which
hangs the system on shutdown.
The reset sequence in ehci_hcreset() starts with the most complete
instance from the original code, which contains a loop to wait for
the controller to indicate that its reset is complete. This appears
to be the correct thing to do according to "Enhanced Host Controller
Interface Specification for Universal Serial Bus" revision 1.0,
section 2.3.1. Add another loop to wait for the controller to
indicate that it has stopped before setting the HCRESET bit. This
is required by the section 2.3.1 in the specification, which says
that setting HCRESET before the controller has halted "will result
in undefined behaviour".
Reviewed by: imp (previous patch version without the extra wait loop)
Tested by: se (previous patch version without the extra wait loop)
Approved by: re (bmah)
MFC after: 1 week
o Revamp the PIC I/F to only abstract the PIC hardware. The
resource handling has been moved to nexus, where it belongs.
o Include EOI and MASK+EOI methods to the PIC I/F in support of
INTR_FILTER.
o With the allocation of interrupt resources and setup of
interrupt handlers in the common platform code we can delay
talking to the PIC hardware after enumeration of all devices.
Introduce a call to powerpc_intr_enable() in configure_final()
to achieve that and have powerpc_setup_intr() only program the
PIC when !cold.
o As a consequence of the above, remove all early_attach() glue
from the OpenPIC and Heathrow PIC drivers and have them
register themselves when they're found during enumeration.
o Decouple the interrupt vector from the interrupt request line.
Allocate vectors increasingly so that they can be used for
the intrcnt index as well. Extend the Heathrow PIC driver to
translate between IRQ and vector. The OpenPIC driver already
has the support for vectors in hardware.
Approved by: re (blanket)
- LK_RETRY prohibits vget() and vn_lock() to return error.
Remove associated code. [1]
- Properly use vhold() and vdrop() instead of their unlocked
versions, we are guaranteed to have the vnode's interlock
unheld. [1]
- Fix a pseudo-infinite loop caused by 64/32-bit arithmetic
with the same way used in modern NetBSD versions. [2]
- Reorganize tmpfs_readdir to reduce duplicated code.
Submitted by: kib [1]
Obtained from: NetBSD [2]
Approved by: re (tmpfs blanket)
- Respect cnflag and don't lock vnode always as LK_EXCLUSIVE [1]
- Properly lock around tn_vnode to avoid NULL deference
- Be more careful handling vnodes (*)
(*) This is a WIP
[1] by pjd via howardsu
Thanks kib@ for his valuable VFS related comments.
Tested with: fsx, fstest, tmpfs regression test set
Found by: pho's stress2 suite
Approved by: re (tmpfs blanket)
cr0-4, etc. Support should be added for other platforms that have a
different set of registers for system use.
Loosely based on: OpenBSD
Approved by: re
a test that assumes that char is signed by default and causes a
warning with GCC 4.2 on PowerPC.
A patch has been sent to the maintainer that addresses this.
Approved by: re (blanket)
of device pager in the pagers list by handle is now synchronized with
its removal from the list, and dev_pager_mtx is put before vm object
lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx
now covers the same block.
Noted by: kensmith
Reviewed by: alc
Approved by: re (kensmith)
(uio_offset < 0) since this can't happen. If this happens, then the
general code handles the problem safely (better than before for reading,
returning 0 (EOF) instead of the bogus errno EINVAL, and the same as
before for writing, returning EFBIG).
In msdosfs_read(), don't check for (uio_resid < 0). msdosfs_write()
already didn't check.
In msdosfs_read(), document in a comment our assumptions that the caller
passed a valid uio_offset and uio_resid. ffs checks using KASSERT(),
and that is enough sanity checking. In the same comment, partly document
there is no need to check for the EOVERFLOW case, unlike in ffs where this
case can happen at least in theory.
In msdosfs_write(), add a comment about why the checking of
(uio_resid == 0) is explicit, unlike in ffs.
In msdosfs_write(), check for impossibly large final offsets before
checking if the file size rlimit would be exceeded, so that we don't
have an overflow bug in the rlimit check and are consistent with ffs.
We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final
offset would be impossibly large but not so large as to cause overflow.
Overflow normally gave the benign behaviour of no signal.
Approved by: re (kensmith) (blanket)
remove some parentheses; fix some whitespace errors; fix only one case of
a boolean comparison of a non-boolean).
Improve an error message by quoting ".", and by not printing large positive
values as negative ones.
Approved by: re (kensmith) (blanket)
namespace pollution in <sys/vnode.h>.
Sort the include of <sys/mutex.h> instead of unsorting it after
<sys/vnode.h> and depending on the pollution there.
Approved by: re (kensmith) (blanket)
the use of divert sockets to dead locks. A number of LORs have been reported
between divert and a number of other network subsystems including: IPSEC, Pfil,
multicast, ipfw and others. Other dead locks could occur because of recursive
entry into the IP stack. This change should take care of most if not all of
these issues.
A summary of the changes follow:
- We disallow multicast operations on divert sockets. It really doesn't make
semantic sense to allow this, since typically you would set multicast
parameters on multicast end points.
NOTE: As a part of this change, we actually dis-allow multicast options on
any socket that IS a divert socket OR IS NOT a SOCK_RAW or SOCK_DGRAM family
- We check to see if there are any socket options that have been specified on
the socket, and if there was (which is very un-common and also probably
doesnt make sense to support) we duplicate the mbuf carrying the options.
- We then drop the INP/INFO locks over the call to ip_output(). It should be
noted that since we no longer support multicast operations on divert sockets
and we have duplicated any socket options, we no longer need the reference
to the pcb to be coherent.
- Finally, we replaced the call to ip_input() to use netisr queuing. This
should remove the recursive entry into the IP stack from divert.
By dropping the locks over the call to ip_output() we eliminate all the lock
ordering issues above. By switching over to netisr on the inbound path,
we can no longer recursively enter the ip_input() code via divert.
I have tested this change by using the following command:
ipfwpcap -r 8000 - | tcpdump -r - -nn -v
This should exercise the input and re-injection (outbound) path, which is
very similar to the work load performed by natd(8). Additionally, I have
run some ospf daemons which have a heavy reliance on raw sockets and
multicast.
Approved by: re@ (kensmith)
MFC after: 1 month
LOR: 163
LOR: 181
LOR: 202
LOR: 203
Discussed with: julian, andre et al (on freebsd-net)
In collaboration with: bms [1], rwatson [2]
[1] bms helped out with the multicast decisions
[2] rwatson submitted the original netisr patches and came up with some
of the original ideas on how to combat this issue.
for bakeoff.. using the next sequential ones)
- In cookie processing 1-2-1, we did not increment the stcb
refcnt before releasing the tcb lock. We need to do this
to keep the tcb from being freed by a abort or ?? unlikely
but worth doing. Also get rid of unneed INP_WLOCK.
- extra receive info included the rcvinfo which killed the
padding/alignment. We now redefine all the fields properly
so they both align properly both to 128 bytes.
- A peeled off socket would not close without an error due to
its misguided idea that sctp_disconnect() was not supported
on it. This fixes it so it goes through the proper path.
- When an assoc was being deleted after abort (via a timer) a
small race condition exists where we might take a packet for
the old assoc (since we are waiting for a cleanup timer). This
state especially happens in mac. We now add a state in the asoc
so these can properly handle the packet as OOTB.
Approved by: re@freebsd.org(Ken Smith)
previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.
While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.
Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)
Recently the AP in my Merced box seems to have grown a habit
of getting unexpected interrupts, such as redundant wake-ups
and legacy interrupts that require an INTA cycle.
While here, replace DELAY(0) with cpu_spinwait() so that it's
clear what we're doing as well as enable the code to take
advantage of cpu_spinwait() when it gets implemented.
Approved by: re (blanket)
There's no advantage in allowing nested external interrupts.
In fact, it leads to a potential stack overrun.
While here, put the interrupt vector in the trapframe, so as
to compensate for the 36 cycle latency of reading cr.ivr.
Further simplify assembly code by dealing with ASTs from C.
Approved by: re (blanket)
vm_object_terminate() on a device-backed object at the same time that
another processor, call it Pa, is performing dev_pager_alloc() on the
same device. The problem is that vm_pager_object_lookup() should not be
allowed to return a doomed object, i.e., an object with OBJ_DEAD set,
but it does. In detail, the unfortunate sequence of events is: Pt in
vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD
on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls
vm_pager_object_lookup(), which returns the doomed object. Next, Pa
calls vm_object_reference(), which requires the doomed object's lock, so
Pa waits for Pt to release the doomed object's lock. Pt proceeds to the
point in vm_object_terminate() where it releases the doomed object's
lock. Pa is now able to complete vm_object_reference() because it can
now complete the acquisition of the doomed object's lock. So, now the
doomed object has a reference count of one! Pa releases dev_pager_sx
and returns the doomed object from dev_pager_alloc(). Pt now acquires
dev_pager_mtx, removes the doomed object from dev_pager_object_list,
releases dev_pager_mtx, and finally calls uma_zfree with the doomed
object. However, the doomed object is still in use by Pa.
Repeating my key point, vm_pager_object_lookup() must not return a
doomed object. Moreover, the test for the object's state, i.e.,
doomed or not, and the increment of the object's reference count
should be carried out atomically.
Reviewed by: kib
Approved by: re (kensmith)
MFC after: 3 weeks
us to do the data serializations once after writing multiple
region registers, as is done in pmap_switch(). All existing
calls to ia64_set_rr() are followed with calls to ia64_srlz_d().
Approved by: re (blanket)
Also rename the related functions in a similar way.
There are no functional changes.
For a packet coming in with IPsec tunnel mode, the default is
to only call into the firewall with the "outer" IP header and
payload.
With this option turned on, in addition to the "outer" parts,
the "inner" IP header and payload are passed to the
firewall too when going through ip_input() the second time.
The option was never only related to a gif(4) tunnel within
an IPsec tunnel and thus the name was very misleading.
Discussed at: BSDCan 2007
Best new name suggested by: rwatson
Reviewed by: rwatson
Approved by: re (bmah)
sector, instead of failing the whole mount if it is garbage. Fields
in the fsinfo sector are only advisory, so there are better sanity
checks than this, and we already silently fix up the only other advisory
field in the fsinfo (the free cluster count).
This wasn't handled quite right in rev.1.92, 1.117, or in NetBSD. 1.92
also failed the whole mount for the non-garbage magic value 0xffffffff
1.117 fixed this well enough in practice since garbage values shouldn't
occur in practice, but left the error handling larger and more convoluted
than necessary. Now we handle the magic value as a special case of
fixing up all out of bounds values.
Also fix up the estimated next free cluster number when there is no
fsinfo sector. We were using 0, but CLUST_FIRST is safer.
Approved by: re (kensmith)
instead of per IOMMU, so we no longer need to program all of them
identically in systems having multiple IOMMUs. This continues the
rototilling of the nexus(4) done about 5 months ago, which amongst
others changed nexus(4) and the drivers for host-to-foo bridges
to provide bus_get_dma_tag methods, allowing to handle DMA tags in
a hierarchical way and to link them with devices.
This still doesn't move the silicon bug workarounds for Sabre (and
in the uncommitted schizo(4) for Tomatillo) bridges into special
bus_dma_tag_create() and bus_dmamap_sync() methods though, as w/o
fully newbus'ified bus_dma_tag_create() and bus_dma_tag_destroy()
this still requires too much hackery, i.e. per-child parent DMA
tags in the parent driver.
- Let the host-to-foo drivers supply the maximum physical address
of the IOMMU accompanying the bridges. Previously iommu(4) hard-
coded an upper limit of 16GB, which actually only applies to the
IOMMUs of the Hummingbird and Sabre bridges. The Psycho variants
as well as the U2S in fact can can translate to up to 2TB, i.e.
translate to 41-bit physical addresses. According to the recently
available Tomatillo documentation these bridges even translate to
43-bit physical addresses and hints at the Schizo bridges doing
43 bits as well.
This fixes the issue the FreeBSD 6.0 todo list item "Max RAM on
sparc64" was refering to and pretty much obsoletes the lack of
support for bounce buffers on sparc64.
Thanks to Nathan Whitehorn for pointing me at the Tomatillo manual.
Approved by: re (kensmith)
requiring DC_TX_ALIGN or DC_TX_COALESCE, which was previously done
in dc_start_locked(), into dc_encap().
o In dc_encap():
- If m_defrag() fails just drop the packet like other NIC drivers
do. This should only happen when there's a mbuf shortage, in which
case it was possible to end up with an IFQ full of packets which
couldn't be processed as they couldn't be defragmented as they
were taking up all the mbufs themselves. This includes adjusting
dc_start_locked() to not trying to prepend the mbuf (chain) if
dc_encap() has freed it.
- Likewise, if bus_dmamap_load_mbuf() fails as dc_dma_map_txbuf()
failed, free the mbuf possibly allocated by the above call to
m_defrag() and drop the packet.
o In dc_txeof():
- Don't clear IFF_DRV_OACTIVE unless there are at least 6 free TX
descriptors. Further down the road dc_encap() will bail if there
are only 5 or fewer free TX descriptors, causing dc_start_locked()
to abort and prepend the dequeued mbuf again so it makes no sense
to pretend we could process mbufs again when in fact we won't.
While at it replace this magic 5 with a macro DC_TX_LIST_RSVD.
- Just always assign idx to sc->dc_cdata.dc_tx_cons; it doesn't
make much sense to exclude the idx == sc->dc_cdata.dc_tx_cons
case.
o In dc_dma_map_txbuf() there's no need to set sc->dc_cdata.dc_tx_err
to error if the latter is != 0, bus_dmamap_load_mbuf() already
returns the same error value in that case anyway.
o For less overhead, convert to use bus_dmamap_load_mbuf_sg() for
loading RX buffers.
o Remove some banal and/or outdated comments.
Approved by: re (kensmith)
MFC after: 1 week
to clear RL_TDESC_VLANCTL_TAG). This fixes sending packets in the
native VLAN when running both tagged and an untagged VLAN over the
same trunk and descriptors are recycled.
Approved by: re (kensmith)
MFC after: 1 week
d_mmap methods. prep_cdevsw() already installs the shims that
acquire/drop Giant for the methods of a driver that specified the
D_NEEDGIANT flag.
Reviewed by: alc
Approved by: re (kensmith)
- If the path cost is calculated when the link is down, set a pending flag so
it is calculated again when it comes back up.
- To not use 00:00:00:00:00:00 as the bridge id, all interfaces are scanned and
the lowest number wins. All zeros is too low.
Approved by: re (rwatson)
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add instruction-serialization after writing to cr.pta.
Delay enabling interrupts until after we setup the clocks and after
we program the task priority register.
Approved by: re (blanket)
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to the region registers and
add instruction-serialization after writing to cr.pta.
Approved by: re (blanket)
ia64_cpu.h. This improves readability and consistency and aids in
auditing the code.
Add data-serialization after writing to cr.tpr.
Approved by: re (blanket)
tdq_group structure. Hyper-threaded cores won't really benefit from
seperate locks anyway.
- Seperate out the migration case from sched_switch to simplify the main
switch code. We only migrate here if called via sched_bind().
- When preempted place the preempted thread back in the same queue at
the head.
- Improve the cpu group and topology infrastructure.
Tested by: many on current@
Approved by: re
message explained why the size is 1 sector, but the code used a
size of 1 cluster.
I/o sizes larger than necessary may cause serious coherency problems
in the buffer cache. Here I think there were only minor efficiency
problems, since a too-large fsinfo buffer could only get far enough
to overlap buffers for the same vnode (the device vnode), so mappings
are coherent at the page level although not at the buffer level, and
the former is probably enough due to our limited use of the fsinfo
buffer.
Approved by: re (kensmith)
- Copy before testing a pointer. This closes a race window.
- Use msleep with the node interlock instead of tsleep.
- Do proper locking around access to tn_vpstate.
- Assert vnode VOP lock for dir_{atta,de}tach to capture
inconsistent locking.
Suggested by: kib
Submitted by: delphij
Reviewed by: Howard Su
Approved by: re (tmpfs blanket)
cpu_start_mp(). This is after we have read the cpuid registers to
calculate the hyperthreading_cpus value for the sysctl that enables or
disables hyperthread cores. Change mp_topology() to use that information
rather than trying to do it itself.
This solves the problem of ULE being incorrectly told that dual core
Athlon64 X2 or Operton cpus are hyperthreading cores. At the very least,
we now have a single piece of code to identify hyperthreading.
Obtained from: jhb
Approved by: re (kensmith)
64bit counters are needed to simplify traffic accounting and
reduce system load at the big PPP concentrators.
Approved by: re (rwatson), glebius (mentor)
Till now node's transmit path was completely unprotected
and so wasn't thread safe in multilink mode. It's receive path was
declared as WRITER as the simpliest protection method but it
reduces performance when compression or encryption enabled.
Approved by: re (rwatson), glebius (mentor)
communicate with another private port.
All unicast/broadcast/multicast layer2 traffic is blocked so it works much the
same way as using firewall rules but scales better and is generally easier as
firewall packages usually do not allow ARP blocking.
An example usage would be having a number of customers on separate vlans
bridged with a server network. All the vlans are marked private, they can all
communicate with the server network unhindered, but can not exchange any
traffic whatsoever with each other.
Approved by: re (rwatson)
be in ticks "for algorithm stability" when originally committed, it turns
out that it has a significant impact in timing out connections. When we
changed HZ from 100 to 1000, this had a big effect on reducing the time
before dropping connections.
To demonstrate, boot with kern.hz=100. ssh to a box on local ethernet
and establish a reliable round-trip-time (ie: type a few commands).
Then unplug the ethernet and press a key. Time how long it takes to
drop the connection.
The old behavior (with hz=100) caused the connection to typically drop
between 90 and 110 seconds of getting no response.
Now boot with kern.hz=1000 (default). The same test causes the ssh session
to drop after just 9-10 seconds. This is a big deal on a wifi connection.
With kern.hz=1000, change sysctl net.inet.tcp.rexmit_min from 3 to 30.
Note how it behaves the same as when HZ was 100. Also, note that when
booting with hz=100, net.inet.tcp.rexmit_min *used* to be 30.
This commit changes TCPTV_MIN to be scaled with hz. rexmit_min should
always be about 30. If you set hz to Really Slow(TM), there is a safety
feature to prevent a value of 0 being used.
This may be revised in the future, but for the time being, it restores the
old, pre-hz=1000 behavior, which is significantly less annoying.
As a workaround, to avoid rebooting or rebuilding a kernel, you can run
"sysctl net.inet.tcp.rexmit_min=30" and add "net.inet.tcp.rexmit_min=30"
to /etc/sysctl.conf. This is safe to run from 6.0 onwards.
Approved by: re (rwatson)
Reviewed by: andre, silby
that could cause panics and corruption under moderate load. Many thanks
to Matt Reimer, Tom McDonald, and the rest of the guys at VPOP.net for
their help in identifying and testing this.
Approved by: re
only USB 1.1 speeds available, but this shouldn't hurt. Now that we have
working usb support for this board, this is a natural followup.
Approved by: re (kensmith)
7 months. You must have JP6 in the 1-2 position to supply power to the
USB devices, but I've used uftdi, uplcom and umass successfully. If you
have it in 2-3, then nothing will show up. Also, if you have the FQPA
packaging for the AT91RM9200 (like the KN9202 boards have), you will get
the following message
uhub0: device problem (IOERROR), disabling port 2
due to a hardware erratum. It is safe to ignore as it is about pins that
aren't brought out on the FQPA package and aren't proeprly terminated either.
Alas, there's no register to read to tell the FQPA from the BGA versions.
Submitted by: Daan Vreeken
Approved by: re (kensmith)
revision 1.66
date: 2007/07/31 06:23:26; author: marcel; state: Exp; lines: +2 -2
Fix backward compatibility of the "old" (i.e. FreeBSD6) lseek
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.
Based on a patch from: grehan@
Approved by: re (blanket)
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.
Based on a patch from: grehan@
Approved by: re (rwatson)
errors (especially when jumbo frames are enabled or in low memory systems)
because the RX chain was corrupted when an mbuf was mapped to an unexpected
number of buffers.
- Fixed a problem that would cause kernel panics when an excessively
fragmented TX mbuf couldn't be defragmented and was released by
bce_tx_encap().
Approved by: re(hrs)
MFC after: 7 days
bucket pointer. The virtual mapping may not be present in the
translation cache. This will result in a nested TLB fault at
a place we don't handle (and don't want to handle).
o Make sure there's a stop after the rfi instruction, otherwise
its behaviour is undefined.
o Make sure we switch back to virtual addressing before doing
a rfi. Behaviour is undefined otherwise.
Approved by: re (blanket)
(INTR_FILTER). This includes:
o Save a pointer to the sapic structure and IRQ for every vector,
so that we can quickly EOI, mask and unmask the interrupt.
o Add locking to the sapic code now that we can reprogram a
sapic on multiple CPUs at the same time.
o Use u_int for the vector and IRQ. We only have 256 vectors, so
using a 64-bit type for it is rather excessive.
o Properly handle concurrent registration of a handler for the
same vector.
Since vectors have a corresponding priority, we should not map
IRQs to vectors in a linear fashion, but rather pick a vector
that has a priority in line with the interrupt type. This is left
for later. The vector/IRQ interchange has been untangled as much
as possible to make this easier.
Approved by: re (blacket)
merely lucky that the VHPT was mapped as a side-effect of
mapping the kernel, but when there's enough physical memory,
this may not at all be the case.
Approved by: re (blanket)
ports to the lagg interface.
- Use the MTU from the first interface as the lagg MTU, all extra interfaces
must be the same.
This fixes using a lagg interface for a vlan or enabling jumbo frames, etc.
Approved by: re (kensmith)
MFC After: 3 days
the fast or safe/slow method is in use. Fast remains at 1000, slow is
now at 850 (always preferred to TSC). Since the HPET has proven slower
than ACPI-fast on some systems, drop its quality to 900. In the future,
it is hoped that HPET performance will improve as it is the main
timer Intel supports. HPET may move back to 2000 in -current once RELENG_7
is branched to ensure that it gets tested.
Approved by: re
<netinet/tcp_fsm.h> is included into any compilation unit that needs
tcpstates[]. Also remove incorrect extern declarations and TCPDEBUG
conditionals. This allows kernels both with and without TCPDEBUG to
build, and unbreaks the tinderbox.
Approved by: re (rwatson)
pc98 motherboards do not provide us with the correct day of week
either. Ignore the day of week when setting the clock here too.
Approved by: re (bmah)
Requested from: nyan
MFC after: 3 weeks
the duration of the function. The device we would otherwise
have left in an useless state may just as well be the low-level
console. When booting verbose, we do need it addressable if we
want to avoid a MCA.
Approved by: re (kensmith)
sys.net.inet.tcp.log_debug = 1
It defaults to enabled for the moment and is to be turned off for
the next release like other diagnostics from development branches.
It is important to note that sysctl sys.net.inet.tcp.log_in_vain
uses the same logging function as log_debug. Enabling of the former
also causes the latter to engage, but not vice versa.
Use consistent terminology in tcp log messages:
"ignored" means a segment contains invalid flags/information and
is dropped without changing state or issuing a reply.
"rejected" means a segments contains invalid flags/information but
is causing a reply (usually RST) and may cause a state change.
Approved by: re (rwatson)
SYNCACHE_TIMEOUT to new function syncache_timeout().
o Fix inverted timeout callout engagement logic to actually
enable the timer for the bucket row. Before SYN|ACK was
not retransmitted.
o Simplify SYN|ACK retransmit timeout backoff calculation.
o Improve logging of retransmit and timeout events.
o Reset timeout when duplicate SYN arrives.
o Add comments.
o Rearrange SYN cookie statistics counting.
Bug found by: silby
Submitted by: silby (different version)
Approved by: re (rwatson)
syncache_rst().
o Fix tests for flag combinations of RST and SYN, ACK, FIN. Before
a RST for a connection in syncache did not properly free the entry.
o Add more detailed logging.
Approved by: re (rwatson)
a proper solution.
- Add a dummy entry point which just calls the C entry points, and try to make
sure it's the first code in the binary.
- Copy a bit more than func_end to try to copy the whole load_kernel()
function. gcc4 puts code behind the func_end symbol.
Approved by: re (blanket)
framework for non-MPSAFE network protocols:
- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
which keyed off debug_mpsafenet to determine various aspects of their own
locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
no-op's, as their entire behavior was determined by the value in
debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.
Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.
Reviewed by: bz, jhb
Approved by: re (kensmith)
day of week field correctly, or they remember bad values that are
written into the day of week field. For this reason, ignore the day
of week field when reading the clock on i386 rather than bailing if
it is set incorrectly.
Problems were seen on a number of platforms, including VMWare, qemu,
EPIA ME6000, Epox-3PTA and ABIT-SL30T.
This is a slightly different fix to that proposed by Ted in his PR,
but the same basic idea.
PR: 111117
Submitted by: Ted Faber <faber@lunabase.org>
Approved by: re (rwatson)
MFC after: 3 weeks
should call uma_zfree() with various spinlock helds. Rearranging the
code would not help here because we cannot break atomicity respect
prcess spinlock, so the only one choice we have is to defer the operation.
In order to do this use a global queue synchronized through the kse_lock
spinlock which is freed at any thread_alloc() / thread_wait() through a
call to thread_reap().
Note that this approach is not ideal as we should want a per-process
list of zombie upcalls, but it follows initial guidelines of KSE authors.
Tested by: jkim, pav
Approved by: jeff, julian
Approved by: re
scope security check for the UDPv6 socket credential lookup service,
allowing security policies to bound access to credential information.
While not an immediate issue for Jail, which doesn't allow use of UDPv6,
this may be relevant to other security policies that may wish to control
ident lookups.
While here, eliminate a very unlikely panic case, in which a socket in
the process of being freed is inspected by the sysctl.
Approved by: re (kensmith)
Reviewed by: bz
- make NDIS_DEBUG a sysctl
- default to IEEE80211_MODE_11B if the card doesnt tell us the channels
- dont mess with ic_des_chan when we assosciate
- Allow a directed scan by setting the ESSID before scanning (verified
with wireshark). Hidden APs probably wouldnt have worked before.
- Grab the channel type and use it to look up the correct curchan for
the scan results (mistakenly used 11B before)
- Fix memory leak in the ndis_scan_results
Tested by: matteo
Reviewed by: sam
Approved by: re (rwatson)
stack overflow in complicated traffic filtering setups.
There can be minor performance degradation for the MHLEN < len <= 256 case
due to additional buffer allocation, but it is a rare case.
Approved by: re (rwatson), glebius (mentor)
MFC after: 1 week
value, then we would use a negative index into the trap_msg[] array
resulting in a nested page fault. Make the 'type' variable holding the
trap number unsigned to avoid this.
MFC after: 2 weeks
Approved by: re (rwatson)
to repeat if you had more than two keys down at any given time (which
happened to me all the time with emacs).
This is taken from PR 110681, although what URATAN Shigenobu describes
there is different than the pathology that I have been seeing. I'm
seeing this only in X, while he sees it on his console, yet I think
the two problems are related. I've also reworked the patch slightly
to conform to the coding standards of adjacent code.
It is unclear to me if this merely masks the maddening bug that I have
seen, or if this is a real fix. I typically see the problem when I'm
typing fast in emacs and using lots of motion keys (meta and control).
In either case, my workstation at work again is finally useful with
this patch.
PR: 110681
Submitted by: URATAN Shigenobu
Approved by: re (blanket)
the protocol to be report on each open, but ignore any errors as set
protocol for mice that don't implement the boot protocol can generate
an error. Evidentally, the Gyration GyroPoint RF Technology Receiver
(Gyration Ultra Cordless) device has this problem.
Submitted by: Eugene M. Kim
PR: 106565
Approved by: re (blanket)
- Fix addrs's error checking of sctp_sendx(3) when addrcnt is less than
SCTP_SMALL_IOVEC_SIZE
- re-add back inpcb_bind local address check bypass capability
- Fix it so sctp_opt_info is independant of assoc_id postion.
- Fix cookie life set to use MSEC_TO_TICKS() macro.
- asconf changes
o More comment changes/clarifications related to the old local address
"not" list which is now an explicit restricted list.
o Rename some functions for clarity:
- sctp_add/del_local_addr_assoc to xxx_local_addr_restricted()
- asconf related iterator functions to sctp_asconf_iterator_xxx()
o Fix bug when the same address is deleted and added (and removed from
the asconf queue) where the ifa is "freed" twice refcount wise,
possibly freeing it completely.
o Fix bug in output where the first ASCONF would not go out after the
last address is changed (e.g. only goes out when retransmitted).
o Fix bug where multiple ASCONFs can be bundled in the same packet with
the and with the same serial numbers.
o Fix asconf stcb iterator to not send ASCONF until after all work
queue entries have been processed.
o Change behavior so that when the last address is deleted (auto asconf
on a bound all endpoint) no action is taken until an address is
added; at that time, an ASCONF add+delete is sent (if the assoc
is still up).
o Fix local address counting so that address scoping is taken into
account.
o #ifdef SCTP_TIMER_BASED_ASCONF the old timer triggered sending
of ASCONF (after an RTO). The default now is to send
ASCONF immediately (except for the case of changing/deleting the
last usable address).
Approved by: re(ken smith)@freebsd.org
This fixes tmpfs caculations on 32-bit systems equipped with more than
4GB swap.
Reported by: Craig Boston <craig xfoil gank org>
PR: kern/114870
Approved by: re (tmpfs blanket)
included man pages on how to use it. This code is still somewhat experimental
but has been successfully tested on a number of targets. Many thanks to
Danny for contributing this.
Approved by: re
Ever since switching to adaptive polling re(4) occasionally spews
watchdog timeouts on systems with MSI capability. This change is
minimal one for supporting MSI and re(4) also needs MSIX support
for RTL8111C in future. Because softc structure of re(4) is shared
with rl(4), rl(4) was touched to use the modified softc.
Reported by: cnst
Tested by: cnst
Approved by: re (kensmith)
Because nfe(4) hardware doesn't support SG on Rx path, supporting
jumbo frame requires very large contiguous kernel memory(i.e. several
mega bytes). In case of lack of contiguous kernel memory that
allocation request may always fail. However nfe(4) can operate on normal
sized MTU frames, so go ahead and just disable jumbo frame support.
While I'm here add a new tunable "hw.nfe.jumbo_disable" to disable
jumbo frame support.
In nfe_poll, make sure to invoke correct Rx handler.
Approved by: re (kensmith)
results unused; this, with -Werror option of gcc, rise a warning for gcc
which let the buildkernel to be busted.
Fix this removing upcall_free().
Reported by: various
Approved by: jeff
Approved by: re
Pointy hat to: attilio
dangerous races.
Fix this problems adding correct locking for the members of 'struct
kse_upcall' and other struct proc/struct thread related members.
For the moment, just leave ku_mflag and ku_flags "lazy" locked.
While here, cleanup the code removing the function kse_GC() (unused),
and merging upcall_link(), upcall_unlink(), upcall_stash() in their
respective callers (static functions, very short and only called in one
place).
Reported by: pav
Tested by: pav (on some pointyhat cluster nodes)
Approved by: jeff
Approved by: re
Sponsorized by: NGX Italy (http://www.ngx.it)
vnode label for a check rather than the directory vnode label a second
time.
MFC after: 3 days
Submitted by: Zhouyi ZHOU <zhouzhouyi at FreeBSD dot org>
Reviewed by: csjp
Sponsored by: Google Summer of Code 2007
Approved by: re (bmah)
print a one line error message. Add some comments on not being able to
trust the day of week field (I'll act on these comments in a follow up
commit).
Approved by: re
MFC after: 3 weeks
udp6_output() from udp6_output.c to udp6_usrreq.c, matching the UDPv4
structure, and allowing us to remove udp6_output.c.
Reviewed by: bz, gnn
Approved by: re (bmah)
o Initialize ownerships and permissions. They were garbage (0) for
root mounts since vfs_mountroot_try() doesn't ask for them to be set
and msdosfs's old incomplete code to set them was removed. The
garbage happened to give the correct ownerships root:wheel, but it
gave permissions 000 so init could not be execed. Use the macros
for root: wheel and 0755. (The removed code gave 0:0 and 0777. 0755
is more normal and secure, thought wrong for /tmp.)
o Check the readonly flag for initial (non-MNT_UPDATE) mounts in the
correct place, as in ffs. For root mounts, it is only passed in
mp->mnt_flags, since vfs_mountroot_try() only passes it as a flag
and nothing translates the flag to the "ro" option string. msdosfs
only looked for it in the string, so it gave a rw mount for root
mounts without even clearing the flag in mp->mnt_flags, so the final
state was inconsistent. Checking the flag only in mp->mnt_flags
works for initial userland mounts too. The MNT_UPDATE case is
messier.
The main point that should work but doesn't is fsck of msdosfs root
while it is mounted ro. This needs mainly MNT_RELOAD support to work.
It should be possible to run fsck -p and succeed provided the fs is
consistent, not just for msdosfs, but this fails because fsck -p always
tries to open the device rw. The hack that allows open for writing
in ffs is not implemented in msdosfs, since without MNT_RELOAD support
writing could only be harmful. So fsck must be turned off to use
msdosfs as root. This is quite dangerous, since msdosfs is still missing
actually using its fs-dirty flag internally, so it is happy to mount
dirty fileystems rw.
Unrelated changes:
- Fix missing error handling for MNT_UPDATE from rw to ro.
- Catch up with renaming msdos to msdosfs in a string.
Approved by: re (kensmith)
physical memory pages into account for tm_maxfilesize.
Reported by: Dominique Goncalves <dominique.goncalves gmail.com>
Submitted by: Howard Su
Approved by: re (tmpfs blanket)
consumers.
This patch makes KSE no more an optionally stub for kernel structures
fixing the breakage.
As a tail note, this bug has broken kqemu for a long period now.
Tested by: Ulf Lilleengen <lulf@FreeBSD.org>
Discussed with: rwatson, jeff
Approved by: jeff (mentor)
Approved by: re
be woken up by kthread_exit. This is racey and in some cases the kthread will
exit before ndis gets around to sleep so it will be stuck indefinitely. This
change reuses the kq_exit variable to indicate that the thread has gone and
will loop on tsleep with a timeout waiting for it. If the kthread has already
exited then it will not sleep at all.
Approved by: re (rwatson)
advancing. Read from the timer before attaching to be sure it advances
in 1 us. Since the slowest rate allowed by the spec is 10 MHz, the
timer is guaranteed to change in this interval if it is working.
Tested by: Rui Paulo
Approved by: re
MFC after: 3 days
- Synchronized audit event list to Solaris, picking up the *at(2) system call
definitions, now required for FreeBSD and Linux. Added additional events
for *at(2) system calls not present in Solaris.
Obtained from: TrustedBSD Project
Approved by: re (hrs)
- remove duplicate #include <sys/priv.h> that is not under
#ifdef FreeBSD version to allow compile on 6.1
- static analysis changes per the cisco SA tool including:
o some SA_IGNORE comments
o some checks for NULL before unlock.
o type corrections int -> size_t
- Fix it so sctp_alloc_asoc takes a thread/proc argument. Without this
we pass a NULL in to bind on implicit assoc setup and crash :-(
Approved by: re@freebsd.org(Ken Smith)
4KB pages as i386, data structures that just fit in one page on i386 (and
on 64 bit architectures with 8KB pages) can be distributed over two pages
on amd64. This is a porblem in the case of the Symbios driver, since the
SCRIPTS engine in the SCSI chip operates on physical addresses and needs
physically contiguous memory. Earlier patches used contigmalloc on amd64,
but this version replaces part of a structure by a pointer to that data.
In order to not introduce an extra indirection for other architectures,
the change has been made conditional on __amd64__.
Earlier attempts to repair this problem are removed (i.e. the macros that
made amd64 use contigmalloc). The fix was submitted by Jan Mikkelsen and
modified by me to only affect amd64.
PR: 89550
Submitted by: janm at transactionware dot com (Jan Mikkelsen)
Approved by: re (Hiroki Sato)
MFC after: 2 weeks
This gives a very large speedup for small block sizes (in my tests,
about 5 times for write and 3 times for read with a block size of 512,
if clustering is possible) and a moderate speedup for the moderatatly
large block sizes that should be used on non-small media (4K is the
best size in most cases, and the speedup for that is about 1.3 times
for write and 1.2 times for read). mmap() should benefit from clustering
like read()/write(), but the current implementation of vm only supports
clustering (at least for getpages) if the fs block size is >= PAGE SIZE.
msdosfs is now only slightly slower than ffs with soft updates for
writing and slightly faster for reading when both use their best block
sizes. Writing is slower for msdosfs because of more sync writes.
Reading is faster for msdosfs because indirect blocks interfere with
clustering in ffs.
The changes in msdosfs_read() and msdosfs_write() are simpler merges
of corresponding code in ffs (after fixing some style bugs in ffs).
msdosfs_bmap() needs fs-specific code. This implementation loops
calling a lower level bmap function to do the hard parts. This is a
bit inefficient, but is efficient enough since msdsfs_bmap() is only
called when there is physical i/o to do.
Approved by: re (hrs)
In msdosfs_read(), mainly reorder the main loop to the same order as in
ffs_read().
In msdosfs_write() and extendfile(), use vfs_bio_clrbuf() instead of
clrbuf(). I think this just just a bogus optimization, but ffs always
does it and msdosfs already did it in one place, and it is what I've
tested.
In msdosfs_write(), merge good bits from a comment in ffs_write(), and
fix 1 style bug.
In the main comment for msdosfs_pcbmap(), improve wording and catch
up with 13 years of changes in the function. This comment belongs in
VOP_BMAP.9 but that doesn't exist.
In msdosfs_bmap(), return EFBIG if the requested cluster number is out
of bounds instead of blindly truncating it, and fix many style bugs.
Approved by: re (hrs)
11b channel is not found, e.g. Atheros 5211.
Reported by: matteo
Problem outlined by: thompsa
Reviewed by: sam, thompsa
Approved by: re (kensmith), sam (mentor)
Tested by: matteo (an early version)
We allocate coda_ctlvp when /coda is mounted, but never release it.
During the unmount this vnode was marked as UNMOUNTING and when venus
is started a second time the system would hang, possibly waiting for
the old vnode to disappear.
So now we call vrele on the control vnode when file system is unmounted
to drop the reference we got during the mount. I'm pretty sure it is
also necessary to not skip the handling in coda_inactive for the control
vnode, it seems like that is the place we actually get rid of the vnode
once the refcount has dropped to 0.
Submitted by: Jan Harkes <jaharkes at cs dot cmu dot edu>
Approved by: re (kensmith)
filt_ttyrdetach() etc would later attempt to dereference cdev->si_tty,
causing a 0xdeadc0de dereference. Change kn_hook value from cdev to
struct tty to avoid dereferencing freed cdev.
In ttygone(), wake up select(), sigio and kevent() users in addition
to the queue sleepers.
Return EV_EOF from kevent filters if TS_GONE is set.
Submitted by: peter
Tested by: Peter Holm
Approved by: re (kensmith)
MFC after: 2 weeks
- Adjust lock_profiling stubs semantic in the hard functions in order to be
more accurate and trustable
- As for sx locks, disable shared paths for lock_profiling. Actually,
lock_profiling has a subtle race which makes results caming from shared
paths not completely trustable. A macro stub (LOCK_PROFILING_SHARED) can
be actually used for re-enabling this paths, but is currently intended
for developing use only.
- style(9) fixes
Approved by: jeff, kmacy, jhb[1]
Approved by: re
[1] Had initial reservations not shared by others, conceded
in the end.
1. Rewrite the backward scan. Specifically, reverse the order in which
pages are allocated so that upon failure it is never necessary to
free pages that were just allocated. Moreover, any allocated pages
can be put to use. This makes the backward scan behave just like the
forward scan.
2. Eliminate an explicit, unsynchronized check for low memory before
calling vm_page_alloc(). It serves no useful purpose. It is, in
effect, optimizing the uncommon case at the expense of the common
case.
Approved by: re (hrs)
MFC after: 3 weeks
interrupt that is shared with other devices(e.g. USB) in system and
provide a new tunable "hw.msk.legacy_intr" to activate the legacy
interrupt handler. Setting the tunable automatically disables MSI
for msk(4). Previously msk(4) used adoptive polling with taskqueue(9)
as all msk(4) hardwares I know supports MSI. However, there are cases
that MSI couldn't be used on some hardwares due to bugs in MSI
implementatins.
Tested by: Li-Lun Wang < llwang AT infor DOT org >
Approved by: re (kensmith)
UDPv4 features to UDPv6:
- Add MAC checks on delivery and MAC labeling on transmit.
- Check for (and reject) datagrams with destination port 0.
- For multicast delivery, check the source port only if the socket being
considered as a destination has been connected.
- Implement UDP blackholing based on net.inet.udp.blackhole.
- Add a new ICMPv6 unreachable reply rate limiting category for failed
delivery attempts and implement rate limiting for UDPv6 (submitted by
bz).
Approved by: re (kensmith)
Reviewed by: bz
machines.
- Leave the long-term load balancer running by default once per second.
- Enable stealing load from the idle thread only when the remote processor
has more than two transferable tasks. Setting this to one further
improves buildworld. Setting it higher improves mysql.
- Remove the bogus pick_zero option. I had not intended to commit this.
- Entirely disallow migration for threads with SRQ_YIELDING set. This
balances out the extra migration allowed for with the load balancers.
It also makes pick_pri perform better as I had anticipated.
Tested by: Dmitry Morozovsky <marck@rinet.ru>
Approved by: re
properly. We have to temporarily unlock the TDQ lock so we can lock
the thread and add it to the run queue. This is used only for KSE.
- When we add a thread from the tdq_move() via sched_balance() we need to
ipi the target if it's sitting in the idle thread or it'll never run.
Reported by: Rene Landan
Approved by: re
- Add custom .c wrappers for the firmware, rather than the standard
firmware(9) generated firmware objects to work around toolchain
problems on ia64 involving linking objects produced by
ld -b -binary into the kernel.
- Move from using Myricom's ".dat" firmware blobs to using Myricom's
zlib compressed ".h" firmware header files. This is done to
facilitate the custom wrappers, and saves a fair amount of wired
memory in the case where the firmware is built in, or preloaded.
- Fix two compile issues in mxge which only appear on non-i386/amd64.
Reviewed by: mlaier, mav (earlier version with just zlib support)
Glanced at by: sam
Approved by: re (kensmith)
IPV6_IPSEC_POLICY always visible again. This unbreaks some
third party user space applications.
PR: 114491
Reported by: sumikawa
Reviewed by: sumikawa
Approved by: re (hrs)
should finally fix fsx test case.
The printf's added here would be eventually turned into
assertions.
Submitted by: Mingyan Guo (mostly)
Approved by: re (tmpfs blanket)
new code and third party modules which try to depend on it.
- Initialize sched_lock in sched_4bsd.c.
- Declare sched_lock in sparc64 pmap.c and assert that we're compiling
with SCHED_4BSD to prevent accidental crashes from running ULE. This
is the sole remaining file outside of the scheduler that uses the
global sched_lock.
Approved by: re
been in development for over 6 months as SCHED_SMP.
- Implement one spin lock per thread-queue. Threads assigned to a
run-queue point to this lock via td_lock.
- Improve the facility for assigning threads to CPUs now that sched_lock
contention no longer dominates scheduling decisions on larger SMP
machines.
- Re-write idle time stealing in an attempt to make it less damaging to
general performance. This is still disabled by default. See
kern.sched.steal_idle.
- Call the long-term load balancer from a callout rather than sched_clock()
so there are no locks held. This is disabled by default. See
kern.sched.balance.
- Parameterize many scheduling decisions via sysctls. Try to document
these via sysctl descriptions.
- General structural and naming cleanups.
- Document each function with comments.
Tested by: current@ amd64, x86, UP, SMP.
Approved by: re
require fewer blocking loops.
- Don't use atomic ops with 4BSD or on UP.
- Only use the blocking loop if ULE is compiled in.
- Use the correct memory barrier.
Discussed with: attilio, jhb, ssouhlal
Tested by: current@
Approved by: re
- use proper tick gathering macro instead of ticks directly.
- Placed reasonable boundaries on sets that a user can do
that are converted to ticks from ms.
- Fix CMT_PF to always check to be sure CMT is on.
- Fix ticks use of CMT_PF.
- put back code to allow asconfs to be queued while INITs are in flight
and before the assoc is established.
- During window probes, an ack'd packet might be left with the window
probe mark on it causing it to be retransmitted. Change so that
the flight decrease macro clears the window_probe mark.
- Additional logging flight size/reading and ASOC LOG. This
is only enabled if you manually insert things into opt_sctp.h
since its a set of debug code only.
- Found an interesting SMP race in the way data was appended which
could cause a reader to lose a part of a message, had to
reorder when we marked the message was complete to after
the data was appended.
- bug in ADD-IP for the subset bound socket case when the peer has only
one address
- fix ASCONF implicit success/error handling case
- proper support of jails in Freebsd 6>
- copy out the timeval for the 64 bit sparc world on cookie-echo
alignment error crashes without this).
Approved by: re(Ken Smith)
config info. from device.hints. Some machines have ipmi controllers
that do not have attachment info in either PCI, SMBIOS or ACPI.
This idea was hacked together by me and then done properly by
jhb.
Submitted by: jhb
Reviewed by: jhb (man page)
Approved by: re (Ken Smith)
MFC after: 1 week
The SDM states that writing to ar.bspstore invalidates the ar.rnat
register as a side-effect. This was interpreted as "bits in the
ar.rnat register that correspond to registers whose value is on
the stack are undefined'. Since we keep the kernel stack NaT-
aligned with the user stack (i.e. the lower 9 bits of the backing
store pointer remain unchanged when we switch to the kernel stack)
bits that need preserving would be preserved.
That interpretation is questionable. So, now, the interpretation
is more absolute: ar.rnat is undefined after writing to ar.bspstore.
As such, we write the saved value of ar.rnat back to ar.rnat after
writing to ar.bspstore.
Discussed with: christian.kandeler@hob.de
Approved by: re (kensmith)
- Keep last transaction label for each destination.
- If the next label is not free, just give up.
- This should reduce CPU load for TX on if_fwip under heavy load.
Approved by: re (hrs)
NET_NEEDS_GIANT, which will shortly be removed. This is done in a
away that it may be easily reattached to the build before 7.1 if
appropriate locking is added. Specifics:
- Don't install netatm include files
- Disconnect netatm command line management tools
- Don't build libatm
- Don't include ATM parts in rescue or sysinstall
- Don't install sample configuration files and documents
- Don't build kernel support as a module or in NOTES
- Don't build netgraph wrapper nodes for netatm
This removes the last remaining consumer of NET_NEEDS_GIANT.
Reviewed by: harti
Discussed with: bz, bms
Approved by: re (kensmith)
vm_phys_free_pages(). Rename vm_phys_alloc_pages_locked() to
vm_phys_alloc_pages() and vm_phys_free_pages_locked() to
vm_phys_free_pages(). Add comments regarding the need for the free page
queues lock to be held by callers to these functions. No functional
changes.
Approved by: re (hrs)
- CMT_PF states added (w/sysctl to turn the PF version on)
- sctp_input.c had a missing incr of cookie case when the
auth was bad. This meant a free was called without an
increment to refcnt, added increment like rest of code.
- There was a case, unlikely, when the scope of the destination
changed (this is a TSNH case). In that case, it would not free
the alloc'ed asoc (in sctp_input.c).
- When listed addresses found a colliding cookie/Init, then
the collided upon tcb was not unlocked in sctp_pcb.c
- Add error checking on arguments of sctp_sendx(3) to prevent it from
referencing a NULL pointer.
- Fix an error return of sctp_sendx(3), it was returing
ENOMEM not -1.
- Get assoc id was changed to use the sanctified socket api
method for getting a assoc id (PEER_ADDR_INFO instead of
PEER_ADDR_PARAMS).
- Fix it so a peeled off socket will get a proper error return
if it trys to send to a different address then it is connected to.
- Fix so that select_a_stream can avoid an endless loop that
could hang a caller.
- time_entered (state set time) was not being set in all cases
to the time we went established.
Approved by: re(ken smith)
This adds a function to agp.c to set the aperture resource ID if it's
not the usual AGP_APBASE. Previously, agp.c had been assuming
AGP_APBASE, which resulted in incorrect agp_info, and contortions by
agp_i810.c to work around it.
This also adds functions to agp.c for default AGP_GET_APERTURE() and
AGP_SET_APERTURE(), which return the aperture resource size and disallow
aperture size changes. Moving to these for our AGP drivers will likely
result in stability improvements. This should fix 855-class aperture
size detection.
Additionally, refuse to attach agp_i810 when some RAM is above 4GB and
the GART can't reference memory that high. This should be very rare.
The correct solution would be bus_dma conversion for agp, which is
beyond the scope of this change. Other AGP drivers could likely use
this change as well.
G33/Q35/Q33 AGP support is also included, but disconnected by default
due to lack of testing.
PR: kern/109724 (855 aperture issue)
Submitted by: FUJIMOTO Kou<fujimoto@j.dendai.ac.jp>
Approved by: re (hrs)
Add support for the CENTIPAD board (http://www.harerod.de/centipad/index.html)
(which is a very cool, very small ARM board)
Add support for KB9202B (it has different memory)
Make BOOT_FLAVOR settable
Minor cleanup nits
Approved by: re@
by removing files from src/sys/coda, and updating include paths in the
new location, kernel configuration, and Makefiles. In one case add
$FreeBSD$.
Discussed with: anderson, Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
Repo-copy madness: simon
- change include style so build in kernel try OR standalone work.
- Limit HWCSUM - I was led to believe that it would work with RSS,
but our testing had odd issues which suggests this is false.
- A fatfinger error in the ioctl code made ifconfig up not work.
Approved by: re
kernels exposed by the recent fixes to resource limits for 32-bit processes
on 64-bit kernels:
- Let ABIs expose their maximum stack size via a new pointer in sysentvec
and use that in preference to maxssiz during exec() rather than always
using maxssiz for all processses.
- Apply the ABI's limit fixup to the previous stack size when adjusting
RLIMIT_STACK to determine if the existing mapping for the stack needs to
be grown or shrunk (as well as how much it should be grown or shrunk).
Approved by: re (kensmith)
to the FAT is possible.
Make the FAT block size less arbitrary before it is rounded up:
- for FAT12, default to 3*512 instead of to 3 sectors. The magic 3 is
the default number of 512-byte FAT sectors on a floppy drive. That
many sectors is too many if the sector size is larger.
- for !FAT12, default to PAGE_SIZE instead of to 4096. Remove
MSDOSFS_DFLTBSIZE since it only obfuscated this 4096.
For reading the BPB, use a block size of 8192 instead of 2048 so that
sector sizes up to 8192 can work. We should try several sizes, or just
try the maximum supported size (MAXBSIZE = 64K). I use 8192 because
that is enough for DVD-RW's (even 2048 is enough) and 8192 has been
tested a lot in use by ffs.
This completes fixing msdosfs for some large sector sizes (up to 8K
for read and 64K for write). Microsoft documents support for sector
sizes up to 4K in mdosfs. ffs is currently limited to 8K for both
read and write.
Approved by: re (kensmith)
Approved by: nyan (several years ago)
Rev 1.9 introduced another path where machclk_freq would be initialized
before the rest of setup was done (i.e. initializing the callout). Make
the one-time initialization a separate function and make init_machclk()
able to be called multiple times, any time. We depend on tsc_freq first
being updated from the highest priority eventhandler, thus we run last
and call init_machclk() to set machclk_freq. Also, don't initialize
static variables to 0.
Tested by: Eygene Ryabinkin
Approved by: re
part of fixing msdosfs for large sector sizes. One of the fixed bugs
was fatal for large sector sizes.
1. The fsinfo block has size 512, but it was misunderstood and declared
as having size 1024, with nothing in the second 512 bytes except a
signature at the end. The second 512 bytes actually normally (if
the file system was created by Windows) consist of a second boot
sector which is normally (in WinXP) empty except for a signature --
the normal layout is one boot sector, one fsinfo sector, another
boot sector, then these 3 sectors duplicated. However, other
layouts are valid. newfs_msdos produces a valid layout with one
boot sector, one fsinfo sector, then these 2 sectors duplicated.
The signature check for the extra part of the fsinfo was thus
normally checking the signature in either the second boot sector
or the first boot sector in the copy, and thus accidentally
succeeding. The extra signature check would just fail for weirder
layouts with 512-byte sectors, and for normal layouts with any other
sector size.
Remove the extra bytes and the extra signature check.
2. Old versions did i/o to the fsinfo block using size 1024, with the
second half only used for the extra signature check on read. This
was harmless for sector size 512, and worked accidentally for sector
size 1024. The i/o just failed for larger sector sizes.
The version being fixed did i/o to the fsinfo block using size
fsi_size(pmp) = (1024 << ((pmp)->pm_BlkPerSec >> 2)). This
expression makes no sense. It happens to work for sector small
sector sizes, but for sector size 32K it gives the preposterous
value of 64M and thus causes panics. A sector size of 32768 is
necessary for at least some DVD-RW's (where the minimum write size
is 32768 although the minimum read size is 2048).
Now that the size of the fsinfo block is 512, it always fits in
one sector so there is no need for a macro to express it. Just
use the sector size where the old code uses 1024.
Approved by: re (kensmith)
Approved by: nyan (several years ago for a different version of (2))
than indirecting through ifaddr_byindex, which makes things easier with
respect to virtualized network stacks.
Submitted by: Marko Zec <zec at icir dot org>
Reviewed by: Leonid Grossman <Leonid dot Grossman at neterion dot com>
Approved by: re (kensmith)
non-sleepable lock held. drm_pci_alloc() calls them, thus drm mutex shall
not be held during the call.
Move the drm_pci_alloc() to the start of the i915_initialize() and drop the
the drm mutex around it.
Reported by: Ganbold <ganbold micom mng net>
Reviewed by: anholt
Approved by: re (hrs)
MFC after: 1 week
- use net80211 for scanning and pass the results back to the scan cache
- use ieee80211_init_channels to fill our channel list
- fix up state transitions
- depreciate the old wicontrol ioctls
- add some debugging lines (#define NDIS_DEBUG)
Reviewed by: sam
Approved by: re (kensmith)
ENOTTY. Make the control vnode a regular file so that ioctls are passed
through to our kernel module.
Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
some previously disabled code which according to the comment caused a
problem during shutdown. But even that is still better than
triggering a kernel panic whenever venus is started.
Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
we can't open container files by device/inode number pair anymore.
Replace the CODA_OPEN upcall with CODA_OPEN_BY_FD, where venus returns
an open file descriptor for the container file. We can then grab a
reference on the vnode coda_psdev.c:vc_nb_write and use this vnode for
further accesses to the container file.
Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
ioctls can be removed. These have been #ifdef'd out and left as a reference in
case any of the RIDs need to be turned into sysctls at a later date.
Reviewed by: sam, avatar
Approved by: re (kensmith)
operations. But we don't have to, if we find the coda_mntinfo structure
for this device in our linked list, we know the device is good.
Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
need to initialize dev so that we can actually find the allocated
coda_mntinfo structure later on.
Submitted by: Jan Harkes <jaharkes@cs.cmu.edu>
Approved by: re (kensmith)
macros for lock_profiling.
Reported by: Tom McLaughlin <tmclaugh@sdf.lonestar.org>
Tested by: Tom McLaughlin <tmclaugh@sdf.lonestar.org>
Approved by: jeff (mentor)
Approved by: re
ELF files. On ia64 the ELF header contains information about
characteristics of the machine code and ld(1) needs that to
determine whether input files are compatible for linking. To
this end non-ELF files are not supported by binutils on ia64.
However, the resulting ELF file seems to be correct despite the
warnings and the non-supportedness of non-ELF files and it
appears enough to unbreak the build of firmware(9) files on ia64
by simply supressing the warning.
Ran into by: gallatin@
Approved by: re (hrs)
Looks good to me: mlaier@
vm_page_cowfault(). Initially, if vm_page_cowfault() sleeps, the given
page is wired, preventing it from being recycled. However, when
transmission of the page completes, the page is unwired and returned to
the page queues. At that point, the page is not in any special state
that prevents it from being recycled. Consequently, vm_page_cowfault()
should verify that the page is still held by the same vm object before
retrying the replacement of the page. Note: The containing object is,
however, safe from being recycled by virtue of having a non-zero
paging-in-progress count.
While I'm here, add some assertions and comments.
Approved by: re (rwatson)
MFC After: 3 weeks
of the the first cluster in a file (and, if the allocation cannot be
continued contiguously, for subsequent clusters in a file) was randomized
in an attempt to leave space for contiguous allocation of subsequent
clusters in each file when there are multiple writers. This reduced
internal fragmentation by a few percent, but it increased external
fragmentation by up to a few thousand percent.
Use simple sequential allocation instead. Actually maintain the fsinfo
sequence index for this. The read and write of this index from/to
disk still have many non-critical bugs, but we now write an index that
has something to do with our allocations instead of being modified
garbage. If there is no fsinfo on the disk, then we maintain the index
internally and don't go near the bugs for writing it.
Allocating the first free cluster gives a layout that is almost as good
(better in some cases), but takes too much CPU if the FAT is large and
the first free cluster is not near the beginning.
The effect of this change for untar and tar of a slightly reduced copy
of /usr/src on a new file system was:
Before (msdosfs 4K-clusters):
untar: 459.57 real untar from cached file (actually a pipe)
tar: 342.50 real tar from uncached tree to /dev/zero
Before (ffs2 soft updates 4K-blocks 4K-frags)
untar: 39.18 real
tar: 29.94 real
Before (ffs2 soft updates 16K-blocks 2K-frags)
untar: 31.35 real
tar: 18.30 real
After (msdosfs 4K-clusters):
untar 54.83 real
tar 16.18 real
All of these times can be improved further.
With multiple concurrent writers or readers (especially readers), the
improvement is smaller, but I couldn't find any case where it is
negative. 342 seconds for tarring up about 342 MB on a ~47MB/S partition
is just hard to unimprove on. (This operation would take about 7.3
seconds with reasonably localized allocation and perfect read-ahead.)
However, for active file systems, 342 seconds is closer to normal than
the 16+ seconds above or the 11 seconds with other changes (best I've
measured -- won easily by msdosfs!). E.g., my active /usr/src on ffs1
is quite old and fragmented, so reading to prepare for the above
benchmark takes about 6 times longer than reading back the fresh copies
of it.
Approved by: re (kensmith)
- Move udp_sendspace and udp_recvspace global variables and associated
sysctls to the top of the file where most other such things are present.
- Rename static variable 'blackhole' to 'udp_blackhole' and unstaticize
so that we can add blackhole support for UDPv6 using the same MIB
variable.
- Move udp_append() above udp_input() to match the function order in
udp6_usrreq.c.
Approved by: re (kensmith)
- reduce cpu usage by as much as 25% (40% -> 30) by doing txq reclaim more efficiently
- use mtx_trylock when trying to grab the lock to avoid spinning during long encap loop
- add per-txq reclaim task
- if mbufs were successfully re-claimed try another pass
- track txq overruns with sysctl
Approved by: re (blanket)
- Add controller id for Intel 82801I (ICH9).
PR: kern/114399
Submitted by: Michael Fuckner <michael@fuckner.net>
- MSI support. Disable by default due to various issues with too many
broken hardwares. MSI can be enabled through device.hints(5) or
kenv(8) by setting "hint.pcm.%d.msi=1".
Partially submitted by: kevlo
YAMAMOTO Taku <taku@tackymt.homeip.net>
Tested by: joel, kevlo, YAMAMOTO Taku
Approved by: re (hrs)
MFC after: 3 days
prototypes, don't use register, etc. Synchronize structure and
layout to the IPv4 versions of these functions to a greater extent,
making visual comparison easier.
Remove now stale or incorrect comments.
Enable full lock assertions, and correct one exception handling
case where the wrong label was jumped to.
Tested by: bz
Approved by: re (bmah)
do the heavy lifting of the 'mii_tick' function, rue was left behind.
Implement this in a naive way. Reports from the field show this makes
the driver functional with some locking issues, as opposed to an
instant panic. Those will be addressed in a later version of the
driver.
Approved by: re@ (bmah)
With the in_mcast.c code, if an interface for an IPv4 multicast join was
not specified, and a route did not exist for the specified group in the
unicast forwarding tables, the join would be rejected with the error
EADDRNOTAVAIL.
This change restores the old behaviour whereby if no interface is specified,
and no route exists for the group destination, the IPv4 address list is
walked to find a non-loopback, multicast-capable interface to satisfy
the join request.
This should resolve problems with starting multicast services during
system boot or when a default forwarding entry does not exist.
Approved by: re (rwatson)
Sort NETGEAR list per convention.
Swap QUALCOMM and QUALCOMM2.
Add a few vendor products.
no md5 changes with this file (except when USBVERBOSE is enabled)
Approved by: re@ (blanket)
vm_fault_additional_pages() that was introduced in revision 1.47. Then
as now, it is unnecessary because dev_pager_haspage() returns zero for
both the number of pages to read ahead and read behind, producing the
same exact behavior by vm_fault_additional_pages() as the special case
handling.
Approved by: re (rwatson)
- Plug memory leak.
- Respect underlying vnode's properties rather than assuming that
the user want root:wheel + 0755. Useful for using tmpfs(5) for
/tmp.
- Use roundup2 and howmany macros instead of rolling our own version.
- Try to fix fsx -W -R foo case.
- Instead of blindly zeroing a page, determine whether we need a pagein
order to prevent data corruption.
- Fix several bugs reported by Coverity.
Submitted by: Mingyan Guo <guomingyan gmail com>, Howard Su, delphij
Coverity ID: CID 2550, 2551, 2552, 2557
Approved by: re (tmpfs blanket)
- Handle directories and leaves other than unit directories and text leaves
correctly.
- Now we can retrieve CROM of iSight correctly.
Approved by: re (hrs)
Tested by: flz
MFC after: 3 days
- When a LDT entry changes, the old one is freed while it is still
referenced by gdt and ldtr. This can lead to disruptive behaviours in
particular on SMP machines.
- When a LDT entry changes, it is assumed that the only one entity sharing
the same LDT are threads in the same proc. It doesn't take in account
edge cases where two processes share the same VM (rfork'ed ones, for
example).
This patch addresses these two problems and addictionally it fixes the
usage of refcount switching back it to the old manually-grown refcount
(since in this case would be faster).
Diagnosed by: tegge
Tested by: pho (a former version)
Reviewed by: kib
Approved by: jeff (mentor)
Approved by: re
free to be consistent with other error handling, and release socket buffer
lock before freeing mbufs and statistics updates rather than after.
Approved by: re (kensmith)
tracks the total number of reactivated pages. (We have not been
counting reactivations by vm_fault() since revision 1.46.)
Correct a comment in vm_fault_additional_pages().
Approved by: re (kensmith)
MFC after: 1 week
in. These are exclusively in the name of the company for this round.
No new devices have been added, but the MITEL entry has been
eliminated because nothing uses it. You won't see any difference
unless you have USBVERBOSE defined for the kernel.
Approved by: re@ (blanket)
- Adjust lock_profiling stubs semantic in the hard functions in order to be
more accurate and trustable
- Disable shared paths for lock_profiling. Actually, lock_profiling has a
subtle race which makes results caming from shared paths not completely
trustable. A macro stub (LOCK_PROFILING_SHARED) can be actually used for
re-enabling this paths, but is currently intended for developing use only.
- Use homogeneous names for automatic variables in hard functions regarding
lock_profiling
- Style fixes
- Add a CTASSERT for some flags building
Discussed with: kmacy, kris
Approved by: jeff (mentor)
Approved by: re
sys/i4b/include/ so they will be available to all architectures
once I4B compiles on those.
We no longer need these "glue" files.
Reminded by: nyan
Approved by: re (kensmith)
nxge: cast page size fragments down to (int). If the vm's demand paging
PAGE_SIZE is ever too big for that, we've got far bigger problems.
ofw: move va_start() a little earlier. gcc-4.2 doesn't like us modifying
the last arg before the va_start().
Approved by: re (rwatson)
would be 93C46(1Kbit) or 93C56(2Kbit). One of differences between them
is number of address lines required to access the EEPROM. For example,
93C56 EEPROM needs 8 address lines to read/write data. If 93C56
recevied premature end of required number of serial clock(CLK) to set
OP code/address of EEPROM, the result would be unexpected behavior.
Previously it tried to detect 93C46, which requires 6 address lines,
and then assumed it would be 93C56 if read data was not expected
value. However, this approach didn't work in some models/situations
as 93C56 requries 8 address lines to access its data. In order to fix
it, change EEPROM probing order such that 93C56 is detected reliably.
While I'm here change hard-coded address line numbers with defined
constant to enhance readability.
PR: 112710
Approved by: re (mux)
- Sort copyrights by date.
- Re-wrap, and in some cases, fix comments.
- Fix tabbing, white space, remove extra blank lines.
- Remove commented out debugging printfs.
Approved by: re (kensmith)
it with netipsec now that KAME IPsec is gone.
While here add missing netinet6 directories.
Add comments about the ports needed to be able to run those targets.
Reviewed by: philip
Approved by: re (rwatson)
o Adonics Cable 205
o Aiptek PocketCAM 3Mega
o Belkin USB2SCSI
o Casio QV DigiCam
o CCYU EasyDisk ED1064
o Desknote UCR-61S2B
o Epson Stylus Photo 875DC Card Reader
o Epson Stylus Photo 895 Card Reader
o Feiya 5-in-1 Card Reader
o Hitachi Dvd-CAM DZ-MV100A Camcorder
o HP CD-WRiter+ CD-4e
o Insystem Storage Adapter v2
o Kyocera Finecam S3x
o Kyocera Finecam S4
o Kyocera Finecam S5
o Kyocera Finecam L3
o Lexar USB CF Reader
o MindAtWork Digital Wallet
o Minolta Dimage F300
o Minolta Dimage E223
o Minsumi USB Fdd
o Netac USB-CF-Card
o NetChip USB Clik! 40
o Onspec MDCFE-B USB CF Reader
o Onspec SIIG/Datafab Memory Stick + CF Reader/Writer
o Onspec Datafab-based Reader
o Onspec PNY/Datafab CF+SM Reader
o Onspec SimpleTech/Datafab CF+SM Reader
o Onspec MDSM-b Reader
o Onspec USB To CF + SM Combo (LC1)
o Onspec ImageMate SDDR55
o Panasonic LS-120 Camera
o Samsung Techwin Digimax 410
o Shuttle eUSB SmartMedia / CompactFlash Adapter
o Skanhex MD 7425 Camera
o Skanhex SX 520z Camera
o Sony Memorystick NW-MS7
o Sony Portable USB Hardrive V2
o Sony Memorystick PEG N760c
o Sony Memorystick MSC-U03
o TREK/IBM USB memory key
o Trumpion T33520 USB Flash Card Controller
o Trumpion MP3 Player
o Vivtar Vivicam 35Xx
o WinMaxGroup USB Flash Disk 64M-C
o Zoran Digital Camera EX-20 DSC
and maybe a few others...
Submitted by: Vaidas Damosevicius and flz
PR: 79893
Reviewed by: njl, flz
Approved by: re (blanket)
ftruncate(), but without the pad arg.
There are several reasons for this. Consider 'mmap()'. On AMD64, the
function call (and syscall) ABI allow for 6 register arguments. Additional
arguments go on the stack. mmap(2) has 6 arguments. However, the syscall
definition has an extra 'int pad' argument. This pushes it to 7 arguments,
which means one must spill into the memory stack. Since the kernel API
doesn't match userland API, we have a hack in libc - libc/sys/mmap.c.
This implements the userland API by calling __syscall() with an extra
argument and the pad argument, for a total of 8 args. This is all
unnecessary and inconvenient for several things, including the kernel's
syscall handler code which now has to handle merging stack arguments with
register arguments. It is a big deal for certain 3rd party code.
I'm adding libc glue to make the transition totally painless. I had
intended to mark the old syscalls as COMPAT6, but the potential to shoot
your feet by building a new kernel without COMPAT_FREEBSD6 but with a
slighly older userland was too great. For now, they have manual
"freebsd6_" prefixes rather than being COMPAT6. They will go back to
being marked 'COMPAT6' after 7-stable starts.
Approved by: re (kensmith)
Also, change the visibility of compat syscalls a slightly. Compat
syscalls were missing from 'syscalls.h' entirely. This additionally adds
them with their compat prefix. eg: SYS_freebsd6_mmap.
Also, the syscalls.c names strings have different prefixes to differentiate
syscalls. Instead of several "old.mmap" strings, there will now be a
"compat.mmap" and "compat6.mmap" etc. Before, both would have had the
same "old.mmap" label.
Approved by: re
that should be a no-op (for example, requesting SYNC on record path).
The standards does not indicate that such requests are illegal, so
just return it as success instead of EINVAL.
Approved by: re (mux)
shall not be called while holding cdev mutex. devfs_inos unrhdr has cdev as
mutex, thus creating this LOR situation.
Postpone calling free() in kern/subr_unit.c:alloc_unr() and nested functions
until the unrhdr mutex is dropped. Save the freed items on the ppfree list
instead, and provide the clean_unrhdrl() and clean_unrhdr() functions to
clean the list.
Call clean_unrhdrl() after devfs_create() calls immediately before
dropping cdev mutex. devfs_create() is the only user of the alloc_unrl()
in the tree.
Reviewed by: phk
Tested by: Peter Holm
LOR: 80
Approved by: re (kensmith)
the 7.0 timeframe.
This is needed because I4B is not locked and NET_NEEDS_GIANT goes away.
The plan is to lock I4B and bring everything back for 7.1.
Approved by: re (kensmith)
setenv(3) by tracking the size of the memory allocated instead of using
strlen() on the current value.
Convert all calls to POSIX from historic BSD API:
- unsetenv returns an int.
- putenv takes a char * instead of const char *.
- putenv no longer makes a copy of the input string.
- errno is set appropriately for POSIX. Exceptions involve bad environ
variable and internal initialization code. These both set errno to
EFAULT.
Several patches to base utilities to handle the POSIX changes from
Andrey Chernov's previous commit. A few I re-wrote to use setenv()
instead of putenv().
New regression module for tools/regression/environ to test these
functions. It also can be used to test the performance.
Bump __FreeBSD_version to 700050 due to API change.
PR: kern/99826
Approved by: wes
Approved by: re (kensmith)
can acquire shared filedescriptor locks in the appropriate cases.
- Remove Giant from calls that issue ioctls. The ioctl path has been
mpsafe for some time now.
- Only acquire giant for VOP_ADVLOCK when the filesystem requires giant.
advlock is now mpsafe.
Reviewed by: rwatson
Approved by: re
to protect this datastructure instead.
- Preallocate an extra lockf structure in case we want to split a lock
on insert or delete.
- msleep() on the vnode interlock when blocking on a lock.
Reviewed by: rwatson
Approved by: re
- Use cpu_spinwait() in the spin loops in stop_cpus(), restart_cpus(), and
smp_rendezvous_action().
- Remove unneeded acq memory barriers in stop_cpus(), restart_cpus(), and
smp_rendezvous_action().
- Add an additional synch point in smp_rendezvous() to ensure that all the
CPUs will always see an up-to-date value of smp_rv_setup_func.
Reviewed by: attilio
Approved by: re (kensmith)
Tested on: alpha, amd64, i386, sparc64 SMP (for several years)
nfsnode could lead to attrs being stale. One example (that we
ran into) was a READDIR+, WRITE. The responses came back in
order, but the attrs from the WRITE were loaded before the
attrs from the READDIR+, leading to the wrong size from being
read on the next stat() call.
MFC after: 1 week
Submitted by: mohans
Approved by: re (kensmith)
recoverable and unrecoverable. For the former, we redirty the
buffer and hang onto it for future retries. For the latter (eg.
ESTALE), we discard the buffer and return the error back to the
user on the next syscall. This fixes a number of vfs panics and
fixes having a large number of dirty buffers (that cannot be
written out and reclaimed) from hanging around. Thanks to ups@
for discussions on this issue.
Reported by: kris, Kai, others
Approved by: re (kensmith)
Lock cdev mutex too to close the race with tty being freed.
Relock clone_drain_lock to prevent the LOR with proctree lock, thus
add #include <fs/devfs/devfs_int.h>.
Suggested by: tegge
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)
Lock Giant in the clone handler.
Use destroy_dev_sched() explicitely from pty_maybecleanup() and postpone
pty_release() until both master and slave cdevs are destroyed by setting
it as callback for destroy_dev_sched().
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)
destroy_dev() is called from csw method, and no d_purge driver method is
provided. Transform the direct call to destroy_dev() into destroy_dev_sched().
Reviewed by: njl (programming interface)
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)
destroy_dev() from d_close() cdev method would self-deadlock.
devfs_close() bump device thread reference counter, and destroy_dev()
sleeps, waiting for si_threadcount to reach zero for cdev without
d_purge method.
destroy_dev_sched() could be used instead from d_close(), to
schedule execution of destroy_dev() in another context. The
destroy_dev_sched_drain() function can be used to drain the scheduled
calls to destroy_dev_sched(). Similarly, drain_dev_clone_events() drains
the events clone to make sure no lingering devices are left after
dev_clone event handler deregistered.
make_dev_credf(MAKEDEV_REF) function should be used from dev_clone
event handlers instead of make_dev()/make_dev_cred() to ensure that created
device has reference counter bumped before cdev mutex is dropped inside
make_dev().
Reviewed by: tegge (early versions), njl (programming interface)
Debugging help and testing by: Peter Holm
Approved by: re (kensmith)
First, we were never correctly checking for a 24XX Status Type 0
response- that cased us to fall through to evaluate status for
commands as if this were a 2100/2200/2300 Status Type 0 response.
This is *close*, but not quite the same. This has been reported
to be apparent with some wierd lun configuration problems with
some arrays. It became glaringly apparent on sparc64 where none
of the correct byte swap things were done.
Fixing this omission then caused a whole universe shifting debug
cycle of endian issues for the 2400. The manual for 24XX f/w turns
out to be wrong about the endianness of a couple of entities. The
lun and cdb fields for the type 7 request are *not* unconditionally
big endian- they happen to be opposite of whatever the endian of
the current machine type is. Same with the sense data for the
24XX type 0 response.
While we're at it investigate and resolve some NVRAM endian
issues.
Approved by: re (ken)
MFC after: 3 days
call the sctp_free_remote_address() function.
- Assure that when we allocate a chunk the whoTo is NULL,
also when we free it and place it into the cache we NULL
it (that way the consolidation code will always work).
- Fix a small race, when a empty data holder is left on the stream
out queue, and both sides do a shutdown, the empty data holder
would prevent us from sending a SHUTDOWN-ACK and at the same time we
never would cleanup the empty holder (since nothing was ever in queue).
We now add a utility function that a) cleans up empty holders and
b) properly determines if there are still pending data chunks on
the stream out wheel.
Approved by: re@freebsd.org (Ken Smith)
of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate
the acquisition of the vnode interlock before releasing the vm object's
lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is
performed. Unfortunately, this allows the vnode to be recycled between
the release of the vm object's lock and the vget() on the vnode.
In this revision, I prevent the vnode from being recycled by acquiring
another reference to the vm object and underlying vnode before releasing
the vm object's lock.
This change also addresses another preexisting but trivial problem. By
acquiring another reference to the vm object, I also prevent the vm
object from being recycled. Previously, the "vnodes skipped" counter
could be wrong because if it examined a recycled vm object.
Reported by: kib
Reviewed by: kib
Approved by: re (kensmith)
MFC after: 3 weeks
and replace with software-testable sysctl node (security.audit) that
can be used to detect kernel audit support.
Obtained from: TrustedBSD Project
Approved by: re (kensmith)
Submitted by: Simon Schubert <corecode@fs.ei.tum.de>
- Defer flushing unsolicited response into taskqueue thread rather
than handle it directly in interrupt handler, since few of its
operations (like measuring/calibrating jack impedance) are quite
expensive.
- Misc. debugging cleanups.
Tested by: joel
Approved by: re (hrs)
MFC after: 3 days
Note: The offending quirk should have been made model/codec specific,
but since there were no records / log which model requires it, the quirk
logic had to be inverted (blacklist instead of whitelist).
Tested by: Arkadiy Dudevitch <dudevitch@englerllc.com>
Approved by: re (hrs)
MFC after: 3 days
This commit includes only the kernel files, the rest of the files
will follow in a second commit.
Reviewed by: bz
Approved by: re
Supported by: Secure Computing
holding the page queues lock. Thus, the page table pages released by
pmap_remove() and pmap_remove_pages() can be freed after the page queues
lock is released.
Approved by: re (kensmith)
- provide dummy routines for ic_scan_curchan and ic_scan_mindwell, we do not support those operations.
- add ieee80211_scan_done() to tell the scanning module that all channels have been scanned.
- pass IEEE80211_S_SCAN state off to net80211 so it can initiate scanning
- fix overflow in the rates array
- scale the rate value passed back from the firmware scan to the units that net80211 uses.
Submitted by: Token
Reviewed by: sam, avatar
Approved by: re (kensmith)
operating channel and use this in the scan cache rather than directly using
ic_curchan. Some firmware cards can only do a full scan and so ic_curchan does
not have the correct value.
Also add IEEE80211_CHAN2IEEE to directly dereference ic_ieee from the channel
to be used in the fast path.
Reviewed by: sam, sephe
Approved by: re (kensmith)
to be index by IEEE channel number but that is no longer the case and it needs
to be searched for.
Submitted by: avatar
Reviewed by: sam
Approved by: re (kensmith)
(1) Add size parameter to usbd_get_string()
(2) Properly limit speed when a full speed hub is plugged into a high
speed hub.
Submitted by: Hans Petter Selasky
PR: 80773, 79725
Approved by: re@ (kensmith)
yet supported by this driver. Support will be committed soon, or a
filter on all the 'newer' devices will be installed before the
release.
Approved by: re@ (blanket)
Obtained from: NetBSD, OpenBSD
Small Furry Animals by: Pink Floyd
switch (i.e. lid) is set to have an action of NONE. This is not an
invalid state, so silently return. This fixes the warning:
"acpi: request to enter state S6 failed (err 22)"
Approved by: re
the command. Make UFI devices return 'success' when asked to do a
SYNC_CACHE. There's no support for write caching in the UFI spec, so
this is the most appropriate action to undertake.
Reviewed by: scottl
Approved by: re@ (blanket)
Hellmuth with some refinements by myself and flz@. It works for me
with my non-MS mice, so nothing should be broken by it.
Submitted by: Hellmuth Michaelis
PR: 90162
Approved by: re (blanket)
pr, the submitter says:
Found this while running freebsd as guest in qemu with -usb
parameter. The patch implements the missing dynamic size based on
number of ports a hub has.
Submitted by: Lonnie Mendez
PR: 94946
Approved by: re@ (blanket)
- Remove unnecessary NULL checks after M_WAITOK allocations.
- Use VOP_ACCESS instead of hand-rolled suser_cred()
calls. [1]
- Use malloc(9) KPI to allocate memory for string. The
optimization taken from NetBSD is not valid for FreeBSD
because our malloc(9) already act that way. [2]
Requested by: rwatson [1]
Submitted by: Howard Su [2]
Approved by: re (tmpfs blanket)
properly (un)padded on the arm platform. With this change, FreeBSD/arm
boxes are able to route AppleTalk properly.
Submitted/tested by: Nathan Whitehorn <nathanw at uchicago dot edu>
Tested on: arm, i386, amd64
Approved by: re (kensmith)
patch that converts ms to ticks was used. Another PR states that a
return code of 0 is the right one for libusb.
Submitted by: Lonnie Mendez
PR: 94311
Approved by: re (blanket)
adequate. Increase them to 1k. The referenced PR made this a sysctl,
but that seems like overkill to me. The difference between 320 and
2048 bytes in modern systems, even embedded ones, seems to be in the
noise to be worth the extra hair to make it settable.
PR: 74609
Submitted by: Divacky Roman
Approved by: re (blanket)
applied to, but I'd think both), honor the timeout that's been set.
Return 0 bytes to be consistant with what libusb expects. By default,
the timeout will be zero, so only applications that change the default
will see a change. The patch only seems to apply to the interrupt end
points, but it should also apply to isochronous endpoints as well.
Submitted by: Maurice Castro
PR: 110122
Approved by: re (blanket)
- In audit_bsm.c, make sure all the arguments: ARG_AUID, ARG_ASID, ARG_AMASK,
and ARG_TERMID{_ADDR} are valid before auditing their arguments. (This is done
for both setaudit and setaudit_addr.
- Audit the arguments passed to setaudit_addr(2)
- AF_INET6 does not equate to AU_IPv6. Change this in au_to_in_addr_ex() so the
audit token is created with the correct type. This fixes the processing of the
in_addr_ex token in users pace.
- Change the size of the token (as generated by the kernel) from 5*4 bytes to
4*4 bytes (the correct size of an ip6 address)
- Correct regression from ucred work which resulted in getaudit() not returning
E2BIG if the subject had an ip6 termid
- Correct slight regression in getaudit(2) which resulted in the size of a pointer
being passed instead of the size of the structure. (This resulted in invalid
auditinfo data being returned via getaudit(2))
Reviewed by: rwatson
Approved by: re@ (kensmith)
Obtained from: TrustedBSD Project
MFC after: 1 month
some false positives but at this moment it is better to add
support then to dont have it at all (comment from Soren).
PR: kern/111516
Submitted by: Thomas Nystrom <thn at saeab dot se>
Approved by: re (kensmith)
Approved by: imp (mentor)
OK'ed by: sos (With the comment noted above about false
positives).
could lead to a deadlock).
- sleepq_set_timeout acquires callout_lock (via callout_reset()) only
with sleepq chain lock held
- msleep_spin in _callout_stop_safe lock the sleepqueue chain with
callout_lock held
In order to solve this don't use msleep_spin in _callout_stop_safe() but
use directly sleepqueues as inline msleep_spin code. Rearrange the
wakeup path in order to have it consistent too.
Reported by: kris (via stress2 test suite)
Tested by: Timothy Redaelli <drizzt@gufi.org>
Reviewed by: jhb
Approved by: jeff (mentor)
Approved by: re
This is very similar to sx_init_flags: it initializes the rwlock using
special flags passed as third argument (RW_DUPOK, RW_NOPROFILE,
RW_NOWITNESS, RW_QUIET, RW_RECURSE).
Among these, the most important new feature is probabilly that rwlocks
can be acquired recursively now (for both shared and exclusive paths).
Because of the recursion counter, the ABI is changed.
Tested by: Timothy Redaelli <drizzt@gufi.org>
Reviewed by: jhb
Approved by: jeff (mentor)
Approved by: re
put out a ispreqt2e_t structure onto the request queue- not a ispreqt2_t
structure. I forgot that the 23XX can use a t2 structure.
Approved by: re (ken, implicitly)
MFC after: 3 days
mpo_check_proc_setaudit_addr to be used when controlling use of
setaudit_addr(), rather than mpo_check_proc_setaudit(), which takes a
different argument type.
Reviewed by: csjp
Approved by: re (kensmith)
changes for example:
(From Craig Leres):
tip to a rocketport line
run "/etc/rc.d/devfs restart"
exit tip
(wait for the system to reboot)
Thanks to Robert Watson for poking me to fix this.
PR: kern/109152
Approved by: imp (mentor)
Approved by: re (kensmith)
Reviewed by: jhb
Submitted by: Craig Leres <leres@ee dot lbl dot gov>
of the file numerically for vendors and then each product numerically
by vendor (with all the foo2's sorting after the foo's). Someday, all
the usbdevs will be merged, I hope, but until then, we have these
mega-merges.
This also finishes the LINKSYS4 -> CISCOLINKSYS rename.
Approved by: re@ (blanket)
firmware reset. Also zero out struct iwi_rateset although its not strictly
necessary.
Reported by: Maxim Konovalov
Reviewed by: sam
Approved by: re (bmah)
- Remove tmpfs_zone_xxx KPI, the uma(9) wrapper, since
they does not bring any value now.
- Use |= instead of = when applying VV_ROOT flag.
- Remove tm_avariable_nodes list. Use uma to hold the
released nodes.
- init/destory interlock mutex of node when init/fini
instead of ctor/dtor.
- Change memory computing using u_int to fix negative
value in 2G mem machine.
- Remove unnecessary bzero's
- Rely uma logic to make file id allocation harder to
guess.
- Fix some unsigned/signed related things. Make sure
we respect -o size=xxxx
- Use wire instead of hold a page.
- Pass allocate_zero to obtain zeroed pages upon first
use.
Submitted by: Howard Su
Approved by: re (tmpfs blanket, kensmith)
to put out a ispreqt3e_t structure onto the request queue-
not a ispreqt3_t structure. We weren't. This turns out only
to really matter for big endian machines.
Approved by: re (ken)
MFC after: 3 days
around an output freezing problem (see the CVS log for details). This
is the same approach that sio takes to solve that problem. However,
ucom has a problem that sio doesn't have.
Consider the case where output is pending, and the device is closed.
ttyclose calls tt_close (which indirects to ucomclose) and then calls
ttyflush which calls tt_stop (which indirects to ucomstop). Since
ucomclose removed all the usb transfer points, sc_oxfer will be NULL
when ucomstop calls ucomstart. This results in a null pointer
dereference.
Since calling ucomstart in ucomstart solves other problems, we need to
work with this calling sequence. The easiest way to do that is to
bail early if sc_oxfer is NULL.
Kazuaki ODA-san came up with this patch, and filed a PR. I had seen
this bug at work and this patch does seem to solve it. He had no idea
why it worked, but knew that either this patch, or backing out ucom.c
1.56 fixed his panic. I just did the legwork of chasing down the code
paths that would cause this, and added a comment. This is obscure
enough to warrant a comment, I think.
Submitted by: Kazuaki ODA-san
PR: 113964
Approved by: re (bmah)
around to force the IO port to a fixed address. They were only turned
on in the module build and were present since the original import. This
breaks soft power-off on the Asus A7V since it reprograms the SMBus base
address to a different one than the BIOS expects. A similar issue was
found in the alpm(4) module build.
PR: kern/113986, i386/97468
MFC after: 3 days
Approved by: re
where a device timeout that occurs with a mgt frame on the tx q
will leave the net80211 layer w/o any way to make progress.
Reviewed by: thompsa, sephe
Approved by: re (hrs)
request queues rather than shove it down a word at a time, we have
to remember to put it into little endian format. Use the macros
ISP_IOXPUT_{16,32} for this purpose. Otherwise, on sparc the firmware
is loaded garbled and we get a (not surprisingly) firmware checksum
failure and the card won't start and we don't attach it.
Approved by: re (bruce)
MFC after: 3 days
both 6.x and 7.x. This is based on feedbacks on this thread
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=81818+0+current/freebsd-stable
and my use it on 6.x.
MFC after: 3 days
- Update the warning about UNION filesystem. It is now actively maintained,
although there are still some issues being resolved.
Reviewed by: freebsd-stable@, kris, bmah
Approved by: re (bmah)
- Fix fwd-tsn to use proper accessor so it does not overrun mbufs
- Fix stream reset error reporting to actually work (it has always been
broken if the peer rejects a stream reset)
- Some 64 bit friendly changes
Approved by: re(bmah@freebsd.org)
some quota limit was exceeded. Sequence of UFS_VALLOC()/UFS_VFREE()
call there could cause inodeblock to have both freefile and inodedep
dependencies without any inode in the block being marked for write.
Then, softdep_check_suspend() would return EAGAIN forewer.
Force write of inodeblock with allocated freefile softdependency by
setting IN_MODIFIED flag in softdep_freefile and unconditionally calling
UFS_UPDATE() in ufs_reclaim.
Reported by: kris
Debug help and tested by: Peter Holm
Approved by: re (kensmith)
MFC after: 3 weeks
Previously it didn't honor parent dma tag's restrictions such that
an invalid dma segment could be passed to device. The driver for the
device may panic in sanity check routine for the dma segment or may
produce unexpected results. I have no idea how it could ever have
worked before.
Reviewed by: grehan
Tested by: gad
Approved by: re (hrs)
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.
Reviewed by: scottl (long a ago)
Approved by: re (hrs)
Improvements:
* /etc/rc.suspend,rc.resume are always run, no matter the source of the
suspend request (user or kernel, apm or acpi)
* suspend now requires positive user acknowledgement. If a user program
wants to cancel the suspend, they can. If one of the user programs
hangs or doesn't respond within 10 seconds, the system suspends anyway.
* /dev/apm is clonable, allowing multiple listeners for suspend events.
In the future, xorg-server can use this to be informed about suspend
even if there are other listeners (i.e. apmd).
Changes:
* Two new ACPI ioctls: REQSLPSTATE and ACKSLPSTATE. Request begins the
process of suspending by notifying all listeners. acpi is monitored by
devd(8) and /dev/apm listener(s) are also counted. Users register their
approval or disapproval via Ack. If anyone disapproves, suspend is vetoed.
* Old user programs or kernel modules that used SETSLPSTATE continue to
work. A message is printed once that this interface is deprecated.
* acpiconf gains the -k flag to ack the suspend request. This flag is
undocumented on purpose since it's only used by /etc/rc.suspend. It is
not intended to be a permanent change and will be removed once a better
power API is implemented.
* S5 (power off) is no longer supported via acpiconf -s 5 or apm -z/-Z.
This restores previous behavior of halt/shutdown -p being the interface.
* Miscellaneous improvements to error reporting
Approved by: re
o Consistently use device_foo_t and bus_foo_t for functions implementing
device_foo and bus_foo respectively. Adjust those routines that were wrong
(we should do this throughout the tree).
o make all the modules depend on usb. Otherwise these modules won't
load.
o ucycom doesn't need usb_port.h
o Minor unifdefing
o uhub, umass, ums, urio, uscanner conversion complete.
o ukbd: Remove the NO_SET_PROTO quirk (fixes a PR 77940). NetBSD removed
their check and setting the proto a long time ago.
o umodem panic fixed. UQ_ASSUME_CM_OVER_DATA quirk removed because I've never
seen a umodem that needed this rejection for proection (this gets rid of
~20% of the quirks).
Approved by: re@ (kensmith)
PR: 77940
Older drivers that do not wish to convert to the native API (which
will work with both 6.x and 7.x) can simply include
<dev/usb/usb_port.h>. Drivers in the tree shouldn't these macros,
unless they actually work on other OSes and are actively maintained.
Approved by: re@
when 'make obj' was done first. I found this when fixing
a problem reported by tinderbox, but forgot to send the
patchset to re@ altogether.
Approved by: re (kensmith)
Postpone call to devfs_free() after cdev mutex is dropped. Reuse
cdp_list link for queuing devices awaiting deletion in the
cdevp_free_list.
Reported by: Hans Petter Selasky <hselasky c2i net>
Tested by: Peter Holm
Approved by: re (kensmith)
MFC after: 2 weeks
mouse pointer instead of a 8 x 16 one so device drivers don't
need to bring there own one there and in gfb_mouse() (ab)use
the pixel_mask argument of putm() to pass along on/off info as
erasing the mouse cursor image by redrawing the text underneath
doesn't work as we use hardware cursors on sparc64.
allowing the driver for the host-PCI-bridge to indicate that
reenumeration of the PCI busses isn't supported by returning
-1 instead of a valid PCI bus number. This is needed in order
support both Tomatillo, which don't support reenumeration and
thus are apparently intended to be used for independently
numbered PCI domains only, and Psycho bridges, whose busses
need to be reenumerated on at least some E450, without the
#ifndef currently used for sun4v in order to support multiple
independently PCI domains. The actual allocation/incrementation
of the PCI bus numbers is now done in psycho(4), though it
no longer establish a mapping between bus numbers and device
nodes like ofw_pci_alloc_busno() did as that functionality
wasn't used (but can easily brought back if really needed).
The now no longer used sys/sparc64/pci/ofw_pci.c is also
removed from sys/conf/files.sun4v as ofw_pci_alloc_busno()
wasn't used there in the first place.
- In ofw_pci_default_{adjust_busrange,intr_pending}() sanity
check that the device has a parent before passing it on.
- Make psycho_softcs static to sys/sparc64/pci/psycho.c as
it's not used outside of that module.
- In sys/sparc64/pci/ofw_pcib_subr.c remove the superfluous
inclusion of opt_global.h and correct the debug output for
adjusting the subordinate bus number.
instead of using the PCI bus number, like it's already done for
sun4v in order to deal properly with independently numbered PCI
domains which can't be reenumerated (in the case of sun4u f.e.
Tomatillo bridges). For machines where we need to reenumerate
all PCI busses this change obviously introduces the theoretical
cosmetic problem that the device number of the PCI bus no longer
equals to its PCI bus number. In practice this doesn't happen
as both are assigned linearly and in parallel.
- to show a specific set: ipfw set 3 show
- to delete rules from the set: ipfw set 9 delete 100 200 300
- to flush the set: ipfw set 4 flush
- to reset rules counters in the set: ipfw set 1 zero
PR: kern/113388
Submitted by: Andrey V. Elsukov
Approved by: re (kensmith)
MFC after: 6 weeks
passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because
it came from the inactive queue and PG_UNMANAGED pages are not in any
page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS
objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED.
So, if the page that is passed to vm_pageout_clean() is not
PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its
neighbors from the same object cannot themselves be PG_UNMANAGED.
Reviewed by: tegge
don't have it. Some partitioning schemes, as well as file systems,
operate on the geometry and without it such schemes (e.g. MBR)
and file systems (e.g. FAT) can't be created. This is useful for
memory disks.
will intialize the the header length and re-initialize the mbuf pointer
to reference the mbuf that is allocated after moving user supplied packet
data in.
to hold off freeing if there is data pending ... someone
might do send/close. Which means we want the data to
go and then close it after startup. Added comments to
the code as well to note that this is done for a reason.
Remove device_t dv, since it is no longer needed.
Add sizeof(device_t) to replace sizeof dv.
Change device_detach(dev) to device_detach(dev->subdevs[i]) since the type
of dev isn't right! Not sure when this was introduced, but it likely would
lead to a crash on disconnect.
MFC After: 1 week
of the magic string is passed in a 32-bit register, we can't use high
memory in the PAE case. This also eliminates a use of vtophys().
Tested by: Jeff Shimbo <jts767 / gmail.com>
MFC after: 1 week
now takes a device_t to be the parent of the bus that is being created.
Most SIMs have been updated with a reasonable argument, but a few exceptions
just pass NULL for now. This argument isn't used yet and the newbus
integration likely won't be ready until after 7.0-RELEASE.
can be allocated atomically
- add debug macros for printing lock initialization / teardown
- add buffers to port_info and adapter to allow each lock to have a
unique name
- destroy mutexes initialized by cxgb_offload_init
- remove recursive calls to ADAPTER_LOCK
- move callout_drain calls so that they don't occur with the lock held
- ensure that only as many qsets as are needed are initialized and
destroyed
MFC after: 3 days
Sponsored by: Chelsio Inc.
- re-factor the packet drop in sctp_output a bit more, we don't need the
trim after all, but the size calc is now corrected.
- When a assoc is in the COOKIE-ECHO/COOKIE-WAIT state and the user
closes, it should not matter if data is queued, the assoc should be
purged.
- In error leg a missing free_chunk when iph comes in NULL (should not
happen but just in case).
to nonzero you fulfill the same function as the variable 'cmp'. so you
might as well zero match and test against it later.
Reviewed by: timeout on review request
of obtaining them over and over again and pretending we could do
anything useful without them (for chosen this includes adding a
declaration and initializing it in OF_init()).
- In OF_init() if obtaining the memory or mmu handle fails just call
OF_exit() instead of panic() as the loader hasn't initialized the
console at these early stages yet and trying to print out something
causes a hang. With OF_exit() one at least has a change to get back
to the OFW boot monitor and debug the problem.
- Fix OF_call_method() on 64-bit machines (this is a merge of
sys/dev/ofw/openfirm.c rev 1.6).
- Replace OF_alloc_phys(), OF_claim_virt(), OF_map_phys() and
OF_release_phys() in the MI part of the loader with wrappers around
OF_call_method() in the sparc64. Beside the fact that they duplicate
OF_call_method() the formers should never have been in the MI part
of the loader as contrary to the OFW spec they use two-cell physical
addresses.
- Remove unused functions which are also MD dupes of OF_call_method().
- In sys/boot/sparc64/loader/main.c add __func__ to panic strings as
different functions use otherwise identical panic strings and make
some of the panic strings a tad more user-friendly instead of just
mentioning the name of the function that returned an unexpected
result.
handlers as filter/"fast" handlers so shutdown_nice() can
acquire the process lock.
- Use bus_{read,write}_8() instead of bus_space_{read,write}_8()
in order to get rid of sc_bushandle and sc_bustag in the softc.
- Remove the banal and outdated comment above sbus_filter_stub().
allowing it to be a filter/"fast" handler. Locking the interrupt
handlers with a spin lock is mainly a requirement in schizo(4)
but as we ought to register the spin lock anyway it should not
hurt to take advantage of it in psycho(4).
- Pass both a driver_filter_t and a driver_intr_t argument to
psycho_set_intr(), allowing to get rid of the FAST interrupt
flag hack.
- Don't register the over-temperature interrupt handler as filter/
"fast" handler so shutdown_nice() can acquire the process lock.
- Use bus_{read,write}_8() instead of bus_space_{read,write}_8()
in order to get rid of sc_bushandle and sc_bustag in the softc.
- Correct the debug output for adjusting the subordinate bus number.
- Remove the banal and outdated above psycho_filter_stub().
- Fix some white space nits.
a privilege is checked against the real uid rather than the effective
uid, instead decide which uid to use in priv_check_cred() based on the
privilege passed in. We use the real uid for PRIV_MAXFILES,
PRIV_MAXPROC, and PRIV_PROC_LIMIT. Remove the definition of
SUSER_RUID; there are now no flags defined for priv_check_cred().
Obtained from: TrustedBSD Project
- Move the rtc_mtx spin lock out from under #ifdef SMP as it's just
not SMP-specific.
- Add a new spin lock pcib_mtx for locking "fast" interrupt handlers
of host-to-PCI bridge drivers on sparc64.
of the register rather than in the offset describing the register.
- In gem_reset_rx() let gem_bitwait() check for the Rx reset bit
rather than the Tx reset bit to clear.
Obtained from: OpenBSD (same/similar bugs being fixed)
These CPUs use an enhanced layout of the interrupt vector dispatch
and dispatch status registers in order to allow sending IPIs to
multiple targets simultaneously. Thus support for these CPUs was
put in a newly added cheetah_ipi_selected(). This is intended to
be pointed to by cpu_ipi_selected, which now is a function pointer,
in order to avoid cpu_impl checks once booted. Alternatively it
can point to spitfire_ipi_selected(), which was renamed from
cpu_ipi_selected(). Consequently cpu_ipi_send() was also renamed
to spitfire_ipi_send() (there's no need for a cheetah equivalent
of this so far). Initialization of the cpu_ipi_selected pointer
and other requirements is done in mp_init(), which was renamed
from mp_tramp_alloc(), as cpu_mp_start() isn't called on UP
systems while cpu_ipi_selected() is. As a side-effect this allows
to make mp_tramp static to sys/sparc64/sparc64/mp_machdep.c.
For the sake of avoiding #ifdef SMP and for keeping the history in
place cheetah_ipi_selected() and spitfire_ipi_{selected,send}()
where not put into/moved to sys/sparc64/sparc64/{cheetah,spitfire}.c
- Add some CTASSERTs and KASSERTs ensuring that MAXCPU doesn't
exceed the data types we use to store the CPU bit fields or the
number of USIII and greater CPUs supported by the current
cheetah_ipi_selected() implementation (which for JBus-CPUs is
only 4; that should be fine though as according to OpenSolaris
there are no sun4u machines with more than 4 JBus-CPUs).
- In cpu_mp_start() don't enumerate and start more than MAXCPU CPUs
as we can't handle more than that.
- In cpu_mp_start() check for upa-portid vs. portid depending on
cpu_impl for consistency with nexus(4).
- In spitfire_ipi_selected() add KASSERTs ensuring that a CPU isn't
told to IPI itself as sun4u CPUs just can't do that.
- In spitfire_ipi_send() do a MEMBAR #Sync after writing the
interrupt vector data as we want to make sure the payload was
actually written before we trigger the dispatch.
- In spitfire_ipi_send() also verify IDR_BUSY when checking whether
the dispatch was successful as it has to be cleared for this to
be the case.
- Remove some redundant variables.
RTC function of a National Semiconductor PC87317/PC97317. This
consists of using the century register the same way Solaris does
for compatibility reasons. Once there is a MD power(4) we'd also
want to interface the APC (Advanced Power Control) functionality
of the same chip function with it.
- Use a macro for the device description and take advantage of
ISA_PNP_PROBE() setting the device description.
- Use the generated typedefs for the prototypes of the device
interface functions.
reasons outlined in the comment removed along with it, because the
OFW hostid has no real meaning for FreeBSD and mainly so the OFW
hostid is not confused with the FreeBSD hostid.
moving OF_set_mmfsa_traptable() (SUNW,set-trap-table with the two
arguments used here is specific to sun4v) to MD code.
- In sys/dev/ofw/openfirm.h remove prototypes for unimplemented
functions and unused Solaris compatibility macros.
to be compiled into every driver making use of it. Use a const instance
of struct gfb_font for this as the font isn't intended to be changed at
run-time and in order to accompany the font data with height and width
info.
- Add missing prototypes.
- Define global variables not used outside of this module as static.
- Replace some outdated hard-coded functions names in panic strings
with __func__.
- Fix some style(9) bugs.
data and remove the array size from the definition as f.e. the gallant
12 x 22 font data is 256 * 44 in size, exceeding the previously hard-
coded size.
- Declare the bold8x16 instance of struct gfb_font as const as it's not
intended to be changed at run-time as a whole either.
- Use __FBSDID in xboxfb.c
Tested by: rink
the passed in auth_type is unacceptable to rpcauth_buildheader-
this avoids a null pointer panic. Clean up allocations if this
happens. This also quiets a gcc 4.2 complaint about ussing mheadend
without it being initialized.
Reviewed by: alfred
through wpa_supplcant. If a sta is deauth'd (e.g. due to inactivity)
with roaming mode set to manual then a subsequent MLME assoc request
will be incorrectly handled and the station will never reauthenticate.
To fix this interpret a reason code of zero as sufficient to send an
auth request frame.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
eradication in/from userland path, countless locking fixes, etc.
- General sleep call through msleep(9) has been converted to condvar(9)
with better consistencies.
- Heavily guard every possible "slow path" entries (open(), close(),
few ioctl()s, sysctls), but once it entering "fast path" (io, interrupt
started), they are free to fly on their own.
- Rearrange locking sequences, resulting better concurrency and
serialization. Large part doesn't even need locking at all, and will be
removed in future. Less clutter, except in few places due to lock
ordering.
- Anonymous mixer object creation/deletion to simplify mixer handling
beyond typical mixer ioctls.
Submitted by: chibis (with modifications)
- Add few mix_[get|set|..] functions to avoid calling mixer_ioctl()
directly using cryptic arguments.
- Locking fixes to avoid possible deadlock with (still under Giant) USB.
- Better simplex/duplex device handling.
- Recover mmap() functionality for recording, which has been lost
since 2.2.x - 3.x (the introduction of newpcm). Full-duplex mmap still
doesn't work (due to VM/page design), but people still can mmap
both by opening each direction separately. mmaped playback is guarantee
to work either way.
- New sysctl: "hw.snd.compat_linux_mmap" to allow PROT_EXEC page
mapping, due to recent changes in linux compatibility layer which
require it. All linux applications that using sound + mmap() (mostly games)
require this to be enabled. Disabled by default.
- Other goodies.. too many, that will increase releng7 shareholder value
and make users of releng6 (and below) cry ;)
* This commit should be atomic. If anything goes wrong (not counting problem
originated from elsewhere), I will not hesitate to revert everything back
within 12 hours. This substantial changes itself not a rocket science
and the process has begun for almost 2 years, and lots of incremental
changes are already in place during that period of time.
* Some issues does occur in snd_emu10kx (note the 'x') due to various
internal locking issues and it is currently being worked on by chibis.
Tested by: chibis (Yuriy Tsibizov), joel, Alexandre Vieira,
many innocent souls...
Without bus_dma clean up and increment of number of Tx descriptors
it's hard to guarantee correct Tx operation in TSO case. The TSO
support would be enabled again when I get more feeback from re(4)
patch posted to current.
Please note that, this is currently considered as an
experimental feature so there could be some rough
edges. Consult http://wiki.freebsd.org/TMPFS for
more information.
For now, connect tmpfs to build on i386 and amd64
architectures only. Please let us know if you have
success with other platforms.
This work was developed by Julio M. Merino Vidal
for NetBSD as a SoC project; Rohit Jalan ported it
from NetBSD to FreeBSD. Howard Su and Glen Leeder
are worked on it to continue this effort.
Obtained from: NetBSD via p4
Submitted by: Howard Su (with some minor changes)
Approved by: re (kensmith)
- Remove unnecessary timestamps.
- Return CAM_RESRC_UNAVAIL for ORB shortage.
- Fix a lock problem when doorbell is used.
- Fix a potential bug for unordered execution.
'result' is still NULL and we do not need to free anything.
That allows us to gc the entire goto parts and a now unused variable.
Found with: Coverity Prevent(tm)
CID: 2519
do not continue with a NULL pointer. [1]
While here change the return of the error handling code path above.
I cannot see why we should always return 0 there. Neither does KAME
nor do we in here for the similar check in all the other functions.
Found with: Coverity Prevent(tm) [1]
CID: 2521
114 bytes of cmos ram in the PC clock chip. The big difference between
this and the Linux version is that we do not recalculate the checksums
for bytes 16..31.
We use this at work when cloning identical machines - we can copy the
bios settings as well. Reading /dev/nvram gives 114 bytes of data but
you can seek/read/write whichever bytes you like.
Yes, this is a "foot, gun, fire!" type of device.
without an mtag in ipsec4_common_input_cb.
So in case of !IPCOMP (AH,ESP) only change the m_tag_id if an mtag
was passed to ipsec4_common_input_cb.
Found with: Coverity Prevent(tm)
CID: 2523
handle, document those sprotos using an IPSEC_ASSERT so that it will
be clear that 'spi' will always be initialized when used the first time.
Found with: Coverity Prevent(tm)
CID: 2533
- In tdq_choose() only assert that a thread does not have too high a
priority (low value) for the queue we removed it from. This will catch
bugs in priority elevation. It's not a serious error for the thread
to have too low a priority as we don't change queues in this case as
an optimization.
Reported by: kris
thinking it had the whole chunk. This could cause a crash if
a large packet drop came in. Fixed by adjusting the trunc length
down to the limit.
- Large sacks with lots of segments could also have same issue. Changed
duplicate and segment handling to use proper get_m_ptr function to
pull each block from mbuf chains.
ioctl routines if we are running with !mpsafenet
- Change un-conditional Giant acquisition around ifpromisc
to occur only if we are running with !mpsafenet
With these locking bits in place, we can now remove the Giant
requirement from BPF, so drop the D_NEEDGIANT device flag.
This change removes Giant acquisitions around BPF device
handlers (read, write, ioctl etc).
MFC after: 1 month
Discussed with: rwatson
or idle priority of another process owned by the same user. This means
that privilege in rtprio(2) (and rtprio_thread(2)) is required indirectly
via p_cansched(9) or directly to set realtime/idle privilege, rather than
directly affecting target process authorization.
- Fix so VRF's will clean themselves up when no references are around.
- Allow sctp_ifa to be passed into inpcb_bind, addr_mgmt_ep_sa to bypass
normal validation checks.
- turn auto-asconf off for subset bound sockets
- Moves all logging to use KTR. This gets rid of most
of the logging #ifdef's with a few exceptions reducing
the number of config options for SCTP.
more exposure. The current state of SCTP implementation is
considered to be ready for 32-bit platforms, but still need some
work/testing on 64-bit platforms.
Approved by: re (kensmith)
Discussed with: rrs
Also, remove usb_malloc_type: it was unused.
Remove METHODS_NONE: it was unused.
Move include of opt_usb.h from usb_port.h to usb.h, since usb_port.h is
going away (there will be a usb_compat.h for out-of-tree drivers that want it).
- Depessimize userret() in kernels where KTRACE is enabled by doing an
unlocked check of the per-process queue of pending events before
acquiring any locks. Previously ktr_userret() unconditionally acquired
the global ktrace_sx lock on every return to userland for every thread,
even if ktrace wasn't enabled for the thread.
- Optimize the locking in exit() to first perform an unlocked read of
p_traceflag to see if ktrace is enabled and only acquire locks and
teardown ktrace if the test succeeds. Also, explicitly disable tracing
before draining any pending events so the pending events actually get
written out. The unlocked read is safe because proc lock is acquired
earlier after single-threading so p_traceflag can't change between then
and this check (well, it can currently due to a bug in ktrace I will fix
next, but that race existed prior to this change as well).
Reviewed by: rwatson
during execve() when turning off tracing due to executing a setuid binary
as non-root. Previously this could fail to acquire Giant and fail an
assertion if the ktrace file was on a non-MPSAFE filesystem and the
executable was on an MPSAFE filesystem.
MFC after: 3 days
Reported by: kris
bridged, previously legitimate traffic was not passed as the bridge could not
tell that it was on a different Ethernet segment.
All non-tagged traffic is treated as vlan1 as per IEEE 802.1Q-2003
than the 5288.
It is not correctly implemented in earlier silicon, and the BIOS often
lies about AHCI capability on platforms where these chips are deployed.
With this change I am able to boot FreeBSD on the ASUS Vintage AH-1
barebones system.
Approved by: sos
tunnels, and was not MPSAFE. The code can be easily restored in the
event that someone with an IPX over IP tunnel configuration can work
with me to test patches.
This removes one of five remaining consumers of NET_NEEDS_GIANT.
Approved by: re (kensmith)
timing loops being optimized away.
Once apon a time, gcc promised not to optimize away timing loops, but
gcc started optimizing away the call to a null function in the timing
loop here some time between gcc-3.3.3 and gcc-3.4.6, and it started
optimizing away the timing loop itself some time between gcc-3.4.6
and gcc-4.2.
- update to firmware version 4.1.0
- switch over to standard method for initializing cdevs (contributed by scottl@)
- break out timer_reclaim_task to be per-port
- move msix teardown into separate function
- fix bus_setup_intr for msi-x for the multi-port case so that msi-x resources
are not corrupted on unload
- handle 10/100/1000 base-T media and auto negotiation
- bind qset to cpu even for singleq case
- white space cleanups
- remove recursive PORT_LOCK
- move mtu setting to separate function
- stop and re-init port when changing mtu
- replace all direct references to m_data with calls to mtod
- handle attach failure better by not trying to de-initialize
taskqueues when they have not been allocated
- no longer default to jumbo frames
Sponsored by: Chelsio
MFC after: 3 days
- Add and document the KVM and KVM_SUPPORT options that
are needed for the ifmcstats(3) makefile
- Garbage collect unused variables
- Add missing inclusion of bsd.own.mk where needed
Approved by: kan (mentor)
Reviewed by: ru
its an INIT collision case.
- Fixed RTO calc to maintain a seperate variable to track
if a RTO calc as been done, this allows the RTO var to be
doubled during initial timeouts.
- Reduces the amount of stack used by process control.
- Use a constant for the peer chunk overhead.
- Name change to spell candidate correctly.
- Remove unused kse fields from struct proc.
- Group remaining fields and #ifdef KSE them.
- Move some kern_kse.c only prototypes out of proc and into kern_kse.
Discussed with: Julian
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.
This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.
The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html
Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.
This work was financially supported by another FreeBSD committer.
Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)
but are a seperate call that can be re-used if needed.
- 64 bit issues
o re-arrange cookie so it is better 64 bit aligned
o For wire level things we need the packed attribute.
- Add a count of exiting threads, p_exitthreads, to struct proc.
- Increment p_exithreads when we set the deadthread in thread_exit().
- When we thread_stash() a deadthread use an atomic to drop the count.
- Spin until the p_exithreads count reaches 0 in thread_wait().
- Lock the last exiting thread momentarily to be certain that it has
exited cpu_throw().
- Restructure thread_wait(). It does not need a loop as there will only
ever be one thread.
Tested by: moose@opera.com
Reported by: kris, moose@opera.com
MCLBYTES for the segment size but it used too many Tx descriptors in
TSO case.
While I'm here adjust maximum size of the sum of all segment lengths
in a given DMA mapping to 65535, the maximum size, in bytes, of a IP
packet.
o s/printf/device_printf/g
o Nuke OpenBSDism.
o Nuke NetBSD/OpenBSD specific DMA sync operations.(we don't have a way
to sync a single descriptor within a DMA map.)
o Remove recursive mutex.
o bus_dma(9) clean up.
o 40bit DMA address support.
o Add protection for Rx map load failure.
o Fix a long standing bug for watchdog timeout. [1]
o Add additional protections, missing Tx completion interrupt, losing
start Tx command, for watchdog timeout.
o Switch to taskqueue(9) API to handle interrupts.
o Use our own timer for watchdog instead of if_watchdog/if_timer
interface.
o Advertise VLAN header length/capability correctly to upper layer.
o Remove excessive kernel stack consumption in nfe_encap().
o Handle highly fragmented mbuf chains correctly.
o Enable etherenet address reprogramming with ifconfig(8).
o Add ALTQ/TSO, MSI/MSIX support.
o Increased Rx ring to 256 descriptors from 128.
o Align Tx/Rx descriptor ring on sizeof(struct nfe_desc64) boundary.
o Remove alignment restrictions on Tx/Rx buffers.
o Rewritten jumbo frame support code.
o Add support for hardware assistend VLAN tag insertion/stripping.
o Add support for Tx/Rx flow control based on patches from Peer Chen. [2]
o Add a routine that detects whether ethernet address swap routines is
required. [3]
o Add a workaround that take MAC/PHY out of power down mode.
o Add suspend/resume support.
o style(9) and code clean up.
Special thanks to Shigeaki Tagashira, the original porter of nfe(4),
who submitted lots of patches, performed uncountable number of
regression tests and maintained nfe(4) for a long time. Without his
enthusiastic help and support I could never have completed this
overhauling task.
The only weak point of nfe(4) compared to nve(4) is instability of
manual half-duplex media selection on certain hardwares(auto sensing
media type should work for all cases, though). This was a long
standing bug of nfe(4) and I still have no idea why it doesn't work
on some hardwares.
Obtained from: OpenBSD [1]
Submitted by: Peer Chen < pchen at nvidia dot com > [2], [3]
Reviewed by: Shigeaki Tagashira < shigeaki AT se DOT hiroshima-u DOT ac DOT jp >
Tested by: Shigeaki Tagashira, current
Discussed with: current
Silence from: obrien
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.
Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.
We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.
Reviewed by: csjp
Obtained from: TrustedBSD Project
- Allow LRO to be enabled / disabled at runtime
- Fix a double-free at module unload time.
- Only update timestamp in lro merge when it is present in the frame
Sponsored by: Myricom
- Use a seperate taskqueue+thread for reset tasks since iwi_ops will
block.
- Return from iwi_ops if the interface has been downed
- The firmware will fail if we are already associated
- Add myself to the copyright
o major overhaul of the way channels are handled: channels are now
fully enumerated and uniquely identify the operating characteristics;
these changes are visible to user applications which require changes
o make scanning support independent of the state machine to enable
background scanning and roaming
o move scanning support into loadable modules based on the operating
mode to enable different policies and reduce the memory footprint
on systems w/ constrained resources
o add background scanning in station mode (no support for adhoc/ibss
mode yet)
o significantly speedup sta mode scanning with a variety of techniques
o add roaming support when background scanning is supported; for now
we use a simple algorithm to trigger a roam: we threshold the rssi
and tx rate, if either drops too low we try to roam to a new ap
o add tx fragmentation support
o add first cut at 802.11n support: this code works with forthcoming
drivers but is incomplete; it's included now to establish a baseline
for other drivers to be developed and for user applications
o adjust max_linkhdr et. al. to reflect 802.11 requirements; this eliminates
prepending mbufs for traffic generated locally
o add support for Atheros protocol extensions; mainly the fast frames
encapsulation (note this can be used with any card that can tx+rx
large frames correctly)
o add sta support for ap's that beacon both WPA1+2 support
o change all data types from bsd-style to posix-style
o propagate noise floor data from drivers to net80211 and on to user apps
o correct various issues in the sta mode state machine related to handling
authentication and association failures
o enable the addition of sta mode power save support for drivers that need
net80211 support (not in this commit)
o remove old WI compatibility ioctls (wicontrol is officially dead)
o change the data structures returned for get sta info and get scan
results so future additions will not break user apps
o fixed tx rate is now maintained internally as an ieee rate and not an
index into the rate set; this needs to be extended to deal with
multi-mode operation
o add extended channel specifications to radiotap to enable 11n sniffing
Drivers:
o ath: add support for bg scanning, tx fragmentation, fast frames,
dynamic turbo (lightly tested), 11n (sniffing only and needs
new hal)
o awi: compile tested only
o ndis: lightly tested
o ipw: lightly tested
o iwi: add support for bg scanning (well tested but may have some
rough edges)
o ral, ural, rum: add suppoort for bg scanning, calibrate rssi data
o wi: lightly tested
This work is based on contributions by Atheros, kmacy, sephe, thompsa,
mlaier, kevlo, and others. Much of the scanning work was supported by
Atheros. The 11n work was supported by Marvell.
MCLBYTES for the segment size but it used too many Tx descriptors in
TSO case.
While I'm here adjust maximum size of the sum of all segment lengths
in a given DMA mapping to 65535, the maximum size, in bytes, of a IP
packet.
making the relevant files standard. This avoids duplication and
makes it easier to override/disable unwanted schemes. Since ARM
doesn't have a DEFAULTS configuration file, leave the source
files for the BSD and MBR partitioning schemes in files.arm for
now.
- Add codec id for AD1988B, along with fixing its line-in and other
issues (with proper quirks). [2]
Submitted by: [1] barbara.xxx1975@libero.it
[2] Oliver Brandmueller ob@e-Gitt.NET
MFC after: 3 days
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments
Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)
be called with an incorrect segment end value. tcp_reass() may
trim segments when they overlap with already existing ones in the
reassembly queue. Instead of saving the segment end value before
the call to tcp_reass() compute it on the fly based on the effective
segment length afterwards.
This bug was not really problematic as no information got lost and
the eventual SACK information computation was correct nontheless.
MFC after: 1 week
instead of an authentication function. There are a design reason
and a practical reason for that. First, the module belongs in
account management because it checks availability of the account
and does no authentication. Second, there are existing and potential
PAM consumers that skip PAM authentication for good or for bad.
E.g., sshd(8) just prefers internal routines for public key auth;
OTOH, cron(8) and atrun(8) do implicit authentication when running
a job on behalf of its owner, so their inability to use PAM auth
is fundamental, but they can benefit from PAM account management.
Document this change in the manpage.
Modify /etc/pam.d files accordingly, so that pam_nologin.so is listed
under the "account" function class.
Bump __FreeBSD_version (mostly for ports, as this change should be
invisible to C code outside pam_nologin.)
PR: bin/112574
Approved by: des, re
is really a memory mapped I/O address. The bug is in the GAS that
describes the address and in particular the SpaceId field. The field
should not say the address is an I/O port when it clearly is not.
With an additional check for the IA64_BUS_SPACE_IO case in the bus
access functions, and the fact that I/O ports pretty much not used
in general on ia64, make the calculation of the I/O port address a
function. This avoids inlining the work-around into every driver,
and also helps reduce overall code bloat.
builds had been succeeding if run serially but could fail if run in
parallel because the bge module build might start before ofw_bus_if.h
got created as part of the mainline kernel build.
Diagnosis and patch by: ru
to the build.
This allocator uses a binary buddy system with a twist. First and
foremost, this allocator is required to support the implementation of
superpages. As a side effect, it enables a more robust implementation
of contigmalloc(9). Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).
The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages. Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space. The performance benefits vary. In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.
This allocator does not implement page coloring. The reason is that
superpages have much the same effect. The contiguous physical memory
allocation necessary for a superpage is inherently colored.
Finally, the one caveat is that this allocator does not effectively
support prezeroed pages. I hope this is temporary. On i386, this is
a slight pessimization. However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects. I speculate
that this is true in general of machines with a direct map.
Approved by: re
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.
Reviewed by: jeff
Approved by: jeff (mentor)
in tcp_outout(). This is currently not strictly necessary but paves
the way to simplify the entire SYN options handling quite a bit.
Clarify comment. No change in effective behavour with this commit.
RFC1323 requires the window field in a SYN (i.e., a <SYN> or
<SYN,ACK>) segment itself never be scaled.
and simplify handling of the send/receive window scaling. No
change in effective behavour.
RFC1323 requires the window field in a SYN (i.e., a <SYN> or
<SYN,ACK>) segment itself never be scaled.
Noticed by: yar
- Unsafeness on ruadd() in thread_exit()
- Unatomicity of thread_exiit() in the exit1() operations
This patch addresses these problems allocating p_fd as part of the
process and modifying the way it is accessed.
A small chunk of this patch, resolves a race about p_state in kern_wait(),
since we have to be sure about the zombif-ing process.
Submitted by: jeff
Approved by: jeff (mentor)
a timer issues a shutdown and a simultaneous close on the socket
happens. This race condition is inherent in the current socket/
inpcb life cycle system but can be handled well.
Reported by: kris
Tested by: kris (on 8-core machine)
- Reorder send failed to be in correct order.
- Fixed calulation of init-ack to be right off
mbuf lengths instead of the precalculated value. This
will fix one 64 bit platform issue.
error doing so. It seems an increasing number of phones have this
quirk, and we're not keeping up. There appears to be nothing bad that
happens for non-quirked phones.
Minor cleanups:
o prefer device_printf over printf
o kill devinfo stuff
o minor other preening.
need to do it at all anymore. Remove it from here. Expand
USB_ATTACH_SETUP inline now that it is one line and we're moving away
from the compat macros. Remove some bzero calls that turn out not be
be necessary.
what we print, don't print it anymore. And don't compute it anymore.
And don't malloc/free memory for it anymore. While I'm here, prefer
device_printf where appropriate.
the value of ph_nhooks to zero, not the address. This removes
extranious calls to pfil_run_hooks (and an rw lock) from the
network stack's critical path when no pfil hooks are active.
Reviewed by: csjp
Sponsored by: Myricom Inc.
implementing some of them using existing ones.
- Allow to compile ZFS on all archs and use atomic operations surrounded
by global mutex on archs we don't have or can't have all atomic
operations needed by ZFS.
algoritm would not go through the proper initialization.
- The initialization was incorrect as well, causing problems in
sat networks with > 1sec RTT
- Get rid of magic numbers in RTT calculations.
A change to dconschat(8) will follow so that it can bomb
this address over FireWire to reset a wedged system.
Though this method is just a hack and far from perfection,
it should be useful if you don't want to go machine room
just to reset or to power-cycle a machine without
remote-managed power supply. And much better than doing:
# fwcontrol -m target-eui64
# dd if=/dev/zero of=/dev/fwmem0.2 bs=1m
Now, it's safe to call the fwohci interrupt(polling) routine while ddb/gdb
is active. After this change, a dcons connnection over FireWire can survive
bus resets even in kernel debugger.
This means that it is not too late to plug a FireWire cable after a panic
to investigate the problem.
Actually there is a small window(between a jump to kernel from loader and
initialization of dcons_crom) in which no one can take care of a bus reset.
Except that window, firewire console should keep working
from loader to reboot even with a panic and a bus reset.
(as far as you enable LOADER_FIREWIRE_SUPPORT)
embedded storage in struct ucred. This allows audit state to be cached
with the thread, avoiding locking operations with each system call, and
makes it available in asynchronous execution contexts, such as deep in
the network stack or VFS.
Reviewed by: csjp
Approved by: re (kensmith)
Obtained from: TrustedBSD Project
cache size limit but this bucket row is empty. Normally we want to
recycle the oldest entry in the bucket row. If there isn't any the
TAILQ_REMOVE leads to a panic by trying to remove a non-existing
element. Fix this by just returning NULL and failing the insert.
This is not a problem as the TCP hostache is only advisory.
Submitted by: jhb
grab sched_lock. This would serialize calls to pmap_switch from
cpu_switch(). With the introduction of thread_lock, this is not
possible anymore, because thread_lock is not a single lock. It
varies. Secondly and most importantly, it's not needed at all. The
only requirement for pmap_switch() is that it's not preempted
while in the middle of updating the CPU and PCPU. In other words,
it's a critical region. No locking required.
This is enabled by default. It should be disabled for
those who are uneasy with peeking/poking from FireWire.
Please note sbp(4) and dcons(4) over FireWire need
this feature.
- Moved BCM5706S/5708S SerDes support to brgphy (since they are not technically
TBI interfaces)
- Added 2.5G support for BCM5708S
Comments:
Since this driver is shared with bge I tested several available controllers
supported by bge and all worked as expected, however the list was not
exhaustive. Need wider testing.
MFC after: 4 weeks
In Rx path it allocates a new mbuf with m_getcl(9) so the length of
the mbuf is MCLBYTES which is greater than a segment size specified by
the dma tag. This segment size mismatch caused a voluntary panic.
Fix the panic by settting the mbuf length to TULIP_DATA_PER_DESC.
Reported by: Arne H Juul <arnej AT yahoo-inc DOT com>
Tested by: Arne H Juul <arnej AT yahoo-inc DOT com>
- lock its own locks and drop Giant.
- create its own taskqueue thread.
- split interrupt routine
- use interrupt filter as a fast interrupt.
- run watchdog timer in taskqueue so that it should be
serialized with the bottom half.
- add extra sanity check for transaction labels.
disable ad-hoc workaround for unknown tlabels.
- add sleep/wakeup synchronization primitives
- don't reset OHCI in fwohci_stop()
when coping out association data.
- Fixes a small bug that prevented the SCTP_UNORDERED indication
from going up to the app on a recv in the sinfo_flags field.
seems not enough to verify its consistencies.
- Define AC97_MIXER_SIZE as SOUND_MIXER_NRDEVICES (25), since we
don't need more than that. Stop doing wild and random guess about
its size since we're stricly bound to it.
application specific SEND_OP_COND (CMD55 + ACMD41), go ahead and allow
100 tries. This gives a timeout of a second rather than the ~100ms
the old style produces.
I've had one old 16MB SD card which needs the extra time. I've now
had reports from the field that other cards need this too.
Originally done at BSDcan 2007 while waiting to give my embedding
madness minitalk.
Add the machine-specific definitions for configuring the new physical
memory allocator.
Set the size of phys_avail[] and dump_avail[] using one of these
definitions.
which support a 2.5Gbps mode over fiber using next page extensions during
autonegotiation. Typically only found in blade systems which also include
a Broadcom 2.5Gbps capable switch.
MFC after: 2 weeks
oldthread should point at before we return.
- When cpu_switch() is called the td_lock pointer in the old thread may
point at the blocked lock. This prevents other processors from
switching into this thread while we're still switching out. Wait
until we're done deactivating the vmspace before we release the
thread by assigning to td_lock.
- Before we can activate the new vmspace we must make sure that the new
thread is not assigned to the blocked lock. It may be in the process
of switching out on another cpu. Spin until the new thread is
available.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Add a new parameter to cpu_switch() that is used to release the lock on
the outgoing thread and properly acquire the lock on the incoming
thread. This parameter is not required for schedulers that don't do
per-cpu locking and architectures which do not support it may continue
to use the 4BSD scheduler. This feature is presently not supported
on ia64
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- There is no globally visible scheduler lock any longer. For now the
watchdog can only check Giant. This model of checking particular locks
is flawed and should be revisited. Other metrics should be considered.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Use sched_throw() rather than replicating the same cpu_throw() code for
each architecture. This also allows the scheduler to use any locking it
may want to.
- Use the thread_lock() rather than sched_lock when preempting.
- The scheduler lock is not required to synchronize release_aps.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Add new spinlocks to support thread_lock() and adjust ordering.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Attempt to return the ttyinfo() selection algorithm to something sane
as it has been broken and disabled for some time. Adapt this algorithm
in such a way that it does not conflict with per-cpu scheduler locking.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Use a global umtx spinlock to protect the sleep queues now that there
is no global scheduler lock.
- Use thread_lock() to protect thread state.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
- Use a global kse spinlock to protect upcall and thread assignment. The
per-process spinlock can not be used because this lock must be acquired
via mi_switch() where we already hold a thread lock. The kse spinlock
is a leaf lock ordered after the process and thread spinlocks.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
- Replace the tail-end of fork_exit() with a scheduler specific routine
which can do the appropriate lock manipulations.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Protect the cp_time tick counts with atomics instead of a global lock.
There will only be one atomic per tick and this allows all processors
to execute softclock concurrently.
- In softclock, protect access to rusage and td_*tick data with the
thread_lock(), expanding the scope of the thread lock over the whole
function.
- Do some creative re-arranging in hardclock() to avoid excess locking.
- Protect the p_timer fields with the per-process spinlock.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.
- Move some common code into thread_suspend_switch() to handle the
mechanics of suspending a thread. The locking here is incredibly
convoluted and should be simplified.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Add a per-turnstile spinlock to solve potential priority propagation
deadlocks that are possible with thread_lock().
- The turnstile lock order is defined as the exact opposite of the
lock order used with the sleep locks they represent. This allows us
to walk in reverse order in priority_propagate and this is the only
place we wish to multiply acquire turnstile locks.
- Use the turnstile_chain lock to protect assigning mutexes to turnstiles.
- Change the turnstile interface to pass back turnstile pointers to the
consumers. This allows us to reduce some locking and makes it easier
to cancel turnstile assignment while the turnstile chain lock is held.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Adapt sleepqueues to the new thread_lock() mechanism.
- Delay assigning the sleep queue spinlock as the thread lock until after
we've checked for signals. It is illegal for a thread to return in
mi_switch() with any lock assigned to td_lock other than the scheduler
locks.
- Change sleepq_catch_signals() to do the switch if necessary to simplify
the callers.
- Simplify timeout handling now that locking a sleeping thread has the
side-effect of locking the sleepqueue. Some previous races are no
longer possible.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
- Move all scheduler locking into the schedulers utilizing a technique
similar to solaris's container locking.
- A per-process spinlock is now used to protect the queue of threads,
thread count, suspension count, p_sflags, and other process
related scheduling fields.
- The new thread lock is actually a pointer to a spinlock for the
container that the thread is currently owned by. The container may
be a turnstile, sleepqueue, or run queue.
- thread_lock() is now used to protect access to thread related scheduling
fields. thread_unlock() unlocks the lock and thread_set_lock()
implements the transition from one lock to another.
- A new "blocked_lock" is used in cases where it is not safe to hold the
actual thread's lock yet we must prevent access to the thread.
- sched_throw() and sched_fork_exit() are introduced to allow the
schedulers to fix-up locking at these points.
- Add some minor infrastructure for optionally exporting scheduler
statistics that were invaluable in solving performance problems with
this patch. Generally these statistics allow you to differentiate
between different causes of context switches.
Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).
Reviewed by: alc, bde
Approved by: jeff (mentor)
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.
Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.
Reviewed by: alc, bde
Approved by: jeff (mentor)
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.
Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.
In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported. In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.
- In the ioctl path let command get queued up and return
when complete _without_ blocking the driving waiting for
the response. This way the driver doesn't "lock up" for
~30s during a flash command. Submitted by scottl.
- Add a guard so that if a DCMD of 0 is sent down the ioctl
path don't send it to the controller. Return with a
status of OK. This is a little strange since MegaCli
doesn't seem to like something and will issue some DCMD
of 0. This doesn't happen under Linux. So the emulation
needs to be improved but I'm not sure what. Another strange
thing is that when a DCMD of 0 gets issued under i386 the
controller returns OK but in amd64 the context is messed
up.
- Add a guard so the context has to be with-in the legal
limit so we get a reasonable error assertion versus random
panic.
It's going to be a challenge to figure out why MegaCli is not totally
happy and then sends some bogus commands. This means that flashing
firmware via the Linux tool won't work since it generates a DCMD of
0 when it should be opening the firmware for a flash update. Without
this problem flashing works fine. This means there is no publicly
available tool to upgrade the RAID firmware under FreeBSD right now.
I plan to MFC all of the mfi changes to 6.X shortly. This might not
include the SCSI pass-through changes.
Submitted by: scottl
Reviewed by: scottl
MFC after: 3 days
1. Pass locking flags to VFS_ROOT().
2. Check v_mountedhere while the vnode is locked.
3. Always return locked vnode on success.
Change 1 fixes problem reported by Stephen M. Rumble - after
zfs_vfsops.c,1.9 change, zfs_root() no longer locks the vnode
unconditionally and traverse() didn't pass right lock type to
VFS_ROOT(). The result was that kernel paniced when .zfs/ directory
was accessed via NFS.
playtone() so that it uses times of 1/100ths of a second.
Now 'time echo T60ABC >/dev/speaker' takes ~3 seconds.
MFC after: 2 weeks
Problem noted by: dwmalone
- removed unused structure members
- fixed a minor bug that the ECN code point may not be restored correctly
Approved by: ume (mentor)
MFC after: 1 week
GEMs is unable to discriminate UDP from TCP packets such that
it can generate 0x0000 checksum value for the UDP datagram. So the
UDP checksum offload was disabled by default. You can enable it
by setting link0 flag with ifconfig(8).
o bus_dma(9) clean up. It now correctly set number of required DMA
segments/size and removed incorrect use of BUS_DMA_ALLOCNOW flag
in static allocations done via bus_dmamem_alloc(9).
o Implemented ALTQ(9) support.
o Implemented Tx side bus_dmamap_load_mbuf_sg(9) which can remove
several book keeping chores orginated from call-back mechanism.
Therefore gem_txdma_callback() was removed and its functionality
was reimplemented in gem_load_txmbuf().
o Don't set GEM_TD_START_OF_PACKET flag until all remaining mbuf
chains are set. I think it was a long standing bug and it caused
fluctuating interrupts/CPU usage patterns while netperf test
is in progress. Previously it seems that we race with the device.
Because I don't have a documentation for GEM I'm not sure this is
correct but almost all other documentations I have stated this
implications on setting SOP mark in descriptor ring(e.g. hme(4)).
o Borrowed gem_defrag() from ath(4) which is supposed to be much
faster than m_defrag(9) since it's not need to defrag all
mbuf chains.
o gem_load_txmbuf() was changed to allow passed mbuf chains to free.
Caller of gem_load_txmbuf() correctly handles freed mbuf chains.
o In gem_start_locked(), added checks for availability of Tx
descriptors before trying to load DMA maps which could save CPU
cycles when number of available descriptors are low. Also, simplyfy
IFF_DRV_OACTIVE detection logic.
o Removed hard-coded function names in CTR macros and replaced it
with __func__.
o Moved statistics counter register access to gem_tick() to reduce
number of PCI bus accesses. There is no reason to update statistics
counters in interrupt handler.
o Removed unnecessary call of gem_start_locked() in gem_ioctl().
Reviewed by: grehan (initial version), marius (with improvements and suggestions)
Tested by: grehan (ppc), marius(sparc64)
own entry in the softc. This should allow more of cbb_pci_intr() to
migrate to a new cbb_pci_filt() so that we don't have to run cbb's ISR
in almost every case we get an interrupt. We can't just move
cbb_pci_intr into cbb_pci_filt because it does things that aren't safe
to do from a fast interrupt handler, err I mean from a filter. This is
an important first step.
# I wonder if I need to make cardok volatile or not.
mpt.h:
Add support for reading extended configuration pages.
mpt_cam.c:
Do a top level topology scan on the SAS controller. If any SATA
device are discovered in this scan, send a passthrough FIS to set
the write cache. This is controllable through the following
tunable at boot:
hw.mpt.enable_sata_wc:
-1 = Do not configure, use the controller default
0 = Disable the write cache
1 = Enable the write cache
The default is -1. This tunable is just a hack and may be
deprecated in the future.
Turning on the write cache alleviates the write performance problems with
SATA that many people have observed. It is not recommend for those who
value data reliability! I cannot stress this strongly enough. However,
it is useful in certain circumstances, and it brings the performence in line
with what a generic SATA controller running under the FreeBSD ATA driver
provides (and the ATA driver has had the WC enabled by default for years).
Things can get ugly without it due to uninitialized class. RELENG_6 need
a simmilar, but different treatment as well.
err.. perhaps we should teach devclass_get_maxunit() to return -1 ?
MFC after: 1 day
o If we don't have a filter, also check to make sure the card is there before
calling the scheduled ISR. This is necessary to help old drivers whose
ISRs can't cope with being called with the hardware missing, which sadly
still exist in the tree. This is the main reason why we have an extra
layer of indirection for cardbus interrupts.
o If the card is no longer present, mark the interrupt as 'handled' rather
than 'stray' because this accounts for why the interrupt happened. Stray
isn't all bad, since there are other filters that would claim it...
o Fix some comments
+ Add comment about why we check for CARD_OK and touch the hardware in both
the filter and ISR.
+ add a note about why we don't care about Giant
+ also note that giant can't be taken out in a filter...
+ Some minor formatting nits on very long comments.
While in the suspend path, this means the idle thread will just return
immediately rather than trying to enter C1-n. This helps in the case where
the chipset is powered down before the rest of the system and reads from
the cpu sleep registers begin returning immediately, causing the logic that
catches bad C2/C3 behavior to kick in. Observed on my Panasonic Y4.
MFC after: 3 days
1.50 to help out with the GCC 2 to GCC 3 transition and it became
obsolete when C flags compatible with GCC 3.x became the default.
With GCC 4 in the tree this variable (i.e. GCC3) is beyond bogus
because it causes confusion when looking for the newly introduced
WITH_GCC3 option that helps the GCC 3 -> GCC 4 bump.
(j/i) was being used and it was being incremented, not decremented as before.
Factor out this code into a common function and call it from both the common
and per-CPU case.
MFC after: 1 day
The global lock is a memory region shared with the BIOS and thus
has some strange behavior like the fact that the sleep is 1 ms max.
We use standard mutexes to synchronize with the SCI so acquiring
the global lock after locking the mutex resulted in a witness
warning.
To deal with this for now, acquire the global lock before all other
locks, similar to Giant. This should fix the witness "sleeping
with mutex held" issue on boot that occurred after the last ACPI-CA
import. In the future, we hope to move to the new mutex interface
in ACPI-CA instead of the pseudo-semaphore version we have now.
Reviewed by: jkim
default_vrf_id
- Missing lock/unlock of inp added as well in the v6 side.
- IFN hash table moves to sctppcbinfo since indexes are
unique across systems (including different VRFs) this makes it easier
to do ifn lookups.
argument from being file descriptor index into the pointer to struct file:
part 2. Convert calls missed in the first big commit.
Noted by: rwatson
Pointy hat to: kib
remove associated comments.
Slip audit_file_rotate_wait assignment in audit_rotate_vnode() before
the drop of the global audit mutex.
Obtained from: TrustedBSD Project
concept that is NOT well thought out for a multi-homed transport
protocol. So the useless table-id entries passed around need to
be removed.
- Add a event timer for the zero copy api.
- Fix a bug in sctp_timer.c when searching for an alternate
with the largest ssthresh (the compare was wrong).
td_ru. This removes the requirement for per-process synchronization in
statclock() and mi_switch(). This was previously supported by
sched_lock which is going away. All modifications to rusage are now
done in the context of the owning thread. reads proceed without locks.
- Aggregate exiting threads rusage in thread_exit() such that the exiting
thread's rusage is not lost.
- Provide a new routine, rufetch() to fetch an aggregate of all rusage
structures from all threads in a process. This routine must be used
in any place requiring a rusage from a process prior to it's exit. The
exited process's rusage is still available via p_ru.
- Aggregate tick statistics only on demand via rufetch() or when a thread
exits. Tick statistics are kept in the thread and protected by sched_lock
until it exits.
Initial patch by: attilio
Reviewed by: attilio, bde (some objections), arch (mostly silent)
this change both simplifies the code and plugs a hole where the devise
was reset without keeping the management controller at bay :) Second,
the 82571 LAA reset problem was incomplete, this addition is necessary.
Just one of those days :)
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.
Requested by: alc
Approved by: jeff (mentor)
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.
Discussed with: jhb
- Rework the entire pcm_channel structure:
* Remove rarely used link placeholder, instead, make each pcm_channel
as head/link of each own/each other. Unlock - Lock sequence due to
sleep malloc has been reduced.
* Implement "busy" queue which will contain list of busy/active
channels. This greatly reduce locking contention for example while
servicing interrupt for hardware with many channels or when virtual
channels reach its 256 peak channels.
- So I heard you like v chan ... O RLY?
Welcome to Virtual **Record** Channels (vrec, rec vchans, vchans for
recording, Rec-Chan, you decide), the ultimate solutions for your
nagging O_RDWR full-duplex wannabe (note: flash plugins) monopolizing
single record channel causing EBUSY. Vrec works exactly like Vchans
(or, should I rename it to "Vplay" :) , except that it operates on the
opposite direction (recording). Up to 256 vrecs (like vchans) are
possible.
Notes:
* Relocate dev.pcm.%d.{vchans,vchanformat,vchanrate} to each of its
respective node/direction:
dev.pcm.%d.play.* for "play" (cdev = dsp%d.vp%d)
dev.pcm.%d.rec.* for "record" (cdev = dsp%d.vr%d)
* Don't expect that it will magically give you ability to split
"recording source" (eg: 1 channel for cdrom, 1 channel for mic,
etc). Just admit that you only have a *single* recording source /
channel. Please bug your hardware vendor instead :)
- Bump maxautovchans from 4 to 16. For a full-fledged multimedia
desktop/workstation with too many soundservers installed (esound,
artsd, jackd, pulse/polypaudio, ding-dong pling plong mudkip fuh fuh,
etc), 4 seems inadequate. There will be no memory penalty here, since
virtual channels are allocate only by demand.
- Nuke/Rework the entire statically created cdev entries. Everything is
clonable through snd own clone manager which designed to withstand many
kind of abusive devfs droids such as:
* while : ; do /bin/test -e /dev/dsp ; done
* jot 16777216 0 | while read x ; do ls /dev/dsp0.$x ; done
* hundreds (could be thousands) concurrent threads/process opening
"/dev/dsp" (previously, this might result EBUSY even with just
3 contesting threads/procs).
o Reusable clone objects (instead of creating new one like there's no
tomorrow) after certain expiration deadline. The clone allocator will
decide whether to reuse, share, or creating new clone.
o Automatic garbage collector.
- Dynamic unit magic allocator. Maximum attached soundcards can be tuned
using tunable "hw.snd.maxunit" (Default to 512). Minimum is 16, and
maximum is 2048.
- ..other fixes, mostly related to concurrency issues.
joel@ will do the manpage updates on sound(4).
Have fun.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.
Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)
properly observe the SB_NOINTR flag in sblock. This restores the
required behavior that lock acquisition be interruptible on the socket
buffer I/O serialization lock to allow threads waiting for I/O to be
signaled even if they aren't the thread currently holding the I/O lock.
With this change, the sblock regression test is again passed.
Reported by: alfred
sx(9) handiwork: attilio
These functions are intended to do the same actions of sx_xlock() and
sx_slock() but with the difference to perform an interruptible sleep, so
that sleep can be interrupted by external events.
In order to support these new featueres, some code renstruction is needed,
but external API won't be affected at all.
Note: use "void" cast for "int" returning functions in order to avoid tools
like Coverity prevents to whine.
Requested by: rwatson
Tested by: rwatson
Reviewed by: jhb
Approved by: jeff (mentor)
- Coverity Prevent(tm) CID 1906 a bogus use of bzero where unneeded.
- ICH8 systems autoneg to 100 rather than 1000, this can also be
seen in 82573, the logic was backwards.
- On new 82575 quadports half duplex tx speed is slow... this was due
to overwriting TCTL reg rather than adding bits.
- Fixed a LOR in handling a cookie. Turns out create lock is applied.
And if we abort processing, this causes LOR. Changed to force the
timer to clean up, that way create lock is released.
is expanded, size of expansion was not taken int consideration.
- Fix so vtag hash is 1 bigger so that it modulo's out
correctly, avoids a panic when restart with right modulo happens.
- do not dereference stcb when control->do_not_ref_stcb is set
- Fix up packet logging to not often use a lock and also to
add to options.
- Fix some logging option duplication in the sctputil.h
OpenBSD's if_ral.c.
I didn't make the LINKSYS4 -> CISCOLINKSYS name change, nor did I
include the RALINK RT2573 that's supported by the rum(4) driver. I
didn't merge any code changes either.
"0" cannot be a correct value since when the function is entered at least
one shared holder must be present and since we want the last one "1" is
the correct value.
Note that lock_profiling for sx locks is far from being perfect.
Expect further fixes for that.
Approved by: jeff (mentor)
patch:
- Do the correct test for ldt allocation
- Drop dt_lock just before to call kmem_free (since it acquires blocking
locks inside)
- Solve a deadlock with smp_rendezvous() where other CPU will wait
undefinitively for dt_lock acquisition.
- Add dt_lock in the WITNESS list of spinlocks
While applying these modifies, change the requirement for user_ldt_free()
making that returning without dt_lock held.
Tested by: marcus, tegge
Reviewed by: tegge
Approved by: jeff (mentor)
race seen on smp laptops when suspending where the rx task can be
entered after the interface is detach'd.
NB: use of taskqueue_drain while holding the softc mutex is problematic
Submitted by: ambrisko
MFC after: 1 month
It is disabled by default. You need to put
LOADER_FIREWIRE_SUPPORT=yes in /etc/make.conf
and rebuild loader to enable it.
(cd /sys/boot/i386 && make clean && make && make install)
You can find a short introduction of dcons at
http://wiki.freebsd.org/DebugWithDcons
as to the type of the command argument: int -> u_long.
These types have different widths in the 64-bit world.
Add a note to UPDATING because the change breaks KBI
on 64-bit platforms.
Discussed on: -net, -current
Reviewed by: bms, ru
hold a wq lock for the iterator. Panda uses a
silly recursive lock they hold through the timer.
- Add poor mans wireshark compile option..
- Allocate and start using SCTP_M_XXX for all SCTP_MALLOC() calls.
- sysctl now will get back the refcnt for viewing by onlookers.
Reviewed by: gnn
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.
Reviewed by: scottl
arm and powerpc have 64KB as the maximum argument size, so one
cannot run "make delete-old" on arm or powerpc anymore. Stop
special-casing powerpc and give it 256KB of arguments like all
other platforms, but keep arm on 64KB for now. There may be a
purpose to it that doesn't exist for powerpc.
produced incorrect behaviour with the KDB_UNATTENDED option) and call
panic in both the KDB and non-KDB cases. This change is consistent
with rwatson's current kdb/ddb work.
by the subsequent mix_setdevs() and friends.
- Minor style(9) declaration arrangement nit.
Requested by: joeld
Submitted by: pluknet <pluknet@gmail.com>
Note on dcons:
To enable dcons in kernel, put the following lines in /boot/loader.conf.
You may also want to enable dcons in /etc/ttys.
boot_multicons="YES"
#Force dcons to be the high-level console if a firewire bus presents.
#hw.firewire.dcons_crom.force_console=1
FireWire/dcons support in loader will come shortly.
(i386/amd64 only)
- bounded cookie-life to 1 second minimum in socket option set.
- Delayed_ack_time becomes delayed_ack per new socket api document.
- Improve port number selection, we now use low/high bounds and
no chance of a endless loop. Only one call to random per bind
as well.
- fixes so set_peer_primary pre-screens addresses to be
valid to this host.
- maxseg did not allow setting on an assoc basis. We needed
to thus track and use an association value instead of a inp value.
- Fixed ep get of HB status to report back properly.
- use settings flag to tell if assoc level hb is on off not
the timer.. since the timer may still run if unconf address
are present.
- check for crazy ENABLE/DISABLE conditions.
- set and get of pmtud (fixed path mtu) not always taking into account ovh.
- Getting PMTU info on stcb only needs to return PMTUD_ENABLED if
any net is doing PMTU discovery.
- Panic or warning fixed to not do so when a valid ip frag is
taking place.
- sndrcvinfo appearing in both inp and stcb was full size, instead
of the non-pad version. This saves about 92 bytes from each struct
by carefully converting to use the smaller version.
- one-2-one model get(maxseg) would always get ep value, never the
tcb's value.
- The delayed ack time could be under a tick, this fixes so
it bounds it to at least 1 tick for platforms whos tick
is more than a ms.
- Fragment interleave level set to wrong default value.
- Fragment interleave could not set level 0.
- Defered stream reset was broken due to a guard check and ntohl issue.
- Found two lock order reversals and fixed.
- Tighten up address checking, if the user gives an address the sa_len
had better be set properly.
- Get asoc by assoc-id would return a locked tcb when it was asked
not to if the tcb was in the restart hash.
- sysctl to dig down and get more association details
Reviewed by: gnn
in tcp_input():
o tighten the checks on allowed TCP flags to be RFC793 and
tcp-secure conform
o log check failures to syslog at LOG_DEBUG level
o rearrange the code flow to be easier to follow
o add KASSERTs to validate assumptions of the code flow
Add sysctl net.inet.tcp.syncache.rst_on_sock_fail defaulting to enable
that controls the behavior on socket creation failure for a otherwise
successful 3-way handshake. The socket creation can fail due to global
memory shortage, listen queue limits and file descriptor limits. The
sysctl allows to chose between two options to deal with this. One is
to send a reset to the other endpoint to notify it about the failure
(default). The other one is to ignore and treat the failure as a
transient error and have the other endpoint retransmit for another try.
Reviewed by: rwatson (in general)
based on individual fields beeing set. This doesn't work for setattr replay,
because va_type is set there, so we add AT_TYPE flag to va_mask, which won't
be accepted by zfs_setattr().
Reported by: kris
- Double the number of descriptors that a single call to send can use
- Quadruple the number of descriptors that can be reclaimed per pass
- only run reclaim twice per second
- increase coalesce timer from 3.5us to 5us
fix printf warning on 64-bit platforms
Neither me nor Ariff have access to any of this hardware, so all tests
have been made by Konstantin and Artem. Commit message mostly written
by Konstantin.
envy24:
- Add test code to support rear line-in input on 'Terratec DMX 6fire'
audio card. This code is also intended to be used in the future for
support of cards, that have I2C-to-GPIO expanders wired between the
control line of the audio codec and the Envy24, however such cards
are too complex and i can't add that support without hardware sample
of such board, i've already tried and failed.
envy24ht:
- Add support for 'AudioTrak Prodigy HD2'.
- Add support for 'AudioTrak Prodigy 7.1 XT'.
- Add support for 'ESI Juli@' (Works ok, DAC volume is hard-coded for
the time being, so 'mixer vol ...' doesn't work, only 'mixer pcm
...' works). [1]
- Fix bug in the init data for M-Audio Revolution 5.1, that
results in distorted sound.
- Add software volume control (now 'mixer pcm' works, thanks to Ariff).
- Add support for more samples rates - 176.4kHz and 192kHz.
- Fix problem with the 192kHz samples rate playback when 24.576MHz
crystal is used on the board instead of 49.152MHz crystal.
spicds:
- Add support for Asahi Kasei flagship DAC - AK4396 (used in AudioTrak
Prodigy HD2).
Submitted by: Konstantin Dimitrov <kosio.dimitrov@gmail.com>
Tested by: Artem Antonov [1]
Reviewed by: ariff
debugger is quite capable of handling Giant-free execution at this
point. Several other similar comments remain in trap.c on both i386
and amd64 awaiting analysis.
speculative loads. This at least makes control speculative loads
work. In the future we should analyze which faults/exceptions
we want to handle rather than defer to avoid having to call the
recovery code when it's not strictly necessary.
actually works. mbp_count() turns out only to be used in debugging code
in if_patm_intr.c, so this bug did not affect much in practice.
Found with: Coverity Prevent(tm)
CID: 1943
existing UMA statistics for pipes, and allows us to get rid of both the
per-pipe dtor and two atomic operations per pipe required to maintain
the counter.
at the credential to be used by the connection. However, the pointer's
value was ignored when actually setting hcp->nc_owner.
(1) Do set nc_owner to the owner pointer value so that the credential is
not discarded after being carefully configured.
(2) In the case where we create a new credential with modified uid, copy
the existing credential to initialize non-uid fields to existing
values, which will lead to a fully initialized MAC label, groups, etc.
Found with: Coverity Prevent(tm)
CID: 2226
the card, panic explicitly if EN_DEBUG is enabled. In the (default)
case of !EN_DEBUG, the driver resets the card. Probably this case
shouldn't exist at all.
debug is turned off, initialize locks with NOWITNESS flag.
At some point I'll get back to them, we would probably need BLESSING
functionality, which is currently turned off by default.
SD Simplified specification, as well as other SD and SDIO
implemenations I've examined, suggest this disclaimer may be required.
It is unclear to me exactly what the license would be for, or why it
might be required. Err on the side of caution and include this
disclaimer so anybody deploying this code can judge for themselves. I
have no further unformation about the details.
parent vnode and relock it after locking child vnode. The problem was that
we always relock it exclusively, even when it was share-locked.
Discussed with: jeff
clusters. This helps quite a bit on my low end machines (improves
performance by about 300Kpps when being blasted by a hardware
packet generator).
- Include one extended f/w counter forgotten in earlier commit
Sponsored by: Myricom Inc.
- upgrade to reflect state of 1.0.0.86
- move from firmware rev 3.2 to 4.0.0
- import driver bits for offload functionality
- remove binary distribution clause from top level files as it
runs counter to the intent of purely supporting the hardware
MFC after: 3 days
back in a simulated resume instead of entering the requested suspend state.
This helps in testing drivers separately from the acpi suspend code. To
test your drivers, set debug.acpi.suspend_bounce=1 and then run
acpiconf -s3 (or 4).
MFC after: 1 day
compiler invocation. This is just to help get over the hump of people
tracking down bugs that may cross the GCC 4.2 upgrade.
It is envisioned that this option goes away after a suitable amount
of time.
release number up to the max. This should eliminate the need to
tweak the default imageid define for later releases that are found
on the Intel web site.
MFC after: 1 month
quirky code: uarts, led, cf/ide, ixpqmgr, npe are now specified with hints.
May want to put some of these devices back in the code and just use hints
to override/specify configuration.
MFC after: 1 month
should setup is the class. This corrects an issue where enabling
uart1 on the avila board caused uart0 to stop working during boot
(no msgs generated by rc scripts were displayed).
Reviewed by: imp
MFC after: 3 weeks
initialization is complete. This fixes some root-on-ZFS
configurations.
Reported by: Bruno Damour <freebsd.ruomad@free.fr>
Tested by: Bruno Damour <freebsd.ruomad@free.fr>
o add the hex output of the th_flags field to the example log
line in comments
o simplify the log line length calculation and make it less
evil
o correct the test for the length panic; the line isn't on
the stack but malloc'ed
This was just wasteful when this was always called before lock_init()
(which overwrote both fields each time), but when
lock_profile_object_init() was moved into lock_init() the clearing of
lo_flags proved fatal (all locks became spin locks to _sleep(), etc.)
Reported by: kris
device's, not the bridge's, softc to be used to check the
PCIB_DISABLE_MSI flag. This resulted in randomly allowing
or denying MSI interrupts based on whatever value the driver
happened to store at sizeof(device_t) bytes into its softc.
I noticed this when I stopped getting MSI interrupts
after slighly re-arranging mxge's softc yesterday.
It's hard to measure performance improvement on my test machine, but the
change won't degrade performance for sure. I can measure slight improvement
for debugging kernel and it can also be a win for machines where atomic
operation is more expensive.
Reviewed by: kib
inline it when needed already, and the symbol is also required outside of
audit.c. This silences a new gcc warning on the topic of using __inline__
instead of __inline.
MFC after: 3 days
Implement all futex atomic operations in assembler to not depend on the
fuword() that does not allow to distinguish between -1 and failure return.
Correctly return 0 from atomic operations on success.
In collaboration with: rdivacky
Tested by: Scot Hetzel <swhetzel gmail com>, Milos Vyletel <mvyletel mzm cz>
Sponsored by: Google SoC 2007
- In rt_check() remove the senderr() macro and the "bad" label. They
used to simplify code, but now aren't.
- Remove extra RT_LOCK_ASSERT() in rt_setgate(). The RT_REMREF macro
does this.
- In rtfree() convert panics to KASSERTs.
- Strict the routing API: rtfree() should be called only in a case
when we are completely sure we've got the last reference on the
rtentry. In all other cases RTFREE_LOCKED() macro should be used.
If the reference isn't the last one spit out a warning printf.
Correct the only(?) case for this in rt_check().
- Fix typos in comments.
- Remove code to use the special wc_fifo. It has been disabled by default
in our other drivers as it actually slows down transmit by a small amount
- Dynamically determine the amount of space required for the rx_done
ring rather than hardcoding it.
- Compute the number of tx descriptors we are willing to transmit per
frame as the minimum of 128 or 1/4 the tx ring size.
- Fix a typo in the tx dma tag setup which could lead to unnecessary
defragging of TSO packets (and potentially even dropping TSO packets
due to EFBIG being returned).
- Add a counter to keep track of how many times we've needed to
defragment a frame. It should always be zero.
- Export new extended f/w counters via sysctl
Sponsored by: Myricom, Inc.
vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With
the exception of calls to vm_map_pmap_enter() from
madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter()
are both used to create speculative mappings. Thus, always
reactivating cached pages is a mistake. In principle, cached pages
should only be reactivated by an actual access. Otherwise, the
following misbehavior can occur. On a hard fault for a text page the
clustering algorithm fetches not only the required page but also
several of the adjacent pages. Now, suppose that one or more of the
adjacent pages are never accessed. Ultimately, these unused pages
become cached pages through the efforts of the page daemon. However,
the next activation of the executable reactivates and maps these
unused pages. Consequently, they are never replaced. In effect, they
become pinned in memory.
same way it was enabled for Linux binares in linuxulator.
This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.
Deal with IPv6 routing headers (see FreeBSD-SA-07:03.ipv6 for background)
Block IPv6 packets with routing headers by default, unless 'allow-opts'
is specified. Block RH0 unconditionally. Deal with ip6_plen 0.
MFC after: 1 week
Discussed with: mlaier
- Update to the latest (1.4.18) f/w. This f/w introduces a new
receive mode which allows us to use FreeBSD's physically discontinuous
MJUM9BYTES clusters.
- Switch the driver from chaining MJUMPAGESIZE clusters to using
MJUM9BYTES clusters to avoid mbuf chaining overheads. Due to this
change, people running obsolete f/w images will be limited to an MTU of
PAGE_SIZE - 16.
- Add (disabled by default) support for Large Receive Offload.
Sponsored by: Myricom, Inc.
processor is to jump to recovery code. This branching behaviour
may not be implemented by the processor and a Speculative Operation
fault is raised. The OS is responsible to emulate the branch.
Implement this, because GCC 4.2 uses advanced loads regularly.
scheduler lock is not involved. sched_lock still protects the sched_clock
call. Another patch will remedy this.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
Tested by: kris, jeff
referenced outside of mp_machdep.c
- Replace a magic 14 with the newly added IDC_ITID_SHIFT macro.
- Remove the global mp_boot_mid variable as it's not really necessary
and just replacing it with PCPU_GET(mid) doesn't have any impact on
performance once booted.
- Replace PCPU_GET(cpuid) with the curcpu shortcut.
- Replace hardcoded function names in panic strings etc with __func__
so they don't need to be updated when renaming the function.
- Use register_t instead of u_long for variables used to hold the
return value of intr_disable() so we don't need to apply any
knowledge about the actual width of that value here.
- Improve the wording of some comments.
- Fix several style(9) bugs.
- Use __FBSDID in identcpu.c.
- Remove #ifndef SUN4V around global cpu_impl variable; it doesn't
hurt on sun4v for now and once setPQL2() is gone sun4v can stop
sharing identcpu.c with sparc64, making the reminder of this file
also sparc64-only again. [1]
Submitted by: kmacy [1]
in the sun4v source in order to be able to compile the source which
is shared between sparc64 and sun4v just #include the sparc64
version here instead of duplicating it.
This is based on the approach taken by pc98 headers in order to
compile the source shared between i386 and pc98.
iommureg.h (which already began to bitrot) and iommuvar.h from the
sun4v source and adjust some of the source which is shared between
sparc64 and sun4v as appropriate.
ignore the size of any headers that were passed with the sendfile(2)
system call. Otherwise the file sent will be truncated by the header
size if the nbytes parameter was provided. The bug doesn't show up
when either nbytes is zero, meaning send the whole file, or no header
iovec is provided.
Resolve a potential error aliasing of errors from the VM and sf_buf
parts and the protocol send parts where an error of the latter over-
writes one of the former.
Update comments.
The byte accounting bug wasn't seen in earlier because none of the popular
sendfile(2) consumers, Apache, lighttpd and our ftpd(8) use it in modes
that trigger it. The varnish HTTP proxy makes full use of it and exposed
the problem.
Bug found by: phk
Tested by: phk
scheme allowed for 1024 PTE pages, each containing 256 PTEs.
This yielded 2GB of KVA. This is not enough to boot a kernel
on a 16GB box and in general too low for a 64-bit machine.
By adding a level of indirection we now have 1024 2nd-level
directory pages, each capable of supporting 2GB of KVA. This
brings the grand total to 2TB of KVA.
Fix the flags argument: M_WAITOK is not a valid flag. Its presence
leaves the indication that contigmalloc(9) will not return a NULL
pointer.
The use of contigmalloc(9) in this place is probably not a good idea
given the constraints. It's probably better to lift the constraints
and instead add a permanent mapping to the ITR. It's possible that
the first 256MB of memory is exhausted when we get here.
This fixes a kernel panic on a 16GB rx3600.
of each port and any further packets are blocked, when the all the marker frames
have been returned to us from the remote network device then we can be sure
that all interface queues are empty.
This is needed when a port is added or removed from the aggregation since it
will affect the hash based distribution, if the queues are not empty then a
packet from an existing connection may be placed on a different interface and
arrive out of order. This was previously achieved by suppressing transmission for
1 second, now that there is an active feedback this timeout as been increased
to 3 seconds and used as a fallback.
Switch ia64 kernels to -fpic. This is likely wrong, but at least gets
ia64 kernels to compile and link with GCC 4.2. The previous -mno-sdata
trick is not working anymore.
for use thoughout the tcp subsystem.
It is IPv4 and IPv6 aware creates a line in the following format:
"TCP: [1.2.3.4]:50332 to [1.2.3.4]:80 tcpflags <RST>"
A "\n" is not included at the end. The caller is supposed to add
further information after the standard tcp log header.
The function returns a NUL terminated string which the caller has
to free(s, M_TCPLOG) after use. All memory allocation is done
with M_NOWAIT and the return value may be NULL in memory shortage
situations.
Either struct in_conninfo || (struct tcphdr && (struct ip || struct
ip6_hdr) have to be supplied.
Due to ip[6].h header inclusion limitations and ordering issues the
struct ip and struct ip6_hdr parameters have to be casted and passed
as void * pointers.
tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr,
void *ip6hdr)
Usage example:
struct ip *ip;
char *tcplog;
if (tcplog = tcp_log_addrs(NULL, th, (void *)ip, NULL)) {
log(LOG_DEBUG, "%s; %s: Connection attempt to closed port\n",
tcplog, __func__);
free(s, M_TCPLOG);
}
lock and unlock conditionally, not just set the flag on it conditionally.
In practice, this bug couldn't manifest, as in the current revision of
the code, no callers pass a NULL rep.
CID: 1416
Found with: Coverity Prevent(tm)
While ng_fec called the ioctl to let interfaces in the bundle know
the list of multicast addresses had changed, it never actually
updated that list on the interfaces in the bundle. Consequently,
the multicast filters could be programmed incorrectly.
if_lagg does this correctly, by maintaining a list of addresses
that it has added to interfaces in the bundle. This commit basically
takes the if_lagg code and adds it to ng_fec.
A version of this patch for RELENG_6 has fixed some problems with
IPv6 ND over ng_fec. This is probably the problem in PR 107523.
PR: 107523
Tested by: Rob Gallagher <robert.gallagher@heanet.ie>
Obtained from: if_lagg
MFC after: 3 weeks
function calls are no more generated for vop_lock.
Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption
about vop naming conventions. This restores pre/post-condition calls.
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
speedup and will be more useful after each gains a spinlock in the
impending thread_lock() commit.
- Move initialization and asserts into init/fini routines. fini routines
are only needed in the INVARIANTS case for now.
Submitted by: Attilio Rao <attilio@FreeBSD.org>
Tested by: kris, jeff
specified in RFC4620. A new flag for icmp6_nodeinfo was added to enable the
feature.
- Also cleaned up the code so that the semantics of the icmp6_nodeinfo
flags is clearer (i.e., defined specific macro names instead of using
hard-coded values).
Approved by: gnn (mentor)
MFC after: 1 week
- Fixed RTOinfo for bounding.
- Fixed connect() to return ECONNREFUSED when an ABORT is received.
- Added comments to direct Static Analysis not to look at some things
it does not understand (comments are /* sa_ignore XXXXX */)
- Bind when colliding was broken, missing not_found = 1 before
checking to see if the port was in use caused endless bind loop.
- Cookie life needs to be in milliseconds to conform to socket api.
- Cookie life is not supposed to change if its 0, On the assoc
level set we changed it to 0 opps.
- Two more static analysis issues identified by the cisco
tool. Null checks needed.
- An issue for sendfile(). Need to validate the correct
input argument.
- When sending failed due to a no route to host, we leaked
the mbuf chain failing to call m_freem().
- Fix #ifdef issue for getting hash block len when HAVE_SHA2 is NOT defined
Reviewed by: gnn
defined. This restores the old behavior, and eliminates the
dependency on the kernconf.tmpl when INCLUDE_CONFIG_FILE isn't
included in the kernel config. There were many people in the terminal
room that had almost, but not quite, up-to-date config files that this
helps. I don't know if this is the result of skew among the cvsup
servers, or some other more subtle problem. However, this fix should
work for any config of recent vintage (I tested with the latest, and
one before the recent changes, and eye-balled the intermediate
versions).
Reviewed by: the terminal room crew
adapter list still capable, but only PCI-E adapters are now enabled.
The user can enable older PCI-X or PCI adapters using ifconfig.
Secondly, Arthur Hartwig pointed out my MSI change was not working
correctly, changed to something that now does. Thanks Arthur.
There was also a fundamental bug in the 82575 MSIX code, the MSIX
registers had to be mapped, opps :)
Rubber-stamped by: Pdeuskar
the power_nodriver tunable is off. pci_cfg_save() already checks the
tunable internally, and no other callers of pci_cfg_save() check the
tunable.
Reviewed by: imp
- Updated firmware to latest release (v3.4.8) to fix TSO + jumbo frame lockup
- Added MSI (hw.bce.msi_enable) and TSO (hw.bce.tso_enable) sysctls
- Fixed kernel panic when MSI is used and module is unloaded
- Added several new debug routines
- Removed slack space for RX/TX chains since it only covers sloppy coding
- Fixed a potential problem when programming jumbo MTU size in hardware
- Various other comment changes
MFC after: 4 weeks
because on at least my dc based cards there's garbage in there. The
recent changes in the resource code appears to have unmasked this
problem... At least dc now probes/attaches better than it did before.
Also, we no longer need to write to the cfg for the other registers.
different versions of FreeBSD source tree.
Old config(8) can now be used unless you want to use INCLUDE_CONFIG_FILE
option.
Approved by: imp
Reviewed by: imp
other than repo copied tcp_subr.c into tcp_timewait.c#1.284:
tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck()
tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset()
tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop()
tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan()
This is a mechanical move with appropriate renames and making
them static if used only locally.
The tcp_tw_2msl_scan() cleanup function is still run from the
tcp_slowtimo() in tcp_timer.c.
value in the mbuf with the result of the calculation. Previously,
if we chose to return an ICMP message, the quoted UDP checksum bytes
would be different to what was sent.
PR: 112471
Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz>
MFC after: 3 weeks
legacy codepath match the 82575, without this we were seeing bridging
fail on 82546 adapters. Secondly, I have limited TSO to PCI Express
adapters, I meant to do this and it got dropped in the earlier delta.
Next, I am dropping in the latest shared code from our development
team, consensus was that this should be done frequently, so I am :)
Approved by: pdeuskar
exists and contains the 'C' flag.
o The partition label can be the empty string. It's how labels are
cleared.
o When an action fails, lower permissions when they were raised
in order to allow the action. A failed action will not result
in any uncommitted changes.
o Allow the flags paremeter to be present but empty. It's the
equivalent of not being present.
processes under 64-bit kernels). Previously, each 32-bit process overwrote
its resource limits at exec() time. The problem with this approach is that
the new limits affect all child processes of the 32-bit process, including
if the child process forks and execs a 64-bit process. To fix this, don't
ovewrite the resource limits during exec(). Instead, sv_fixlimits() is
now replaced with a different function sv_fixlimit() which asks the ABI to
sanitize a single resource limit. We then use this when querying and
setting resource limits. Thus, if a 32-bit process sets a limit, then
that new limit will be inherited by future children. However, if the
32-bit process doesn't change a limit, then a future 64-bit child will
see the "full" 64-bit limit rather than the 32-bit limit.
MFC is tentative since it will break the ABI of old linux.ko modules (no
other modules are affected).
MFC after: 1 week
SIGCHLD/kevent(2) notification of process termination and wait(). Now
we no longer drop locks between sending the notification and marking
the process as a zombie. Previously, if another process attempted to do
a wait() with W_NOHANG after receiving a SIGCHLD or kevent and locked
the process while the exiting thread was in cpu_exit(), then wait() would
fail to find the process, which is quite astonishing to the process
calling wait().
MFC after: 3 days
option value so that unrecognized options are ignored as specified in RFC2711.
(packets containing an MLD router alert option are passed to the upper layer
as before).
Approved by: gnn (mentor), ume (mentor)
functions from their origininal place to their own files.
TCP Reassembly from tcp_input.c -> tcp_reass.c
TCP Timewait from tcp_subr.c -> tcp_timewait.c
is caused by my latest changes to config(8). You're supposed to install new
config(8) in order to prevent yourself from seeing a warning about old
version of that tool.
You should configure the kernel with a new config(8) then.
Oked by: rwatson, cognet (mentor)
This change will let us to have full configuration of a running kernel
available in sysctl:
sysctl -b kern.conftxt
The same configuration is also contained within the kernel image. It can be
obtained with:
config -x <kernelfile>
Current functionality lets you to quickly recover kernel configuration, by
simply redirecting output from commands presented above and starting kernel
build procedure. "include" statements are also honored, which means options
and devices from included files are also included.
Please note that comments from configuration files are not preserved by
default. In order to preserve them, you can use -C flag for config(8). This
will bring configuration file and included files literally; however,
redirection to a file no longer works directly.
This commit was followed by discussion, that took place on freebsd-current@.
For more details, look here:
http://lists.freebsd.org/pipermail/freebsd-current/2007-March/069994.htmlhttp://lists.freebsd.org/pipermail/freebsd-current/2007-May/071844.html
Development of this patch took place in Perforce, hierarchy:
//depot/user/wkoszek/wkoszek_kconftxt/
Support from: freebsd-current@ (links above)
Reviewed by: imp@
Approved by: imp@
protocol entry points using functions named proto_getsockaddr and
proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr.
While it's true that sockaddrs are allocated and set, the net effect is
to retrieve (get) the socket address or peer address from a socket, not
set it, so align names to that intent.
passed zero as exit signal.
GCC 4.2 changes the kernel data segment layout not to have 0
in that memory location. This code ran by luck before and now
the luck has run out.
- All printf that was surrounded by #ifdef SCTP_DEBUG moves to
a macro that does all of this. This removes all printfs from
the code and makes the code more portable and easier to
read.
- Static Analysis (cisco) - found a few bugs, but mostly we
add checks for NULL pointers and such to make the tool
happy. We now pass the Cisco SA tools checks except for
where it does not understand tailq/lists. We still need
to look at the coverity tools output too (this is like
the cisco SA tool) and see if it wants us to fix any other
items. Hopefully this will be the last major churn in the
code other than bug fixes.
This patch does the following:
- Remove un-necessary code that is not even compiling into the driver
under TW_OSL_NON_DMA_MEM_ALLOC_PER_REQUEST defines.
- Remove bundled firmware image and associated "files" entry for tw_cl_fwimg.c
- Remove bundled firmware flashing routines. We now have tw_update userspace
FreeBSD controller flash utility.
- Fix driver crash on load due to shared interrupt.
- Fix 2 lock leaks for Giant lock.
- Fix CCB leak.
- Add support for 9650SE controllers.
Many thanks to 3Ware/AMCC for continuing to support FreeBSD.
time workaround for problems with 82571 adapters and LAAs, one port
getting reset can cause the other to have its RAR[0] also reset,
thus overwriting an LAA. This fix works around it by also keeping
the address in the last array member.
The other bug is specific to the new 575 adapter, its transmit code
logic in handling hwassists was too crude, it broken when doing
bridges. I am much happier with the new logic,we may want to change
the legacy path at some point to something similar.
Reviewed by: pdeuskar
Approved by: pdeuskar
an APIC ID of 38 for its second CPU):
- Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern
systems.
- Size the various arrays in the MADT, MP Table, and SMP code that are
indexed by APIC IDs to allow for up to MAX_APIC_ID.
- Explicitly go through and assign logical cpu ids to local APICs before
starting any of the APs up rather than doing it while starting up the
APs. This step is now where we honor MAXCPU.
MFC after: 1 week
1) adding the thread to the sleepq via sleepq_add() before dropping the
lock, and 2) dropping the sleepq lock around calls to lc_unlock() for
sleepable locks (i.e. locks that use sleepq's in their implementation).
- Split the intr_table_lock into an sx lock used for most things, and a
spin lock to protect intrcnt_index. Originally I had this as a spin lock
so interrupt code could use it to lookup sources. However, we don't
actually do that because it would add a lot of overhead to interrupts,
and if we ever do support removing interrupt sources, we can use other
means to safely do so w/o locking in the interrupt handling code.
- Replace is_enabled (boolean) with is_handlers (a count of handlers) to
determine if a source is enabled or not. This allows us to notice when
a source is no longer in use. When that happens, we now invoke a new
PIC method (pic_disable_intr()) to inform the PIC driver that the
source is no longer in use. The I/O APIC driver frees the APIC IDT
vector when this happens. The MSI driver no longer needs to have a
hack to clear is_enabled during msi_alloc() and msix_alloc() as a result
of this change as well.
- Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to
complement apic_enable_vector() and use it in the I/O APIC and MSI code
when freeing an IDT vector.
- Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an
IRQ to its irq_rman. The MSI code uses this when it creates new
interrupt sources to let the nexus know about newly valid IRQs.
Previously the msi_alloc() and msix_alloc() passed some extra stuff
back to the nexus methods which then added the IRQs. This approach is
a bit cleaner.
- Change the MSI sx lock to a mutex. If we need to create new sources,
drop the lock, create the required number of sources, then get the lock
and try the allocation again.
119373: o Remove the query verb, along with the request and response
parameters.
o Add the version and output parameters.
119390: [APM,GPT] Properly clear deleted entries.
119394: o Make the alias the standard and use the '!' to prefix
literal partition types.
o Treat schemes and partition types as case insensitive.
119462: [GPT] Fix a page fault caused when modifying a partition entry
without a new partition type.
stack will process from 50 to 15. As this is a sysctl variable it
can be tuned up or down at the user/administrator's whim.
Submitted by: itojun
MFC after: 1 day
to the coverity tool.. may even be the same one.. not sure).
- A bug in the way sctp_abort() and friends were
setting the IP_CLOSE flag.. and NOT passing the
last argument as a (,1)... so that things would
get freed..
- Update to latest (1.4.17) firmware.
- Use the new MXGEFW_CMD_UNALIGNED_TEST (added in firmare 1.4.16) to
have the firmware tell us if the PCIe chipset supports aligned PCIe
completions.
- Hard to maintain, and frequently out of date whitelist of PCIe
chipsets known to produce aligned completions removed, as it has been
replaced in its role of selecting the correct firmware to run by the
use of MXGEFW_CMD_UNALIGNED_TEST.
- Break the dma test out of mxge_reset() and into its own function
(mxge_dma_test()) so it can be used by both the normal DMA test, and
to run the unaligned test.
- Improved support for enabling ECRCs
Sponsored by: Myricom Inc.
- PR-SCTP would ignore FWD-TSN's above a rwnd's worth
of TSN's (1 byte msgs).. this left the peer hopelessly
out of sync.. or an attacker. So now we abort the assoc.
- New IFN hash, also rename hashes to match addr/ifn now
that the vrf has multiple.
- Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default
as defined in the Socket API ID.
- Export MTU information via sysctl.
- Vrf's need table id's. This is default for
BSD, but may be other things later when BSD
fully supports VRFs.
- Additional stream reset bug (caught by cisco dev-test).
- Additional validations for the address in sending a message (socket api).
-------- and -----
- Fix association notifications not to give the active open
side false notifications.
- Fix so sendfile and SENDALL will work properly (missing
flag to say socket sender is done).
- Fix Bug that prevented COOKIES from being retransmitted.
- Break out connectx into helper sub-models so that iox routines can
reuse the helpers.
- When an address is added during system init (non-dynamic mode) make
sure that the "defer use" flag is not set.
** its compiling on XR now :-D **
Reviewed by: gnn
and in_setsockaddr(), containing only stale comments on why they
exist, remove them and initialize the protosw for UDP to directly
reference in_setpeeraddr() and in_setsockaddr().
The entire code is wrapperd in #ifdef ... #endif so it won't harm
the actual implementation, but developers are encouraged to test it.
For arm, ia64, ppc, sparc64 and sun4v some work is still
needed, thus arch maintainers are encouraged to bring their arch on par
with respect to i386 and amd64.
Approved by: re (implicit?)
o push much of the i386 and amd64 MD interrupt handling code
(intr_machdep.c::intr_execute_handlers()) into MI code
(kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
(intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
and make them part of 'struct intr_event', passing them as arguments to
kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
with filter and ithread functions.
Approved by: re (implicit?)
and change it to a void function.
We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late
late segments arriving for such a connection is handled directly in the TW
code.
and show up with different names: first try to open provider using
remembered name and compare its ident, if equal, this is our provider,
if not equal or there is no provider with such name, find provider with
remembered ident and don't care about the name.
- Locks were not being unlocked when an invalid size chunk is
sent in.
- When a notification comes in, we cannot use it to look up
the fragment interleave stream information since its not
on a stream.
Seems to work on RELENG_4 through -current and also on sparc64
now. There may still be some issues with the auto attach/detach
code to sort out.
MFC after: 3 days
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory. The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used. Previously,
only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.
This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.
Discussed with: kmacy, marius, Nathan Whitehorn
PR: 112194
DIOCGFLUSH - Flush write cache (sends BIO_FLUSH).
DIOCGDELETE - Delete data (mark as unused) (sends BIO_DELETE).
DIOCGIDENT - Get provider's uniqe and fixed identifier (asks for
GEOM::ident attribute).
First two are self-explanatory, but the last one might not be. Here are
properties of provider's ident:
- ident value is preserved between reboots,
- provider can be detached/attached and ident is preserved,
- provider's name can change - ident can't,
- ident value should not be based on on-disk metadata; in other words
copying whole data from one disk to another should not yield the same
ident for the other disk,
- there could be more than one provider with the same ident, but only if
they point at exactly the same physical storage, this is the case for
multipathing for example,
- GEOM classes that consumes single providers and provide single providers,
like geli, gbde, should just attach class name to the ident of the
underlying provider,
- ident is an ASCII string (is printable),
- ident is optional and applications can't relay on its presence.
The main purpose for this is that application and remember provider's ident
and once it tries to open provider by its name again, it may compare idents
to be sure this is the right provider. If it is not (idents don't match),
then it can open provider by its ident.
OK'ed by: phk
- Make wlan_amrr depend on wlan, so that it can find various symbols in
wlan module if wlan is not compiled into kernel.
Approved by: sam (mentor)
Tested by: kevlo
- http://www.intel.com/design/chipsets/specupdt/245051.htm
AC97 Soft Audio and Soft Modem Master Abort Errata
Issue:
Use of either soft audio or soft modem on an Intel® 82443MX PCISet
based platform running a 100 MHz Processor System Bus and an AC97 codec
may result in failures. The system continues to function normally while
the AC97 hardware may not resume and may require a cold-boot to
recover. As a result of the failure, the Master Abort Status bit will
be set in the audio or modem function PCI header space.
Workaround:
Force uncacheable DMA on both BDL and pcm buffers.
Tested by: Emil Holmstr|m <emil@linux.se>
- Remove explicit call to pmap_change_attr(), since we now have proper
and functional definition of BUS_DMA_NOCACHE.
- Enable PCI(e) bus snooping for non i386/amd64 as an alternative for
uncacheable DMA.
- Codecs changes:
* Analag Device -> Analog Devices, AD1988.
* New codec: VIA VT1708 and VT1709, Realtek ALC262, ALC861-VD and
ALC885.
* Various fixups for Conexant Waikiki, fix recording (read: microphone)
on various Analog Devices codecs due to vendor BIOS mess, various
quirks for several ASUS laptops/boards.
- Fix connection list handling, closely following the specification to
handle range of nids.
- Basic Jack sense polling infrastructure for possible hardwares with
broken unsolicited response interrupt.
Ideas/Submitted/Tested by: Andriy Gapon <avg@icyb.net.ua>,
#freebsd-azalia, many.
state tcp_debug, tcp_debx. Acquire and drop as required in tcp_trace().
Move to ANSI C function header, correct prototype types so that short TCP
state is no longer promoted to int unnecessarily.
Add comments.
MFC after: 3 weeks
Updated copyright date to 2007.
Tested with BCM5706 A3.
Added ID for BCM5708 B2.
Removed unused driver version string.
Modified BCE_PRINTF macro to automatically fill-in the sc pointer.
Fixed a kernel panic when the driver was loaded as a module from the
command-line because the MII bus pointer was null (i.e. the MII bus
hadn't been enumerated yet).
Added fix proposed by Vladimir Ivanov <wawa@yandex-team.ru> to prevent
driver state corruption when releasing the lock during the ISR in
bce_rx_intr() to send packets up the stack.
Added new TX chain and register read sysctl interfaces for debugging.
Cleaned up formatting for various other debug routines.
Added a new statistic maintained by firmware which tracks the number
of received packets dropped because no receive buffers are available.
correct network drivers with respect to busmaster DMA, go over it
with at duster to make other aspects of it a role model:
Eliminate the pci specific softc, it serves no rational purpose.
Use convenience resource allocation/deallocation functions to save
code and errorhandling.
Switch from bus_space_{read|write}_%u() to bus_{read|write}_%u()
functions and forget about tags and handles, the resource will know
about those, should they be needed. This also eliminates a number
of inconsistently named local variables.
it was full and a collision occured, then we would leave
a inp locked. Also fixes a missing inp unlock if IPSEC was
on and it failed during the attach. Bug found by Weongyo Jeong.
as UF_OPENING. Disable closing of that entries. This should fix the crashes
caused by devfs_open() (and fifo_open()) dereferencing struct file * by
index, while the filedescriptor is closed by parallel thread.
Idea by: tegge
Reviewed by: tegge (previous version of patch)
Tested by: Peter Holm
Approved by: re (kensmith)
MFC after: 3 weeks
in comments for .c and .h files respectively. Jack may want to clean up
style or other aspects once he's up and about again, but this gets the
kernel compiling.
shared code infrastructure that is family specific and
modular. There is also support for our latest gigabit
nic, the 82575 that is MSI/X and multiqueue capable.
The new shared code changes some interfaces to the core
code but testing at Intel has been going on for months,
it is fairly stable.
I have attempted to be careful in retaining any fixes that
CURRENT had and we did not, I apologize in advance if any
thing gets clobbered, I'm sure I'll hear about it :)
Approved by pdeuskar
on each socket buffer with the socket buffer's mutex. This sleep lock is
used to serialize I/O on sockets in order to prevent I/O interlacing.
This change replaces the custom sleep lock with an sx(9) lock, which
results in marginally better performance, better handling of contention
during simultaneous socket I/O across multiple threads, and a cleaner
separation between the different layers of locking in socket buffers.
Specifically, the socket buffer mutex is now solely responsible for
serializing simultaneous operation on the socket buffer data structure,
and not for I/O serialization.
While here, fix two historic bugs:
(1) a bug allowing I/O to be occasionally interlaced during long I/O
operations (discovere by Isilon).
(2) a bug in which failed non-blocking acquisition of the socket buffer
I/O serialization lock might be ignored (discovered by sam).
SCTP portion of this patch submitted by rrs.
- Simplify the amount of work that has be done for each architecture by
pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
multiple MSI-X messages into a single IRQ when handling a message
shortage.
The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
to calculate the address and data values for a given MSI/MSI-X IRQ.
The x86 nexus drivers map this into a call to a new 'msi_map()' function
in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge
of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
Specifically, it now stores an array of IRQs (called "message vectors" in
the code) that have associated address and data values, and a small
virtual version of the MSI-X table that specifies the message vector
that a given MSI-X table entry uses. Sparse mappings are permitted in
the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
registers directly via custom bus_setup_intr() and bus_teardown_intr()
methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
address and data values for a given message as needed. The MD code
no longer has to call back down into the PCI bus code to set these
values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
new values of the address and data fields for a given IRQ. The x86
MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
since the only remaining diff between the two is a substring in a
bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed. Instead of accepting
indices for the allocated vectors, it accepts a mini-virtual table
(with a new length parameter). This table is an array of u_ints, where
each value specifies which allocated message vector to use for the
corresponding MSI-X message. A vector of 0 forces a message to not
have an associated IRQ. The device may choose to only use some of the
IRQs assigned, in which case the unused IRQs must be at the "end" and
will be released back to the system. This allows a driver to use the
same remap table for different shortage values. For example, if a driver
wants 4 messages, it can use the same remap table (which only uses the
first two messages) for the cases when it only gets 2 or 3 messages and
in the latter case the PCI bus will release the 3rd IRQ back to the
system.
MFC after: 1 month
set/clear it but would not do it. Now we will.
- Moved to latest socket api for extended sndrcv info struct.
- Moved to support all new levels of fragment interleave (0-2).
- Codenomicon security test updates - length checks and such.
- Bug in stream reset (2 actually).
- setpeerprimary could unlock a null pointer, fixed.
- Added a flag in the pcb so netstat can see if we are listening easier.
Obtained from: (some of the Listen changes from Weongyo Jeong)
pointers. A structure is more readable and less error-prone. It
also avoids problems when a function pointer doesn't have the
same width as a void pointer.
functions with CPUs they apply to only, otherwise default to the
plain C functions. This is modeled in a way so that f.e. a Cheetah
version of these functions can be inserted easily.
Not because I admit they are technically wrong and not because of bug
reports (I receive nothing). But because I surprisingly meets so
strong opposition and resistance so lost any desire to continue that.
Anyone who interested in POSIX can dig out what changes and how
through cvs diffs.
the UPA_IMR2 resource is also shared with/a subset of the Schizo PCI
bus B CSR bank. I'm not entirely sure how this previously managed to
escape testing...
consistent with the naming of other structure field members, and
reducing improper grep matches. Clean up and comment structure
fields in structure definition.
sc->mii_anegticks according to whether the respective BGE chip
supports Fast Ethernet only or also Gigabit Ethernet.
- At least the BGE chips I've tested with wedge when isolating them
so document this as the reason for setting MIIF_NOISOLATE and
remove the unused (and partially even #ifdef'ed out) isolation
related code. Add code that panics if we encounter a non-zero MII
instance as generally there's no way a PHY requiring MIIF_NOISOLATE
can be handled gracefully in a multi-PHY configuration (it's ok for
the internal PHY of single-PHY-only-NIC to not support isolation
though).
- Additionally set MIIF_NOLOOP as loopback doesn't seem to work
either and remove the #ifdef'ed out code for adding respective
media. The MIIF_NOLOOP flag currently triggers nothing but
hopefully will be respected by mii_phy_setmedia() later on.
Reviewed by: jkim, yongari
MFC after: 1 month
Blade 2500, Fire V210 and probably some other sparc64 machines.
These chips are typically not fitted with an EEPROM which means
that we have to obtain the MAC address via OFW and that some chip
tests will just always fail.
These changes are based on the respective code found in OpenBSD
with some additional info obtained from OpenSolaris and some style
suggestions by jkim@. They also have the desired side-effect of
respecting the 'local-mac-address?' system configuration variable
for the affected BGEs.
- In bge_attach() factor out calling bge_release_resources() before
going to the fail label into the fail label as well as replace a
magic 6 with ETHER_ADDR_LEN.
Reviewed by: yongari (before style changes), jkim
- Wake up DMA engine after adding a new receive buffer.
- Skip buffers which have unknown state after error.
- More rigid error detection.
MFC after: 1 week
as some combinations of chipset, controller and target do not behave
correctly when DMA is enabled for other commands.
PR: kern/103602
MFC after: 2 weeks
were never freed, but the big ring was freed twice.
-Don't supply rx hw csums for frames which are padded beyond the
length specified in the ip header. If the padding is non-zero,
the hw csum will be incorrect for such frames.
Sponsored by: Myricom
non-mapped data as possible at once and not page-by-page. Which this change we
combain I/Os, but also saves many VM_OBJECT_UNLOCK()/VM_OBJECT_LOCK()
operations.
Simple 'fsx -l 33554432 -o 524288 -N 10000 /tank/fsx' test shows ~23%
performance increase.
This workaround the problem in Parallels/VMWare where the emulated drivers are
slower, especially with ATA_FLUSHCACHE. The problem appears much more
frequently with ZFS which use it a lot more.
Approved: sos, pjd
- vm_page_undirty() is enough (instead of vm_page_set_validclean()), but it has
to be called before we write the data in case someone makes page dirty after
our write, but before our vm_page_undirty() call.
- Always dmu_write, not matter if uiomove() succeeded, because it could
partially be ok and we would lose some changes.
All good ideas from: ups
In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp
vnode may be VI_DOOMED due to one of the following:
- other thread finished unmount and vput()ed it, and vnode was chosen
for recycling, while vn_lock() slept;
- forced unmount of the coveredvp->v_mount fs.
In the first case, next check for changed v_mountedhere or mnt_gen counter
would be successfull. In the second case, the unmount shall be allowed.
Submitted by: sobomax
MFC after: 2 weeks
- Fix for a bug where a close would not wait for all (directio)
dirty buffers to drain. The nfsnode was not marked NMODIFIED
when there were directio dirtied buffers pending, causing this.
- No reason to vhold/vrele the vp when enqueueing DirectIO requests
for the nfsiods. The vnode can't really go way since the close
has to wait for these requests to drain.
MFC after: 1 week
Submitted by: mohans
specific request and thus should first try to be allocated from the
sys_resource pool. This avoids using the sys_resource pool for wildcard
requests that have bounded ranges coming from cbb(4) and Host-PCI pcib(4)
drivers.
Tested by: Andrea Bittau <a.bittau of cs.ucl.ac.uk fame>
Sleuthing by: Andrea Bittau as well
that the MSI mapping window is fixed at 0xfee00000 and the capability
does not include two more dwords used to program the address. Supporting
this mostly results in quieting spurious warnings during boot about
non-default MSI mapping windows.
- HT 2.00b also added a new HT capability type, so support that in pciconf.
MFC after: 3 days
Tested by: jmg
It seems that valid pause frames(Tx flow control) cause GMAC to hang
such that it resulted in watchdog timeout. As a work around don't
flush Rx MAC FIFO if we've received pause frames.
Tested by: Harald Schmalzbauer (h DOT schmalzbauer AT omnisec DOT de)
Under certain circumtances, if TSO is active, Yukon II generates
corrupted IP packets. All corrupted IP packets I noticed were the the
last segmented packet in a TSO request. The corrupted packet resulted
in retransmission of the damaged packet which in turn decreased network
performance dramatically.
Unfortunately it seems that there is no way to workaround this bug
as TSO is completely handled in hardware. Disable TSO until we find a
working workaround or a new silicon revision that doesn't have this
hardware bug.
fault. The previous method zero'd out the page tables, invalidated the
TLB, and then entered a spin loop. The idea was that the instruction after
the TLB invalidate would result in a page fault and the page fault and
subsequent double fault wouldn't be able to determine the physical page
for their fault handlers' first instruction. This stopped working when
PGE (PG_G PTE/PDE bit) support was added as a TLB invalidate via %cr3
reload doesn't clear TLB entries with PG_G set. Thus, the CPU was still
able to map the virtual address for the spin loop and happily performed
its infinite loop.
The triple fault now uses a much more deterministic sledge-hammer approach
to generate a triple fault. First, the IDT descriptor is set to point to
an empty IDT, so any interrupts (including a double fault) will instantly
fault. Second, we trigger a int 3 breakpoint to force an interrupt and
kick off a triple fault.
MFC after: 3 days
in all other file system on FreeBSD (instead from inactive() method).
A nice side-effect of this change, except that it speedups file system
when mmaped file are often open/closed, is that it makes FreeBSD's
namecache work:)
This fixes slow operations on mmaped files, because without this fix,
pages were written to disk multiple times.
If one is looking for even greater speed up for such operation, he should
disable ZIL (by setting vfs.zfs.zil_disable to 1 in /boot/loader.conf).
Disabling ZIL makes fsx run ~9 times faster.
supports software encrypt/decrypt.
The nuked code itself is quite problematic, as pointed out by sam@ ---
wk->wk_keyix should be replaced by the loop count.
Tested with WEP/TKIP/CCMP/no-protection.
Approved by: sam@ (mentor)
Noticed by: Hans Petter Selasky <hselasky@c2i.net>
o Fix linewrap issues.
o Fix two typos (s/Recomended/Recommended/ and s/tunning/tuning/)
o Remove a couple of extra instances of the word "of".
o Update names of kmem_size variables.
Approved by: pjd
where similar data structures exist to support devfs and the MAC
Framework, but are named differently.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.
by Philippe Biondi and Arnaud Ebalard. This is a temporary fix
until more discussion can be had on the exact risks involved in
allowing source routing in IPv6
Submitted by: itojun
Reviewed by: jinmei
MFC after: 1 day
- Move FreeBSD-specific code to zfs_freebsd_*() functions in zfs_vnops.c
and keep original functions as similar to vendor's code as possible.
- Add various includes back, now that we have them.
macro, as za_first_integer field also contains type. This should be fixed in
ZFS itself, but this bug is not visible on Solaris, because there, type is
not stored in za_first_integer. On the other hand it will be visible on
MacOS X.
Reported by: Barry Pederson <bp@barryp.org>
variable name conventions for arguments passed into the framework --
for example, name network interfaces 'ifp', sockets 'so', mounts 'mp',
mbufs 'm', processes 'p', etc, wherever possible. Previously there
was significant variation in this regard.
Normalize copyright lists to ranges where sensible.
labels: the mount label (label of the mountpoint) and the fs label (label
of the file system). In practice, policies appear to only ever use one,
and the distinction is not helpful.
Combine mnt_mntlabel and mnt_fslabel into a single mnt_label, and
eliminate extra machinery required to maintain the additional label.
Update policies to reflect removal of extra entry points and label.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.
the introduction of priv(9) and MAC Framework entry points for privilege
checking/granting. These entry points exactly aligned with privileges and
provided no additional security context:
- mac_check_sysarch_ioperm()
- mac_check_kld_unload()
- mac_check_settime()
- mac_check_system_nfsd()
Add mpo_priv_check() implementations to Biba and LOMAC policies, which,
for each privilege, determine if they can be granted to processes
considered unprivileged by those two policies. These mostly, but not
entirely, align with the set of privileges granted in jails.
Obtained from: TrustedBSD Project
- Redistribute counter declarations to where they are used, rather than at
the file header, so it's more clear where we do (and don't) have
counters.
- Add many more counters, one per policy entry point, so that many
individual access controls and object life cycle events are tracked.
- Perform counter increments for label destruction explicitly in entry
point functions rather than in LABEL_DESTROY().
- Use LABEL_INIT() instead of SLOT_SET() directly in label init functions
to be symmetric with destruction.
- Align counter names more carefully with entry point names.
- More constant and variable name normalization.
Obtained from: TrustedBSD Project
- Add a more detailed comment describing the mac_test policy.
- Add COUNTER_DECL() and COUNTER_INC() macros to declare and manage
various test counters, reducing the verbosity of the test policy
quite a bit.
- Add LABEL_CHECK() macro to abbreviate normal validation of labels.
Unlike the previous check macros, this checks for a NULL label and
doesn't test NULL labels. This means that optionally passed labels
will now be handled automatically, although in the case of optional
credentials, NULL-checks are still required.
- Add LABEL_DESTROY() macro to abbreviate the handling of label
validation and tear-down.
- Add LABEL_NOTFREE() macro to abbreviate check for non-free labels.
- Normalize the names of counters, magic values.
- Remove unused policy "enabled" flag.
Obtained from: TrustedBSD Project
set/clear it but would not do it. Now we will.
- Moved to latest socket api for extended sndrcv info struct.
- Moved to support all new levels of fragment interleave.
calls. Add MAC Framework entry points and MAC policy entry points for
audit(), auditctl(), auditon(), setaudit(), aud setauid().
MAC Framework entry points are only added for audit system calls where
additional argument context may be useful for policy decision-making; other
audit system calls without arguments may be controlled via the priv(9)
entry points.
Update various policy modules to implement audit-related checks, and in
some cases, other missing system-related checks.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.
- Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD.
- Use PRIV_NFS_DAEMON in the NFS server.
- In the NFS client, move the privilege check from nfslockdans(), which
occurs every time a write is performed on /dev/nfslock, and instead do it
in nfslock_open() just once. This allows us to avoid checking the saved
uid for root, and just use the effective on open. Use PRIV_NFS_LOCKD.
@118370 Correct typo.
@118371 Integrate changes from vendor.
@118491 Show backtrace on unexpected code paths.
@118494 Integrate changes from vendor.
@118504 Fix sendfile(2). I had two ways of fixing it:
1. Fixing sendfile(2) itself to use VOP_GETPAGES() instead of
hacking around with vn_rdwr(UIO_NOCOPY), which was suggested
by ups.
2. Modify ZFS behaviour to handle this special case.
Although 1 is more correct, I've choosen 2, because hack from 1
have a side-effect of beeing faster - it reads ahead MAXBSIZE
bytes instead of reading page by page. This is not easy to implement
with VOP_GETPAGES(), at least not for me in this very moment.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
@118525 Reorganize the code to reduce diff.
@118526 This code path is expected. It is simply when file is opened with
O_FSYNC flag.
Reported by: kris
Reported by: Michal Suszko <dry@dry.pl>
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.
Approved by: njl (mentor)
Reviewed by: alc
from the incoming SYN handling section of tcp_input().
Enforcement of the accept queue limits is done by sonewconn() after the
3WHS is completed. It is not necessary to have an earlier check before a
connection request enters the SYN cache awaiting the full handshake. It
rather limits the effectiveness of the syncache by preventing legit and
illegit connections from entering it and having them shaken out before we
hit the real limit which may have vanished by then.
Change return value of syncache_add() to void. No status communication
is required.
when the ACK is invalid and doesn't belong to any registered connection,
either in syncache or through SYN cookies. True but a NULL struct socket
is returned when the 3WHS completed but the socket could not be created
due to insufficient resources or limits reached.
For both cases an RST is sent back in tcp_input().
A logic error leading to a panic is fixed where syncache_expand() would
free the mbuf on socket allocation failure but tcp_input() later supplies
it to tcp_dropwithreset() to issue a RST to the peer.
Reported by: kris (the panic)
when one of links is inactive and have stale sequence number. To avoid
this sequence numbers of all links are getting updated on every
successful packet reassembling.
- ng_ppp_bump_mseq function created to simplify code.
- ng_ppp_frag_drop function separated from ng_ppp_frag_process to
simplify code.
Reviewed by: archie
Approved by: glebius (mentor)
which lead to ineffective multilink packet distribution plans.
- Changed bytesInQueue calculation math to have more precise information
about links utilization.
- Taken rough account of the link overhead. Better way to do it could be to
get exact overhead from user-level, but I have not done it to keep
binary compatibility.
Reviewed by: archie
Approved by: glebius (mentor)
be applied to dev entries. This leaves us with file times like "Jan 1 1970."
Work around this problem by replacing the tv_sec == 0 check with a
<= 3600 check. It's doubtful anyone will be booting within an hour of the
Epoch, let alone care about a few seconds worth of nonzero timestamps. It's
a hackish work around, but it does work and I have not experienced any
negatives in my testing.
Discussed with: bde
"Ok with me: phk
and new SCBs were allocated on demand later if needed. This has two
problems. First, allocating SCBs involves allocating contiguous memory,
and if memory is exhausted then the VM will try to page out to satisfy
the request, leading to recursion and deadlock. The second problem is
that it can cause lock order reversals due to parts of the VM still being
under Giant.
Fix the problem be allocating the full pool at driver attach, when it is
safe to do so.
1. CMSG_NXTHDR(mhdr, cmsg) is supposed to dereference cmsg and return
the next header in the chain. If cmsg is NULL it should return
the first header, behaving essentially like CMSG_FIRSTHDR().
2. inet6_rth_(space|init|add) should do basic checking on their input
to verify that the number of headers (segments) is
between 0 and 127 inclusive.
MFC-After: 1 month
and should only be applied on certain specific card / vendor, hence the
addition of ac97_getsubvendor().
- Fix low volume issue on several MSI laptops through ALC655 quirk.
Reported/Tested by: Christian Mueller
<raptor-freebsd-multimedia@xpls.de>
MFC after: 1 week
- For ural(4):
o Fix node leakage in ural_start(), if ural_tx_mgt() fails.
o Fix mbuf leakage in ural_tx_{mgt,data}(), if usbd_transfer() fails.
o In ural_tx_{mgt,data}(), set ural_tx_data.{m,ni} to NULL, if
usbd_transfer() fails, so they will not be freed again in ural_stop().
Approved by: sam (mentor)
- Removed free-oqueue cache.
- Fix counter for sq entries
- Increased the amount of information retained
on ASOC_TSN logging on the association.
- Made it so with the ASOC_TSN logging on
sending or recieving an abort we dump the log.
- Went through and added invariant's around some
panic's that needed them.
- decrements went to atomic_subtact_int instead of add -1
- Removed residual count increment that threw off a
strm oq count.
- Tracks and complaints if we don't have a LAST fragment and
clean up the sp structure.
- Track a new stat that counts number of abandoned msgs that
happen if you close without reading.
- Fix lookup of frag point to be aware of a 0 assoc-id.
Reviewed by: gnn
Group mutexes used in hwpmc(4) into 3 "types" in the sense of
witness(4):
- leaf spin mutexes---only one of these should be held at a time,
so these mutexes are specified as belonging to a single witness
type "pmc-leaf".
- `struct pmc_owner' descriptors are protected by a spin mutex of
witness type "pmc-owner-proc". Since we call wakeup_one() while
holding these mutexes, the witness type of these mutexes needs
to dominate that of "sleepq chain" mutexes.
- logger threads use a sleep mutex, of type "pmc-sleep".
Submitted by: wkoszek (earlier patch)
When nbytes=0, sendfile(2) should use file size. Because of the bug, it
was sending half of a file. The bug is that 'off' variable can't be used
for size calculation, because it changes inside the loop, so we should
use uap->offset instead.
contigmalloc2() was always testing the first physical page for PG_ZERO,
not the current page of interest.
Submitted by: Michael Plass
PR: 81301
MFC after: 1 week
gets a bogus irq storm detected when periodic daily kicks off at 3 am
and disconnects the disk. Change the print logic to print once per second
when the storm is occurring instead of only once. Otherwise, it appeared
that something else was causing the errors each night at 3 am since the
print only occurred the first time.
Reviewed by: jhb
MFC after: 1 week
on a snapshot directory:
- Remove PRIV_VFS_MOUNT check - regular users can mount snapshots
via lookups on snapshot directory.
- Reset mount credential to kcred, so user won't be able to unmount
the snapshot.
- Reset owner uid.
- Unlock vnode in case of a failure.
Reported by: simokawa
Previously whenever PROMISC mode turned on/off link renegotiation
occurs and it could resulted in network unavailability for serveral
seconds.(Depending on switch STP settings it could last several tens
seconds.)
Reported by: Prokofiev S.P. < proks AT logos DOT uptel DOT net >
Tested by: Prokofiev S.P. < proks AT logos DOT uptel DOT net >
This fixes stange panics when listing .zfs/snapshot/ directory for me.
Reported by: simokawa
Reported by: Johan Hendriks <Johan@double-l.nl>
- Hide cache_purge() under FREEBSD_NAMECACHE like in other files.
- Protect mnt_flag with mount interlock.
to free the oldest entry in the current bucket row. The global
entry limit may be smaller than the bucket rows and their limit
combined however. Thus only try to free a syncache entry if we
found one in this bucket row.
Reported by: kris
to move up the start address until the allocation succeeds. If the
alignment of the resource was 0, then the code would keep trying the same
request in an infinite loop and hang. Force the request to always move
start up by at least 1 byte each time through the loop.
The 6105M and 6102 does not have the DWORD alignment problem, so
don't m_defrag() every packet in the transmit path for those.
More stringent usage of tx-descriptor ring and its flags.
Tested on 6102 and 6105M, other chips may also be able to run
without the m_defrag() but I have neither hardware nor docs to
find out.
Sponsored by: Soekris Engineering
"zone", which is generally not present in zone names. This reduces the
incidence of line-wrapping in "vmstat -z " using 80-column displays.
MFC after: 3 days
The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.
The lagg(4) driver provides link aggregation, failover and fault tolerance.
Discussed on: current@
unload instead of returning EBUSY. This check tells if there are mounted
ZFS file systems or not. We can't unload if there are mounted file systems.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
requests where uio_offset is not 0 to begin with. This fixes a long-
standing bug where e.g. 'cat /proc/$$/regs' would loop forever.
MFC after: 3 weeks
- Fix bug that prevented EEOR mode from working
and simplified the can_we_split code in the process.
- Reduce lock contention for the tcb_send_lock. I did
this especially for EEOR mode, still need to look at
why I need a lock when removing from the tailq and the
->next is NOT null. A lock fixes it but it implies a
bug yet exists.
- Activated Andre's proposed changes to better use the mbuf
infrastructure.
- Fixed places that were not using the aloc macro's to take
advantage of the per assoc cache.
- Adds ifdef fix so any logging will enable stat_logging to
get the right data structures in place (suggested by Max Laier).
use to synchornize and protect all data objects that are used for that
SIM. Drivers that are not yet MPSAFE register Giant and operate as
usual. RIght now, no drivers are MPSAFE, though a few will be changed
in the coming week as this work settles down.
The driver API has changed, so all CAM drivers will need to be recompiled.
The userland API has not changed, so tools like camcontrol do not need to
be recompiled.
implement robust version of m_collapse
add support for sf_buf
add fix for m_iovappend
add calls to m_sanity under INVARIANTS
fix m_freem_vec to correctly travese the mbuf iovec chain
The pfs_info mutex is only needed to lock pi_unrhdr. Everything else
in struct pfs_info is modified only while Giant is held (during
vfs_init() / vfs_uninit()); add assertions to that effect.
Simplify pfs_destroy somewhat.
Remove superfluous arguments from pfs_fileno_{alloc,free}(), and the
assertions which were added in the previous commit to ensure they were
consistent.
Assert that Giant is held while the vnode cache is initialized and
destroyed. Also assert that the cache is empty when it is destroyed.
Rename the vnode cache mutex for consistency.
Fix a long-standing bug in pfs_getattr(): it would uncritically return
the node's pn_fileno as st_ino. This would result in st_ino being 0
if the node had not previously been visited by readdir(), and also in
an incorrect st_ino for process directories and any files contained
therein. Correct this by abstracting the fileno manipulations
previously done in pfs_readdir() into a new function, pfs_fileno(),
which is used by both pfs_getattr() and pfs_readdir().
- Reduce default number of spa_zio_* threads to N*spa_zio_issue
plus N*spa_zio_intr threads per ZIO type, where N is the number
of CPUs.
- Put ZIO type number in thread's name.
sendmsg() while using a 0-length msg_controllen. This isn't allowed in
the FreeBSD system call ABI, so detect this case and set msg_control to
NULL. This allows Linux ping to work.
Submitted by: rdivacky
- name change of prefered -> preferred
- CMT fast recover code added.
- Comment fixes in CMT.
- We were not giving a reason of cant_start_asoc per socket api
if we failed to get init/or/cookie to bring up an assoc. Change
so we don't just give a generic "comm lost" but look at actual
states of dying assoc.
- change "crc32" arguments to "crc32c" to silence strict/noisy
compiler warnings when crc32() is also declared
- A few minor tweaks to get the portable stuff truely portable
for sctp6_usrreq.c :-D
- one-2-one style vrf match problem.
- window recovery would leave chks marked for retran
during window probes on the sent queue. This would then
cause an out-of-order problem and assure that the flight
size "problem" would occur.
- Solves a flight size logging issue that caused rwnd
overruns, flight size off as well as false retransmissions.g
- Macroize the up and down of flight size.
- Fix a ECNE bug in its counting.
- The strict_sacks options was causing aborts when window probing
was active, fix to make strict sacks a bit smarter about what
the next unsent TSN is.
- Fixes a one-2-one wakeup bug found by Martin Kulas.
- If-defed out form, Andre's copy routines pending his
commit of at least m_last().. need to adjust for 6.2 as
well.. since m_last won't exist.
Reviewed by: gnn
which has already been freed by in_ifdetach(). With this cumulative change,
the removal of a member interface will not cause a panic in pfsync(4).
Requested by: yar
PR: 86848
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
for jailed root from different jails, etc.
OK'ed by: rwatson
than 2GB of RAM. This was because our physmem is long and 'physmem*PAGESIZE'
can be negative for more than 2GB of memory.
Reported by: Andrey V. Elsukov <bu7cher@yandex.ru>
It is not yet tested by Andrey, so there can be other problems, but this
was definiately a bug, so I'm committing a fix now.
tokens. Currently, we do not support the set{get}audit_addr(2) system
calls which allows processes like sshd to set extended or ip6
information for subject tokens.
The approach that was taken was to change the process audit state
slightly to use an extended terminal ID in the kernel. This allows
us to store both IPv4 IPv6 addresses. In the case that an IPv4 address
is in use, we convert the terminal ID from an struct auditinfo_addr to
a struct auditinfo.
If getaudit(2) is called when the subject is bound to an ip6 address,
we return E2BIG.
- Change the internal audit record to store an extended terminal ID
- Introduce ARG_TERMID_ADDR
- Change the kaudit <-> BSM conversion process so that we are using
the appropriate subject token. If the address associated with the
subject is IPv4, we use the standard subject32 token. If the subject
has an IPv6 address associated with them, we use an extended subject32
token.
- Fix a couple of endian issues where we do a couple of byte swaps when
we shouldn't be. IP addresses are already in the correct byte order,
so reading the ip6 address 4 bytes at a time and swapping them results
in in-correct address data. It should be noted that the same issue was
found in the openbsm library and it has been changed there too on the
vendor branch
- Change A_GETPINFO to use the appropriate structures
- Implement A_GETPINFO_ADDR which basically does what A_GETPINFO does,
but can also handle ip6 addresses
- Adjust get{set}audit(2) syscalls to convert the data
auditinfo <-> auditinfo_addr
- Fully implement set{get}audit_addr(2)
NOTE: This adds the ability for processes to correctly set extended subject
information. The appropriate userspace utilities still need to be updated.
MFC after: 1 month
Reviewed by: rwatson
Obtained from: TrustedBSD
- Tune number of namecache entires better (based on desiredvnodes).
- Handle vfs_lowvnodes event by releasing requested number of name cache
entries, but no less than 5%.
Reported by: simokawa
specific nodes when the process exits)
Move the vnode-cache-walking loop which was duplicated in pfs_exit() and
pfs_disable() into its own function, pfs_purge(), which looks for vnodes
marked as dead and / or belonging to the specified pfs_node and reclaims
them. Note that this loop is still extremely inefficient.
Add a comment in pfs_vncache_alloc() explaining why we have to purge the
vnode from the vnode cache before returning, in case anyone should be
tempted to remove the call to cache_purge().
Move the special handling for pfstype_root nodes into pfs_fileno_alloc()
and pfs_fileno_free() (the root node's fileno must always be 2). This
also fixes a bug where pfs_fileno_free() would reclaim the root node's
fileno, triggering a panic in the unr code, as that fileno was never
allocated from unr to begin with.
When destroying a pfs_node, release its fileno and purge it from the
vnode cache. I wish we could put off the call to pfs_purge() until
after the entire tree had been destroyed, but then we'd have vnodes
referencing freed pfs nodes. This probably doesn't matter while we're
still under Giant, but might become an issue later.
When destroying a pseudofs instance, destroy the tree before tearing
down the fileno allocator.
In pfs_mount(), acquire the mountpoint interlock when required.
MFC after: 3 weeks
a single conditional. The two operations are linked, but since the link
is not very direct, Coverity can't see it. Humans might also miss the
link as well. So, this isn't fixing any actual bugs, just improving
readability.
CID: 1787 (likely others as well)
Found by: Coverity Prevent (tm)
directly to a merged model where only one callout, the next to fire,
is registered.
Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.
The single new callout is a mutex callout on inpcb simplifying the
locking a bit.
tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.
Reviewed by: rwatson (earlier version)
Yukon II generated corrupted TCP checksum for short TCP packets
that's less than 60 bytes in size(e.g. window probe packet, pure ACK
packet etc). Padding the frame with zeros to make the frame minimum
ethernet frame size didn't work at all. Instead of dropping Tx
checksum offload support we calculate TCP checksum with S/W method
when we encounter short TCP frames.
Fortunately it seems that short UDP datagrams appear to be handled
correctly by Yukon II.
While I'm here simplify ethernet/VLAN header size calculation logic.
PR: 111384
popular names. Hence:
- comment current index() and rindex() functions, as these serve the same
functionality as, respectively, strchr() and strrchr() from userland;
- add inlined version of strchr() and strrchr(), as we tend to use them more
often;
- remove str[r]chr() definitions from ZFS code;
Reviewed by: pjd
Approved by: cognet (mentor)
front-end if the dpt(4) module is built along with a kernel that
includes eisa(4) or when compiling it stand-alone (logic based on
the corresponding ISA logic in sys/modules/sound/sound/Makefile).
As as side-effect this fixes the stand-alone build of the dpt(4)
module after dpt.h 1.17, dpt_eisa.c 1.22 and dpt_scsi.c 1.55.
Breakage reported by: n_hibma
does not prevent handle_workitem_remove() from recursing into a blocking
version. Add the dirrem to worklist instead of processing it now if this
is the case.
Reported and tested by: kris
Submitted by: tegge
MFC after: 2 weeks
- Allow to shrink ARC down to 16MB (instead of 64MB).
- Set arc_max to 1/2 of kmem_map by default.
- Start freeing things earlier when low memory situation is detected.
- Serialize execution of arc_lowmem().
I decided to setup minimum ZFS memory requirements to 512MB of RAM and 256MB of
kmem_map size. If there is less RAM or kmem_map, a warning will be printed.
World is cruel, be no better. In other words: modern file system requires
modern hardware:)
From ZFS administration guide:
"Currently the minimum amount of memory recommended to install a Solaris
system is 512 Mbytes. However, for good ZFS performance, at least one
Gbyte or more of memory is recommended."
tolerance. This driver allows aggregation of multiple network interfaces as
one virtual interface using a number of different protocols/algorithms.
failover - Sends traffic through the secondary port if the master becomes
inactive.
fec - Supports Cisco Fast EtherChannel.
lacp - Supports the IEEE 802.3ad Link Aggregation Control Protocol
(LACP) and the Marker Protocol.
loadbalance - Static loadbalancing using an outgoing hash.
roundrobin - Distributes outgoing traffic using a round-robin scheduler
through all active ports.
This code was obtained from OpenBSD and this also includes 802.3ad LACP support
from agr(4) in NetBSD.
the args for hash32_stre and hash32_strne but there are no consumers in the
base system and openbgpd does not use it which the initial import was for.
Silence on: hackers
I converted allprison_mtx mutex to allprison_lock sx lock. To fix this LOR,
move prison removal to prison_complete() entirely. To ensure that noone
will reference this prison before it's beeing removed from the list skip
prisons with 'pr_ref == 0' in prison_find() and assert that pr_ref has to
greater than 0 in prison_hold().
Reported by: kris
OK'ed by: rwatson
Linux SCSI SG passthrough device API. The intention is to allow for both
running of Linux apps that want to talk to /dev/sg* nodes, and to facilitate
porting of apps from Linux to FreeBSD. As such, both native and linuxolator
entry points and definitions are provided.
Caveats:
- This does not support the procfs and sysfs nodes that the Linux SG
driver provides. Some Linux apps may rely on these for operation,
others may only use them for informational purposes.
- More ioctls need to be implemented.
- Linux uses a naming scheme of "sg[a-z]" for devices, while FreeBSD uses a
scheme of "sg[0-9]". Devfs aliasis (symlinks) are automatically created
to link the two together. However, tools like camcontrol only see the
native names.
- Some operations were originally designed to return byte counts or other
data directly as the syscall return value. The linuxolator doesn't appear
to support this well, so this driver just punts for these cases.
Now that the driver is in place, others are welcome to add missing
functionality. Thanks to Roman Divacky for pushing this work along.
StartMediaTx message before an OpnRcvChnAck message was received.
Reviewed by: glebius
Approved by: glebius (mentor)
MFC after: 3 days
Found with: Coverity Prevent(tm)
CID: 498
- Added magic numbers to pretend the NEC original program version
2.70.
- Added string display routine with Shift-JIS code support.
- Added three nop instructions at start1 in start.s since the
installaer of the IPLware put 'call $0x09ab' instruction.
- Put the near return instruction at 0x9ab in selector.s.
Since the Shit-JIS display routine must be located at 0x1243, the
linker script file (ldscript) is applied.
bootinfo variable declaration visible. It conflicts with static
declaration in this file. Declare variable as globally visible in
order to resolve the conflict.
anymore. Previously it tried to access interrupt register to disable
interrupts which could result in hang if the hardware was not
properly initialized by system BIOS/ACPI.
Tested by: Benjamin Hansmann (benjamin.hansmann AT rub dot de)
MFC after: 3 days
ZFS file system was ported from OpenSolaris operating system. The code in under
CDDL license.
I'd like to thank all SUN developers that created this great piece of software.
Supported by: Wheel LTD (http://www.wheel.pl/)
Supported by: The FreeBSD Foundation (http://www.freebsdfoundation.org/)
Supported by: Sentex (http://www.sentex.net/)
It may be used for external modules to attach some data to jail's in-kernel
structure.
- Change allprison_mtx mutex to allprison_sx sx(9) lock.
We will need to call external functions while holding this lock, which may
want to allocate memory.
Make use of the fact that this is shared-exclusive lock and use shared
version when possible.
- Implement the following functions:
prison_service_register() - registers a service that wants to be noticed
when a jail is created and destroyed
prison_service_deregister() - deregisters service
prison_service_data_add() - adds service-specific data to the jail structure
prison_service_data_get() - takes service-specific data from the jail
structure
prison_service_data_del() - removes service-specific data from the jail
structure
Reviewed by: rwatson
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.
A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.
There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.
Reviewed by: rwatson
The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(),
which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg.
EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before'
and 'after' arguments, so we have some accidental values there. This bascially
was causing this condition to be meet:
if ((rahead + rbehind) >
((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) {
pagedaemon_wakeup();
[...]
}
(we have some random values in rahead and rbehind variables)
I'm not entirely sure this is the right fix, maybe we should just return FALSE
in vnode_pager_haspage() when VOP_BMAP() fails?
alc@ knows about this problem, maybe he will be able to come up with a better
fix if this is not the right one.
tcp_input() to syncache_socket() where it belongs and the majority
of it already happens.
The "tp->snd_up = tp->snd_una" is removed as it is done with the
tcp_sendseqinit() macro a few lines earlier.
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.
- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.
- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.
- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).
- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.
In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).
Tested by: kris
Discussed with: jhb, kris, attilio, jeff
Defer mbuf allocation and initialization until after data has already been
received in a cluster
This reduces cpu utilization somewhat, but it only improves the rx path.
Recent changes to TCP appear to make us rate limited by the TX path.
This is the first step in reducing mbuf management overhead for manipulating
clusters.
MFC after: 3 days
they have been reported back to the userland as being in 1970.
Add boot time to the timestamp to give the time in the scale of the 'current'
real timescale. Not perfect if you change the time a lot but good enough
to keep all the rules correct relative to each other correct in terms
of time relative to "now".
- fixed a refcount bug in the new ifa structures.
- use vrf's from default stcb or inp whenever possible.
- Address limits raised to account for a full IP fragmented
packet (1000 addresses).
- flight size correcting updated to include one message only
and to handle case where the peer does not cumack the
next segment aka lists 1/1 in sack blocks..
- Various bad init/init-ack handling could cause a panic
since we tried to unlock the destroyed mutex. Fixes
so we properly exit when we need to destroy an assoc.
(Found by Cisco DevTest team :D)
- name rename in src-addr-selection from pass to sifa.
- route structure typedef'd to allow different platforms
and updated into sctp_os_bsd file.
- Max retransmissions a chunk can be made added.
Reviewed by: gnn
been defragged and had their headers in the same cluster as their
payload would be fed to the NIC in header-sized chunks, and would
likely exceed the number of available transmit descriptors.
- If a TSO frame exceeds the number of available transmit descriptors,
don't leak busdmma resources when freeing it.
Sponsored by: Myricom Inc.
in the putc() method. Likewise, in the getc() method, don't check for
received characters with an interval defined in terms of the baudrate.
In both cases it works equally well to implement a fixed delay. More
importantly, it avoids calculating a delay that's roughly 1/10th the
time it takes to send/receive a character. The calculation is costly
and happens for every character sent or received, affecting low-level
console or debug port performance significantly. Secondly, when the
RCLK is not available or unreliable, the delays could disrupt normal
operation.
The fixed delay is 1/10th the time it takes to send a character at
230400 bps.
it obtained through the uart_class structure. This allows us
to declare the uart_class structure as weak and as such allows
us to reference it even when it's not compiled-in.
It also allows is to get the uart_ops structure by name, which
makes it possible to implement the dt tag handling in uart_getenv().
The side-effect of all this is that we're using the uart_class
structure more consistently which means that we now also have
access to the size of the bus space block needed by the hardware
when we map the bus space, eliminating any hardcoding.
- Close the new file objects created during socketpair() if the copyout of
the new file descriptors fails.
- Add a test to the socketpair regression test for this edge case.
sysent' for a new system call into a new MAKE_SYSENT() macro.
- Use MAKE_SYSENT() to build a full sysent for the nfssvc system call in
the NFS server and use syscall_register() and syscall_deregister() to
manage the nfssvc system call entry instead of manually frobbing the
sysent[] array.
file descriptor is closed out from under us in kern_open(). This race
is already handled and the file will be closed when kern_open() does an
fdrop just before returning.
execution should help us avoiding potential deadlock and illegal locking
while sleeping in various mixer -> usb calls. To enable it, use
hint.uaudio.%d.async="1" or sysctl dev.uaudio.%d.async=1. Default is
disable, to remain compatible with old behaviour (with slight risk of
potential deadlock).
When the linux port changes were imported which split the
target command list to be separate from the initiator command
list and the handle format changed to encode a type in the handle
the implications to the function isp_handle_index (which only
the NetBSD/OpenBSD/FreeBSD ports use) were overlooked.
The fault is twofold: first, the index into the DMA maps
in isp_pci is wrong because a target command handle with
the type bit left in place caused a bad index (and panic)
into dma map. Secondly, the assumption of the array
of DMA maps in either PCS or SBUS attachment structures is
that there is a linear mapping between handle index and
DMA map index. This can no longer be true if there are
overlapping index spaces for initiator mode and target
mode commands.
These changes bandaid around the problem by forcing us
to not have simultaneous dual roles and doing the appropriate
masking to make sure things are indexed correctly. A longer
term fix is being devloped.
vfs_flags field is used for VFCF_* flags which are given at file system
driver creation time (via VFS_SET(9)) macro.
What this code did was bascially this:
If file system registers itself with VFCF_UNICODE flag (stores file names
as Unicode), it will gain MNT_SOFTDEP flag (UFS soft-updates).
If file system registers itself with VFCF_LOOPBACK flag (aliases some other
mounted FS), it will gain MNT_SUIDDIR flag (special handling of SUID on
dirs).
The latter will be quite dangerous, but those flags are reset later in
vfs_domount().
MFC after: 1 month
read the same register back. It can cause hangs or machine
checks in certain cases. One particular case is with bge(4)
when a reset is initiated for the controller.
MFC after: 1 month
file system code (mostly *_reclaim()) which look like this:
VOP_LOCK(vp);
/* examine vp */
VOP_UNLOCK(vp);
vdrop(vp);
This can now be rewritten to:
VOP_LOCK(vp);
/* examine vp */
vdropl(vp); /* will unlock vp */
MFC after: 1 week
obtaining and releasing shared and exclusive locks. The algorithms for
manipulating the lock cookie are very similar to that rwlocks. This patch
also adds support for exclusive locks using the same algorithm as mutexes.
A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior. The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.
Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.
The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels. As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.
The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.
The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.
MFC after: 1 month
Submitted by: attilio
Tested by: kris, pjd
explicitly test and panic. This should not ever happen, but if it does,
this is a preferred failure mode to a NULL pointer dereference in kernel.
Coverity CID: 1716
Found with: Coverity Prevent(tm)
incorrect, non-bundlable fragmentation.
- Added min residual to better control split points for
both how big a msg must be as well as how much needs
to be left over.
- With our new algo in place, we need to implicitly
set "end of msg" on the sp-> structure otherwise we
end up with "hung" associations.
- Room reserved up front in IP header by pushing IP
header to back of mbuf.
- Fix so FR's peg count of retransmissions needed.
- Fix so an unlucky chunk that never gets across
will kill the assoc via the kill timer and send an
abort too.
- Fix bug in sctp_input which can result in a crash.
- Do not strip off IP options anymore.
- Clean up sctp_calculate_rto().
- Get rid of unused sysctl.
- Fixed so we discard all M-Cast
- Fixed so port check done AFTER checksum
- Fixed bug in fragmentation code that prevented
us from fragmenting a small complete message when
we needed to.
- Window probes were not marked back to unsent and
flight adjusted when a sack came in with no
window change or accepting of the probe data.
We now fix this with having a mark on the net and
the chunk so we can clear it out when the sack arrives
forcing it to retran just like it was "new" this
improves the handling of window probes, which were
dropped by the receiver.
- Tighten AUTH protocol error checks during INIT/INIT-ACK exchange
We can now use LOCK_CLASS() as a stronger check in lockmgr_chain() as a
result. This required putting back lk_flags as lockmgr's use of flags
conflicted with other flags in lo_flags otherwise.
- Tweak 'show lock' output for lockmgr to match sx, rw, and mtx.
seminfo because kernel_sysctlbyname() is slow. There is no dependency
problem since linux module depends on both sysvmsg and sysvsem and linprocfs
depends on it in turn.
Pointed out by: des
Reviewed by: des
Dont "return" in linux_clone() after we forked the new process in a case
of problems. Move the copyout of p2->p_pid outside the emul_lock coverage.
Submitted by: Roman Divacky
for doing this job. This change will make it easy to migrate from using
spinning locks to adaptive ones.
Reviewed by: glebius, julian
Approved by: cognet (mentor)
* Join the IPv4 all-hosts multicast group 224.0.0.1 once only;
that is, when an IPv4 address is first configured on an interface.
* Do not join it for subsequent IPv4 addresses as this violates IGMP.
* Be sure to leave the group when all IPv4 addresses have been removed
from the interface.
* Add two DIAGNOSTIC printfs related to the issue.
Further care and attention is needed in this area; it is suggested that
netinet's attachment to the ifnet structure be compartmentalized and
non-implicit.
Bug found by: andre
MFC after: 1 month
defined with VFS_LOCK_GIANT(NULL) call.
This shall fix softdep operation when mpsafe_vfs = 0.
Reported and tested by: kris
Submitted by: tegge
MFC after: 1 week
GetSeconds(). Instead, use CRTR register shifted right 15. This
gives us a range of 32 seconds we can do for timeout.
Shift to using == rather than < or > for calculating the timeout,
since if we can't read the ST_CTRT register twice in a second we have
even bigger problems to worry about, and == deals with the 'wrap'
issue.
This lets me type at the boot2 prompt again! Woo Hoo!
Bogusness noticed by: tisco
Pointy Hat to: That silly imp guy
CSD is usually 512 (well, 9), but for 2GB (and the rogue 4GB SD cards)
it is 1024 (or 2048 for 4GB). This value doesn't work for the block
read commands (which really want 512). Hardcode 512 for those. This
may break really old MMC cards that don't have a 512 block size (I've
never seen one: make my day and send me one :-), but since the MMC
side of the house is currently broken, it should only have the effect
that 2GB (and non-conforming 4GB) SD cards will work.
My 'non-conforming' 4GB SD card also works now too. The
non-conforming 4GB SD cards were sold for a while before the SD
association was worried they would be (a) incompatible (different FAT
flavor on them) and (b) confusing for the new SDHC standard and
cracked down on suppliers' bogus use of the SD trademark...
The changes to getstr() is so that the character that is
passed in to it, is also processed just as the rest. I also
removed one of the getc() calls otherwise you loose every
second character.
I also changed the strcpy of kname, so that it only happens if
kname is '\0'. This is so that one can pass a kernel in
through /boot.config.
The last change to boot2.c is in parse(). If you tried to type
a kernel name to boot, the first character was lost, the arg--
fix that.
Submitted by: jhay
that the driver clock is identical to the processor or bus clock.
This is the case for the PowerQUICC processor. When the clock is
high enough, overflows happen in the calculation of the time it
takes to send 1/10 of a character, used in delay loops. Fix the
overflows so as to fix bugs in the delay loops that can cause either
insufficient delays or excessive delays.
system devices (i.e. console, debug port or keyboard), don't stop
after the first match. Find them all and keep track of the last.
The reason for this change is that the low-level console is always
added to the list of system devices first, with other devices added
later. Since new devices are added to the list at the head, we have
the console always at the end. When a debug port is using the same
UART as the console, we would previously mark the "newbus" UART as
a debug port instead of as a console. This would later result in a
panic because no "newbus" device was associated with the console.
By matching all possible system devices we would mark the "newbus"
UART as a console and not as a debug port.
While it is arguably better to be able to mark a "newbus" UART as
both console and debug port, this fix is lightweight and allows
a single UART to be used as the console as well as a debug port
with only the aesthetic bug of not telling the user about it also
being a debug port.
Now that we match all possible system devices, update the rclk of
the system devices with the rclk that was obtained through the
bus attachment. It is generally true that clock information is
more reliable when obtained from the parent bus than by means of
some hardcoded or assumed value used early in the boot. This by
virtue of having more context information.
MFC after: 1 month
by driver backends to mark individual channels as enabled or not.
The default implementation of this method always mark channels as
enabled.
This method is currently not used, but is added with the PowerQUICC
in mind where the 2nd SCC channel can be disabled.
This will increase the memory consumption for more than 1 Mb, but this
is required for operation on multiinterface access concentrators running
mpd.
Requested by: Alexander Motin
watchdog might hide the succesful arming of an earlier one. Accept that on
failing to arm any watchdog (because of non-supported timeouts) EOPNOTSUPP is
returned instead of the more appropriate EINVAL.
MFC after: 3 days
always 0. Previously we aligned threads on a minimum of 8-byte boundaries.
Note: This changes the uma zone to no longer cache align threads. We
really want the uma zone to do align threads to MAX(16, cache line size)
but there currently isn't a good way to express that to uma.
Submitted by: attilio
When submitting rx buffers and not using WC fifo, always replace the
invalid DMA address with the real one, otherwise allocation failures
could lead to the invalid DMA address being given to the NIC, and
that would cause the receive side to lockup.
causing a crash.
Suppose that we have two objects, obj and backing_obj, where
backing_obj is obj's backing object. Further, suppose that
backing_obj has a reference count of two. One being the reference
held by obj and the other by a map entry. Now, suppose that the map
entry is deallocated and its reference removed by
vm_object_deallocate(). vm_object_deallocate() recognizes that the
only remaining reference is from a shadow object, obj, and calls
vm_object_collapse() on obj. vm_object_collapse() executes
if (backing_object->ref_count == 1) {
/*
* If there is exactly one reference to the backing
* object, we can collapse it into the parent.
*/
vm_object_backing_scan(object, OBSC_COLLAPSE_WAIT);
vm_object_backing_scan(OBSC_COLLAPSE_WAIT) executes
if (op & OBSC_COLLAPSE_WAIT) {
vm_object_set_flag(backing_object, OBJ_DEAD);
}
Finally, suppose that either vm_object_backing_scan() or
vm_object_collapse() sleeps releasing its locks. At this instant,
another thread executes vm_object_split(). It crashes in
vm_object_reference_locked() on the assertion that the object is not
dead. If, however, assertions are not enabled, it crashes much later,
after the object has been recycled, in vm_object_deallocate() because
the shadow count and shadow list are inconsistent.
Reviewed by: tegge
Reported by: jhb
MFC after: 1 week
bioscom is called to set up serial port parameters because COMSPEED
was treated as an address instead of an immediate value, causing
serial port parameters to never be set.
PR: i386/110828
Reviewed by: jhb
MFC after: 2 weeks
code.
# There is some question about whether this code is even relevant any
# longer (it dates back to prehistoric times, i.e. present in r1.1),
# especially on amd64.
Reviewed by: jhb
it via pci_get_vpd_*() rather than always reading it for each device during
boot. I've left the tunable so that it can still be turned off if a device
driver causes a lockup via a query to a broken device, but devices whose
drivers do not use VPD (the vast majority) should no longer result in
lockups during boot, and most folks should not need to tweak the tunable
now.
Tested on: bge(4)
Silence from: jmg
one (hardware & global lock). This should address witness complaints that
a duplicate mutex is being acquired. Be sure to free the mutex to fix a
potential memory leak.
MFC after: 3 days
cpufreq_pre_change is called before the change, giving each driver a chance
to revoke the change. cpufreq_post_change provides the results of the
change (success or failure). cpufreq_levels_changed gives the unit number
of the cpufreq device whose number of available levels has changed. Hook
in all the drivers I could find that needed it.
* TSC: update TSC frequency value. When the available levels change, take the
highest possible level and notify the timecounter set_cputicker() of that
freq. This gets rid of the "calcru: runtime went backwards" messages.
* identcpu: updates the sysctl hw.clockrate value
* Profiling: if profiling is active when the clock changes, let the user
know the results may be inaccurate.
Reviewed by: bde, phk
MFC after: 1 month
other C files:
- Move sbcreatecontrol() and sbtoxsockbuf() to uipc_sockbuf.c. While
sbcreatecontrol() is really an mbuf allocation routine, it does its work
with awareness of the layout of socket buffer memory.
- Move pru_*() protocol switch stubs to uipc_socket.c where the non-stub
versions of several of these functions live. Likewise, move socket state
transition calls (soisconnecting(), etc) to uipc_socket.c. Moveo
sodupsockaddr() and sotoxsocket().
doesn't need to be first in softc now. (It was the whole
ifnet structure itself that needed to be first in the good
old days.) Fix the respective comment accordingly.
Add xrefs to ifnet(9) in some other comments while I'm here.
Pointed out by: thompsa
imitating an Ethernet device, so vlan(4) and if_bridge(4) can be
attached to it for testing and benchmarking purposes. Its source
can be an introduction to the anatomy of a network interface driver
due to its simplicity as well as to a bunch of comments in it.
(The rest of needed changes were in my previous commit, which got
interrupted in the middle. Alas, CVS commits are not atomic.)
imitating an Ethernet device, so vlan(4) and if_bridge(4) can be
attached to it for testing and benchmarking purposes. Its source
can be an introduction to the anatomy of a network interface driver
due to its simplicity as well as to a bunch of comments in it.
function may be called without any TCP SACK option blocks present. Protect
iteration over SACK option blocks by checking for SACK options present flag
first.
Bug reported by: wkoszek, keramida, Nicolas Blais
explaining that some more locking is needed. The routing pieces are done,
but there is an interlocking issue between optionally compiled code and
mandatory code.
Spotted by: kris
1) Eliminate an unnecessary check for fictitious pages. Specifically,
only device-backed objects contain fictitious pages and the object is
not device-backed.
2) Change the types of "psize" and "tmpidx" to vm_pindex_t in order to
prevent possible wrap around with extremely large maps and objects,
respectively. Observed by: tegge (last summer)
temporary mapping created by locore so that the lowest two to four
megabytes can become a permanent identity mapping. This implementation
avoids any use of a large page mapping.
FreeBSD/arm installworld install is only 170MB. The smallest SD card
I could find at the store today was 512MB (and it was only $10 after
rebate), with a 2GB card for as low as $25.00...
Now that the IIC stuff has been sorted out, include that as well.
Include hints for the icee 16kb 16-bit i2c device. It should include
info about the temperature sensor as well, but that driver isn't quite
ready.
Add bpf for dhclient happiness.
MFC After: 1 week
some devices (and not others). To get instances onto the iicbus, one
now needs hints or an identify routine. We also do not probe the bus
for devices because many iic devices cannot be safely probed (and when
they can, the probe order turns out to be somewhat difficult to get
right).
# I'm not 100% sure that the iicsmb removal is right. Please contact me if
# this causes difficulty.
robustness of IIC transactions when parts aren't present. This also
removes a bunch of debug. This also moves this driver to 7-1
addressing rather than 6-0 addressing, which is more inline with all
the other iic drivers in the tree. I've tested this for about a
million years on the systems at work.
The relevant changes for FreeBSD (excerpt from the release note):
* Newly implemented CORE EXT words: CASE, OF, ENDOF, and ENDCASE. Also
added FALLTHROUGH, which works like ENDOF but jumps to the instruction
just after the next OF.
* Bugfix: John-Hopkins locals syntax now accepts | and -- in the comment
(between the first -- and the }.)
* Bugfix: Changed vmGetWord0() to make Purify happier. The resulting
code is no slower, no larger, and slightly more robust.
o tcp_input() now handles TCP segment sanity checks and preparations
including the INPCB lookup and syncache.
o tcp_do_segment() handles all data and ACK processing and is IPv4/v6
agnostic.
Change all KASSERT() messages to ("%s: ", __func__).
The changes in this commit are primarily of mechanical nature and no
functional changes besides the function split are made.
Discussed with: rwatson
- Change exca_activate_resource() to call BUS_ACTIVATE_RESOURCE() before
calling exca_(io|mem)_map() since the latter use rman_get_bus(tag|handle)
and the recent changes to nexus(4) mean that you need to activate a
resource before reading the bus tag and handle. This was true before,
but now the nexus(4) drivers on x86 and ia64 are more forceful about it.
Reviewed by: imp
calling pru_detach we can be absolutely sure, that we don't have any
references to the socket in the stack.
This closes race between lockless sbdestroy() and data arriving on socket.
Reviewed by: rwatson
sequence. First, if rt_ifa is going to be changed, then call
ifa_rtrequest(RTM_DELETE). Second, if gateway is going to be changed,
then call rt_setgate(). Third, change rt_ifa.
With this change we are able to change a link level route to a
gateway one, that wasn't possible before:
# ifconfig em0 192.168.22.1/24
# arp -s 192.168.22.99 00:11:22:33:44:55
# route change 192.168.22.99 192.168.22.199
# ping 192.168.22.99
db>
Reported by: avatar
instance expiry of the ARP entries. Since we no longer abuse the IPv4
radix head lock, we can now enter arp_rtrequest() with a lock held on
an arbitrary rt_entry.
Reviewed by: bms
argument from a mutex to a lock_object. Add cv_*wait*() wrapper macros
that accept either a mutex, rwlock, or sx lock as the second argument and
convert it to a lock_object and then call _cv_*wait*(). Basically, the
visible difference is that you can now use rwlocks and sx locks with
condition variables using the same API as with mutexes.
macros.
- witness_check() replaces witness_check_mtx() and
witness_check_exclusive_sx() and checks for an exclusive acquire of
either a mutex, rwlock, or sx lock.
- witness_check_shared() replaces witness_check_shared_sx() and checks for
a shared acquire of either a rwlock or sx lock.
until after the call to fdclose(). This closes an obscure race that
could result in the later call to fdclose() actually closing a different
file descriptor if another thread close()'s the file descriptor being
opened before fdrop() is called, so the fdrop() in kern_open() frees the
file object, then the second thread (or a third) creates a new file
descriptor which reuses both the same index and the same file pointer
thus tricking fdclose() in the first thread into thinking that the
original file was still open.
MFC after: 1 week
DMA memory for a firmware load if it was the exact size needed, thus in the
common case the driver was constantly free'ing and reallocating the DMA
buffer and it would eventually begin to fail. With this fix, iwi0 reuses
the same buffer the entire time and no longer fails to load the firmware
after the machine has been up for a while.
MFC after: 1 week
simpler. It now can just use rman_is_region_manager() during
acpi_release_resource() to see if the the resource is suballocated from
a system resource. Also, the driver no longer needs MD knowledge about
how to setup bus space tags and handles when doing a suballocation, but
can simply rely on bus_activate_resource() in the parent setting all that
up.
handles when activating a resource via bus_activate_resource() rather than
doing some of the work in bus_alloc_resource() and some of it in
bus_activate_resource().
One note is that when using isa_alloc_resourcev() on PC-98, drivers now
need to just use bus_release_resource() without explicitly calling
bus_deactivate_resource() first. nyan@ has already fixed all of the PC-98
drivers.
o make all crypto drivers have a device_t; pseudo drivers like the s/w
crypto driver synthesize one
o change the api between the crypto subsystem and drivers to use kobj;
cryptodev_if.m defines this api
o use the fact that all crypto drivers now have a device_t to add support
for specifying which of several potential devices to use when doing
crypto operations
o add new ioctls that allow user apps to select a specific crypto device
to use (previous ioctls maintained for compatibility)
o overhaul crypto subsystem code to eliminate lots of cruft and hide
implementation details from drivers
o bring in numerous fixes from Michale Richardson/hifn; mostly for
795x parts
o add an optional mechanism for mmap'ing the hifn 795x public key h/w
to user space for use by openssl (not enabled by default)
o update crypto test tools to use new ioctl's and add cmd line options
to specify a device to use for tests
These changes will also enable much future work on improving the core
crypto subsystem; including proper load balancing and interposing code
between the core and drivers to dispatch small operations to the s/w
driver as appropriate.
These changes were instigated by the work of Michael Richardson.
Reviewed by: pjd
Approved by: re
address ranges used by local and I/O APICs in the system. Some systems
also reserve these ranges as system resources via either PnPBIOS or
ACPI, so this device currently attaches after acpi0 and legacy0 so that
the system resources are given precedence.
(with the notable exception of improvements for using multiple TX queues)
This adds support for the T3B2 ASIC rev
Obtained from: Chelsio
MFC after: 3 days
addresses corresponding to system RAM. On amd64 ram0 uses the SMAP
and claims all the type 1 SMAP regions. On i386 ram0 uses the
dump_avail[] array. Note that on i386 we have to ignore regions above
4G in PAE kernels since bus resources use longs.
udp_sendspace, to avoid a situation where jumbograms (datagrams > 9KB)
are unnecessarily fragmented.
A common use case for this is OSPF link-state database synchronization
during adjacency bringup on a high speed network with a large MTU.
It is not possible to auto-tune this setting until a socket is bound to
a given interface, and because the laddr part of the inpcb tuple may be
overridden, it makes no sense to do so. Applications may request a larger
socket buffer size by using the SO_SENDBUF and SO_RECVBUF socket options.
Certain applications such as Quagga ospfd do not probe for interface MTU
and therefore do not increase SO_SENDBUF in this use case.
XORP is not affected by this problem as it preemptively uses SO_SENDBUF
and SO_RECVBUF to account for any possible additional latency in XRL IPC.
PR: kern/108375
Requested by: Vladimir Ivanov
MFC after: 1 week
on amd64 and i386) until we gain proper BUS_DMA_NOCACHE support.
(in progress).
Tested by: rafan, infofarmer, Nguyen Tam Chinh <unixvn@gmail.com>
Tested on: amd64, i386
shutdown which caused extra abort from peer.
- RTT time calculation was not being done in
express sack handling since it refered to an unused
variable (rto_pending). Removed variable.
- socket buffer high water access macro-ized.
- don't acquire port lock, already held in ioctl
- rename to cxgb_stop_locked
- switch callout_drain to callout_stop to avoid a hang from having the port lock held
cause the EC to stop handling future events because the GPE stayed masked.
Set a flag when queueing a GPE handler since it will ultimately re-enable
the GPE. In all other cases, re-enable it ourselves. I reworked the
patch from the submitter.
Submitted by: Rong-en Fan <grafan@gmail.com>
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.
This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.
With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.
Obtained from: p4 branch bms_netdev
Reviewed by: andre
Sponsored by: Garance A Drosehn
Idea from: NetBSD
MFC after: 1 month
is okay for most of the chipsets but BCM5701 PHY does not seem to like it.
Set media to IFM_NONE if link is not up instead of the previous value.
Reported by: Goran Lowkrantz (goran dot lowkrantz at ismobile dot com)
already been deleted. The assertion is important to show that
we won't end up accounting for extended attribute blocks (using
fs_pendingblocks) in our subsequent call to fs_alloc().
Agreed verbally by: mckusick
MFC after: 3 weeks
Main points of this change:
* Drop frames immediately if the interface is not marked IFF_UP.
* Always trim off the frame checksum if present.
* Always use M_VLANTAG in preference to passing 802.1Q frames
to consumers.
* Use __func__ consistently for KASSERT().
* Use the M_PROMISC flag to detect situations where ether_input()
may reenter itself on the same call graph with the same mbuf which
was promiscuously received on behalf of subsystems such as
netgraph, carp, and vlan.
* 802.1P frames (that is, VLAN frames with an ID of 0) will now be
passed to layer 3 input paths.
* Deal with the special case for CARP in a sane way.
This is a significant rewrite of code on the critical path. Please report
any issues to me if they arise. Frames will now only pass through dummynet
if M_PROMISC is cleared, to avoid problems with re-entry.
The handling of CARP needs to be revisited architecturally. The M_PROMISC
flag may potentially be demoted to a link-layer flag only as it is in
NetBSD, where the idea originated.
Discussed on: net
Idea from: NetBSD
Reviewed by: yar
MFC after: 1 month
This change partially resolves the issue in the PR. Further architectural
fixes, in the form of reference counting, are needed.
PR: 86848
Reviewed by: yar
MFC after: 1 month
on a per VRF basis (BSD has only one VRF currently).
Hash table is sized to 16 but may need to be adjusted
for machines with large numbers of addresses.
Reviewed by: gnn
- SB_CLEAR macro defined and used for sb clearing.
- Fix for CMT express_sack_handling did not do proper
pseudo-cumack updates.
- Get rid of extraneous function that was never used ip_2_ip6_hdr()
- Fixed source address selection bug (initialization problem).
- Source address selection debug added.
in case of multiple interfaces with the same MAC in the same bridge.
This commit do not solve the entire problem. Only case where packet
arrived from such interface.
PR: kern/109815
MFC after: 7 days
Submitted by: Eygene Ryabinkin and rik@
Discussed with: bms@, thompsa@, yar@
prison_priv_check() to decide what to do.
This change is suppose not to change current (security) behaviour
in any way.
This change is simlar to the change of PRIV_VFS_MOUNT in previous revision.
most systems, it causes the EC not to respond for some Acer and Compaq/HP
laptops. This is the default value for Linux also. For systems that need
it, burst mode can be enabled via the tunable/sysctl:
debug.acpi.ec.burst="1"
Only ops which used namei still remained.
- Implement a scheme for reducing the overhead of tracking which vops
require giant by constantly reducing the number of recursive giant
acquires to one, leaving us with only one vfslocked variable.
- Remove all NFSD lock acquisition and release from the individual nfs
ops. Careful examination has shown that they are not required. This
greatly simplifies the code.
Sponsored by: Isilon Systems, Inc.
Discussed with: rwatson
Tested by: kkenn
Approved by: re
unsigned char. Weirdly, casting the 1 constant to u_char still produces
a signed integer result that is then used in the % computation. This
avoids that mess all together and causes a 0 pri to turn into 255 % 64
as we expect.
Reported by: kkenn (about 4 times, thanks)
- *ip is not initialized in the case of inet6 connection, but ip->ip_len is
being changed anyway
Now the question is, why does it think an ipv4 connection is an ipv6 connection?
xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.
- SWAPLR quirk for (unknown, luckily it is mine) broken uaudio stick.
Fixing by rewiring is impossible without damaging it. Luckily,
we can fix it using "other" methods :) .
- Add uaudio_get_vendor(), _product() and _release() in uaudio.c
(currently used by uaudio_pcm quirk).
- Implement CHANNEL_SETFRAGMENTS().
- Drop channel locking in few places where it is about to sleep
somewhere. This should help eliminating illegal locking acquisition
where the current thread is about to sleep, and also few deadlock
cases. Dropping it right here is quite safe since it is already
protected by CHN_F_BUSY flag and other threads won't bother to touch it.
Solving other illegal locking issues are quite tricky without converting
most usbd_do_request() calls to its equivalent _async() calls,
which I intend to do it later after getting full test report from
other people with different uaudio hardwares.
- Fix memory leak issues during detach. This seems common to any drivers
(notably emu10kx, csapcm?) with bridge functions.
Implement CHANNEL_SETFRAGMENTS() for snd_atiixp, snd_es137x, snd_hda
and snd_via8233. CHANNEL_SETBLOCKSIZE() will basically call
CHANNEL_SETFRAGMENTS() internally using conservative blocksize /
blockcount hints. Other drivers will be converted later.
- Disable stray buffer management, since sample size aligned buffering
are pretty much guaranteed through out the entire feeder_* chain
processes.
- Few style(9) cleanups.
channel.c/channel_if.m:
- Macros cleanups, prefer inlined min() over MIN().
- Rework chn_read()/chn_write() for better dead interrupt detection
policy. Reduce scheduling overhead by doing pure 5 seconds sleep
before giving up, instead of several cycle of brute micro sleeping.
- Avoid calling wakeup_one() for non-sleeping channel (for example,
vchan parent channel).
- EWOULDBLOCK -> EAGAIN.
- Fix possible divide-by-zero panic on chn_sync().
- Re-enforce ^2 blocksize policy, since there are too many broken
userland apps that blindly assume it without even trying to do
serious calculations.
- New channel method - CHANNEL_SETFRAGMENTS(), a refined version of
CHANNEL_SETBLOCKSIZE(). It accept _both_ blocksize and blockcount
arguments, so the driver internals will have better hints for
buffering and timing calculations.
- Hook FEEDER_SWAPLR into feederchain building process.
feeder_fmt.c:
- Unified version of various filters, avoiding duplications.
- malloc()less feeder_fmt. Informations can be retrieved dynamically
by doing table lookup on static data. For cases such as converting
from stereo to mono or reducing bit depth where input data is larger
than output, cycle remaining available free space until it has been
exhausted and start kicking 8 bytes reservoir space from there to
complete the remaining requested count.
- Introduce FEEDER_SWAPLR. Few super broken hardwares (found on several
extremely cheap uaudio stick, possibly others) mistakenly wired left
and right channels wrongly, screwing output or input.
- Rearrange FEEDER_* constants starting from 0 to 31, so the future
additions will be much easier and consistent.
- Introduce FEEDER_SWAPLR. Few super broken hardwares (found on several
extremely cheap uaudio stick, possibly others) mistakenly wired left
and right channels wrongly, screwing output or input.
malloc()less feeder_vchan. Informations can be retrieved dynamically
by doing table lookup on static data. Reduce mixing overhead by
doing direct copy on first channel. Mixing process will begin starting
from second channel onwards.
malloc()less feeder_volume. Informations can be retrieved dynamically
by doing table lookup on static data. Increase resolution from 6bit
to PCM_FXSHIFT (8bit) for better resolution and finer volume changes.
- Convert sx lock to plain mutex. Since the access of /dev/sndstat
is pretty much exclusive and protected by toggling sndstat_isopen,
plain mutex is more than enough.
- Enable SBUF_AUTOEXTEND to avoid buffer truncation.
- We need at least two OCBs with indirect pointers allocated in a 4KB page.
- SBP_MAXPHYS can increase to 1MB once we separate management OCB/ORB
which usually does not need indirect pointers.
- We have to increase SBP_DMA_SIZE for MAXPHYS larger than 1MB.
MFC after: 3 days
cache coherency, besides of causing train wreck in other places
(especially on amd64, possibly on i386).
Discussed with: kib@, rafan@
Tested by: rafan@
confusions and panic provided that the following conditions are met:
1) WITNESS is enabled (watch/trace).
2) Using modules, instead of statically linked (Not a strict
requirement, but easier to reproduce this way).
3) 2 or more modules share the same mtx type ("sound softc").
- They might share the same name (strcmp() == 0), but it always
point to different address.
4) Repetitive kldunload/load on any module that shares the same mtx
type (Not a strict requirement, but easier to reproduce this way).
Consider module A and module B:
- From enroll() - subr_witness.c:
* Load module A. Everything seems fine right now.
wA-w_refcount == 1 ; wA-w_name = "sound softc"
* Load module B.
* w->w_name == description will always fail.
("sound softc" from A and B point to different address).
* wA->w_refcount > 0 && strcmp(description, wA->w_name) == 0
* enroll() will return wA instead of returning (possibly unique)
wB.
wA->w_refcount++ , == 2.
* Unload module A, mtx_destroy(), wA->w_name become invalid,
but wA->w_refcount-- become 1 instead of 0. wA will not be
removed from witness list.
* Some other places call mtx_init(), iterating witness list,
found wA, failed on wA->w_name == description
* wA->w_refcount > 0 && strcmp(description, wA->w_name)
* Panic on strcmp() since wA->w_name no longer point to valid
address.
Note that this could happened in other places as well, not just sound
(eg. consider lots of drivers that share simmilar MTX_NETWORK_LOCK).
Solutions (for sound case):
1) Provide unique mtx type string for each mutex creation (chosen)
or
2) Put "sound softc" global variable somewhere and use it.
and syncache_respond() into its own generic function tcp_addoptions().
tcp_addoptions() is alignment agnostic and does optimal packing in all cases.
In struct tcpopt rename to_requested_s_scale to just to_wscale.
Add a comment with quote from RFC1323: "The Window field in a SYN (i.e.,
a <SYN> or <SYN,ACK>) segment itself is never scaled."
Reviewed by: silby, mohans, julian
Sponsored by: TCP/IP Optimization Fundraise 2005
- moved away from ifn/ifa access to sctp_ifa/sctp_ifn
built and managed by the add-ip code.
- cleaned up add-ip code to use the iterator
- made iterator be a thread, which enables auto-asconf now.
- rewrote and cleaned up source address selection (also
made it use new structures).
- Fixed a couple of memory leaks.
- DACK now settable as to how many packets to delay as
well as time.
- connectx() to latest socket API, new associd arg.
- Fixed issue with revoking and loosing potential to
send when we inflate the flight size. We now inflate
the cwnd too and deflate it later when the revoked
chunk is sent or acked.
- Got rid of some temp debug code
- src addr selection moved to a common file (sctp_output.c)
- Support for simple VRF's (we have support for multi-vfr
via compile switch that is scrubbed from BSD but we won't
need multi-vrf until we first get VRF :-D)
- Rest of mib work for address information now done
- Limit number of addresses in INIT/INIT-ACK to
a #def (30).
Reviewed by: gnn
boot. Then, just switch to the kernel pmap when suspending instead of
allocating/freeing our own mapping every time. This should solve a panic
of pmap_remove() being called with interrupts disabled. Thanks to Alan
Cox for developing this patch.
Note: this means that ACPI requires super page (PG_PS) support in the CPU.
This has been present since the Pentium and first documented in the
Pentium Pro. However, it may need to be revisited later.
Submitted by: alc
MFC after: 1 month
acpi module. Also clean up print of args a little.
This was accidentally committed as 1.9.2.3 in the stable branch. Since it
is harmless, I will let the "insta-MFC" stand unless there is a problem.
the alternate status and the control registers. Remove the local
version of ata_reset.
Add support for the ADI Pronghorn Metro boards. They use CS3 and CS4
instead of Avila's CS1 and CS2.
the alternate status and the control registers. Remove the local
version of ata_reset.
Add support for the ADI Pronghorn Metro boards. They use CS3 and CS4
instead of Avila's CS1 and CS2.
OKed by: sam, cognet
Each struct dquot gets dq_lock mutex to protect dq_flags and to interlock
with DQ_LOCK. qhash, dqfreelist and dq.dq_cnt are protected by global
dqhlock mutex.
i_dquot array for inode is protected by lockmgr' vnode lock, corresponding
assert added to the dqget(). Access to struct ufsmount quota-related fields
(um_quotas and um_qflags) is protected by um_lock.
Tested by: Peter Holm
Reviewed by: tegge
Approved by: re (kensmith)
This work were not possible without enormous amount of help given by
Tor Egge and Peter Holm. Tor reviewed each version of patch, pointed out
numerous errors and provided invaluable suggestions. Peter did tireless
testing of the patch as it was developed.
from ATAPI requests. If CAM debugging is enabled, also mark ATAPI
requests with ATA_R_DEBUG flag.
(atapi_cb): Report ATAPI timeouts to the CAM layer.
Fix incorrect debugging traces in the presence of ATAPI errors.
PR: kern/103602
MFC after: 2 weeks
CAM rescan if the ATAPI device entries have not changed.
The ATAPI bus may be reset for a variety of reasons, including any time an
ATAPI request times out. It is not necessary to rescan at the CAM level
in such a case, unless a device has appeared or disappeared, or has
otherwise changed.
PR: kern/103602
MFC after: 2 weeks
ATAPI request, do not clear the ATA_R_DEBUG flag. This allows a request
marked as requiring debug traces to produce these traces also during
the completion of the autosense processing.
Reviewed by: sos
MFC after: 2 weeks
tokens into the common isp_osinfo structure instead of being
in bus specific structures. This allows us to implement
a SYNC_REG MEMORYBARRIER call (using bus_space_barrier)
and also reduce the amount of bus specific wrapper structure
usages in isp_pci && isp_sbus.
MFC after: 3 days
late stages of unmount). On failure, the vnode is recycled.
Add insmntque1(), to allow for file system specific cleanup when
recycling vnode on failure.
Change getnewvnode() to no longer call insmntque(). Previously,
embryonic vnodes were put onto the list of vnode belonging to a file
system, which is unsafe for a file system marked MPSAFE.
Change vfs_hash_insert() to no longer lock the vnode. The caller now
has that responsibility.
Change most file systems to lock the vnode and call insmntque() or
insmntque1() after a new vnode has been sufficiently setup. Handle
failed insmntque*() calls by propagating errors to callers, possibly
after some file system specific cleanup.
Approved by: re (kensmith)
Reviewed by: kib
In collaboration with: kib
sosend_copyin().
- Use M_WAITOK instead of M_TRYWAIT in sosend_copyin().
- Don't check for NULL from M_WAITOK and return ENOBUFS.
M_WAITOK/M_TRYWAIT allocations don't fail with NULL.
Reviewed by: andre
Requested by: andre (2)
This can help to spot bugs (which it did for me,)
and let people know which mode the vlan module is
actually using if they suspect it isn't picking its
options from the main kernel config file.
- ifv_list member of struct ifvlan is unneeded in array mode,
it's used only in hash mode to resolve hash collisions.
- We don't need the list of trunks at all. (The initial reason for
having it was to be able to destroy all trunks in the MOD_UNLOAD
handler, but a trunk is not to be destroyed forcibly -- it will
go away when all vlan interfaces on it have been deleted.
Note that if_clone_detach() called first of all under MOD_UNLOAD
will delete all vlan interfaces and thus make all trunks go away
quietly.)
- It's enough to use a single [S]LIST_FIRST() in a typical list
destruction loop.
function which is called from pfs_destroy() before the node is reclaimed.
Modify pfs_create_{dir,file,link}() to accept a pointer to a destructor
function in addition to the usual attr / fill / vis pointers.
This breaks both the programming and binary interfaces between pseudofs
and its consumers. It is believed that there are no pseudofs consumers
outside the source tree, so that the impact of this change is minimal.
Submitted by: Aniruddha Bohra <bohra@cs.rutgers.edu>
their latest Compaq V3000 BIOS (revision F.22). As a result, analog CD
connectivity is gone to the oblivion. Even if they decide to fix it in
future revisions, the damage has been done.
o leave IEEE80211_RADIOTAP_HDRLEN for portability to other systems but
correct comment about radiotap headers being padded to 64-bytes
(hasn't been true for many years)
o remove reference to IEEE80211_RADIOTAP_FCS; it was never used, instead
the flags are marked with IEEE80211_RADIOTAP_F_FCS to indicate whether
or not FCS is present
Might be better to just remove IEEE80211_RADIOTAP_HDRLEN so drivers
don't bogusly pad.
MFC after: 2 weeks
event. Locking primitives that support this (mtx, rw, and sx) now each
include their own foo_sleep() routine.
- Rename msleep() to _sleep() and change it's 'struct mtx' object to a
'struct lock_object' pointer. _sleep() uses the recently added
lc_unlock() and lc_lock() function pointers for the lock class of the
specified lock to release the lock while the thread is suspended.
- Add wrappers around _sleep() for mutexes (mtx_sleep()), rw locks
(rw_sleep()), and sx locks (sx_sleep()). msleep() still exists and
is now identical to mtx_sleep(), but it is deprecated.
- Rename SLEEPQ_MSLEEP to SLEEPQ_SLEEP.
- Rewrite much of sleep.9 to not be msleep(9) centric.
- Flesh out the 'RETURN VALUES' section in sleep.9 and add an 'ERRORS'
section.
- Add __nonnull(1) to _sleep() and msleep_spin() so that the compiler will
warn if you try to pass a NULL wait channel. The functions already have
a KASSERT to that effect.
from whoever has dequeued the item from the queue. Generally they have
no interest in the result, and even if it is called by the queuer, it
should still pretend that it was queued. The queuer should be assuming
that the call was queued and giving them the false confidence that they
are getting status leads to hard to find bugs.
Make it a void and remove all the code that tried to return status through it.
These functions are intended to be used to drop a lock and then reacquire
it when doing an sleep such as msleep(9). Both functions accept a
'struct lock_object *' as their first parameter. The 'lc_unlock' function
returns an integer that is then passed as the second paramter to the
subsequent 'lc_lock' function. This can be used to communicate state.
For example, sx locks and rwlocks use this to indicate if the lock was
share/read locked vs exclusive/write locked.
Currently, spin mutexes and lockmgr locks do not provide working lc_lock
and lc_unlock functions.
excessive interrupt clock timer reset, screwing interrupt generation
for already active channels. Track moving DMA pointer and call buffer
interrupt on each blocksize boundary.
PR: kern/109791
MFC after: 3 days
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.
* To use this option with a UDP socket, it must be bound to a local port,
and INADDR_ANY, to disallow possible collisions with existing udp inpcbs
bound to the same port on other interfaces at send time.
* If the socket is bound to INADDR_ANY, specifying IP_SENDSRCADDR with
INADDR_ANY will be rejected as it is ambiguous.
* If the socket is bound to an address other than INADDR_ANY, specifying
IP_SENDSRCADDR with INADDR_ANY will be disallowed by in_pcbbind_setup().
Reviewed by: silence on -net
Tested with: src/tools/regression/netinet/ipbroadcast
MFC after: 4 days
a thread is an idle thread, just see if it has the IDLETD
flag set. That flag will probably move to the pflags word
as it's permenent and never chenges for the life of the
system so it doesn't need locking.
- Remove some excessive parentheses around shift operators.
- Use macro instead of magic number where it is applicable.
- Change lower-case hexdecimals to upper cases to match wpaul's style.
- Revert some unnecessary line wraps and changes from the previous commit.
Pointed out by: bde
in the field. In one situation, one end of the TCP connection sends
a back-to-back RST packet, with delayed ack, the last_ack_sent variable
has not been update yet. When tcp_insecure_rst is turned off, the code
treats the RST as invalid because last_ack_sent instead of rcv_nxt is
compared against th_seq. Apparently there is some kind of firewall that
sits in between the two ends and that RST packet is the only RST
packet received. With short lived HTTP connections, the symptom is
a large accumulation of connections over a short period of time .
The +/-(1) factor is to take care of implementations out there that
generate RST packets with these types of sequence numbers. This
behavior has also been observed in live environments.
Reviewed by: silby, Mike Karels
MFC after: 1 week
sun4v nexus(4) in turn is based on):
o Change nexus(4) to manage the resources of its children so the
respective device drivers don't need to figure them out of OFW
themselves.
o Change nexus(4) to provide the ofw_bus KOBJ interface instead of
using IVARs for supplying the OFW node and the subset of standard
properties of its children. Together with the previous change this
also allows to fully take advantage of newbus in that drivers like
fhc(4), which attach on multiple parent busses, no longer require
different bus front-ends as obtaining the OFW node and properties
as well as resource allocation works the same for all supported
busses. As such this change also is part 4/4 of allowing creator(4)
to work in USIII-based machines as it allows this driver to attach
on both nexus(4) and upa(4). On the other hand removing these IVARs
breaks API compatibility with the powerpc nexus(4) but which isn't
that bad as a) sparc64 currently doesn't share any device driver
hanging off of nexus(4) with powerpc and b) they were no longer
compatible regarding OFW-related extensions at the pci(4) level
since quite some time.
o Provide bus_get_dma_tag methods in nexus(4) and its children in
order to handle DMA tags in a hierarchical way and get rid of the
sparc64_root_dma_tag kludge. Together with the previous two items
this changes also allows to completely get rid of the nexus(4)
IVAR interface. It also includes:
- pushing the constraints previously specified by the nexus_dmatag
down into the DMA tags of psycho(4) and sbus(4) as it's their
IOMMUs which induce these restrictions (and nothing at the
nexus(4) or anything that would warrant specifying them there),
- fixing some obviously wrong constraints of the psycho(4) and
sbus(4) DMA tags, which happened to not actually be used with
the sparc64_root_dma_tag kludge in place and therefore didn't
cause problems so far,
- replacing magic constants for constraints with macros as far
as it is obvious as to where they come from.
This doesn't include taking advantage of the newbus way to get
the parent DMA tags implemented by this change in order to divorce
the IOTSBs of the PCI and SBus IOMMUs or for implementing the
workaround for the DMA sync bug in Sabre (and Tomatillo) bridges,
yet, though.
o Get rid of the notion that nexus(4) (mostly) reflects an UPA bus
by replacing ofw_upa.h and with ofw_nexus.h (which was repo-copied
from ofw_upa.h) and renaming its content, which actually applies to
all of Fireplane/Safari, JBus and UPA (in the host bus case), as
appropriate.
o Just use M_DEVBUF instead of a separate M_NEXUS malloc type for
allocating the device info for the children of nexus(4). This is
done in order to not need to export M_NEXUS when deriving drivers
for subordinate busses from the nexus(4) class.
o Use the DEFINE_CLASS_0() macro to declare the nexus(4) driver so
we can derive subclasses from it.
o Const'ify the nexus_excl_name and nexus_excl_type arrays as well
as add 'associations' and 'rsc', which are pseudo-devices without
resources and therefore of no real interest for nexus(4), to the
former.
o Let the nexus(4) device memory rman manage the entire 64-bit address
space instead of just the UPA_MEMSTART to UPA_MEMEND subregion as
Fireplane/Safari- and JBus-based machines use multiple ranges,
which can't be as easily divided as in the case of UPA (limiting
the address space only served for sanity checking anyway).
o Use M_WAITOK instead of M_NOWAIT when allocating the device info
for children of nexus(4) in order to give one less opportunity
for adding devices to nexus(4) to fail.
o While adapting the drivers affected by the above nexus(4) changes,
change them to take advantage of rman_get_rid() instead of caching
the RIDs assigned to allocated resources, now that the RIDs of
resources are correctly set.
o In iommu(4) and nexus(4) replace hard-coded functions names, which
actually became outdated in several places, in panic strings and
status massages with __func__. [1]
o Use driver_filter_t in prototypes where appropriate.
o Add my copyright to creator(4), fhc(4), nexus(4), psycho(4) and
sbus(4) as I changed considerable amounts of these drivers as well
as added a bunch of new features, workarounds for silicon bugs etc.
o Fix some white space nits.
Due to lack of access to Exx00 hardware, these changes, i.e. central(4)
and fhc(4), couldn't be runtime tested on such a machine. Exx00 are
currently reported to panic before trying to attach nexus(4) anyway
though.
PR: 76052 [1]
Approved by: re (kensmith)
- Properly note when a read lock is released.
- Always note when we contest on a read lock.
- Only note success of obtaining read locks for the first reader to match
the behavior of sx(9).
Reviewed by: kmacy
station exiting power save mode prepend them to the driver's
send q instead of appending them. This insures the packets
are not misordered wrt any packets already q'd for the station.
This corrects a problem noticed when using a VoIP phone talking
to an ath card in ap mode; the misordered packets caused noise.
Submitted by: "J.R. Oldroyd" <jr@opal.com>
MFC after: 2 weeks
<sys/extattr.h> to <ufs/ufs/extattr.h>. Move description
of extended attributes in UFS from man9/extattr.9 to
man5/fs.5.
Note that restore will not compile until <sys/extattr.h>
and <ufs/ufs/extattr.h> have been updated.
Suggested by: Robert Watson
never correct as CAM has no real understanding of it, and will just immediately
retry the command. This leads to undesirable cycling of the camisr as well as
a high possibility for the command to exhaust its retries before the driver
can get around to servicing it.
The better fix, as demonstrated here, is to freeze the simq and mark the
command as needing to be tried. Then when driver can service the command,
the simq gets unfrozen. This is correct, and documented here to help reduce
the mystery. However, it also points out a shortcoming in CAM error handling
that makes writing drivers harder.
Submitted by: Erich Chen
for processing frames from the power save queue when operating
in ap mode. This is especially noticeable for realtime data going
to devices like voip phones.
Submitted by: "J.R. Oldroyd" <jr@opal.com>
MFC after: 2 weeks
For example, during a buildworld more than half of the calls do not
generate an IPI because the only TLB entry invalidated is on the calling
processor. This revision pushes down the acquisition and release of
smp_ipi_mtx into smp_tlb_shootdown() and smp_targeted_tlb_shootdown() and
instead uses sched_pin() and sched_unpin() in pmap_invalidate_*() so that
thread migration doesn't lead to a missed TLB invalidation.
Reviewed by: jhb
MFC after: 3 weeks
EC occasionally times out and provides bogus values (3000C). This change
prevents those systems from prematurely shutting down while we work on the
underlying problem. Also, bump the sanity value to 0...200C from 0...150C.
paper over catching an error as the case was already handled, albeit
in a somewhat surprising way (the caller received zero'd data)
Submitted by: sephe
MFC after: 2 weeks
by any code in the tree[1] and are close enough for common values
that this change is a noop
[1] ath uses one macro to calculate a value that is not used
Submitted by: sephe
MFC after: 1 week
- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.
to problems when the geli device is used with file system or as a swap.
Hopefully will prevent problems like kern/98742 in the future.
MFC after: 1 week
RTC state, then it may clobber the RTC index register, so the index
register must be restored before using it to restore control registers
in rtc_restore().
The following problems remain:
- rtc_restore() is only called if pmtimer is configured. Buggy
suspend/resumes are more likely to clobber the index register than
a control register, so pmtimer is more needed than it used to be.
- pmtimer doesn't exist for amd64.
- Restoring of the RTC state may race with rtcintr(). If an RTC
interrupt is handled before the state is restored, then rtcin(RTC_INTR)
in rtcintr() may read from the wrong register, so rtcintr() may spin
forever. This may be mitigated by the most common state clobbering
being to turn off RTC interrupts.
system calls now enter without Giant held, and then in some cases, acquire
Giant explicitly.
Remove a number of other MPSAFE annotations in the credential code and
tweak one or two other adjacent comments.
Add some comments to explain how 10 was picked. 20 was completely
arbitrary, at least 10 has some reasoning behind it.
Also, update the comments about how long we sleep to reflect the new,
shorter timeout we use.
- Remove unnecessary findcpuspeed() function.
- Initialize the timer_freq in i8254_init().
- Fix inittodr() and resettodr(). These are broken by rev.1.154.
If these drivers are setting M_VLANTAG because they are stripping the
layer 2 802.1Q headers, then they need to be re-inserting them so any
bpf(4) peers can properly decode them.
It should be noted that this is compiled tested only.
MFC after: 3 weeks
uuencoded format along with their respective LICENSE files.
- Add new share/doc/legal directory to BSD.usr.dist mtree file. This is the
place we install LICENSE files for restricted firmwares.
- Teach firmware(9) and kmod.mk about licensed firmwares. Restricted firmwares
won't load properly unless legal.<name>.license_ack is set to 1, either
via kenv(1) or /boot/loader.conf.
Reviewed by: mlaier, sam
Permitted by: Intel (via Andrew Wilson)
MFC after: 1 month
to embed up to four counters in outgoing packets. The message specifies
the offset at which the counter should be inserted as well as the
parameters of the counter.
Example usage:
ngctl msg src0: setcounter \
'{ index=0 offset=0x40 flags=1 width=4 increment=1 max_val=12345 }'
Sponsored by: Sandvine Incorporated
to embed a timestamp (struct timeval) in outgoing packets. The message
specifies the offset at which the timestamp should be inserted.
NG_SOURCE(4) gives an example usage that queues an ICMP packet. Using that
example, the following command will insert a timestamp in the ICMP's data
payload:
ngctl msg src0: settimestamp '{ offset=0x2a flags=1 }'
Sponsored by: Sandvine Incorporated
this patch the code behaves according to the comment on the line above.
Without this patch, a socket could cause SIGPIPE to be delivered to its
process, once with SO_NOSIGPIPE set, and twice without.
With this patch, the kernel now passes the sigpipe regression test.
Tested by: Anton Yuzhaninov
MFC after: 1 week
An mbuf packet chain with the M_PROMISC flag set contains a unicast packet
received by the link layer, which does not correspond to any configured
link layer address in the local system.
It is copied when copying m_pkthdr. It is not cleared when crossing layers.
As such, it is defined to have a flag value which is outside of the
M_PROTO* range, like M_VLANTAG has.
Reviewed by: andre
Obtained from: NetBSD
been set at the socket layer, in our somewhat convoluted IPv4 source
selection logic in ip_output().
IP_ONESBCAST is actually a special case of SO_DONTROUTE, as 255.255.255.255
must always be delivered on a local link with a TTL of 1.
If IP_ONESBCAST has been set at the socket layer, also perform destination
interface lookup for point-to-point interfaces based on the destination
address of the link; previously it was not possible to use the option with
such interfaces; also, the destination/broadcast address fields map to the
same field within struct ifnet, which doesn't help matters.
One more valid fix going forward for these issues is to treat 255.255.255.255
as a destination in its own right in the forwarding trie. Other
implementations do this. It fits with the use of multiple paths, though
it then becomes necessary to specify interface preference.
This hack will eventually go away when that comes to pass.
Reviewed by: andre
MFC after: 1 week
and optimize away unused stack values. The 48 bytes that the lock_profile_object
adds to the stack evidently has a measurable performance impact on certain workloads.
uipc_send in cases where only a global read lock is held by breaking
them out and avoiding the unpcb lock acquire in the common case. This
avoids deadlocks which manifested with X11, and should also marginally
further improve performance.
Reported by: sepotvin, brooks
Add macro EVL_APPLY_VLID() which may be used to apply an 802.1q VLAN ID
to the M_VLANTAG field in an mbuf packet header non-destructively.
This will be used by net80211 to begin with.
Add macro EVL_APPLY_PRI() which may be used to apply an 802.1p priority
class to the M_VLANTAG field in an mbuf packet header non-destructively.
Add other macros for manipulating tags and the CFI bit.
Submitted by: Boris Kovalenko (EVL_CFIOFTAG(), EVL_MAKETAG())
to a READ_CAPACITY request rather than the maximum sector (off by one
problem). This causes a huge cascade of errors as the geom tasting
code tries to read the last sector (which isn't really there in the
face of this error). automated tools that manipulate disk labels and
such also have issues.
Create a new quirk READ_CAPACITY_OFFBY1 and add a quirk for the
SanDISK ImageMate that I have that suffers from this problem (the
SDDR-31). It intercepts the READ_CAPACITY response and adjusts it
from number of sectors to max sector for devices with this quirk.
Reading the Linux source suggests that there are a host of
other devices with this issue, including iPods and some popular
cameras. I've not added quirks for them, since I don't have the
devices in front of me to test.
it is initialized; use path instead.
This change fixes a panic when using atapicam in conjunction with CAMDEBUG,
which has been described under kern/103602.
Thanks to Josh Carroll <josh.carroll@gmail.com> for providing the traces
that allowed identifying this problem.
PR: kern/103602
MFC after: 1 week
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
- Move updates to the lock hash to after the lock is released for spin mutexes,
sleep mutexes, and sx locks
- Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
statistics when an acquisition is contended. This reduces the overhead of
LOCK_PROFILING to increasing system time by 20%-25% which on
"make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
of wall-clock time. Contrast this to enabling lock profiling without
LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.
Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.
There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.
Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
Linux does not check file descriptor when MAP_ANONYMOUS is set.
This should fix recent LTP test regressions.
Reported by: Scot Hetzel (swhetzel at gmail dot com)
netchild
case where it asynchronously exits burst mode on its own. Handle different
values of hz in sleep loop. Provide more debugging options to tune EC
behavior. These tunables/sysctls may be temporary and are not for user
access if the EC is working properly. Burst mode is now on by default for
testing and the poll interval has been increased from 100 to 500 us and
total timeout from 100 to 500 ms.
Hopefully this should be the first step of addressing reports of timeout
errors during battery or thermal access, especially on HP/Compaq laptops.
It is reasonably stable and should not cause a loss of functionality or
performance on systems that were previously working. Testing shows an
increase of responsiveness by ~75% on one system.
PR: kern/98171
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.
Reviewed by: gnn, silby.
- BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining
whether incoming, outgoing, or all packets on the interface should be
returned by BPF. Set to BPF_D_IN to see only incoming packets on the
interface. Set to BPF_D_INOUT to see packets originating locally and
remotely on the interface. Set to BPF_D_OUT to see only outgoing
packets on the interface. This setting is initialized to BPF_D_INOUT
by default. BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but
kept for backward compatibility.
- BIOCFEEDBACK sets packet feedback mode. This allows injected packets
to be fed back as input to the interface when output via the interface is
successful. When BPF_D_INOUT direction is set, injected outgoing packet
is not returned by BPF to avoid duplication. This flag is initialized to
zero by default.
Note that libpcap has been modified to support BPF_D_OUT direction for
pcap_setdirection(3) and PCAP_D_OUT direction is functional now.
Reviewed by: rwatson
concurrency:
- Add per-unpcb mutexes protecting unpcb connection state, fields, etc.
- Replace global UNP mutex with a global UNP rwlock, which will protect the
UNIX domain socket connection topology, v_socket, and be acquired
exclusively before acquiring more than per-unpcb at a time in order to
avoid lock order issues.
In performance measurements involving MySQL, this change has little or no
overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in
multi-processor measurements using the sysbench and supersmack benchmarks.
Much testing by: kris
Approved by: re (kensmith)
determine if it holds an exclusive rwlock reference or not. This is
non-ideal, but recursion scenarios in the network stack currently
require it.
Approved by: jhb
call which can easily lock up a system otherwise; instead,
return ENOBUFS as documented in a manpage, thus reverting
us to the FreeBSD 4.x behavior.
Reviewed by: rwatson
MFC after: 2 weeks
- only collect timestamps when a lock is contested - this reduces the overhead
of collecting profiles from 20x to 5x
- remove unused function from subr_lock.c
- generalize cnt_hold and cnt_lock statistics to be kept for all locks
- NOTE: rwlock profiling generates invalid statistics (and most likely always has)
someone familiar with that should review
attribute. Also define some macros to manipulate one of these
structures. Explain their use in the extattr.9 manual page.
The next step will be to make a sweep through the kernel replacing
the old pointer manipulation code. To get an idea of how they would
be used, the ffs_findextattr() function in ufs/ffs/ffs_vnops.c is
currently written as follows:
/*
* Vnode operating to retrieve a named extended attribute.
*
* Locate a particular EA (nspace:name) in the area (ptr:length), and return
* the length of the EA, and possibly the pointer to the entry and to the data.
*/
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
u_char **eap, u_char **eac)
{
u_char *p, *pe, *pn, *p0;
int eapad1, eapad2, ealength, ealen, nlen;
uint32_t ul;
pe = ptr + length;
nlen = strlen(name);
for (p = ptr; p < pe; p = pn) {
p0 = p;
bcopy(p, &ul, sizeof(ul));
pn = p + ul;
/* make sure this entry is complete */
if (pn > pe)
break;
p += sizeof(uint32_t);
if (*p != nspace)
continue;
p++;
eapad2 = *p++;
if (*p != nlen)
continue;
p++;
if (bcmp(p, name, nlen))
continue;
ealength = sizeof(uint32_t) + 3 + nlen;
eapad1 = 8 - (ealength % 8);
if (eapad1 == 8)
eapad1 = 0;
ealength += eapad1;
ealen = ul - ealength - eapad2;
p += nlen + eapad1;
if (eap != NULL)
*eap = p0;
if (eac != NULL)
*eac = p;
return (ealen);
}
return(-1);
}
After applying the structure and macros, it would look like this:
/*
* Vnode operating to retrieve a named extended attribute.
*
* Locate a particular EA (nspace:name) in the area (ptr:length), and return
* the length of the EA, and possibly the pointer to the entry and to the data.
*/
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
u_char **eapp, u_char **eac)
{
struct extattr *eap, *eaend;
eaend = (struct extattr *)(ptr + length);
for (eap = (struct extattr *)ptr; eap < eaend; eap = EXTATTR_NEXT(eap)){
/* make sure this entry is complete */
if (EXTATTR_NEXT(eap) > eaend)
break;
if (eap->ea_namespace != nspace ||
eap->ea_namelength != length ||
bcmp(eap->ea_name, name, length))
continue;
if (eapp != NULL)
*eapp = eap;
if (eac != NULL)
*eac = EXTATTR_CONTENT(eap);
return (EXTATTR_CONTENT_SIZE(eap));
}
return(-1);
}
Not only is it considerably shorter, but it hopefully more readable :-)
PRIO_USER case, possibly also other places that deferences
p_ucred.
In the past, we insert a new process into the allproc list right
after PID allocation, and release the allproc_lock sx. Because
most content in new proc's structure is not yet initialized,
this could lead to undefined result if we do not handle PRS_NEW
with care.
The problem with PRS_NEW state is that it does not provide fine
grained information about how much initialization is done for a
new process. By defination, after PRIO_USER setpriority(), all
processes that belongs to given user should have their nice value
set to the specified value. Therefore, if p_{start,end}copy
section was done for a PRS_NEW process, we can not safely ignore
it because p_nice is in this area. On the other hand, we should
be careful on PRS_NEW processes because we do not allow non-root
users to lower their nice values, and without a successful copy
of the copy section, we can get stale values that is inherted
from the uninitialized area of the process structure.
This commit tries to close the race condition by grabbing proc
mutex *before* we release allproc_lock xlock, and do copy as
well as zero immediately after the allproc_lock xunlock. This
guarantees that the new process would have its p_copy and p_zero
sections, as well as user credential informaion initialized. In
getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW
process, we just skip the process in question, because it does
not affect the final result of the call, as the p_nice value
would be copied from its parent, and we will see it during
allproc traverse.
Other potential solutions are still under evaluation.
Discussed with: davidxu, jhb, rwatson
PR: kern/108071
MFC after: 2 weeks
semi-automatic style(9)
The futex stuff already differs a lot (only a small part does not differ)
from NetBSD, so we are already way off and can't apply changes from NetBSD
automatically. As we need to merge everything by hand already, we can even
make the files comply to our world order.
(external) microphone pin tend to screw it. Internal microphone (found
on several laptops) still need high VRef.
Tested by: Pietro Cerutti <pietro.cerutti@gmail.com>
lenix <irc.freenode.net>
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage(). This allows for the elimination of some uses of
the page queues lock.
Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS. This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().
Remove vm_page_unmanage(). It is no longer used.
It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko,
if and only if IPv6 support is enabled for loadable modules.
Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).
- Dont "return" in linux_clone() after we forked the new process in a case
of problems.
- Move the copyout of p2->p_pid outside the emul_lock coverage in
linux_clone().
- Cache the em->pdeath_signal in a local variable and move the copyout
out of the emul_lock coverage.
- Move the free() out of the emul_shared_lock coverage in a preparation
to switch emul_lock to non-sleepable lock (mutex).
Submitted by: rdivacky
an ICB. This shows up on card restarts, and usually for
2200-2300 cards. What happens is that we start up,
attempting to acquire a hard address. We end up instead
being an F-port topology, which reports out a loop id
of 0xff (or 0xffff for 2K Login f/w). Then, if we restart,
we end up telling the card to go off an acquire this loop
address, which the card then rejects. Bah.
Compilation fixes from Solaris port.
I created and tested this with a custom FreeSBIE cd-image.
PR: i386/96452
Submitted by: Yuichiro Goto <y7goto at gmail dot com>
MFC after: 3 days
Approved by: imp (mentor)
inode's i_flag.
It's possible that after ufs_infactive() calls softdep_releasefile(),
i_nlink stays >0 for a considerable amount of time (> 60 seconds here).
During this period, any ffs allocation routines that alter di_blocks
must also account for the blocks in the filesystem's fs_pendingblocks
value.
This change fixes an eventual df/du discrepency that will happen as
the result of fs_pendingblocks being reduced to <0.
The only manifestation of this that people may recognise is the
following message on boot:
/somefs: update error: blocks -N files M
at which point the negative pending block count is adjusted to zero.
Reviewed by: tegge
MFC after: 3 weeks
freshly-loaded kernel module. To avoid various unload races, hide linker
files whose sysinit's are being run from userland so that they can't be
kldunloaded until after all the sysinit's have finished.
Tested by: gallatin
changes. This should ease the job of maintaining codebase since much
of the regression tests are done across os versions.
- bus_setup_intr() -> snd_setup_intr().
triggers a KASSERT) or local variables. In the case of kern_ndis, the
tsleep() actually used a common sleep address (curproc) making it
susceptible to a premature wakeup.
want an equivalent of DELAY(9) that sleeps instead of spins. It accepts
a wmesg and a timeout and is not interrupted by signals. It uses a private
wait channel that should never be woken up by wakeup(9) or wakeup_one(9).
Glanced at by: phk
Use bus_get_dma_tag() to obtain the parent DMA tag to make the drivers
a little bit more non-ia32/amd64 friendly.
There is no man page for bus_get_dma_tag, so this is modelled after
rev. 1.62 of src/sys/dev/sound/pci/es137x.c by marius.
Inspired by: commit by marius
attachment of new devices that arrive (and we notice them
via async Fibre Channel events). We've always had the
right thing (of sorts) happen when devices go away- this
is the corollary function that makes multipath failover
actually work.
MFC after: 2 weeks
rescan requests. The purpose of this is to allow a SIM
(or other entities) to request a bus rescan and have it
then fielded in a different (process) context from the
caller.
There are probably better ways to accomplish this, but
it's a very small change that helps solve a number of
problems.
Reviewed by: Justin, Ken and Scott.
MFC after: 2 weeks
early, we haven't set board type, so we can't correctly check for
some options. Fix this by splitting option setting/getting into
generic, pci and then later board specific, option setting/getting.
This was noticed when setting 'iid' (or 'hard loop id') didn't work
all of a sudden.
Noticed by: Mike Drangula (thanks!) via Jung-uk Kim (thanks!)
incoming packets have had their 802.1Q tags processed by the
hardware, resulting in them being stripped from the packets, and
placed on the mbuf. This fixes the processing of 802.1Q tags when
hardware offload of 802.1Q tags is enabled.
check that the subject has read/write access to the vnode using the
vnode MAC check.
MFC after: 3 weeks
Submitted by: Spencer Minear <spencer_minear at securecomputing dot com>
Obtained from: TrustedBSD Project
a link-layer multicast group membership.
Such memberships are needed in order to support protocols such as
IS-IS without putting the interface into PROMISC or ALLMULTI modes.
sa_equal() is not OK for comparing sockaddr_dl as it has deeper structure
than a simple byte array, so add sa_dl_equal() and use that instead.
Reviewed by: rwatson
Verified with: /usr/sbin/mtest
Bug found by: Jouke Witteveen
MFC after: 2 weeks
addition of SerDes support. According to the docs, the 5706C and 5708C
phys are supposed to use the same MII model that is separate from the
SerDes parts, but the 5706C actually uses the MII model of the SerDes
parts. To fix this, readd the old 5706C entry to miidevs and add a
special check in brgphy_probe() for phys that match the 5706C ID. If
the phy is supported by the gentbi(4) driver, then it's a SerDes phy, so
we fail the probe and let gentbi(4) grab it. Otherwise, it's a 5706C phy,
so we let brgphy(4) grab it.
In coordination with: dwhite
treated as multicast frames and filtered, but when only when "adopting"
running firmware. By "adopting", I mean using pre-existing firmware
loaded from eeprom at PCI reset, rather than firmware loaded by the
driver.
non consecutively numbered ports.
This should fix current SATA problems.
Support AHCI chips where the ports are not consecutively numbered as in
some incarnations of the ICH8 chip.
flash card reader.
Also remove an 'Opened da0 -> <random number>' which is not needed on a daily
basis (available through bootverbose).
Reviewed by: phk, ken
MFC after: 1 week
and make it print under debug.iwi control same as other debugging stuff.
Remove the device_printf() in iwi_ioctl() and replace with this:
/*
* wait until pending iwi_cmd() are completed, to avoid races
* that could cause problems.
*/
while (sc->flags & IWI_FLAG_BUSY)
msleep(sc, &sc->sc_mtx, 0, "iwiioctl", hz);
This at least prevents what has become an almost systematic failure for my
system, presumably due to a previous iwi_cmd() not complete yet by the
time iwi_ioctl() is called.
It has been pointed to my attention that the real problem could be
calling ieee80211_ioctl() with the lock held. If that is true,
there might still be a possibility for a race condition e.g. an
interrupt coming while the ioctl is sleeping.
Need to investigate further on what changes are required to release
the lock before calling ieee80211_ioctl
+ do not release the dma-ble region used for downloading firmware.
This should fix the problems that some people were seeing, due to
memory becoming too fragmented which prevented subsequent allocations
of a suitable contiguous region of memory;
+ document the firmware format and usage in if_iwivar.h
+ use a loop to allocate the four tx rings, instead of replicating
the body of the loop.
+ add debugging code IWI_LOCK_ASSERT() to detect missing locks.
These only do a printf, and should go away once we figure out why
the driver sometimes freezes the system due to a (yet unidentified)
race condition.
+ add a device_printf() in iwi_ioctl() in certain conditions
(see comment in the code). This helps preventing the race condition
mentioned above, and makes the system survive. This printf will
also go away once fixing this bug is completed.
+ change iwi_getfw() to return 0 on success, 1 on error, consistently
with other functions.
+ fix the argument of a sizeof() in iwi_get_firmware()
+ use le32toh() to access little-endian fields
+ simplify error handling in iwi_load_firmware() and iwi_init_locked()
The bugs fixed by this commit (the freezing one especially) are serious
enough to call for a quick MFC
MFC after: 3 days
garbage collection complications from general discussion of UNIX domain
sockets.
Staticize unp_addsockcred().
Remove XXX comment regarding Giant and v_socket -- v_socket is protected
by the global UNIX domain socket lock.
System V shared memory, now believed fixed in sysv_shm.c:1.109:
date: 2006/11/06 13:42:01; author: rwatson; state: Exp; lines: +65 -37
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
This restores fine-grained privilege support to System V IPC.
PR: 106078
ignored on other systems I investigated when accessing an existing
memory segment rather than creating a new one. This call to ipcperm()
is the only one to pass in a complete mode flag to the permission
checks rather than a simple access request mask, and caused problems
for the revised ipcperm() based on the priv(9) interface, which can
now be restored.
PR: 106078
VFS privilege namespace: exceedquota, getquota, and setquota. Leave
UFS-specific quota configuration privileges in the UFS name space.
This renumbers VFS and UFS privileges, so requires rebuilding modules
if you are using security policies aware of privilege identifiers.
This is likely no one at this point since none of the committed MAC
policies use the privilege checks.
set real-time priority on a thread. It looks like this suser(9)
call was introduced after my first pass through replacing superuser
checks with named privilege checks.
As consequence, getdirentries() no longer needs to drop/reacquire
directory vnode lock, that would allow it to be reclaimed in between.
Reported and tested by: Peter Holm
Approved by: rodrigc (unionfs)
MFC after: 1 week
Rounding addr upwards to next 2M boundary in pmap_growkernel() could
cause addr to become 0, resulting in an early return without populating
the last PDE.
Reported and tested by: kris
Suggested by: alc
MFC after: 1 week
mapped at, and LOADERRAMADDR, the address at which the loader maps the ram at
at the time the kernel is booted.
They are used to detect if the kernel is booted from the onboard flash.
Define those for the IQ31244
(1) change debounce period from 1s to 250ms. This appears to be fine and
speeds things up a little.
(2) In the middle of cbb_pcic_power_disable_socket we write 0 to the EXCA_INTR
register to put the card into reset. However, this turns off CSC
interrupts for TI bridges (and maybe others). So no further card
insertion events would be noticed. To compensate, after we've gone
through the entire power down sequence, turn on EXCA_INTR_ENABLE so
that CSC events happen.
#2 should fix the 'dead slot' problem that has been reported after
card ejection (but only 16-bit cards).
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.
Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.
VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.
Approved by: mckusick
Discussed with: many (on IRC)
Tested with: ufs, msdosfs, cd9660, nullfs and zfs
a version that i posted earlier on the -current mailing list,
and subsequent feedback received.
The core of the change is just in sys/firmware.h and kern/subr_firmware.c,
while other files are just adaptation of the clients to the ABI change
(const-ification of some parameters and hiding of internal info,
so this is fully compatible at the binary level).
In detail:
- reduce the amount of information exported to clients in struct firmware,
and constify the pointer;
- internally, document and simplify the implementation of the various
functions, and make sure error conditions are dealt with properly.
The diffs are large, but the code is really straightforward now (i hope).
Note also that there is a subtle issue with the implementation of
firmware_register(): currently, as in the previous version, we just
store a reference to the 'imagename' argument, but we should rather
copy it because there is no guarantee that this is a static string.
I realised this while testing this code, but i prefer to fix it in
a later commit -- there is no regression with respect to the past.
Note, too, that the version in RELENG_6 has various bugs including
missing locks around the module release calls, mishandling of modules
loaded by /boot/loader, and so on, so an MFC is absolutely necessary
there. I was just postponing it until this cleanup to avoid doing
things twice.
MFC after: 1 week
of the special handling for ".." and perform an ISDOTDOT VOP_LOOKUP()
for a filesystem root vnode. Handle this case inside lookup().
Submitted by: tegge
PR: 92785
MFC after: 1 week
device pointers. They don't change as the children device drivers
come and go. Rather, check to see if the device is attached where we
would have checked ! NULL. This solves many asymmetries in the code
that likely could lead to crashes when loading/unloading cbb without
one or more of the expected children's driver not present.
o When detaching all children, try really hard to get all the children
list before giving up. This is based on an observation by hans petter
selasky in his usb p4 branch.
o When rescanning devices after a driver is added, abort if we can't get
the child list with a message.
o when rescanning devices, if the reprobe/attach is successful, save the
device for cardbus/pccard.
Unlike other GigEs Yukon II always set VLAN bit when it detects VLAN
tagged packet regardless of H/W VLAN processing configuration state.
So it need to check IFCAP_VLAN_HWTAGGING bit to know whether driver
is configured to take advantage of H/W VLAN processing. If H/W VLAN
processing was disabled don't adjust received packet length such that
subsequent validation logic works for software VLAN processing.
Reported by: bms
Tested by: bms
vm_page_free_toq() to account for recent changes that allow
vm_page_free_toq() to be called on some pages without the page queues lock
being held, specifically, pages that are not contained in a vm object and
not a member of a page queue. (Examples of such pages include page table
pages, pv entry pages, and uma small alloc pages.)
- PROT_READ, PROT_WRITE, or PROT_EXEC implies PROT_READ and PROT_EXEC.
Linux/ia64's i386 emulation layer does this and it complies with Linux
header files. This fixes mmap05 LTP test case on amd64.
- Do not adjust stack size when failure has occurred.
- Synchronize i386 mmap/mprotect with amd64.
blacklist a bunch of old chipsets. If a system contains a PCI-PCI bridge
that supports PCI-X, assume the chipset supports PCI-X. If a system
contains a PCI-express root port, assume the chipset supports PCI-express.
If the chipset doesn't support either PCI-X or PCI-express, then blacklist
it by default. We should now only need to explicitly blacklist PCI-X or
PCI-express chipsets that don't properly handle MSI.
broke the method as all the MSI-X table indices were off by one in
the backend MD code.
- Fix a cosmetic nit in the bootverbose printf in pci_alloc_msix_method().
sonewconn() in unp_connect(). This avoids a race that occurs due to
v_socket being an uncounted reference, as the lock was being released in
order to call sonewconn(), which otherwise recurses into the UNIX domain
socket code via pru_attach, as well as holding the lock over a sleeping
memory allocation in uipc_attach(). Switch to a non-sleeping memory
allocation during UNIX domain socket attach.
This fix non-ideal in that it requires enabling recursion, but is a much
smaller change than moving to using true references for v_socket. The
reported panic occurs in unp_connect() following the return of
sonewconn().
Update copyright year.
Panic reported by: jhb
is actually being added to the hold queue, not the free queue. At the same
time, avoid unnecessary tests to wake up threads waiting for free memory
and the idle thread that zeroes free pages. (These tests will be performed
later when the page finally moves from the hold queue to the free queue.)
observation here is that it doesn't matter what garbage accumulates in
bits which we're going to end up masking away anyway, as long as the
garbage doesn't overflow into bits which we care about.
This improved version may not be the fastest possible on all systems,
but it's certainly going to be better than what was here before.
- Add sigacts locking.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Create and log events via the CTRx macros.
Reviewed by: cognet
doing a CLEARFILE option. Do a vrele instead. This prevents
a panic later due to v_writecount being negative when the vnode
is taken off the freelist.
Submitted by: jhb
- ZONE get now also take a type cast so it does the
cast like mtod does.
- New macro SCTP_LIST_EMPTY, which in bsd is just
LIST_EMPTY
- Removal of const in some of the static hmac functions
(not needed)
- Store length changes to allow for new fields in auth
- Auth code updated to current draft (this should be the
RFC version we think).
- use uint8_t instead of u_char in LOOPBACK address comparison
- Some u_int32_t converted to uint32_t (in crc code)
- A bug was found in the mib counts for ordered/unordered
count, this was fixed (was referencing a freed mbuf).
- SCTP_ASOCLOG_OF_TSNS added (code will probably disappear
after my testing completes. It allows us to keep a
small log on each assoc of the last 40 TSN's in/out and
stream assignment. It is NOT in options and so is only
good for private builds.
- Some CMT changes in prep for Jana fixing his problem
with reneging when CMT is enabled (Concurrent Multipath
Transfer = CMT).
- Some missing mib stats added.
- Correction to number of open assoc's count in mib
- Correction to os_bsd.h to get right sha2 macros
- Add of special AUTH_04 flags so you can compile the code
with the old format (in case the peer does not yet support
the latest auth code).
- Nonce sum was incorrectly being set in when ecn_nonce was
NOT on.
- LOR in listen with implicit bind found and fixed.
- Moved away from using mbuf's for socket options to using
just data pointers. The mbufs were used to harmonize
NetBSD code since both Net and Open used this method. We
have decided to move away from that and more conform to
FreeBSD style (which makes more sense).
- Very very nasty bug found in some of my "debug" code. The
cookie_how collision case tracking had an endless loop in
it if you got a second retransmission of a cookie collision
case. This would lock up a CPU .. ugly..
- auth function goes to using size_t instead of int which
conforms to socketapi better
- Found the nasty bug that happens after 9 days of testing.. you
get the data chunk, deliver it and due to the reference to a ch->
that every now and then has been deleted (depending on the postion
in the mbuf) you have an invalid ch->ch.flags.. and thus you don't
advance the stream sequence number.. so you block the stream
permanently. The fix is to make local variables of these guys
and set them up before you have any chance of trimming the
mbuf.
- style fix in sctp_util.h, not sure how this got bad maybe in
the last patch? (aka it may not be in the real source).
- Found interesting bug when using the extended snd/rcv info where
we would get an error on receiving with this. Thats because
it was NOT padded to the same size as the snd_rcv info. We
increase (add the pad) so the two structs are the same size
in sctp_uio.h
- In sctp_usrreq.c one of the most common things we did for
socket options was to cast the pointer and validate the size.
This as been macro-ized to help make the code more readable.
- in sctputil.c two things, the socketapi class found a missing
flag type (the next msg is a notification) and a missing
scope recovery was also fixed.
Reviewed by: gnn
to become negative. This will detect the underflow when it
happens, instead of having it discovered when the vnode is
taken off the freelist, long after the offending process is long
gone.
boot by MD code to indicated detected alignment preference. Rather than
cache alignment being encoded in UMA consumers by defining a global
alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now
a special value (-1) that causes UMA to look at registered alignment. If
no preferred alignment has been selected by MD code, a default alignment
of (16 - 1) will be used.
Currently, no hardware platforms specify alignment; architecture
maintainers will need to modify MD startup code to specify an alignment
if desired. This must occur before initialization of UMA so that all UMA
zones pick up the requested alignment.
Reviewed by: jeff, alc
Submitted by: attilio
vm_page_alloc() from within a critical section in pmap_growkernel().
Since the need for a critical section may never have existed in the
first place, simply get rid of it.
Discussed with: alc@
IGMPMSG_WHOLEPKT notifications to the userland PIM routing daemon,
as an optimization to mitigate the effects of high multicast
forwarding load.
This is an experimental change, therefore it must be explicitly enabled by
setting the sysctl/tunable net.inet.pim.squelch_wholepkt to a non-zero value.
The tunable may be set from the loader or from within the kernel environment
when loading ip_mroute.ko as a module.
Submitted by: edrt <edrt at citiz.net>
See also: http://mailman.icsi.berkeley.edu/pipermail/xorp-users/2005-June/000639.html
Make PIM dynamically loadable by using encap_attach_func().
PIM may now be loaded into a GENERIC kernel.
Tested with: ports/net/pimdd && tcpreplay && wireshark
Reviewed by: Pavlin Radoslavov
it isn't used in the access control decision. This became visible to
Coverity with the change to a function call retrieving label values.
Coverity CID: 1723
tries to drop the reference count after our close routine returns.
A more correct fix is to defer the destroy_dev() to a taskqueue(either
in devfs or locally).
Reminded by: jhb
addressing if a packet is later re-encapsulated and sent to a
non-broadcast, non-multicast destination after being received on the
ng_ksocket input hook.
PR: 106999
Submitted by: Kevin Lahey
MFC after: 4 weeks
device specific d_close(), which makes subsequent destroy_dev() being
blocked in the "devdrn" loop.
This bandaid should fix the smbfs hang/crashing observed on -CURRENT since
the introduction of sys/kern/kern_conf.c:1.199:
# mount_smbfs -I server //server/share /mnt
Password:
[hang]
Reviewed by: bp
See also: http://lists.freebsd.org/pipermail/cvs-src/2006-November/071379.html
by the token bucket filter will result in EINVAL being returned.
If you want to rate-limit traffic in future, use ALTQ or dummynet; this
isn't a general purpose QoS engine.
Preserve the now unused fields in struct vif so as to avoid having to
recompile netstat(1) and other tools.
Reviewed by: Pavlin Radslavov, Bill Fenner
Bugfix for the Realtek PHY driver... an RTL8201L standalone PHY
needs different handling than the integrated ones in terms of
speed detection. There was a bogus test based on the parent
device driver name string controlling which speed register to
query. That test began failing when the rl driver was split into
separate rl and re drivers some time ago. Apparently nobody ever
noticed because the buggy code only executes if NWAY negotiation
failed. Since we happen to be testing with an ancient dumb hub
rather than a modern switch, we found it.
To fix it all, have the attach() routine notice whether we're
dealing with an integrated PHY or an RTL8201L and store that info
in a struct accessible to the status() routine that needs to know
which register to query.
I touched up the fixes because they were relative to RELENG_6 and to
bring a few nits into line with style(9).
MFC After: 2 weeks
Submitted by: Ian Lepore
tunable allowing automatic parsing of VPD data to be disabled. The
default is left as-is; if you are having problems with hard hangs at boot
due to VPD, try setting hw.pci.enable_vpd=0. A proper architectural
solution has been under discussion for some time, but this allows me to
boot my test machines in the mean time.
Submitted by: bz
Head nod: jmg
- Fix these types in ULE as well. This fixes bugs in priority index
calculations in certain edge cases. (int)-1 % 64 != (uint)-1 % 64.
Reported by: kkenn using pho's stress2.
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).
The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
- Restore support for fetching swap information from crash dumps via
kvm_get_swapinfo(3) to fix pstat -T/-s on crash dumps.
Reviewed by: arch@, phk
MFC after: 1 week
never used them; with mrouted, their functionality may be replaced by
explicitly configuring gif(4) instances and specifying them with the
'phyint' keyword.
Bump __FreeBSD_version to 700030, and update UPDATING.
A doc update is forthcoming.
Discussed on: net
Reviewed by: fenner
MFC after: 3 months
the value of p_textvp. This way, we always unlock the locked vnode.
While there, vhold() the vnode around the vn_lock().
Reported and tested by: Guy Helmer (ghelmer palisadesys com)
Approved by: des (procfs maintainer)
MFC after: 1 week
variable to avoid invalid constraints in dead code. Use an array of
u_char's (inside a struct) instead of a char/short/int/long variable so
that the variable and its accesses can be spelled in the same way in all
cases and code doesn't need to be cloned just to hold the spelling
differences.
Fixed strict-aliasing errors in PCPU_SET() and in the amd64 PCPU_GET().
Cast to (void *) as in rev.1.37 of the i386 version where the errors
were fixed for the i386 PCPU_GET() only. It would be more correct to
copy to and from the temp. variable using memcpy(), but then an
ifdef tangle would be required to ensure using the builtin memcpy().
We depend on fairly aggressive optimization to put the temp. variable
only in a register despite it being copied using
*(type *)(void *)&anothertype and could depend on this when using
memcpy() too. This seems to work right even for -O0, but the -O0 case
has not been completely tested.
This change gives identical object code for all object files in LINT
on amd64 (except for one file with a __TIME__ stamp). For LINT on
i386 it gives unimportant differences in instruction order and padding
in a few object files. This was only tested for -O.
This change (actually a previous version of it) gives the following
reductions in the number of object files in LINT that fail to compile
with -O2 but without the -fno-strict-aliasing kludge:
- amd64: 29 (down from 211)
- i386: 36 (down from 47)
gcc-3.4.6 actually allows the invalid constraints that result from not
using the temp. variable, at least with -O[1-2], but gcc-3.3.3 crashes
on them and I don't want to depend on compiler bugs.
avoid holding the UNIX domain socket subsystem lock over soooptcopyin()
and sooptcopyout(). This problem was introduced when LOCAL_CREDS, and
LOCAL_CONNWAIT support were added.
Reviewed by: mdodd
LABEL_TO_SLOT() macro used by policy modules to query and set label data
in struct label. Instead of using a union, store an intptr_t, simplifying
the API.
Update policies: in most cases this required only small tweaks to current
wrapper macros. In two cases, a single wrapper macros had to be split into
separate get and set macros.
Move struct label definition from _label.h to mac_internal.h and remove
_label.h. With this change, policies may now treat struct label * as
opaque, allowing us to change the layout of struct label without breaking
the policy module ABI. For example, we could make the maximum number of
policies with labels modifiable at boot-time rather than just at
compile-time.
Obtained from: TrustedBSD Project
Don't perform a nested include of _label.h in mac.h, as mac.h now
describes only the user API to MAC, and _label.h defines the in-kernel
representation of MAC labels.
Remove mac.h includes from policies and MAC framework components that do
not use userspace MAC API definitions.
Add _KERNEL inclusion checks to mac_internal.h and mac_policy.h, as these
are kernel-only include files
Obtained from: TrustedBSD Project
sleep lock missed the witness code, and the system will panic
immediately on boot if WITNESS is enabled.
Changed the witness definition to the new type.
register takes 16 characters (64-bit register in hex). In practice this
is a slight bit of overkill as 7 of the 56 registers are only 32-bit, but
having the buffer too small results in remote kgdb trashing kernel memory
when it connects.
PR: amd64/108673
Submitted by: Ravi Murty, Nikhil Rao @ Intel
MFC after: 3 days
that of the tun instance even for the !AF_INET case, and properly
remove configured addresses by calling if_purgeaddrs().
Maintain the TUN_DSTADDR behaviour for compatibility with the OS/390
emulator.
MFC after: 3 weeks
PR: 100080
Reviewed by: bz
The patch from the PR was a little outdated w/regards to the
Vodafone vendor string.
PR: kern/106033
Submitted by: Volker Werth <volker_AT_vwsoft.com>
MFC in: 3 days
Make devfs cloning a sysctl/tunable which defaults to on.
If devfs cloning is enabled, only the super-user may create
tun(4)/tap(4)/vmnet(4) instances. Devfs cloning is still enabled by
default; it may be disabled from the loader or via sysctl with
"net.link.tap.devfs_cloning" and "net.link.tun.devfs_cloning".
Disabling its use affects potentially all tun(4)/tap(4) consumers
including OpenSSH, OpenVPN and VMware.
PR: 105228 (potentially also 90413, 105570)
Submitted by: Landon Fuller
Tested by: Andrej Tobola
Approved by: core (rwatson)
MFC after: 4 weeks
to set_controller_command_byte() call; by issueing a Read Mode Byte
command, the touchpad is in Absolute Mode again.
This problem occursed at least on Asus V6V laptops.
approval, change the copyright statement to point at him instead of
"FreeBSD, Inc".
Encouraged by: rwatson
Reviewed by: imp
Discussed with and approved by: orion
a user or group, when the kernel first sees this, it will update
the grace time value. However, it never flags the quota as modified
and the updated value never makes it to the quota data file unless
the user actually makes some other change that would write the
data out.
Fixed to flag the quota as modified if the soft limit has actually
been reached and should be now enforced.
description here. The fix in the PR isn't necessary at all for memory
leaks, but we weren't setting the device description.
While I'm here, remove some of the obfuscating macros in attach.
PR: 108719
PR/108719, but there's a simpler fix: free it after it is used, and
then get rid of the redundant frees this causes. Other leaks in this
PR not yet fixed.
While I'm here, remove NetBSD/OpenBSD code and some of the portability
#defines that were getting in the way of understanding this code. The
devinfo bug was harder to spot because one needed to know that
device_set_desc_copy() was used inside of one of them (one that didn't
take an argument!).
Prefer device_printf(sc->sc_dev, "...") to printf("%s:...",
device_get_nameunit(sc->sc_dev)). This saves almost 300 bytes.
PR: 108719
Submitted by: Antoine Brodin
rest of file.
This has the additional side-effect of removing a C++ reserved keyword
from this file, which prevents the Click Modular Router's FreeBSD
kernel support from building.
Reviewed by: silence on -current
of a tap(4) instance, if IFF_PROMISC is not set.
In tap(4), we should emulate the effect IFF_PROMISC would have on
hardware, otherwise we risk introducing layer 2 loops if tap(4) is
used with bridges. This means not even bpf(4) gets to see them.
This patch has been tested in a variety of situations. Multicast and
broadcast frames are correctly allowed through. I have observed this
behaviour causing problems with multiple QEMU instances hosted on
the same FreeBSD machine.
The checks in in ether_demux() [if_ethersubr.c, rev 1.222, line 638]
are insufficient to prevent this bug from occurring, as ifp->if_vlantrunk
will always be NULL for the non-vlan case.
MFC after: 3 weeks
PR: 86429
Submitted by: Pieter de Boer (with changes)
socket option TCP_INFO.
Note that the units used in the original Linux API are in microseconds,
so use a 64-bit mantissa to convert FreeBSD's internal measurements
from struct tcpcb from ticks.
/usr/share/examples/etc/bsd-style-copyright. I've fixed a
few minor wording and formatting differences.
Approved by: luigi, Hannu Savolainen <hannu@opensound.com>
Formulas described in RFC require high precision of floating point.
Formulas of integer math implemented in ng_pptpgre give mistake in range
of +0-7ms on RTT and +0-3ms on deviation. This leads to significant
underestimation of real packet RTT.
I have made a very simple patch to reduce mistake to +4-3ms on RTT and
+2-1ms on deviation. Mistake in RTT is not good, but gets covered by
deviation. To cover worst possible negative mistake in deviation I have
added 2ms to it. Also this 2 ms cover the case when measured deviation
is so small (about zero) that it can interfere with process scheduling
delays or weather on Mars.
My tests show decreasing of packet losses on 20ms RTT link from 2.5% to
0.3% while speed increased un 1/3.
Reviewed by: archie
multicast memberships, when interface is detached. Thus, when
an underlying interface is detached, we do not need to free
our multicast memberships.
Reviewed by: bms
Normally the socket buffers are static (either derived from global
defaults or set with setsockopt) and do not adapt to real network
conditions. Two things happen: a) your socket buffers are too small
and you can't reach the full potential of the network between both
hosts; b) your socket buffers are too big and you waste a lot of
kernel memory for data just sitting around.
With automatic TCP send and receive socket buffers we can start with a
small buffer and quickly grow it in parallel with the TCP congestion
window to match real network conditions.
FreeBSD has a default 32K send socket buffer. This supports a maximal
transfer rate of only slightly more than 2Mbit/s on a 100ms RTT
trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send
buffer auto scaling and the default values below it supports 20Mbit/s
at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or
1000%. For the receive side it looks slightly better with a default of
64K buffer size.
New sysctls are:
net.inet.tcp.sendbuf_auto=1 (enabled)
net.inet.tcp.sendbuf_inc=8192 (8K, step size)
net.inet.tcp.sendbuf_max=262144 (256K, growth limit)
net.inet.tcp.recvbuf_auto=1 (enabled)
net.inet.tcp.recvbuf_inc=16384 (16K, step size)
net.inet.tcp.recvbuf_max=262144 (256K, growth limit)
Tested by: many (on HEAD and RELENG_6)
Approved by: re
MFC after: 1 month
upper-bounding it to the size of the initial socket buffer lower-bound it
to the smallest MSS we accept. Ideally we'd use the actual MSS information
here but it is not available yet.
For socket buffer auto sizing to be effective we need room to grow the
receive window. The window scale shift is determined at connection setup
and can't be changed afterwards. The previous, original, method effectively
just did a power of two roundup of the socket buffer size at connection
setup severely limiting the headroom for larger socket buffers.
Tested by: many (as part of the socket buffer auto sizing patch)
MFC after: 1 month
cannot change (because its referenced by curthread). This fixes
a LOR caused by acquiring emul_shared_lock while holding emul_lock.
Fix typo in comment.
Submitted by: rdivacky
p->p_emuldata is properly initialized in the time when the child can run.
Do not set p->p_emuldata to NULL when the process is exiting.
It does not make any sense and only costs 2 mutex operations.
Do not lock emul_data to unlock it on the very next line.
Comment on possible race while there.
Reparent all procs that are part of a threading group but not its leaders
to init and SIGCHLD init to finish the zombies off. This fixes zombies
left after opera's exit. [1]
There is no need to lock p_em in the linux_proc_init CLONE_THREAD
case because the process cannot change the address of the p_em->shared
because its currently running this code path.
Move assigning of em->shared outside emul_shared_lock.
Noticed by: Scott Robbins <scottro@nyc.rr.com> [1]
Submitted by: rdivacky
buffer resizing, etc.) that was here since eon. Free all (unmanaged)
allocated buffer through sndbuf_destroy() in case we forgot to call
sndbuf_free(). For a managed buffer (mostly hw specific managed buffer),
either provide CHANNEL_FREE() method with appropriate return value to
invoke semi-automatic sndbuf_free() or simply do it on their own. If
everything is failed, sndbuf_destroy() will come to the rescue as a
final measure.
MFC after: 3 days
This should fix the run time bustage observed on recent -CURRENT whilst
mounting a MSDOS filesystem with non-default locale/code page:
link_elf: symbol msdosfs_fileno_free undefined
KLD msdosfs_iconv.ko: depends on msdosfs - not available
using the callers UID instead of the GID when performing group
operations. This could allow users to determine group quota
information for groups they are not a member of in some cases.
Rename the "uid" parameter in ufs_quotactl to "id" to better show
that it is used for more than just the uid, and to be more in line
with the naming conventions in the other quota routines.
PR: kern/33940
for pci_cfg_restore() to be exported. It was tested using a
hackily accessed pci_cfg_restore().
- Add ifmedia_removeall() to mxge_detach() in order to stop leaking
an ifaddr
- Fix a small acounting bug introduced by the locking code shuffle
which could cause spurious watchdog resets now that we have a
watchdog.
Sponsored by: Myricom
locking in preparation for adding a watchdog handler (callouts must
not use sleepable locks). This required shuffling memory and
interrupt allocation to the attach routine rather than if_ioctl so as
to avoid potential sleeps while bringing up the interface.
This is not a functional change.
IN_LINKLOCAL() tests if an address falls within the IPv4 link-local prefix.
IN_PRIVATE() tests if an address falls within an RFC 1918 private prefix.
IN_LOCAL_GROUP() tests if an address falls within the statically assigned
link-local multicast scope specified in RFC 2365.
IN_ANY_LOCAL() tests for either of IN_LINKLOCAL() or IN_LOCAL_GROUP().
As with the existing macros in the FreeBSD netinet stack, comparisons
are performed in host-byte order.
See also: RFC 1918, RFC 2365, RFC 3927
Obtained from: NetBSD (dyoung@)
MFC after: 2 weeks
- initialize ifq_drv_maxlen correctly
- mark the interface as jumbo capable
- keep stats on the number of times the hw transmit queue filled and
was restarted.
#ifdef MSDOSFS_LARGE to run-time checks to see if "-o large" was specified.
Test case provided by Oliver Fromme:
truncate -s 200G test.img
mdconfig -a -t vnode -f test.img -u 9
newfs_msdos -s 419430400 -n 1 /dev/md9 zip250
mount -t msdosfs /dev/md9 /mnt # should fail
mount -t msdosfs -o large /dev/md9 /mnt # should succeed
PR: 105964
Requested by: Oliver Fromme <olli lurza secnetix de>
Tested by: trhodes
MFC after: 2 weeks
SOCK_DGRAM (i.e. UDP), respect the value configured earlier. This allows
TCP NFS root mounts using e.g. the boot.nfsroot.options="tcp" tunable.
In this case some of the connection parameters like the retry timer were
previously set appropriately for TCP but inappropriately for the UDP
socket that was actually used, leading to e.g. extremely long recovery
times (O(hours)) after a nfs server reboot.
Reviewed by: mohans
MFC After: 2 weeks
/usr/share/examples/etc/bsd-style-copyright. I've fixed a
few minor wording and formatting differences.
Approved by: matk, Hannu Savolainen <hannu@opensound.com>
Reviewed by: imp
We can't bind to a CPU which is not yet on-line, so add code that wait for
CPUs to go on-line before binding to them.
Reported by: Alin-Adrian Anton <aanton@spintech.ro>
MFC after: 2 weeks
configured and that in turn controls the descriptor layout; the rate
control module has no business peeking inside the descriptor but until
we can change the api so the driver records the tx rates and passes
them deal with it
unsolicited pin sense event and need manual control to turn off speaker
volume while attaching headphone.
Tested by: Ingeborg Hellemo <Ingeborg.Hellemo@cc.uit.no>
Disable global Acer + ALC883 headphone automute settings since there are
few models that does not respect this and causing broken behaviour.
Reported/Tested by: Pavel Argentov <argentoff@rtelekom.ru>
When the disk has an error, it will now print SMART
instead of 'Unknown CMD'.
PR: kern/93368
Submitted by: Garry Belka <garry at NetworkPhysics dot COM>
Approved by: sos
firmware in that module (eventhough this is a programming error) - drop the
reference to the module again.
Submitted by: Benjamin Close
MFC after: 3 days
the space allocated for the double fault handler since this space
is otherwise unused till the time a double fault occurs.
This change should have been committed alongside r1.127 of
"exception.S", but I somehow missed doing so.
Problem reported by: jeff
Pointy hat to: jkoshy
not used in any of our code. Also remove explicit padding variable that
kept the bpf_d structure the same size before and after the change in
select implementation, since binary compatibility is not required for this
data structure on 7-CURRENT.
IPv6 over point-to-point gif(4) tunnels.
These revisions caused a host route to the destination of a
point-to-point gif(4) interface to not get installed when the interface
and destination addresses were assigned. This caused
"no route to host" errors when trying to send traffic over the
interface. The first packet arriving inbound over the tunnel,
however, would cause the correct route to get installed, allowing
subsequent outbound traffic to be routed correctly.
gif(4) interfaces with prefix lengths of less than 128 bits
(i.e. no explicit destination address assigned) were not affected
by this bug.
This bug fix is a possible candidate for a 6.2-RELEASE errata note.
Approved by: jhay (original committer)
Discussed with: jhay, JINMEI Tatuya
MFC after: 3 days
blade systems, such as the Dell 1955 and the Intel SBXD132.
Development hardware for this work was provided by Broadcom and iXsystems.
A SBXD132 blade for testing was provided by Iron Systems.
is replaced with BSD gzip, let's make it possible to
distinguish between the two with a __FreeBSDversion bump,
just in case some developers want it.
Suggested by: linimon
minimize IPIs and rescheduling when scheduling like tasks while keeping
latency low for important threads.
1) An idle thread is running.
2) The current thread is worse than realtime and the new thread is
better than realtime. Realtime to realtime doesn't preempt.
3) The new thread's priority is less than the threshold.
with bypass header, to send it out to userland.
- Use ng_ppp_bypass() in ng_ppp_proto_recv().
- Use ng_ppp_bypass() in ng_ppp_comp_recv() and in
ng_ppp_crypt_recv() if compression or encryption is
disabled, respectively.
- Any LCP packet goes directly to ng_ppp_bypass(), instead
of passing through PPP stack.
- Any non-LCP packet on disabled link is discarded. This
is behavior defined in RFC.
Submitted by: Alexander Motin <mav alkar.net>
support sched_4bsd.
- Rename the KTR level for non schedgraph parsed events. They take event
space from things we'd like to graph.
- Reset our slice value after we sleep. The slice is simply there to
prevent starvation among equal priorities. A thread which had almost
exhausted it's slice and then slept doesn't need to be rescheduled a
tick after it wakes up.
- Set the maximum slice value to a more conservative 100ms now that it is
more accurately enforced.
carp_clone_destroy() we are on a safe side, we don't need to
unlock the cif, that can me already non-existent at this point.
Reported by: Anton Yuzhaninov <citrin rambler-co.ru>
apparently be confused by short TCP segments that have been manually
padded to the minimum ethernet frame size. The driver does short frame
padding in software as a workaround for a bug in the 8169 PCI devices
that causes short IP fragments to be corrupted due to an apparent
conflict between the hardware autopadding and hardware IP checksumming.
To fix this, we avoid software padding for short TCP segments, since
the hardware seems to autopad and checksum these correctly (even the
older 8169 NICs get these right). Short UDP packets appear to be
handled correctly in all cases. This should work around the IP header
checksum bug in the 8169 while not tripping the TCP checksum bug in
the 8111B/8168B and 8101E.
collisions with nfsclient's names. Even static names should have a
unique prefix so that they can be debugged easily.
Hide the unused colliding variable nfsv3_commit_on_close in "#if 0"
together with other unused sysctl variables. Duplicating the nfs sysctl
under nfs4 is probably just a bug.
Fix some nearby style bugs.
Remove duplicate $FreeBSD$.
nfs_* to nfs4_* to avoid collisions with nfsclient's names. Even
static names should have a unique prefix so that they can be debugged
easily.
Most of the renamed functions can probably be shared. nfs4_cmount()
and nfs4_sync() are identical to the nfs_* versions, and all the others
except nfs4_vfsops() seem to be idendentical except for style bugs,
missing support for mountroot, and bugs.
Fix some nearby style bugs.
Remove duplicate $FreeBSD$.
of duplicating it except for larger style bugs in the copy.
Fix some nearby style bugs (including a harmless type mismatch)
in and near the remaining copy.
This is part of fixing collisions of the 2 nfs*client's names. Even
static names should have a unique prefixes so that they can be debugged
easily.
zone. Cluster allocations fail when this happens. Also processes that may have
blocked on cluster allocations will never be woken up. Thanks to rwatson for
an overview of the issue and pointers to the mbuma paper and his tool to dump
out UMA zones.
Reviewed by: andre@
maxpages on a zone is woken up, with the rest never being woken up as
a result of the ZFLAG_FULL flag being cleared. Wakeup all such blocked
procsses instead. This change introduces a thundering herd, but since
this should be relatively infrequent, optimizing this (by introducing
a count of blocked processes, for example) may be premature.
Reviewd by: ups@
negative. Use unsigned integers for sleep and run time so this doesn't
disturb sched_interact_score(). This should fix the invalid interactive
priority panics reported by several users.
o remove errata_a0 and introduce the corresponding flags into 'errata'.
o introduce a new errata for K8, namely some platform might set the
PENDING_BIT but aren't able to unset it, also don't loop forever
waiting PENDING_BIT being cleared.
o try to introduce a workaround for the PENDING_BIT stuck problem,
o support now half multipliers for K8.
Tested by: Abdullah Al-Marrie
Approved by: njl
file are after snaplock, while other ffs device buffers are before
snaplock in global lock order. By itself, this could cause deadlock
when bdwrite() tries to flush dirty buffers on snapshotted ffs. If,
during the flush, COW activity for snapshot needs to allocate block
and ffs_alloccg() selects the cylinder group that is being written
by bdwrite(), then kernel would panic due to recursive buffer lock
acquision.
Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.
Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes)
Tested by: Peter Holm
X-MFC after: 3 weeks (if ever: it changes ABI)
- Define our own maybe_preempt() as sched_preempt(). We want to be able
to preempt idlethread in all cases.
- Define our idlethread to require preemption to exit.
- Get the cpu estimation tick from sched_tick() so we don't have to worry
about errors from a sampling interval that differs from the time
domain. This was the source of sched_priority prints/panics and
inaccurate pctcpu display in top.
for clock.h, so changing th i386 clock.h broke it. MFi386 (not tested):
Cleaned up declaration and initialization of clock_lock. It is only
used by clock code, so don't export it to the world for machdep.c to
initialize. There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work. Split it up a bit more and do the first part as
late as possible to document the necessary order. The functions that
implement the split are still bogusly exported.
Cleaned up initialization of the i8254 clock hardware using the new
split. Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.
This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug. The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.
setrunqueue() was mostly empty. The few asserts and thread state
setting were moved to the individual schedulers. sched_add() was
chosen to displace it for naming consistency reasons.
- Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be
different on all three schedulers where it was only called in one place
each.
- Remove the long ifdef'd out remrunqueue code.
- Remove the now redundant ts_state. Inspect the thread state directly.
- Don't set TSF_* flags from kern_switch.c, we were only doing this to
support a feature in one scheduler.
- Change sched_choose() to return a thread rather than a td_sched. Also,
rely on the schedulers to return the idlethread. This simplifies the
logic in choosethread(). Aside from the run queue links kern_switch.c
mostly does not care about the contents of td_sched.
Discussed with: julian
- Move the idle thread loop into the per scheduler area. ULE wants to
do something different from the other schedulers.
Suggested by: jhb
Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.
used by clock code, so don't export it to the world for machdep.c to
initialize. There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work. Split it up a bit more and do the first part as
late as possible to document the necessary order. The functions that
implement the split are still bogusly exported.
Cleaned up initialization of the i8254 clock hardware using the new
split. Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.
This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug. The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.
the mount options list with vfs_deleteopt(). At this point, the export
information is saved in mp->mnt_export, so we can delete
the "export" mount option from mp->mnt_optnew and mp->mnt_opt.
This fixes read-write/read-only update mounts (mount -u -o rw, mount -u -o ro)
of NFS exported directories.
For some reason, I could only reproduce the problem with a configuration
supplied by Andre:
- "options QUOTA" enabled in kernel config
- "/ -maproot=root 10.0.1.105" in /etc/exports
Reported by: kris, Andre Guibert de Bruet <andy siliconlandmark com>,
Andrzej Tobola <ato iem pw edu pl>
Tested by: Andre Guibert de Bruet
addresses shall access invalid descriptor DMA addresses on PCIe
hardwares and then panicked the system.
To fix it set descriptor DMA addresses before enabling Tx and Rx
such that hardware can see valid descriptor DMA addresses. Also
set RL_EARLY_TX_THRESH before starting Tx and Rx.
Reported by: steve.tell AT crashmail DOT de
Tested by: steve.tell AT crashmail DOT de
Obtained from: NetBSD
MFC after: 1 week
- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.
Tested by: scottl
control data but no payload data is passed.
Change m_uiotombuf() to return at least one empty mbuf if the requested
length was zero. Add comment to sosend_dgram and sosend_generic().
Diagnoses by: jhb
Regression test by: rwatson
Pointy hat to. andre
--------------------------
[Deadlock] is caused by a lock order reversal in vfs_lookup(), where
[some] process is trying to lock a directory vnode, that is the parent
directory of covered vnode) while holding an exclusive vnode lock on
covering vnode.
A simplified scenario:
root fs var fs
/ A / (/var) D
/var B /log (/var/log) E
vfs lock C vfs lock F
Within each file system, the lock order is clear: C->A->B and F->D->E
When traversing across mounts, the system can choose between two lock orders,
but everything must then follow that lock order:
L1: C->A->B
|
+->F->D->E
L2: F->D->E
|
+->C->A->B
The lookup() process for namei("/var") mixes those two lock orders:
VOP_LOOKUP() obtains B while A is held
vfs_busy() obtains a shared lock on F while A and B are held (follows L1,
violates L2)
vput() releases lock on B
VOP_UNLOCK() releases lock on A
VFS_ROOT() obtains lock on D while shared lock on F is held
vfs_unbusy() releases shared lock on F
vn_lock() obtains lock on A while D is held (violates L1, follows L2)
dounmount() follows L1 (B is locked while F is drained).
Without unmount activity, vfs_busy() will always succeed without blocking
and the deadlock isn't triggered (the system behaves as if L2 is followed).
With unmount, you can get 4 processes in a deadlock:
p1: holds D, want A (in lookup())
p2: holds shared lock on F, want D (in VFS_ROOT())
p3: holds B, want drain lock on F (in dounmount())
p4: holds A, want B (in VOP_LOOKUP())
You can have more than one instance of p2.
The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and
MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs
servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode.
- Tor Egge
To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp
is actually not used by the callers of namei. Thus, placeholder deadfs
vnode vp_crossmp is introduced that is filled into ni_dvp.
Idea by: ups
Reviewed by: tegge, ups, jeff, rwatson (mac interaction)
Tested by: Peter Holm
MFC after: 2 weeks
sparc64 GENERIC and the sound device drivers known working on sparc64
to use bus_get_dma_tag() to obtain the parent DMA tag so we can get rid
of the sparc64_root_dma_tag kludge eventually. Except for ath(4), sk(4),
stge(4) and ti(4) these changes are runtime tested (unless I booted up
the wrong kernels again...).
a power saving mode otherwise.
- If the thread is already bound in sched_bind() unbind it before
re-binding it to a new cpu. I don't like these semantics but they are
expected by some code in the tree. Patch by jkoshy.
Dont expose em->shared to the outside world before its properly
initialized. Might not affect anything but its at least a better
coding style.
Dont expose em via p->p_emuldata until its properly initialized.
This also enables us to get rid of some locking and simplify the
code because we are workin on a local copy.
In linux_fork and linux_vfork create the process in stopped state
to be sure that the new process runs with fully initialized emuldata
structure [1]. Also fix the vfork (both in linux_clone and linux_vfork)
race that could result in never woken up process [2].
Reported by: Scot Hetzel [1]
Suggested by: jhb [2]
Reviewed by: jhb (at least some important parts)
Submitted by: rdivacky
Tested by: Scot Hetzel (on amd64)
Change 2 comments (in the new code) to comply to style(9).
Suggested by: jhb
work when we start requiring this.
- Don't specify an alignment when creating our own parent DMA tag;
the supported DMA engines require no alignment constraint (f.e. the
LANCE child does though) and it's no inherited by the child DMA
tags anyway (which probably is a bug though).
- Fix whitespace nits.
These are shared-memory variants based on Am79C90-compatible chips
that apart from the missing DMA engine are similar to the 'ledma'
variant including using a (pseudo-)bus/device for the buffer that
the actual LANCE device hangs off from. The performance of these is
close to that of the 'ledma' one, like expected at a few times the
CPU load though.
1) Do not do quota accounting for the actual quota data files
or for file system snapshot files ("system" files). This
prevents a deadlock descibed in PR kern/30958 if the kernel
ever has to grow the quota file. Snapshot files were already
exempt from the quota checks, but this change generalized the check.
2) Fix a cast that caused extremely large uids/gids to incorrectly
write the quota information to the data file at a truncated
value for a uint_t32 id value. The incorrect cast caused quota
files in this case to be around 4GB in size, with the correct cast
they can now be 131GB in size. Also related to PR kern/30958.
3) Check for what appear to be negative UIDs/GIDs and not account
for them. This prevents the quota files from becoming 131GB in
size and causing quotacheck to run forever at bootup. This could
also cause the kernel to try and expand the quota file, which might
deadlock due to the issue in #1. kern/30958 and kern/38156
(and some much older closed PR's).
4) With the deadlock problems gone, the kernel can now expand the
size of the quota database files if it needs to.
5) Pass in the i-node count change value to chkiq and chkiqchg as an
int, like it used to be before the common routine was split up
into 2 different routines to increase / decrease the i-node in-use
count. Prevents an underflow on the i-node count. Related
to PR kern/89247.
6) Prevent the block usage from growing slowly if a file system is
full and the write was denied due to that fact. PR kern/89247.
Some of these changes require an updated quotacheck to prevent
the creation of huge (131GB) quota data files (item #3).
#1/#4 probably fixes a lot of the random hangs when quotas are enabled,
possibly some of the jail hangs.
unlike documented may not take effect without an initialization. So
don't invoke (*sc_mediachange) directly in lance_mediachange() but
go through lance_init_locked(). It's suboptimal to impose this for
all chips but given that besides the affected PCI bus front-end the
only other front-end which supports media selection is and likely
ever will be the 'ledma' front-end I see not enough reason to break
the in-driver API for this (though one could argue both ways here).
the ipi settings. If NEEDRESCHED is set and an ipi is later delivered
it will clear it rather than cause extra context switches. However, if
we miss setting it we can have terrible latency.
- In sched_bind() correctly implement bind. Also be slightly more
tolerant of code which calls bind multiple times. However, we don't
change binding if another call is made with a different cpu. This
does not presently work with hwpmc which I believe should be changed.
front of isp_init so we can read NVRAM even if we're role ISP_NONE.
Prepare for reintroduction of channels (for FC) for N-Port
Virtualization.
Fix a botch in handle assignment that caused us to nuke one device
when a new one arrives and end up with two devices with the same
identity in the virtual target mapping table.
ifmedia_init() invocation. IFM_IMASK makes only sense here when all of
the maxium of 32 PHYs on each one MII bus support disjoint sets of media,
which generally isn't the case (though it would be nice if we had a way
to let NIC drivers indicate that for the few card models where the PHY
configuration is known/fixed and IFM_IMASK actually makes sense).
- Add and use a miibus_print_child() for the bus_print_child method which
additionally prints the PHY number (which actually is the PHY address)
so one can figure out the media instance <-> PHY number mapping from the
PHY driver attach output. This is intented to be usefull in situations
where the addresses of the PHYs on the bus are known (f.e. of internal/
integrated PHYs) so one can feed the appropriate media instance number
to ifconfig(8) (with the upcoming change for ifconfig(8)).
This is more or less inspired by the NetBSD mii_print().
multiple PHYs. In case some PHYs currently driven by ukphy(4) exhibit
problems when isolating due to incomplete implementations or silicon bugs
we'll need to add specific drivers for these. Looking at NetBSD and
OpenBSD I don't expect problems here though (quite the contrary; we still
seem to set MIIF_NOISOLATE without good reason in a bunch of PHY drivers).
- Fix a style(9) whitespace nit.
capability rather than hardcoded offsets for a particular card. While
I'm here, expand the constants some.
- Change the ahd(4) driver to use pci_find_extcap() to locate the PCI-X
capability to keep up with the first change.
Reviewed by: scottl, gibbs (earlier version)
- Switch back to direct modification of remote CPU run queues. This added
a lot of complexity with questionable gain. It's easy enough to
reimplement if it's shown to help on huge machines.
- Re-implement the old tdq_transfer() call as tdq_pickidle(). Change
sched_add() so we have selectable cpu choosers and simplify the logic
a bit here.
- Implement tdq_pickpri() as the new default cpu chooser. This algorithm
is similar to Solaris in that it tries to always run the threads with
the best priorities. It is actually slightly more complex than
solaris's algorithm because we also tend to favor the local cpu over
other cpus which has a boost in latency but also potentially enables
cache sharing between the waking thread and the woken thread.
- Add a bunch of tunables that can be used to measure effects of different
load balancing strategies. Most of these will go away once the
algorithm is more definite.
- Add a new mechanism to steal threads from busy cpus when we idle. This
is enabled with kern.sched.steal_busy and kern.sched.busy_thresh. The
threshold is the required length of a tdq's run queue before another
cpu will be able to steal runnable threads. This prevents most queue
imbalances that contribute the long latencies.
headers in .S directly rather than getting to their macros through
genassym.c/assym.s so there are less headers genassym.c has to be
kept in sync with.
While at it fix some stytle(9) bugs (indentation, prototype format,
sort headers, etc) and remove trailing whitespace.
that can be used to check whether receive data is ready, i.e. whether
the subsequent call of uart_poll() should return a char, and unlike
uart_poll() doesn't actually receive data.
- Remove the device-specific implementations of uart_poll() and implement
uart_poll() in terms of uart_getc() and the newly added uart_rxready()
in order to minimize code duplication.
- In sunkbd(4) take advantage of uart_rxready() and use it to implement
the polled mode part of sunkbd_check() so we don't need to buffer a
potentially read char in the softc.
- Fix some mis-indentation in sunkbd_read_char().
Discussed with: marcel
may also reflect a Fireplane/Safari or JBus bus (or a virtual bus which
in turn reflects a JBus bus or something like that...).
- In the both the sparc64 and sun4v bus_machdep.c use __FBSDID.
- Spell SBus the official way in comments.
- Replace hardcoded function names (all of which were actually outdated)
in panic and status strings with __func__.
- Fix whitespace nits.
hooks get their per hook rcvdata methods, and all functions are organized
corresponding to protocol stack model.
Submitted by: Alexander Motin <mav alkar.net>
Reviewed by: archie, julian
and friends along with all hacks required to implement them. None of
the drivers currently built (as part of GENERIC, LINT or modules) on
sparc64 or sun4v and none of those we might want to use there in
future uses them, AFAICT there actually never was a driver hooked up
to the sparc64 or sun4v build that correctly used these functions
(and it looks like that due to a bug read{b,w,l}()/write{b,w,l}() and
the other functions working on a memory handle never actually worked on
sun4v). All they ever were good for on sparc64 and sun4v was erroneously
dragging in dependencies on isa(4) in drivers like f.e. dpt(4), si(4)
and syscons(4) in source files that supposedly were bus-neutral and
hiding issues with drivers like f.e. ng_bt3c(4) that used these
functions with busses other than isa(4) and therefore couldn't work on
these platforms.
the newly added DEV_EISA. This is done so that these back-ends can
be compiled on platforms not providing in{b,w,l}()/out{b,w,l}() and
friends (but may wish to use them together with bus front-ends other
than the EISA one).
- Finally all splxx() are removed
- Count error fixed in mapping array which might
cause a wrong cumack generation.
- Invariants around panic for case D + printf when no invariants.
- one-to-one model race condition fixed by using
a pre-formed connection and then completing the
work so accept won't happen on a non-formed
association.
- Some additional paranoia checks in sctp_output.
- Locks that were missing in the accept code.
Approved by: gnn
to open() [1].
Improve locking for accessing session control structures [2].
Try to document (most likely harmless) races in the code [3].
Based on submission by: Intron (intron at intron ac) [1]
Reviewed by: jhb [2]
Discussed with: netchild, rwatson, jhb [3]
total size of all input reports is < 6.
PR: usb/106435
Submitted by: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
Approved by: emax (mentor)
MFC after: 3 days
PCI bus' one as the default one, and explicitely use the other one for
non-PCI devices.
This is needed because the PCI bus can only address 64MB of RAM, while some
IXP425 boards have 128MB or more, and most of the PCI drivers do not bother
providing the parent dma tag.
- Add a default parent dma tag, similar to what has been done for sparc64.
- Before invalidating the dcache in POSTREAD, save the bits which are in the
same cachelines than our buffers, but not part of it, and restore them after
the invalidation.
from just before extending a file. This has the desired effect
of keeping the write speed constant. And yes, that helps a lot
copying large files always at full speed now, and I have seen
improvements using benchmarks/bonnie.
Stolen from: NetBSD
Reviewed by: bde
bus hanging off from the Fireplane/Safari bus in some USIII machines.
This is part 3/4 of allowing creator(4) to work in these machines.
The little info needed on how to configure the bridge and to work
around the incorrect values contained in the `interrupts' properties
of its children were obtained form OpenSolaris.
The separate bus front-end was inherited from the OpenBSD creator(4),
which at that time had a mainbus(4) (for USI/II machines, which use
an UPA interconnection bus as the nexus) and an upa(4) (for USIII
machines, which use a subordinate/slave UPA bus hanging off from the
Fireplane/Safari interconnection bus) front-end. With FreeBSD and
newbus there is/will be no need to have two separate bus front-ends
for these busses, so we can easily coallapse the shared front-end
and the back-end into a single source file (note that the FreeBSD
creator_upa.c was misnomer anyway; based on what it actually attached
to that should have been creator_nexus.c), actually OpenBSD meanwhile
also has moved to a shared front-end and a single source file. Due
to the low-level console support creator.c also wasn't free from bus
related things before.
While at it, also split sys/sparc64/creator/creator.h into a
sys/dev/fb/creatorreg.h that only contains register macros and move
the structures to the top of sys/dev/fb/creator.c as suggested by
style(9) so creator(4) is no longer scattered over two directories.
- Use OF_decode_addr()/sparc64_fake_bustag() to obtain the bus tags and
handles for the low-level console support instead of hardcoding
support for AFB/FFB hanging off from nexus(4) only. This is part 2/4
of allowing creator(4) to work in USIII machines (which have a UPA
bus hanging off from the Fireplane/Safari bus reflected by the nexus),
which already makes it work as the low-level console there.
- Allocate resources in the bus attach routine regardless of whether
creator(4) is used as for the low-level console and thus the required
bus tags and handles have been already obtained or not so the resources
are marked as taken in the respective RMAN.
- For both obtaining the bus tags and handles for the low-level console
support as well as allocating the corresponding resources in the
regular bus attach routine don't bother to get all for the maximum of
24 register banks but only (for) the two tag/handle pairs required for
providing the video interface for syscons(4) support. If we can't
allocate the rest of them just limit the memory range accessible via
creator_fb_mmap() accordingly.
- Sanity check the memory range spanned by the first and last resources
and the resources in between as far as possible, as the XFree86/Xorg
sunffb(4) expects to be able to access the whole region, even though
the backing resources are actually non-continuous. Limit and check
the memory range accessible via creator_fb_mmap() accordingly.
- Reduce the size of buffers for OFW properties to what they actually
need to hold.
- Rename some tables to creator_<foo> for consistency.
- Also for the sizes in the creator_fb_mmap() mapping table entries use
macros for consistency, add macros for the remaining register banks
for completeness.
nexus (which might or might not reflect an UPA interconnection bus;
accordingly UPA_BUS_SPACE should be renamed to NEXUS_BUS_SPACE at a
later point) and subordinate/slave UPA busses. This is part 1/4 of
allowing creator(4) to work in USIII machines (which have a UPA bus
hanging off from the Fireplane/Safari bus reflected by the nexus).
operation as it ran out of free descriptors or if there are too many
segments in the first place, call bus_dmamap_unload() in order to
unload the already loaded segments.
For trying to map the defragmented mbuf (chain) in re_encap() this
introduces re_dma_map_desc() setting arg.rl_maxsegs to 0 as a new
failure mode. Previously we just ignored this case, corrupting our
view of the TX ring.
o In re_txeof():
- Don't clear IFF_DRV_OACTIVE unless there are at least 4 free TX
descriptors. Further down the road re_encap() will bail if there
aren't at least 4 free TX descriptors, causing re_start() to
abort and prepend the dequeued mbuf again so it makes no sense
to pretend we could process mbufs again when in fact we won't.
While at it replace this magic 4 with a macro RL_TX_DESC_THLD
throughout this driver.
- Don't cancel the watchdog timeout as soon as there's at least one
free TX descriptor but instead only if all descriptors have been
handled. It's perfectly normal, especially in the DEVICE_POLLING
case, that re_txeof() is called when only a part of the enqueued
TX descriptors have been handled, causing the watchdog to be
disarmed prematurely.
o In re_encap():
- If m_defrag() fails just drop the packet like other NIC drivers
do. This should only happen when there's a mbuf shortage, in which
case it was possible to end up with an IFQ full of packets which
couldn't be processed as they couldn't be defragmented as they
were taking up all the mbufs themselves. This includes adjusting
re_start() to not trying to prepend the mbuf (chain) if re_encap()
has freed it.
- Remove dupe initialization of members of struct rl_dmaload_arg to
values that didn't change since trying to process the fragmented
mbuf chain.
While at it remove an unused member from struct rl_dmaload_arg.
o In re_start() remove a abandoned, banal comment. The corresponding
code was moved to re_attach() some time ago.
With these changes re(4) now survives one day (until stopped) of
hammering out packets here.
Reviewed by: yongari
MFC after: 2 weeks
- Retire the PCI_SUB*_1 constants and don't try to read a subvendor ID out
of them. There isn't a standard subvendor ID field for PCI-PCI bridges.
Instead, the dword at offset 0x34 is actually mostly reserved except for
the LSB which is the capabilities pointer.
- Add support for the PCI-PCI bridge subvendor ID capability (13) and use
it to set the subvendor ID for PCI-PCI bridges.
MFC after: 1 month
functions. The idea is taken from OpenBSD.
- Set/clear jumbo frame configurations for bge(4).
- Re-add BCM5750 PHY workaround for bce(4), which was mistakenly removed
from the previous commit.
- Move some PHY bug detections from brgphy.c to if_bge.c.
- Do not penalize working PHYs.
- Re-arrange bge_flags roughly by their categories.
- Fix minor style(9) nits.
PR: kern/107257
Obtained from: OpenBSD
Tested by: Mike Hibler <mike at flux dot utah dot edu>
The code is modelled after cd9660, including support for simple read-ahead
courtesy of clustered read.
Fix udf_strategy to DTRT.
This change fixes sendfile(2) not to send out garbage.
Reviewed by: scottl
MFC after: 1 month
- Added a short time wait (not used yet) constant
- Corrected the type of the crc32c table (it was
unsigned long and really is a uint32_t
- Got rid of the user of MHeaders until they
are truely needed by lower layers.
- Fixed an initialization problem in the readq structure
(ordering was off).
- Found yet another collision bug when the random number
generator returns two numbers on one side (during a collision)
that are the same. Also added some tracking of cookies
that will go away when we know that we have the last collision
bug gone.
- Fixed an init bug for book_size_scale, that was causing
Early FR code to run when it should not.
- Fixed a flight size tracking bug that was associated with
Early FR but due to above bug also effected all FR's
- Fixed it so Max Burst also will apply to Fast Retransmit.
- Fixed a bug in the temporary logging code that allowed a
static log array overflow
- hashinit_flags is now used.
- Two last mcopym's were converted to the macro sctp_m_copym that
has always been used by all other places
- macro sctp_m_copym was converted to upper case.
- We now validate sinfo_flags on input (we did not before).
- Fixed a bug that prevented a user from sending data and immediately
shuting down with one send operation.
- Moved to use hashdestroy instead of free() in our macros.
- Fixed an init problem in our timed_wait vtag where we
did not fully initialize our time-wait blocks.
- Timer stops were re-positioned.
- A pcb cleanup method was added, however this probably will
not be used in BSD.. unless we make module loadable protocols
- I think this fixes the mysterious timer bug.. it was a
ordering of locks problem in the way we did timers. It
now conforms to the timeout(9) manual (except for the
_drain part, we had to do this a different way due
to locks).
- Fixed error return code so we get either CONNREUSED or CONNRESET
depending on where one is in progression
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
error when the connection was reset.
- Purged an unused clone macro.
- Fixed a read erro code issue where we were NOT getting the proper
error when the connection was reset.
Approved by: gnn
Approved by: gnn
Add a new function hashinit_flags() which allows NOT-waiting
for memory (or waiting). The old hashinit() function now
calls hashinit_flags(..., HASH_WAITOK);
o eliminate assumptions that half/quarter rate channels on exist in 11a
o handle frequency mapping between hal and net80211; hal gives us freq's
in the range 2422..2437 that we remap
MFC after: 1 month
o add channel flag to enable freq <-> ieee channel # mapping (can
go away in the future when ieee number is precomputed)
o add mapping between 900mhz freq's and channel #'s that gives a
unique channel # for each half/quarter/full width channel
o remove assumptions that half/quarter rate channels on happen in 11a
o remove assumptions that all 11g channels are full width
o ensure ic_curchan is reset on mode change so changing the channel
list (e.g. on countrycode change) doesn't leave curchan set to an
invalid channel
There is still an issue with switching rate sets; to be fixed separately.
MFC after: 1 month
only support external PHYs (besides not connectable internal ones
which respond at the usual addresses, but which don't hurt if we
let them show up) and don't wedge when isolating PHYs. Actually,
this change special cases limiting PHYs to Am79C97{3,5,8}, for
which this driver doesn't implement swiching between the internal
and external PHYs, yet, and Am79C971, where isolating the external
PHY (at least in case it's a DP83840A) wedges the chip. Together
with sys/dev/mii/acphy.c rev. 1.21 this adds support for the
100baseFX port of AT-2700 series adaptors, which use two AC101,
one for the copper and one for the fibre port (there might be
variants which only use one PHY though).
- Fix a bug in the previous revision that prevented the address of
the used (external) PHY to be actually recorded.
- Don't bother to set if_mtu to ETHERMTU, ether_ifattach() does that.
MFC after: 1 week
bridge if it doesn't pass MSI messages up correctly. We set the flag
in pcib_attach() if the device ID is disabled via a PCI quirk.
- Disable MSI for devices behind the AMD 8131 HT-PCIX bridge. Linux has
the same quirk.
Tested by: no one despite repeated calls for testers
laptops.
Tested by: [1] Lion G. <liontanker@hotmail.com>
[2] Pietro Cerutti <pietro.cerutti@gmail.com>
Specialized mixer initialization for STAC9221, much like STAC9220.
Tested by: Devon H. O'Dell
- Set MIIF_NOLOOP as loopback doesn't work with this PHY. The MIIF_NOLOOP
flag currently triggers nothing but hopefully will be respected by
mii_phy_setmedia() later on.
- Use MII_ANEGTICKS instead of 5.
- Remove an unused macro.
- Fix some whitespace nits.
MFC after: 1 week
- In exphy_service() for the MII_TICK case don't bother to check whether
the currently selected media is of type IFM_AUTO as auto-negotiation
doesn't need to be kicked anyway.
- Remove #if 0'ed unapplicable code.
- Fix some whitespace nits.
MFC after: 1 week
and thus the FX_DIS pin indicates fibre media. This is part 1/2 of
adding support for the 100baseFX interface/port of AT-2700 series
adaptors.
Idea from: NetBSD
MFC after: 1 week
indices when manually adding media. Some of these I've missed while
converting drivers to take advantage of said fuctions recently,
others where longstanding bugs.
- General style(9) cleanup -- white space, braces, line wraps, etc.
- Annotate a lack of synchronization the global route cache if the input
routine is invoked with parallelism.
- Remove unused debugging code.
routing:
- style(9) cleanup -- white space, braces, etc.
- Make include guards consistent with our more general naming
convention.
- Rearrange and complete forward structure declarations in at_extern.h,
remove testing of guards of various other include files to protect
function declarations.
This leaves an ifdef _KERNEL in at_var.h, but from inspection it seems
likely that this file is not actually safe for inclusion in user space
still. However, since it's not included from within src/ so this does
not appear to be an issue (ifconfig, etc, have migrated to the generic
cross-protocol ioctls for address operations).
etc, changes.
Remove a small amount of #if !defined(__FreeBSD__) code.
Add missing include guard for _NETATALK_AARP_H_.
Remove unneeded (and conflicting) extern prototype for aarptfree().
call, its semantics were unintentionally changed. It went from
returning the time state to returning 0 or -1. Since 0 means time
normal, and non-zero effectively only shows up around leap seconds,
this went unnoticed until now. At least unnoticed until someone was
trying to run a binary they didn't have source for and it was
misbehaving...
Submitted by: Judah Levine
MFC After: 2 weeks
members right. However, it also said it was aligned(1), which meant
that gcc generated really bad code. Mark this as aligned(4). This
makes things a little faster on arm (a couple percent), but also saves
about 30k on the size of the kernel for arm.
I talked about doing this with bde, but didn't check with him before
the commit, so I'm hesitant say 'reviewed by: bde'.
- Use printf() and device_printf() instead of log() in ichsmb(4).
- Create the mutex sooner during ichsmb(4) attach.
- Attach the interrupt handler later during ichsmb(4) attach to avoid
races.
- Don't try to set PCIM_CMD_PORTEN in ichsmb(4) attach as the PCI bus
driver does this already.
- Add locking to alpm(4), amdpm(4), amdsmb(4), intsmb(4), nfsmb(4), and
viapm(4).
- Axe ALPM_SMBIO_BASE_ADDR, it's not really safe to write arbitrary values
into BARs, and the PCI bus layer will allocate resources now if needed.
- Merge intpm(4) and intsmb(4) into just intsmb(4). Previously, intpm(4)
attached to the PCI device and created an intsmb(4) child. Now,
intsmb(4) just attaches to PCI directly.
- Change several intsmb functions to take a softc instead of a device_t
to make things simpler.
KERNBASE for the first 1 MB of RAM instead of calling pmap_mapdev().
pmap_mapdev() knows how to handle the first 1 MB (and has known for a
while now) and properly maps the memory as UC to boot.
MFC after: 2 weeks
preemptions when adjusting the priority of a thread that is on a run
queue. This was only observed when FULL_PREEMPTION was enabled.
Reported by: kris
Diagnosed by: ups
MFC after: 1 week
that piggybacks on bce_tick() callout.
- Instead of unconditionally resetting the controller, try to
skip the reset in case we got a pause frame, like em(4) did.
- Lock bce_tick() using callout_init_mtx().
Discussed with/Reviewed by: glebius, scottl, davidch
we actually issue preemptions.
- Remove the #ifdef IPI_PREEMPTION so it is always compiled in. Leave
the option which optionally enables support in sched_4bsd. sched_ule.c
will soon use this functionality as a run time rather than compile time
option.
- Compare against the idlethread rather than the priority. There are some
idle prio tasks that we can preempt.
Discussed with: ups
Tested on: i386, amd64
allocations were made using improper flags in interrupt context.
Replace with a simple WITNESS warning call. This restores the
invariant that M_WAITOK allocations will always succeed or die
horribly trying, which is relied on by many UMA consumers.
MFC after: 3 weeks
Discussed with: jhb
NOTES though, as ofw_syscons(4) doesn't properly interface with
syscons(4) regarding loading the font specified with SC_DFLT_FONT,
causing a kernel with both options SC_OFWFB and SC_NO_MODE_CHANGE
to not link.
not needed if the proper ordering is done in attach and shutdown.
Remove usage of if_timer/watchdog and roll my own by piggybacking
off the tick() function.
Use the new usb system to allocate task queues instead of using
the system wide thread for taskqueues.
for usb. I hope that this will eventually be used for generic devices
that need full fledged blocking threads for event processing.
Create a taskqueue:
void usb_ether_task_init(device_t, int, struct usb_taskqueue *);
Enqueue a task:
void usb_ether_task_enqueue(struct usb_taskqueue *, struct task *);
Wait for all tasks queued to complete:
void usb_ether_task_drain(struct usb_taskqueue *, struct task *);
Destroy the taskqueue:
void usb_ether_task_destroy(struct usb_taskqueue *);
University of Washington copyrights, which include the
advertising clause. Move $NetBSD$ into standard location for
FreeBSD source files, and normalize formatting.
MFC after: 3 days
the UCB license now excludes the advertising clause. I'm not
interested in it either, so move my copyright. This leaves
only a CGD copyright with the advertising clause.
MFC after: 3 days
the state machine clocks to INIT, node references are not reclaimed.
Add a new routine ieee80211_drain_ifq that does this and use it
instead of IF_DRAIN.
Submitted by: Sepherosa Ziehau
Obtained from: DragonFly
MFC after: 1 month
- Sort by date in license blocks, oldest copyright first.
- All rights reserved after all copyrights, not just the first.
- Use (c) to be consistent with other entries.
MFC after: 3 days
o add IEEE80211_F_JOIN flag to ieee80211_fix_rate to indicate a station
is joining a BSS; this is used to control whether or not we over-write
the basic rate bit in the calculated rate set
o fix ieee80211_fix_rate to honor IEEE80211_F_DODEL when IEEE80211_F_DONEGO
is not specified (e.g. when joining an ibss network)
o on sta join always delete unusable rates from the negotiated rate set,
this was being done only ibss networks but is also needed for 11g bss
with mixed stations
o on sta join delete unusable rates from the bss node's rate set, not the
scan table entry's rate set
o when calculating a rate set for new neighbors in an ibss caculate a
negotiated rate set so drivers are not presented with rates they should
not use
Submitted by: Sepherosa Ziehau (w/ modifications)
Obtained from: DragonFly
MFC after: 1 month
- Clear the PCI AFSR and status error bits as previous errors still
might be indicated.
- Set up the PCI control and diagnostic registers according to the
capabilities, workarounds, etc of/for specific revisions of the
supported bridges. This includes no longer setting Hummingbird-/
Sabre-specific bits in the PCI control register but preserving
what the firmware has initialized them to like OpenSolaris does.
Previously we were setting these bits according to the example in
the Sabre documentation, which I doubt is appropriate for all
Sabre based designs and especially not for Hummingbirds. This
also includes not enabling bus parking unless the firmware tells
us to.
- Set the PCI latency timer register as this isn't always done by
the firmware.
o Remove a redundant argument from psycho_set_intr() and in this
function check the return value of bus_setup_intr(). [2]
o Let psycho_setup_intr() return ENOMEM instead of 0 when it can't
allocate memory for the interrupt wrapper stub and EINVAL instead
of 0 if it can't find the interrupt vector in the interrupt map.
o Add a workaround for a bug of the Sabre-APB-combination where it
doesn't drain DMA write data for devices behind additional PCI-PCI
bridges underneath the APB PCI-PCI bridge. This workaround (do
things necessary in order to achieve a manual drain when coherency
is required) is currently implemented in psycho_setup_intr() and
psycho_intr_stub() (for easy MFC'ing) and therefore is only applied
for interrupt handlers. This should be moved to psycho(4)-specific
bus_dma_tag_create() and bus_dmamap_sync() methods, respectively,
once this driver is converted to make use of BUS_GET_DMA_TAG(), so
the workaround is also applied for polling(4) callbacks. [3]
o Fix some minor style issues.
Info from: OpenSolaris [1]
Info from: Linux, OpenBSD, OpenSolaris [3]
Suggested by: Coverity Prevent (CID 682) [2]
MFC after: 1 month
firmware (mainly 'pmu' and its 'lomp' dupe found in a couple of
later USII{e,i}-based machines) by checking whether a device with
the same triple of bus number, slot and function already has been
added. This is the simple yet effective approach introduced in
OpenBSD some time ago, but which has the flaw that it assumes
that the device and its dupe(s) found in the OFW device tree are
equal or at least the one encountered first is in some way the
more important one (this is the case with 'pmu' and 'lomp'; the
'pmu' node has couple of properties and children while the 'lomp'
one misses most of these). If there's ever a device/dupe pair
where we don't encounter the more important node first, we'll
probably need to introduce a quirk list in order to add the
desired device but prevent its dupe(s) from being added.
MFC after: 1 week
link state changes. Instead, build new speed/duplex/flow-control
settings from the values reported from PHY.
This should fix speed/duplex/flow-control mismatches between GMAC and
PHY which resulted in very poor Rx performance due to lots of
out-of-order packet delivery.
Reported by: Arno J. Klaassen <arno AT heho DOT snv DOT jussieu DOT fr>
Tested by: Arno J. Klaassen <arno AT heho DOT snv DOT jussieu DOT fr>
modern dual-core systems as well.
- Parse the _CST packages for each cpu and track all the states individually,
on a per-cpu basis.
- Revert to generic FADT/P_BLK based Cx control if the _CST package
is not present on all cpus. In that case, the new driver will
still support per-cpu Cx state handling. The driver will determine the
highest Cx level that can be supported by all the cpus and configure the
available Cx state based on that.
- Fixed the case where multiple cpus in the system share the same
registers for Cx state handling. To do that, added a new flag
parameter to the acpi_PkgGas and acpi_bus_alloc_gas functions that
enable the caller to add the RF_SHAREABLE flag. This flag could also be
useful to other callers (acpi_throttle?) in the tree but this change is
not yet made.
- For Core Duo cpus, both cores seems to be taken out of C3 state when
any one of the cores need to transition out. This broke the short sleep
detection logic. It is disabled now if there is more than one cpu in
the system for now as it fixed it in my case. This quirk may need to
be re-enabled later differently.
- Added support to control cx_lowest on a per-cpu basis. There is still
a generic cx_lowest to enable changing cx_lowest for all cpus with a single
sysctl and for ease of use. Sample output for the new sysctl:
dev.cpu.0.cx_supported: C1/1 C2/1 C3/57
dev.cpu.0.cx_lowest: C3
dev.cpu.0.cx_usage: 0.00% 43.16% 56.83%
dev.cpu.1.cx_supported: C1/1 C2/1 C3/57
dev.cpu.1.cx_lowest: C3
dev.cpu.1.cx_usage: 0.00% 45.65% 54.34%
hw.acpi.cpu.cx_lowest: C3
This work was done by Stephane E. Potvin with some simple reworking by
myself. Thank you.
Submitted by: Stephane E. Potvin <sepotvin / videotron.ca>
MFC after: 2 weeks
recording enabled some programs (audio/audacity from ports) can't
correctly enumerate all /dev/dsp device.
Note: previous commit did not enable some debugging stuff, my eyes did
misread "#undef" as "#define".
Submitted by: Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
Now (ok it's been a while...) that FreeBSD has RLIMIT_AS too, we can use
it in the linuxolator instead of ignoring it.
This fixes a LTP test.
Submitted by: rdivacky
No need to lock prison in a case of linux_use26 because the int
setting is atomic and process cannot leave jail.
Submitted by: kib
Reviewed by: jhb
Requested by: rdivacky
Dont lock em in a case of just using em->shared->group_pid because
the group_pid never changes.
Submitted by: rdivacky
Reviewed by: kib
Glanced at by: jhb
(due to an early reset or the like), remember to unlock the socket lock.
This will not occur in 7-CURRENT, but could in theory occur in 6-STABLE.
MFC after: 1 week
do not call markvoldirty() until the mount has been flagged as read-write.
Due to the nature of the msdosfs code, this bug only seemed to appear for
FAT-16 and FAT-32.
This fixes the testcase:
#!/bin/sh
dd if=/dev/zero bs=1m count=1 oseek=119 of=image.msdos
mdconfig -a -t vnode -f image.msdos
newfs_msdos -F 16 /dev/md0 fd120m
mount_msdosfs -o ro /dev/md0 /mnt
mount | grep md0
mount -u -o rw /dev/md0; echo $?
mount | grep md0
umount /mnt
mdconfig -d -u 0
PR: 105412
Tested by: Eugene Grosbein <eugen grosbein pp ru>
revision 1.98 is NOT merged, because FreeBSD does not support this
syntax.
revision 1.99 is NOT merged, "const poisoning" part is not applicable
to FreeBSD. There is no variable shadowing, GCC can't find
this one (but there are others)
revision 1.100 is NOT merged, because it was null patch (no changes)
revision 1.101 is NOT merged, there is no BIT() macro in FreeBSD
revision 1.102 is merged
revision 1.103 is partially merged. There is no ai.ifaceh in FreeBSD
revision 1.104 is NOT merged
revision 1.105 is merged
revision 1.106 is not merged, because of rev. 1.107
revision 1.107 is a backuout of 1.106
Submitted by: Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
---snip---
New features:
1. Optional multichannel recording (32 channels on Live!, 64 channels
on Audigy).
All channels are 16bit/48000Hz/mono, format is fixed.
Half of them are copied from sound output, another half can be
used to record any data from DSP. What should be recorded is
hardcoded in DSP code. In this version it records dummy data, but
can be used to record all DSP inputs, for example..
Because there are no support of more-than-stereo sound streams
multichannell stream is presented as one 32(64)*48000 Hz 16bit mono
stream.
Channel map:
SB Live! (4.0/5.1)
offset (words) substream
0x00 Front L
0x01 Front R
0x02 Digital Front L
0x03 Digital Front R
0x04 Digital Center
0x05 Digital Sub
0x06 Headphones L
0x07 Headphones R
0x08 Rear L
0x09 Rear R
0x0A ADC (multi-rate recording) L
0x0B ADC (multi-rate recording) R
0x0C unused
0x0D unused
0x0E unused
0x0F unused
0x10 Analog Center (Live! 5.1) / dummy (Live! 4.0)
0x11 Analog Sub (Live! 5.1) / dummy (Live! 4.0)
0x12..-0x1F dummy
Audigy / Audigy 2 / Audigy 2 Value / Audigy 4
offset (words) substream
0x00 Digital Front L
0x01 Digital Front R
0x02 Digital Center
0x03 Digital Sub
0x04 Digital Side L (7.1 cards) / Headphones L (5.1 cards)
0x05 Digital Side R (7.1 cards) / Headphones R (5.1 cards)
0x06 Digital Rear L
0x07 Digital Rear R
0x08 Front L
0x09 Front R
0x0A Center
0x0B Sub
0x0C Side L
0x0D Side R
0x0E Rear L
0x0F Rear R
0x10 output to AC97 input L (muted)
0x11 output to AC97 input R (muted)
0x12 unused
0x13 unused
0x14 unused
0x15 unused
0x16 ADC (multi-rate recording) L
0x17 ADC (multi-rate recording) R
0x18 unused
0x19 unused
0x1A unused
0x1B unused
0x1C unused
0x1D unused
0x1E unused
0x1F unused
0x20..0x3F dummy
Fixes:
1. Do not assign negative values to variables used to index emu_cards
array. This array was never accessed when index is negative, but
Alexander (netchild@) told me that Coverity does not like it.
After this change emu_cards[0] should never be used to identify
valid sound card.
2. Fix off-by-one errors in interrupt manager. Add more checks there.
3. Fixes to sound buffering code now allows driver to use large playback
buffers.
4. Fix memory allocation bug when multichannel recording is not
enabled.
5. Fix interrupt timeout when recording with low bitrate (8kHz).
Hardware:
1. Add one more known Audigy ZS card to list. Add two cards with
PCI IDs betwen old known cards and new one.
Other changes:
1. Do not use ALL CAPS in messages.
Incomplete code:
1. Automute S/PDIF when S/PDIF signal is lost.
Tested on i386 only, gcc 3.4.6 & gcc41/gcc42 (syntax only).
---snip---
This commits enables a little bit of debugging output when the driver is
loaded as a module. I did a cross-build test for amd64.
The code has some style issues, this will be addressed later.
The multichannel recording part is some work in progress to allow playing
around with it until the generic sound code is better able to handle
multichannel streams.
This is supposed to fix
CID: 171187
Found by: Coverity Prevent
Submitted by: Yuriy Tsibizov <Yuriy.Tsibizov@gfk.ru>
Bring the linux mmap code more into line with how linux (2.4.x) behaves.
Tested by: Scot Hetzel <swhetzel@gmail.com> on amd64 without PROT_EXEC
Additionally to the i386 version always use PROT_EXEC in the mapping like the
previous version of the amd64 code did. We need to examinate this further to
decide what the right thing to do is. For now this fixes several problems in
the LTP test runs and should behave regarding PROT_EXEC like before.
of max() when computing the divisor in SCHED_TICK_PRI(). This prevents
cases where rounding down would allow the quotient to exceed
SCHED_PRI_RANGE.
- Garbage collect some unused flags and fields.
- Replace TDF_HOLD with sched_pin_td()/sched_unpin_td() since it simply
duplicated this functionality.
- Re-enable the rebalancer by default and fix the sysctl so it can be
modified.
marked idle, thus breaking cpu load balancing.
- Change sched_interact_update() to fix cases where the stored history
has expanded significantly rather than handling them in the callers. This
fixes a case where sched_priority() could compute a bad value.
- Add a sysctl to disable the global load balancer for experimentation.
server.
Don't complain about a hard loop id of 0xffff- we get this in
point-to-point topologies with the 2300 and 2K Login firmware.
Up the timeout on register FC4 types commands.
- Rename confusing AGP_INTEL_I845_MCHCFG to AGP_INTEL_I845_AGPM.
- Move E7205 and E7505 from i8x5 to i8x0 family. It probably worked
because the actual offset is the same.
In fact, all three families have the bit at the exact same place. Only
differences are name and width of the registers, i.e., NBXCFG (0x50, dword),
RDCR (0x51, byte), AGPM (0x51, byte), MCHCFG (0x50, word) depending on
the family of the chipsets.
sysctl and socket teardown by adding a reference count to the UNIX domain
pcb object and fixing the sysctl that enumerates unpcbs to grab a
reference on each unpcb while it builds the list to copy out to userland.
- Close a race between UNIX domain pcb garbage collection (unp_gc()) and
file descriptor teardown (fdrop()) by adding a new garbage collection
flag FWAIT. unp_gc() sets FWAIT while it walks the message buffers
in a UNIX domain socket looking for nested file descriptor references
and clears the flag when it is finished. fdrop() checks to see if the
flag is set on a file descriptor whose refcount just dropped to 0 and
waits for unp_gc() to clear the flag before completely destroying the
file descriptor.
MFC after: 1 week
Reviewed by: rwatson
Submitted by: ups
Hopefully makes the panics go away: mx1
- Add a printf in swp_pager_meta_build() to warn if the swapzone becomes
exhausted so that there's at least a warning before a box that runs out
of swapzone space before running out of swap space deadlocks.
MFC after: 1 week
Reviwed by: alc
functions now more closely resemble similar functions in nullfs.
This also eliminates some errors.
Submitted by: daichi, Masanori OZAWA <ozawa ongs co jp>
setting ftick = ltick = ticks in schedinit().
- Update the priority when we are pulled off of the run queue and when we
are inserted onto the run queue so that it more accurately reflects our
present status. This is important for efficient priority propagation
functioning.
- Move the frequency test into sched_pctcpu_update() so we don't repeat it
each time we'd like to call it.
- Put some temporary work-around code in sched_priority() in case the tick
mechanism produces a bad priority. Eventually this should revert to an
assert again.
a spin mutex since it doesn't have an INTR_FAST interrupt handler.
Beyond that the driver is still under Giant anyway.
- Remove unneeded locking during attach across operations that can't be
called with locks held (such as bus_dma_tag_create()).
MFC after: 1 week
Not objected to by: scottl
the most recently chosen index. This significantly improves nice
behavior. This allows a lower priority thread to run some multiple of
times before the higher priority thread makes it to the front of
the queue. A nice +20 cpu hog now only gets ~5% of the cpu when running
with a nice 0 cpu hog and about 1.5% with a nice -20 hog. A nice
difference of 1 makes a 4% difference in cpu usage between two hogs.
- Track a seperate insert and removal index. When the removal index is
empty it is updated to point at the current insert index.
- Don't remove and re-add a thread to the runq when it is being adjusted
down in priority.
- Pull some conditional code out of sched_tick(). It's looking a bit
large now.
- Remove the double queue mechanism for timeshare threads. It was slow
due to excess cache lines in play, caused suboptimal scheduling behavior
with niced and other non-interactive processes, complicated priority
lending, etc.
- Use a circular queue with a floating starting index for timeshare threads.
Enforces fairness by moving the insertion point closer to threads with
worse priorities over time.
- Give interactive timeshare threads real-time user-space priorities and
place them on the realtime/ithd queue.
- Select non-interactive timeshare thread priorities based on their cpu
utilization over the last 10 seconds combined with the nice value. This
gives us more sane priorities and behavior in a loaded system as
compared to the old method of using the interactivity score. The
interactive score quickly hit a ceiling if threads were non-interactive
and penalized new hog threads.
- Use one slice size for all threads. The slice is not currently
dynamically set to adjust scheduling behavior of different threads.
- Add some new sysctls for scheduling parameters.
Bug fixes/Clean up:
- Fix zeroing of td_sched after initialization in sched_fork_thread() caused
by recent ksegrp removal.
- Fix KSE interactivity issues related to frequent forking and exiting of
kse threads. We simply disable the penalty for thread creation and exit
for kse threads.
- Cleanup the cpu estimator by using tickincr here as well. Keep ticks and
ltick/ftick in the same frequency. Previously ticks were stathz and
others were hz.
- Lots of new and updated comments.
- Many many others.
Tested on: up x86/amd64, 8way amd64.
- runq_add_pri allows the caller to position the thread at any rqindex
regardless of priority.
- runq_choose_from() chooses the lowest priority thread starting from a given
index. The index is updated with the rqindex of the chosen thread. This
routine is used to pick the lowest priority relative to a given index.
- runq_remove_idx() updates the index if the run queue that held the removed
thread is now empty.
start working with third party usb modules, where sometimes it
is not easy to set the inclusion order so that there are no multiple
inclusions, yet you want to compile with high WARNS levels).
I am not sure if there is a standard for having a leading and/or trailing _
in the macro name, the usb code seems to use both.
There are still several unprotected headers here so it might be useful
to do the same thing on other files as well as the need arises.
MFC After: 3 days
check length of the pathname in the range 0<=n<=NFS_MAXPATHLEN,
not 0<n<=NFS_MAXPATHLEN. This fixes a minor interoperability problem
that the FreeBSD NFS server did not allow a symlink pointing the empty
pathname.
MFC after: 1 week
mbuf. First moves toward being able to cope better with having layer 2 (or
other encapsulation data) before the IP header in the packet being examined.
More commits to come to round out this functionality. This commit should
have no practical effect but clears the way for what is coming.
Revirewed by: luigi, yar
MFC After: 2 weeks
been introduced to the MAC framework:
mpo_associate_nfsd_label
mpo_create_mbuf_from_firewall
mpo_check_system_nfsd
mpo_check_vnode_mmap_downgrade
mpo_check_vnode_mprotect
mpo_init_syncache_label
mpo_destroy_syncache_label
mpo_init_syncache_from_inpcb
mpo_create_mbuf_from_syncache
MFC after: 2 weeks [1]
[1] The syncache related entry points will NOT be MFCed as the changes in
the syncache subsystem are not present in RELENG_6 yet.
exclusive access if there is at least one thread waiting for it to
become available. This may significantly reduce overhead by reducing
the number of unnecessary wakeups issued whenever the framework becomes
idle.
Annotate that we still signal the CV more than necessary and should
fix this.
Obtained from: TrustedBSD Project
Reviewed by: csjp
Tested by: csjp
Redo the checking for 2.6 emulation. We now cache the value of
use26 and replace calls to linux_get_osrelease() + parsing with
a call to linux_use26(). Typical path is lockless now.
Pointed out by: kib
This allows to ship RELENG_7_0 with a default osrelease of 2.4.2 and the
possibility to enable 2.6.x emulation without the possible performance
impact of the previous version of the check.
Submitted by: rdivacky
- Micro-optimize the addition of an 802.1q header to match the removal code.
- Consistently check for interfaces being up and running.
- Consistently use NULL instead of 0 with pointers.
With the second (and last) part of my previous Summer of Code work, we get:
-ipfw's in kernel nat
-redirect_* and LSNAT support
General information about nat syntax and some examples are available
in the ipfw (8) man page. The redirect and LSNAT syntax are identical
to natd, so please refer to natd (8) man page.
To enable in kernel nat in rc.conf, two options were added:
o firewall_nat_enable: equivalent to natd_enable
o firewall_nat_interface: equivalent to natd_interface
Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet
to continue being checked by the firewall ruleset after being
(de)aliased.
NOTA BENE: due to some problems with libalias architecture, in kernel
nat won't work with TSO enabled nic, thus you have to disable TSO via
ifconfig (ifconfig foo0 -tso).
Approved by: glebius (mentor)
access plus timers. This makes the code
more portable and able to change out the
mbuf or timer system used more easily ;-)
b) removal of all use of pkt-hdr's until only
the places we need them (before ip_output routines).
c) remove a bunch of code not needed due to <b> aka
worrying about pkthdr's :-)
d) There was one last reorder problem it looks where
if a restart occur's and we release and relock (at
the point where we setup our alias vtag) we would
end up possibly getting the wrong TSN in place. The
code that fixed the TSN's just needed to be shifted
around BEFORE the release of the lock.. also code that
set the state (since this also could contribute).
Approved by: gnn
semantics.
- Stop testing bpf pointers for NULL. In some cases use
bpf_peers_present() and then call the function directly inside the
conditional block instead of the macro.
- For places where the entire conditional block is the macro, remove the
test and make the macro unconditional.
- Use BPF_MTAP() in if_pfsync on FreeBSD instead of an expanded version of
the old semantics.
Reviewed by: csjp (older version)
lookup early. This has some performance implications and should not be
enabled by default, but might help greatly in certain setups. After some
more testing this could be turned into a sysctl.
Tested by: avatar
LOR ids: 17, 24, 32, 46, 191 (conceptual)
MFC after: 6 weeks
MPLOCKED. The cleaning in rev.1.25 was supposed to have been undone
by rev.1.26, but 1.26 could never have actually affected asm files
since atomic.h is full of C declarations so including it in asm files
would just give syntax errors. The asm MPLOCKED is even less needed
than when misplaced definitions of it were first removed, and is now
unused in any asm file in the src tree except in anachronismns in
sys/i386/i386/support.s.
manipulation is visible to the subject process. Remove XXX comments
suggesting this.
Convert one XXX on a difference from Darwin into a note: it's not a
bug, it's a feature.
Obtained from: TrustedBSD Project
system calls on the amd64 architecture.
Some minor white space tweaks for consistency with other syscalls.master
files.
Obtained from: TrustedBSD Project
- Replace XXX with Note: in several cases where observations are made about
future functionality rather than problems or bugs.
- Remove an XXX comment about byte order and au_to_ip() -- IP headers must
be submitted in network byte order. Add a comment to this effect.
- Mention that we don't implement select/poll for /dev/audit.
Obtained from: TrustedBSD Project
kernel<->policy ABI version. Add a comment to the definition describing
it and listing known versions. Modify MAC_POLICY_SET() to reference the
current kernel version by name rather than by number.
Staticize mac_late, which is used only in mac_framework.c.
Obtained from: TrustedBSD Project
mac_framework.c Contains basic MAC Framework functions, policy
registration, sysinits, etc.
mac_syscalls.c Contains implementations of various MAC system calls,
including ENOSYS stubs when compiling without options
MAC.
Obtained from: TrustedBSD Project
consumes and implements, as well as the location of the framework and
policy modules.
Refactor MAC Framework versioning a bit so that the current ABI version can
be exported via a read-only sysctl.
Further update comments relating to locking/synchronization.
Update copyright to take into account these and other recent changes.
Obtained from: TrustedBSD Project
node would send every outgoing frame to the "compress" hook.
Packets received on the "compress" hook were expected to be
compressed and PROT_COMPD tag was put on them unconditionally.
After this commit an alternative compression mode can be set.
In this mode the node doesn't put the PROT_COMPD, the compressor
should put it itself. This is important for such kind of
compressors, that can submit uncompressed frames.
Before this commit, if the decompression is enabled, the ng_ppp(4)
node would send and incoming frame to the "decompress" hook
only if it has the PROT_COMPD proto tag on it.
After this commit an alternative decompression mode can be set.
In this mode the node sends all the incoming packets to the
decompression hook. This is important for such kind of compressors
that need uncompressed packets too, to keep their library in sync.
These new features will be used in new version of mpd4, and in new
compressor nodes.
Submitted by: Alexander Motin <mav alkar.net>
mainly involves removing all __CC_SUPPORTS___INLINE__ ifdefs. These
ifdefs are even less needed for amd64 than for i386, but the i386
atomic.h never had them. The ifdefs here were just an optimization
of obsolescent compatibility cruft (__inline) for a null set of
compilers. I think null sets of compilers should only be supported
in cases where this is more than an optimization, doesn't require
extensive ifdefs, and only involves not-so-obsolescent compatibility
cruft (plain inline here).
o mark 11g mode support on finding 11g or pure 11g (OFDM-only)
channels; was requiring pure 11g which caused some contortions
in drivers that manually setup their channel lists
These functions are used a lot for mutexes, so this reduces the text
size of an average kernel by about 0.75%. This wasn't intended to
be a significant optimization, but it somehow increased the maximum
number of packets per second that can be transmitted by my bge hardware
from 320000 to 460000 (this benchmark is CPU-bound and remarkably
sensitive to changes in the text section).
Details: we would prefer to leave the result of the cmpxchg in %al,
but cannot tell gcc that it is there, so we have to convert it to an
integer register. We converted to %al, then to %[re]ax, but the
latter step is usually wasted since gcc usually only wants the condition
code and can recover it from %al just as easily as from %[re]ax. Let
gcc promote %al in the few cases where this is needed.
Nearby style fixes;
- let gcc manage the load of `res', and don't abuse `res' for a copy of `exp'
- don't echo `res's name in comments
- consistently spell the condition code as 'e' after comparison for equality
- don't hard-code %al anywhere except in constraints
- for the version that doesn't use cmpxchg, there is no requirement to use
%al anywhere, so don't hard-code it in the constraints either.
Style non-fix:
- for the versions that use cmpxchg, keep using "a" (was %[re]ax, now %al)
for the main output operand, although this is not required. The input
and output operands that use the "a" constraint are now decoupled, and
this makes things clearer except for the reason that the output register
is hard-coded. It is now just a hack to tell gcc that the input "a" has
been clobbered without increasing the number of operands.
o change handling of regdomain-related mib knobs so they can be set
post-attach: regdomain, countrycode, outdoor, and xchanmode; the
hal will not permit changing the regdomain but we expose it for now
o on regdomain/countrycode change recalculate the channel list and
push it to the net80211 layer (NB: looks to need more tweaking)
o setup rate tables for half/quarter rate channels
o honor half/quarter rate channel configs when changing channels
o honor half/quarter rate channel configs when setting the slot time
o use hack/nonstandard channel numbering scheme for the public safety
band to avoid overlapping 2.4G channels on dual-band cards
o remove setup of ic_sup_rates; the net80211 layer can do this for us
and it simplifies handling of half/quarter rate channels
Tested only in Public Safety Band with cards that have RF5112.
in the Public Safety Band):
o add channel flags to identify half/quarter-rate operation
o add rate sets (need to check spec on 4Mb/s in 1/4 rate)
o add if_media definitions for new rates
o split net80211 channel setup out into ieee80211_chan_init
o fixup ieee80211_mhz2ieee and ieee80211_ieee2mhz to understand half/quarter
rate channels: note we temporarily use a nonstandard/hack numbering that
avoids overlap with 2.4G channels because we don't (yet) have enough
state to identify and/or map overlapping channel sets
o fixup ieee80211_ifmedia_init so it can be called post attach and will
recalculate the channel list and associated state; this enables changing
channel-related state like the regulatory domain after attach (will be
needed for 802.11d support too)
o add ieee80211_get_suprates to return a reference to the supported rate
set for a given channel
o add 3, 4.5, and 27 MB/s tx rates to rate <-> media conversion routines
o const-poison channel arg to ieee80211_chan2mode
bge_intr(). Some of them are used in bge_poll(). Simplify by only
initializing these for polling mode and not toggling them when switching
modes. This also fixes missing synchronization with the coalescing
engine in the toggling.
Add a pointer to the relevant PR for future reference. The whole comment
will be OK to remove as soon as the general solution is applied.
PR: kern/105943
pmap on i386
- check for change in executable status in pmap_enter
- pmap_qenter and pmap_qremove only need to invalidate the range if one
of the pages has been referenced
- remove pmap_kenter/pmap_kremove as they were only used by pmap_qenter
and pmap_qremove
- in pmap_copy don't copy wired bit to destination pmap
- mpte was unused in pmap_enter_object - remove
- pmap_enter_quick_locked is not called on the kernel_pmap, remove check
- move pmap_remove_write specific logic out of tte_clear_phys_bit
- in pmap_protect check for removal of execute bit
- panic in the presence of a wired page in pmap_remove_all
- pmap_zero_range can call hwblkclr if offset is zero and size is PAGE_SIZE
- tte_clear_virt_bit is only used by pmap_change_wiring - thus it can be
greatly simplified
- pmap_invalidate_page need only be called in tte_clear_phys_bit if there
is a match with flags
- lock the pmap in tte_clear_phys_bit so that clearing the page bits is
atomic with invalidating the page
- these changes result in 100s reduction in buildworld from a malloc backed
disk to a malloc backed disk - ~2.5%
mbuf is dropped, to preserve the invariant in the PR_ADDR case.
Add a regression test to detect this condition, but do not hook it
up to the build for now.
PR: kern/38495
Submitted by: James Juran
Reviewed by: sam, rwatson
Obtained from: NetBSD
MFC after: 2 weeks
The problem was that I was acquiring the driver sx lock and then waiting
for a taskqueue to drain, however the taskqueue itself would try to
acquire the lock as well leading to a deadlock.
To fix the problem roll my own exclusive lock that allows for lock
cancellation. This is a normal exclusive lock, however if someone
marks it as "dead" then all waiters who request an error return will
get back an error instead of continuing to wait for the lock.
In this particular case, the shutdown and detach functions kill the
lock while the async task thread tries to acquire the lock but will
abort if the lock returns an error.
The other option was to drop the driver lock mid-detach and mid-shutdown,
mid-detach was a ok, however mid-shutdown was not.
While I'm here, fix a bug in what appears to be the mii link status
word in the softc going out to lunch. Explicitly set the status
word to 1 after initializing the mii. This would result in an interface
that would never respond to "if_start" requests as the mii interface
would always look down.
nmi handler is used to stop other processors, nmi hander calls trap(),
however, trap() now accepts a pointer rather than a reference, this was
changed by kmacy@.
non-extattr functions from vfs_extattr.c, and extattr functions from
vfs_syscalls.c.
Change copyright/license on vfs_extattr.c to my copyright/license on
the extended attribute implementation (from extattr.h).
Clean up includes a bit.
Obtained from: TrustedBSD Project
Framework and security modules, to src/sys/security/mac/mac_policy.h,
completing the removal of kernel-only MAC Framework include files from
src/sys/sys. Update the MAC Framework and MAC policy modules. Delete
the old mac_policy.h.
Third party policy modules will need similar updating.
Obtained from: TrustedBSD Project
return an error since it returns a count of battery devices in the system.
Set it to 0 explicitly, since it is the only switch branch that doesn't set
it.
# I guess no one uses it.
It always called MH_ALIGN for small lengths being
prepended (less than MHLEN). This meant that if you did
a prepend on a non M_PKTHDR the system would panic with
the KASSERT in MH_ALIGN. Instead we are not aware of
this and do a MH_ALIGN or M_ALIGN as appropriate.
Reviewed by: andre
Approved by: gnn
subsystems will be a property of policy modules, which may require
access control check entry points to be invoked even when not actively
enforcing (i.e., to track information flow without providing
protection).
Obtained from: TrustedBSD Project
Suggested by: Christopher dot Vance at sparta dot com
than from the slab, but don't.
Document mac_mbuf_to_label(), mac_copy_mbuf_tag().
Clean up white space/wrapping for other comments.
Obtained from: TrustedBSD Project
Exapnd comments on System V IPC labeling methods, which could use improved
consistency with respect to other object types.
Obtained from: TrustedBSD Project
the ifnet itself. The stack copy has been made while holding the mutex
protecting ifnet labels, so copying from the ifnet copy could result in
an inconsistent version being copied out.
Reported by: Todd.Miller@sparta.com
Obtained from: TrustedBSD Project
MFC after: 3 weeks
- Move linux_nanosleep() from src/sys/amd64/linux32/linux32_machdep.c to
src/sys/compat/linux/linux_time.c.
- Validate timespec ranges before use as Linux kernel does.
- Fix l_timespec structure.
- Clean up style(9) nits.
Add rudimentary IPC_INFO/MSG_INFO command support for linux_msgctl()
to pacify Linux ipcs(1). While I am here, add more bound checks
for linux_msgsnd() and linux_msgrcv().
copyin()/copyout() for message type is separated from msgsnd()/msgrcv() and
it is done from its wrapper functions to support 32-bit emulations. After I
implemented this, I have briefly referenced NetBSD and Darwin. NetBSD passes
copyin()/copyout() function pointers from wrappers. Darwin passes size of
message type as an argument, which is actually similar to my first
implementation (P4 109706). We may revisit these implementations later.
would be able to work with aac(4).
This approach is used by some other drivers as well. However, we
need a more generic way to do this in order to avoid having to
special case headers in individual drivers for each platform.
Obtained from: Adaptec (version b11518)
Approved by: scottl
been handled instead of when at least one descriptor was just handled.
For bge, it is normal to get a txeof when only a small fraction of the
queued tx descriptors have been handled, so the bug broke the watchdog
in a usual case.
- moved the synchronizing bus read to after the bus write for the first
interrupt ack so that it actually synchronizes everything necessary.
We were acking not only the status update that triggered the interrupt
together with any status updates that occurred before we got around
to the bus write for the ack, but also any status updates that occur
after we do the bus write but before the write reaches the device.
The corresponding race for the second interrupt ack resulted in
sometimes returning from the interrupt handler with acked but
unserviced interrupt events. Such events then remain unserviced
until further events cause another interrupt or the watchdog times
out.
The race was often lost on my 5705, apparently since my 5705 has broken
event coalescing which causes a status update for almost every packet,
so another status update is quite likely to occur while the interrupt
handler is running. Watchdog timeouts weren't very noticeable,
apparently because bge_txeof() has one of the usual bugs resetting the
watchdog.
- don't disable device interrupts while bge_intr() is running. Doing this
just had the side effects of:
- entering a device mode in which different coalescing parameters apply.
Different coalescing parameters can be used to either inhibit or
enhance the chance of getting another status update while in the
interrupt handler. This feature is useless with the current
organization of the interrupt handler but might be useful with a
taskqueue handler.
- giving a race for ack+reenable/return. This cannot be handled
by simply rearranging the order of bus accesses like the race for
ack+keepenable/entry. It is necessary to sync the ack and then
check for new events.
- taking longer, especially with the extra code to avoid the race on
ack+reenable/return.
Reviewed by: ru, gleb, scottl
vnode v_flag. For cluster buffers this would result in dereferencing NULL
b_vp. To prevent the panic, cache relevant vnode flag before calling
bstrategy.
Reported by: Peter Holm, kris
Tested by: Peter Holm
Reviewed by: tegge
Pointy hat to: kib
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.
Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.
Tested on: Athlon64 X2 3800+, Dual Xeon 5130
re_watchdog() in order to avoid races accessing if_timer.
- Use bus_get_dma_tag() so re(4) works on platforms requiring it.
- Remove invalid BUS_DMA_ALLOCNOW when creating the parent DMA tag
and the tags that are used for static memory allocations.
- Don't bother to set if_mtu to ETHERMTU, ether_ifattach() does that.
- Remove an unused variable in re_intr().
watchdog timer in dc_txeof() in case there are still unhandled
descriptors as dc_poll() invokes dc_poll() unconditionally.
Otherwise this would result in the watchdog timer constantly being
being reloaded and thus circumvent that the watchdog ever fires in
the DEVICE_POLLING case.
Pointed out by: bde
pmap.c, and is potentially the cause of hangs reported on machines with a
small amount of memory. On machines with sufficient RAM, and without a lot
of processes running, this situation would probably never occur.
Testing is still incomplete, but it is obviously wrong so remove the
offending code now.
The issue of what to do when both the primary and secondary hash overflow
is still open.
Reported by: Dan Kresja at windriver dot com, via alc
This macro was written expecting a 32-bit unsigned long, and
doesn't work properly on 64-bit systems. This bug caused vn_stat()
to return incorrect values for files larger than 2gb on msdosfs filesystems
on 64-bit systems.
PR: 106703
Submitted by: Axel Gonzalez <loox e-shell net>
MFC after: 3 days
This bug caused vn_stat() to fail on files larger than 2gb on msdosfs
filesystems on AMD64.
PR: 106703
Tested by: Axel Gonzalez <loox e-shell net>
MFC after: 3 days
- Do not repeatedly read vendor/device IDs while probing.
- Remove redundant bzero(3) for softc. device_get_softc(9) does it for free[1].
Reviewed by: glebius
Suggested by: glebius[1]
aches as a read-only file. In a number of cases this has led to
compiles failing- usually due to some strange NFS drift which thinks
that the opt_ah.h in the compile directory is out of date wrt the
source it is copied from. When the copy is executed again, it fails
because the target is read-only. Oops. Modify the compile hooks
avoid this.
Discussed with a while back with: Sam Leffler
- If we want mii_phy_add_media() to add 1000baseT media, we need to
supply sc->mii_extcapabilities.
- Fix formatting when announcing autonegotiation support.
usage to conform to that of tl0_trap - the separate code path
for unaligned faults was never getting used (and evidently doesn't
work), so ifdef out for now
Because accessing ID registers in rtl81x9 needs 32bit register access
and RL_IDR4/RL_IDR5 registers are reservered registers bzero() is
needed before copying ethernet address.
This fixes unaligned memory accesses panic in sparc64.
PR: kern/106801
MFC after: 3 days
as if they were really passed by reference. Specifically, the dead stores
elimination pass in the GCC 4.1 optimiser breaks the non-compliant behavior
on which FreeBSD relied. This change brings FreeBSD up to date by switching
trap frames to being explicitly passed by reference.
Reviewed by: kan
Tested by: kan
passed by value (trap frames) as if they were in fact being passed by
reference. For better or worse, this incorrect behaviour is no longer
present in gcc 4.1. In this patch I convert all trapframe arguments to
be explicitly pass by reference. I also remove vm86_initflags, pushing
the very little work that it actually does up into vm86_prepcall.
Reviewed by: kan
Tested by: kan
- The PCPU usage was to ensure that there were no faults on the stack while
the tte_hash_bucket lock was held - but this can be avoided by making sure
the address on the stack is already referenced.
- PCPU removal obviates the need for critical_{enter, exit}
- in trying to avoid nested brackets and #ifdef INVARIANTS around i at the
top, I broke booting for INVARIANTS all together :-(
- the cleanest fix is to simply assign to sq twice if INVARIANTS is enabled
- tested both with and without INVARIANTS :-/
after we perform the operations to delete the export,
call vfs_deleteopt() to delete the "export" mount option from
the linked list of mount options associated with that mount point.
This fixes one scenario:
- put a filesystem in /etc/exports to export it
- remove the filesystem from /etc/exports to delete the export and restart
mountd
- try to do a "mount -u -o ro" or "mount -u -o rw" on that filesystem
now that it is no longer exported.
arguments to fail. The mode field for shmget() appears to have undefined
meaning in the context of an already-present IPC object, but applications
appear to assume any arbitrary passed value will be ignored. I had hoped
to revisit this more quickly, but am removing the change for now to
prevent toe-stubbing.
Reported by: JAroslav Suchanek <jarda at grisoft dot cz>
PR: kern/106078
- rename skip_utrap to tl0_skip_utrap to indicate its use by the fill trap fault handler
- handle a null kstack by switching to the idle threads stack and then going to trap
- correctly handle a unaligned or unmapped stack during a fill trap
- save off some extra data in the pcpu pad in ptl1_panic
- add an assert that PCB is valid in vm_machdep.c
- add cnt_hold cnt_lock support for spin mutexes
- make sure contested is initialized to zero to only bump contested when appropriate
- move initialization function to kern_mutex.c to avoid cyclic dependency between
mutex.h and lock_profile.h
behave as expected.
Also:
- Return an error if WD_PASSIVE is passed in to the ioctl as only
WD_ACTIVE is implemented at the moment. See sys/watchdog.h for an
explanation of the difference between WD_ACTIVE and WD_PASSIVE.
- Remove the I_HAVE_TOTALLY_LOST_MY_SENSE_OF_HUMOR define. If you've
lost your sense of humor, than don't add a define.
Specific changes:
i80321_wdog.c
Don't roll your own passive watchdog tickle as this would defeat the
purpose of an active (userland) watchdog tickle.
ichwd.c / ipmi.c:
WD_ACTIVE means active patting of the watchdog by a userland process,
not whether the watchdog is active. See sys/watchdog.h.
kern_clock.c:
(software watchdog) Remove a check for WD_ACTIVE as this does not make
sense here. This reverts r1.181.
o fixed a comment
o made in kernel libalias a bit less verbose (disabled automatic
logging everytime a new link is added or deleted)
Approved by: glebius (mentor)
work:
- A new PCI quirk (PCI_QUIRK_DISABLE_MSI) is added to the quirk table.
- A new pci_msi_device_blacklisted() determines if a passed in device
matches an MSI quirk in the quirk table. This can be overridden (all
quirks ignored) by setting the hw.pci.honor_msi_blacklist to 0.
- A global blacklist check is performed in the MI PCI bus code by checking
to see if the device at 0:0:0 is blacklisted.
Tested by: jdp
1) s/mi/mfi/ in FreeBSD ioctl path
2) add in "\n" on various failure messages
3) cap the length of time to abort an AEN command
4) fix passing sense data back to user to make Dell's Linux firmware
upgrade tool happy.
5) bump the MFI_POLL_TIMEOUT_SECS from 10s to 50s since the
firmware flash command can take ~40s to return.
This is some clean-up and enables RAID firmware to updated via Dell's
tool. Note Dell's tool requires the updates to the Linux emulator
that has been done in -current with TLS etc.
I need to discuss with scottl how to better submit mfi commands to
the firmware via the ioctl path so we don't do it in polled mode.
2) Fix all "magic numbers" to be constants.
3) A collision case that would generate two associations to
the same peer due to a missing lock is fixed.
4) Added tracking of where timers are stopped.
Approved by: gnn
by vnode. Allow for md thread and the thread that owns lock on vnode
backing the md device to do the write even when runningbufspace is
exhausted.
Tested by: Peter Holm
Reviewed by: tegge
MFC after: 2 weeks
have been added erroneously, and it causes problems on some chips. A larger
change is needed to do this write at a more appropriate place, but that
change requires reworking the ASF logic. That will be worked on in the
future.
Submitted by: Bruce Evans
o no more ds_vdata in tx/rx descriptors
o split h/w tx/rx descriptor from s/w status
o as part of the descriptor split change the rate control module api
so the ath_buf is passed in to the module so it can fetch both
descriptor and status information as needed
o add some const poisoning
Also for sample rate control algorithm:
o split debug msgs (node, rate, any)
o uniformly bounds check rate indices (and in some cases correct checks)
o move array index ops to after bounds checking
o use final tsi from the status block instead of the h/w descriptor
o replace h/w descriptor struct's with proper mask+shift defs (this
doesn't belong here; everything is known by the driver and should
just be sent down so there's no h/w-specific knowledge)
MFC after: 1 month
o remove os-specific glue code; it's now the responsibility of
the driver
o add wackelf utility for patching the ELF magic number on arm
builds since noone can agree on how to mark a .o file as not
having any floating point instructions
o remove radar/dfs-related entry points; folks have finally
decided how to support dfs w/o polluting the hal
o properly recognize AR2424 chips (they were being rejected on
attach despite being fully supported)
o add HAL_CAP_RXORN_FATAL capability to control how RXORN errors
are handled; previously RXORN was always treated as fatal because
older chips required a reset; now we do not treat it as fatal
for "newer chips" (noone seems to know what the cutoff is so
this capability can be used to override the current guestimate)
o HAL_CAP_RXTSTAMP_PREC capability to export the number of bits
of precision for timestamp data returned in the rx descriptor
o remove public exposure of the compression buffer; it is chip
specific and never belonged in the public view
o change definition of HAL_INT_GLOBAL from an enum member to a
#define to workaround compilers that bitch about enum values
that appear overflow 31 bits
o add support for newer chips that can store the tkip mic key
together with the cipher key in a single key cache entry
o split tx/rx descriptor into a h/w section and a s/w portion;
this permits storing the s/w area in cached memory when the
h/w area is stored in uncached memory; this also shrinks
memory use since only one status block is needed while multiple
tx/rx descriptors may be required per frame
o add final transmit series index to the transmit descriptor status
so rate control algorithms don't need to grovel through h/w state
to find it
o remove ds_vdata field from the descriptor state as part of the
radar changes
o fix excessive stack usage for some 5212 rf backends
o correct rfkill handling when the pin polarity is 0 true
o correct handling of tsf wrap when reading 64-bit values
MFC after: 1 month
kernel. This LOR snuck in with some of the recent syncache changes. To
fix this, the inpcb handling was changed:
- Hang a MAC label off the syncache object
- When the syncache entry is initially created, we pickup the PCB lock
is held because we extract information from it while initializing the
syncache entry. While we do this, copy the MAC label associated with
the PCB and use it for the syncache entry.
- When the packet is transmitted, copy the label from the syncache entry
to the mbuf so it can be processed by security policies which analyze
mbuf labels.
This change required that the MAC framework be extended to support the
label copy operations from the PCB to the syncache entry, and then from
the syncache entry to the mbuf.
These functions really should be referencing the syncache structure instead
of the label. However, due to some of the complexities associated with
exposing this syncache structure we operate directly on it's label pointer.
This should be OK since we aren't making any access control decisions within
this code directly, we are merely allocating and copying label storage so
we can properly initialize mbuf labels for any packets the syncache code
might create.
This also has a nice side effect of caching. Prior to this change, the
PCB would be looked up/locked for each packet transmitted. Now the label
is cached at the time the syncache entry is initialized.
Submitted by: andre [1]
Discussed with: rwatson
[1] andre submitted the tcp_syncache.c changes
controller. Due to lack of documentation, this driver is based on the
code from sk(4) and Marvell's myk(4) driver for FreeBSD. I've also
adopted the OpenBSD interface name, msk(4) in order to reduce naming
differences between BSDs.
The msk(4) driver supports the following Gigabit Ethernet adapters.
o SysKonnect SK-9Sxx Gigabit Ethernet
o SysKonnect SK-9Exx Gigabit Ethernet
o Marvell Yukon 88E8021CU Gigabit Ethernet
o Marvell Yukon 88E8021 SX/LX Gigabit Ethernet
o Marvell Yukon 88E8022CU Gigabit Ethernet
o Marvell Yukon 88E8022 SX/LX Gigabit Ethernet
o Marvell Yukon 88E8061CU Gigabit Ethernet
o Marvell Yukon 88E8061 SX/LX Gigabit Ethernet
o Marvell Yukon 88E8062CU Gigabit Ethernet
o Marvell Yukon 88E8062 SX/LX Gigabit Ethernet
o Marvell Yukon 88E8035 Gigabit Ethernet
o Marvell Yukon 88E8036 Gigabit Ethernet
o Marvell Yukon 88E8038 Gigabit Ethernet
o Marvell Yukon 88E8050 Gigabit Ethernet
o Marvell Yukon 88E8052 Gigabit Ethernet
o Marvell Yukon 88E8053 Gigabit Ethernet
o Marvell Yukon 88E8055 Gigabit Ethernet
o Marvell Yukon 88E8056 Gigabit Ethernet
o D-Link 550SX Gigabit Ethernet
o D-Link 560T Gigabit Ethernet
Unlike OpenBSD/NetBSD msk(4), the msk(4) driver supports all hardware
features including TCP/UDP checksum offload for transmit, MSI, TCP
segmentation offload(TSO), hardware VLAN tag stripping/insertion,
and jumbo frames(up to 9022 bytes). The only unsupported hardware
feature except RLMT is Rx checksum offload which I don't know how to
make it work reliably.
Known Issues:
It seems msk(4) does not work on the second port of dual port NIC.
(The first port works without problems.)
Thanks to Marvell for releasing the BSD licensed myk(4) driver and
thanks to all users helped fixing bugs.
Tested by: bz, philip, bms,
YAMAMOTO Shigeru < shigeru AT iij DOT ad DOT jp >,
Dmitry Pryanishnikov < dmitry AT atlantis DOT dp DOT ua >,
Jia-Shiun Li < jiashiun AT gmail DOT com >,
David Duchscher < daved AT tamu DOT edu >,
Arno J. Klaassen < arno AT heho DOT snv DOT jussieu DOT fr>,
Nicolae Namolovan < adrenalinup AT gmail DOT com>,
Andre Guibert de Bruet < andy AT siliconlandmark DOT com >
current ML
Tested on: i386, amd64
subtypes of HT capabilities.
- Add constants for the MSI mapping window HT PCI capability.
- On i386 and amd64, enable the MSI mapping window on any HT bridges we
encounter and report any non-standard mapping window addresses.
pcib_alloc_msix() methods instead of using the method from the generic
PCI-PCI bridge driver as the PCI-PCI methods will be gaining some PCI-PCI
specific logic soon.
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.
This is the "+ one more change" missed in the original commit.
Noticed by: tinderbox
Pointy hat to: me (#1)
In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.
- Use the appropriate register writing method when reseting the chip
- Program the descriptor DMA engine correctly.
- More reliably detect certain chips and their features.
Also add some low-level debugging tools to help future work on this driver.
Submitted by: David Christenson (proof of concept changes)
Sponsored by: www.UIA.net
This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt
for other NFS error responses. For now, disabling wcc pre-op attr checks and
post-op attr loads on NFS errors (sysctl'ed).
Reported by: Kris Kennaway
- Correct RX packet drop counter for BCM5705+. This register is read/clear
and it wraps very quickly under heavy packet drops because only the lower
ten bits are valid according to the documentation. However, it seems few
more bits are actually valid and the rest bits are always zeros[1].
Therefore, we don't mask them off here. To get accurate packet drop count,
we need to check the register from bge_rxeof(). It is commented out for now,
not to penalize normal operation. Actual performance impact should be
measured later.
- Correct integer casting from u_long to uint32_t. Casting is not really
needed for all supported platforms but we better do this correctly[2].
Tested by: bde[1]
Suggested by: bde[2]
o Remove unused static global variable e1000phy_debug.
o Take advantage of mii_phy_dev_probe().
o Use MII_ANEGTICKS/MII_ANEGTICKS_GIGE instead of magic number 5.
o Add IFM_NONE as e1000phy(4) supports it without issues.
o Nuke magic PHY programming sequence in PHY reset and follow correct
reset sequence. [1]
o Make manual media selection work for all supported media types.
o Don't set MIIF_NOISOLATE so e1000phy(4) can be used in
configurations with multiple PHYs.
o In 1000baseT, when setting the link manually, one side must be the
master and the other the slave. If LINK0 is set, program the PHY
to be a master, otherwise it's a slave.
o When we lost a link, reset mii_ticks immediately so it correctly
check number of seconds elapsed in autonegotiation phase.
o Announce link loss right after it happens.
o After kicking autonegotiation, report PHY status instead of
returning immediatly.
o When link state check is in progress, check auto negotiation
completion bit only when auto negotiation is enbaled.
o When PHY is resolved to a master, show it with IFM_FLAG2.
Special thanks to marius who fixed several nits in original patch.
In half-duplex mode, nfe(4) fails to send packets. I think it's a bug
in nfe(4) as the same PHY works without problems on msk(4).
Obtained from: em(4) [1]
Reviewed by: marius
Tested by: bz
Fixing the IP accounting issue, if we plan to do so, needs to be better
thought out; the 'fix' introduces a hash lookup and a possible kernel panic.
Reported by: Mark Tinguely
Either they're there early and the ispfw sets have
registered themselves, or they're not.
The module dependency stuff isn't quite what we want
anyway. If the user doesn't want the load placed on
system memory by loading the firmware, they don't
specify it to be loaded (either by being linked in
or via being a module to be loaded and then hooked
in with firmware(9)). It doesn't then make sense to
then override what they want by pulling it in anyway.
This might be able to work if we were able to pull in
just exactly what we needed for the card we have- but
that's an optimization left for the future.
at which the kernel should start allocating physical memory. The primary
purpose of this is to test 64-bit cleanness of the data path by setting
hw.physmemstart=4G so that all physical allocations are above 4GB. AMD64
and i386/PAE could also benefit from having this option.
just the intenral phy on parts supported by the rl and re drivers, the
RTL8201BL for example. He also sent me a nice picture of hundreds of
these chips in a tray to boulder his claim. :-) Therefore remove a
comment that suggested that they were...
is already bounded by hw.physmem to calculate phys_avail[] - previously only
real_phys_avail[] was being bound by hw.physmem so we were allocating memory
that wasn't mapped in the direct map
- shuffle memory range following kernel to the beginning of phys_avail
- have the direct area use 256MB pages where possible
- remove dead code from the end of pmap_bootstrap
- have pmap_alloc_contig_pages check all memory ranges in phys_avail before
giving up
- informal benchmarking indicates a ~5% speedup on buildworld
an "export" flag indicating that we are trying to NFS export the
filesystem, and the MSDOSFS_LARGEFS flag is set on the filesystem,
then deny the mount update and export request. Otherwise,
let the full mount update proceed normally.
MSDOSFS_LARGES and NFS don't mix because of the way inodes are calculated
for MSDOSFS_LARGEFS.
MFC after: 3 days
The symptoms were that outgoing DHCP requests for diskless kernels
had the IP header corrupt. After long investigations, the source of
the problem was found in ether_output() - for SIMPLEX interfaces
and broadcast traffic, a copy of the packet is passed back to the kernel
through if_simloop(). However if_simloop() modifies the mbuf, while
the copy obtained through m_copym() is a readonly one.
The bug has been there forever, but it has been triggered only recently
by a change in sosend_dgram() which passed down mbufs with sufficient
space to prepend the header.
This fix is trivial - use m_dup() instead of m_copy() to create
the copy. As an alternative, we could try and modify if_simloop()
to play safely with readonly mbufs, but i don't think it is worthwhile
because 1) this is a relatively infrequent code path so we do not need
to worry too much about performance, and 2) the cost of doing an
extra m_pullup in if_simloop() is probably the same as doing the
copy of the cluster, anyways.
MFC after: 1 week
field to "unsigned long" so that it actually works.
Thanks to Robert Sciuk for sending me a DVD that
demonstrated ISO9660-formatted media with a file >2G.
I've now fixed this both in libarchive and in the cd9660
filesystem.
MFC after: 14 days
timer in xl_txeof()/xl_txeof_90xB(); xl_poll_locked() unconditionally
invokes xl_txeof()/xl_txeof_90xB(), effectively circumventing that
the watchdog ever fires in the DEVICE_POLLING case as its timer is
constantly reloaded.
- Remove the banal and pedantically outdated comment regarding setting
xl_wdog_timer to 0 in xl_txeof().
Pointed out by: bde
(somewhat) meaningful message and terminate the build. It'd be
nice to print a proper URL from which to fetch the file but that
seems problematic. Leave a suggested starting point in this file
(TBD: add it to the man page).
Submitted by: ru
to workaround the problem with SMP kernels on Turion64 X2 processors
described in kern/104678 and may be useful in other situations too.
MFC after: 3 days
rather than treating them as a fatal exception and halting. At least one
storage BIOS (some newer mpt(4) parts) have a breakpoint instruction in
their disk read routine.
MFC after: 3 days
Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.
Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.
The ULE scheduler compiles again but I have no idea if it works.
The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.
Tested by David Xu, and Dan Eischen using libthr and libpthread.
driving xl_watchdog() in order to avoid races accessing if_timer.
While at it relax the watchdog a bit by reloading it in xl_txeof()/
xl_txeof_90xB() if there are still packets enqueued.
- Use bus_get_dma_tag() so xl(4) works on platforms requiring it.
- Don't bother to set if_mtu to ETHERMTU, ether_ifattach() does that.
hme_watchdog() in order to avoid races accessing if_timer.
- Use bus_get_dma_tag() so hme(4) works on platforms requiring it.
- Don't bother to set if_mtu to ETHERMTU, ether_ifattach() does that.
gem_watchdog() in order to avoid races accessing if_timer.
While at it relax the watchdog a bit by reloading it in gem_tint()
if there are still packets enqueued.
- Don't bother to set if_mtu to ETHERMTU, ether_ifattach() does that.
- Fix inconsistencies in prototypes.
depending on the NIC and isn't used at all with HomePNA links)
instead of if_slowtimo() for driving dc_watchdog() in order to
avoid races accessing if_timer.
- Use bus_get_dma_tag() so dc(4) works on platforms requiring it.
- Don't bother to set if_mtu to ETHERMTU, ether_ifattach() does that.
- Remove an alpha remnant in dc_softc.
of the nvenet lib upgrade (the constant went from 63 (2^n - 1) to
32 (2^n)). For reasons that are not obvious to me this fixes the driver
on at least some NICs.
MFC after: 3 days
with- not hope for the best. Change some things which were gated
off of 24XX to be gated off of 2K login support. Convert some
isp_prt calls to xpt_print calls.
Add a note that suggests a cleanup.
Note: This patch was derived based on looking at the pvrxxx/pvr250
ports' Makefiles only, and may be incomplete. It is not derived from
anything I saw from Hauppage.
was written into a user's address space. The fix is to modify uiomove_fromphys
to sync the icache when an executable user-space page is written into.
Alan Cox suggested that there should probably be a higher-level interface
to this in the ptrace code, but agreed that this is an OK short-term solution.
Files changed:
pmap.h - declaration of pmap_page_executable()
pmap_dispatch.c - pass through the page_executable call to the mmu object
mmu_oea.c - implement the page_executable method by examining the PTE_EXEC
field in the vm_page_t
uio_machdep.c - in uiomove_fromphys(), if the op was a UIO_WRITE to user-space,
and if the page is executable, sync the icache since this is at the least
a breakpoint-write from gdb.
Reported by: marcel
Tested by: marcel, grehan on g3+g4
Discussed with: alc
MFC after: 2 weeks
Fixes for 'blocking in fifoor state' problem of LTP tests.
linux_*stat*() functions were opening files with O_RDONLY to get
major/minor pair for char/block special files. Unfortunately,
when these functions are used against fifo, it is blocked forever
because there is no writer. Instead, we only open char/block special
files for major/minor conversion. We have to get rid of kern_open()
entirely from translate_path_major_minor() but today is not the day.
While I am here, add checks for errors before calling
translate_path_major_minor().
if waittime was zero (the lock was uncontested) l->lpo_waittime
in the hash table would not get initialized.
Inspection prompted by questions from: Attilio Rao
- create real_phys_avail which includes all memory ranges to be added to the direct map
- merge in nucleus memory to real_phys_avail
- distinguish between tag VA and index VA in tsb_set_tte_real for cases where page_size != index_page_size
- clean up direct map loop
of the bridge port and path cost have been administratively set or
calculated automatically by RSTP.
Make sure to transition from non-edge to edge when the port goes down
and the edge flag was manually set before.
This is needed to comply with the condition
((!portEnabled && AdminEdge) || ....)
in the Bridge Detection State Machine (IEE802.1D-2004, p. 171).
Reviewed by: thompsa
Approved by: bz (mentor)
pthread_cancel()ed, it is expected that the thread will not
consume a pthread_cond_signal(), therefor, we use thr_wake()
to mark a flag, the flag tells a thread calling do_cv_wait()
in umtx code to not block on a condition variable.
Thread library is expected that once a thread detected itself
is in pthread_cond_wait, it will call the thr_wake() for itself
in its SIGCANCEL handler.
- In hme_eint() print MIF register contents on MIF interrupts.
- In hme_mifinit() don't bother to preserve the previous MIF config.
This was mainly done in order to preserve the PHY select bit (external
or internal PHY) but which only needs to be set as appropriate when
reading from or writing to the desired PHY in hme_mii_{read,write}reg().
Similarly don't bother to set the PHY select bit in hme_mii_statchg().
- In hme_mii_{read,write}reg() ignore requests to PHYs other than the
external and internal PHY one.
- Move enabling/disabling the MII drivers of the external transceiver
from hme_init_locked() and based on the sheer presence of an external
to hme_mifinit() and based on the currently selected media, defaulting
to the internal transceiver when the media hasn't been set, yet.
Invoke hme_mifinit() from the newly added hme_mediachange_locked() so
the setting of the MII drivers is updated when changing media.
These changes keep the MII bus from wedging (which manifests in the HME
and the PHYs no longer being able to communicate with each other) when
the PHY device drivers isolate the unused PHY in two-PHY configurations
as present in f.e. Netra t1 100 while changing media, either from
hme_init_locked() (see also below) or via ifconfig(8). They also allow
for using both transceivers/PHYs.
- In the newly added hme_mediachange_locked() also reset the PHYs in two-
PHY configurations before invoking mii_mediachg(). This is required
for successfully unisolating the previously unused PHY when switching
between PHYs.
- Now that changing media should no longer cause problems back out rev.
1.27 and re-enable setting the current media in hme_init_locked() (see
the commit message of rev. 1.23 for more info).
These changes are roughly a merge of NetBSD gem.c rev. 1.32 - 1.35 (1.30
was already fixed differently in our 1.36; 1.31 and 1.32 were wrong) with
some parts reworked and things that don't make sense like setting the MII
drivers and restoring the previous MIF and XIF settings in hme_mii_{read,
write}reg() omitted.
MFC after: 2 weeks
Use TAILQ_FOREACH_SAFE instead of the unsafe one where an item is removed
from the queue.
This prevents a panic on kldunload.
Submitted by: rdivacky
Tested by: bsam
set birthtime to FAT CTime (creation time) and in the other cases
set birthtime to -1.
o Set ctime to mtime instead of FAT CTime which has completely
different meaning.
PR: kern/106018
Submitted by: Oliver Fromme
MFC after: 1 month
read wasn't flagging the SYNC mode was enabled. The temp
values for offset and sync period were uint8_t, but were
being assigned and shifted from a uint32_t value.
This didn't show up in testing because a random number
of 1030 cards set a bit that says "honor BIOS negotiation",
which means this whole code path was skipped.
This should clear up at least some of the negotation
issues that have been seen.
author can't remember why it was there.
The CTS_SCSI_FLAGS_TAG_ENB remains in place, and makes sense, and is
checked all over the place.
The CTS_SPI_FLAGS_TAG_ENB was probably an attempt to distinguish
protocol and transport tag capabilities. At the very least this can
be confusing and prone to many bugs, so let's just assume that the
transport tag case just flows from the protocol (and vice versa)
for now.
and by only delaying when an RTC register is written to. The delay
after writing to the data register is now not just a workaround.
This reduces the number of ISA accesses in the usual case from 4 to
1. The usual case is 2 rtcin()'s for each RTC interrupt. The index
register is almost always RTC_INTR for this. The 3 extra ISA accesses
were 1 for writing the index and 2 for delays. Some delays are needed
in theory, but in practice they now just slow down slow accesses some
more since almost eveyone including us does them wrong so modern systems
enforce sufficient delays in hardware. I used to have the delays ifdefed
out, but with the index register optimization the delays are rarely
executed so the old magic ones can be kept or even implemented non-
magically without significant cost.
Optimizing RTC interrupt handling is more interesting than it used to
be because RTC interrupts are currently needed to fix the more efficient
apic timer interrupts on some systems. apic_timer_hz is normally 2000
so the RTC interrupt rate needs to be 2048 to keep the apic timer
firing on such systems. Without these changes, each RTC interrupt
normally took 10 ISA accesses (2 PIC accesses and 2 sets of 4 RTC
accesses). Each ISA access takes 1-1.5uS so 10 of then at 2048 Hz
takes 2-3% of a CPU. Now 4 of them take 0.8-1.2% of a CPU.
priority mutex implemented, it is the time to introduce this stuff,
now we can use umutex and ucond together to implement pthread's
condition wait/signal.
Fix things to use the LSI-Logic Fusion Library mask and shift names for
offset and sync, no matter how awkward they are, in preference to just
plain numbers.
- Don't set MIIF_NOISOLATE so amphy(4) can be used in configurations
with multiple PHYs. There doesn't seem to be a problem with isolating
AM79c873 and workalikes per se nor in combination with the NICs they're
used with and amphy(4) was already adding IFM_NONE anyway.
- Use mii_phy_add_media() instead of mii_add_media() so the latter can
be eventually retired.
- Take advantage of mii_phy_setmedia().
- Fix a whitespace nit.
Obtained from: NetBSD dmphy(4) (except for the last item)
MFC after: 2 weeks
the currently selected media is of type IFM_AUTO as auto-negotiation
doesn't need to be kicked anyway.
- Fix a whitespace nit.
- Probe another Altima PHY, which is a AC101 workalike and integrated
in at least ADMtek ADM8511 but apparently is not mentioned in any
publically available data sheet so the actual identifier is unknown.
- Don't set MIIF_NOISOLATE so acphy(4) can be used in configurations
with multiple PHYs. There doesn't seem to be a problem with isolating
AC101 and workalikes per se nor in combination with the NICs they're
used with.
- Use mii_phy_add_media() instead of mii_add_media() so the latter can
be eventually retired.
- Take advantage of mii_phy_setmedia().
Obtained from: NetBSD (except for the first and second item)
MFC after: 2 weeks
in at least ADMtek ADM8511 but apparently is not mentioned in any
publically available data sheet so the actual identifier is unknown.
- Add Davicom DM9102 PHY.
- Add DM9101 to the description of AMD 79C873 as at least some Davicom
DM9101F identify identical to AMD 79C873.
Obtained from: NetBSD
MFC after: 2 weeks
with multiple PHYs. There doesn't seem to be a problem with isolating
78Q2120 per se nor in combination with the NICs they're used with and
tdkphy(4) was already adding IFM_NONE anyway.
- Set MIIF_NOLOOP as loopback doesn't work with this PHY. The MIIF_NOLOOP
flag currently triggers nothing but hopefully will be respected by
mii_phy_setmedia() later on.
- Use mii_phy_add_media() instead of mii_add_media() so the latter can
be eventually retired.
- Take advantage of mii_phy_setmedia().
Thanks to Hans-Joerg Sirtl for lending me test hardware.
Obtained from: NetBSD tqphy(4)
MFC after: 2 weeks
with multiple PHYs and un-comment case IFM_NONE in case MII_MEDIACHG
rgephy_service(). There doesn't seem to be a problem with isolating
RTL8169S and their internal PHY.
- Take advantage of mii_phy_add_media(). [1]
Obtained from: NetBSD [1]
Tested by: yongari
MFC after: 2 weeks
- Fix some whitespace nits.
- Fix some spelling in comments.
- Use MII_ANEGTICKS instead of 5.
- Don't define variables in nested scope.
- Remove superfluous returns at the end of void functions.
- Remove unused static global rgephy_mii_model.
- Remove dupe $Id$ in tdkphy(4).
- Sort brgphys table.
MFC after: 2 weeks
and Daichi GOTO <daichi@FreeBSD.org> for submitting this
major rewrite of unionfs. This rewrite was done to
try to solve many of the longstanding crashing and locking
issues in the existing unionfs implementation. This
implementation also adds a 'MASQUERADE mode', which allows
the user to set different user, group, and file permission
modes in the upper layer.
Submitted by: daichi, Masanori OZAWA
Reviewed by: rodrigc (modified for minor style issues)
- Playback and headphone/speaker automute works.
- Recording untested due to me being deaf doing back-and-forth
remote debugging.
Free Macbook donation is highly appreciated :)
Tested by: Dennis Pielken <mips128@gmx.net>
mii_phy_match() API and takes care of the PHY device probe based on
the struct mii_phydesc array and the match return value provided.
Convert PHY drivers to take advantage of mii_phy_dev_probe(),
converting drivers to provide a mii_phydesc table in the first
place where necessary.
Reviewed by: yongari
MFC after: 2 weeks
- Currently LINUX_MAX_COMM_LEN is smaller than MAXCOMLEN, but in case
this will change we have a buffer overflow. Apply some defensive
programming to DTRT when this should happen.
- Use copyinstr() instead of copyin where appropriate.
* Fallback to copyin() in case of ENAMETOOLONG. [1]
* Use the right source and destination (it was wrong before).
- Use strlcpy instead of strcpy.
- Properly lock the read case (PR_GET_NAME) like the write case.
Reviewed by: rwatson (except [1])
Suggested by: rwatson [1]
on the arm. Add an assert to ensure that the size is 8 to prefent others
from falling into this trap (we should have more of these).
Why the construct:
struct foo {
union bar {
struct {
...
} __packed fred;
...
} __packed wilma;
} __packed;
has a different packing than:
struct foo {
union bar {
struct {
...
} fred __packed;
...
} wilma __packed;
} __packed;
is beyond my ability to ferret out of the gcc documentation. Most
likely some subtle binding issue (eg before it says the struct itself
is packed, while after it means that the whole struct is packed into
the thing it is in). Pointers to relevant documentation would be
appreciated.
sizeof ether_header is 2 * ETHER_ADDR_LEN + 2 (14) bytes long
sizeof ether_addr is ETHER_ADDR_LEN bytes long
On arm, this shows that struct ether_addr needs to be __packed.
The first condition muts be true for the bridging code to not dump core.
The second one appears to be implicitly relied upon by wi (but many
of the rids it sends down likely need __packed too to be safe) and
maybe others. It appears to not hurt anything.
discarded RX packets to input error for BCM5705 or newer chipset as the others.
Unfortunately we cannot do the same for output errors because ifOutDiscards
equivalent register does not exist. While I am here, replace misleading and
wrong BGE_RX_STATS/BGE_TX_STATS with BGE_MAC_STATS. They were reversed but
worked accidently.
machines and both TX and RX were broken on big-endian machines.
The chip design is crazy -- on RX, it puts the 16-bit VLAN tag
in network byte order (big-endian) in the 32-bit little-endian
register!
Thanks to John Baldwin for helping me document this change! ;-)
Tested by: sat (amd64), test program (sparc64)
PR: kern/105054
MFC after: 3 days
if_watchdog/if_timer interface doesn't fit modern SMP network
stack design.
Device drivers that need watchdog to monitor their hardware should
implement it theirselves.
Eventually the if_watchdog/if_timer API will be removed. For now,
warn that driver uses it.
Reviewed by: scottl
__stop_<section> symbols generated by the static linker for elf
sections. This is done only for the final link, and not for ld -r.
Augment elf_obj in-kernel linker by recognizing such special symbols,
and resolving them to the start and end of the section automatically.
As result, linker sets on amd64 could be used in the same way as on
other architectures, without explicit calls to linker_file_lookup_set().
Requested by: rdivacky
No objections from: peter, jhb