Commit graph

284 commits

Author SHA1 Message Date
Edward Tomasz Napierala
5ed1eb2bb0 Fix panic on invalid 'mdconfig -at preload' usage.
PR:		kern/80136
2010-02-27 10:41:30 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Konstantin Belousov
3364c323e6 Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
Marcel Moolenaar
dbb95048da Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
2009-05-18 18:37:18 +00:00
John Baldwin
33fc362512 Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a
filesystem supports additional operations using shared vnode locks.
Currently this is used to enable shared locks for open() and close() of
read-only file descriptors.
- When an ISOPEN namei() request is performed with LOCKSHARED, use a
  shared vnode lock for the leaf vnode only if the mount point has the
  extended shared flag set.
- Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but
  not O_CREAT.
- Use a shared vnode lock around VOP_CLOSE() if the file was opened with
  O_RDONLY and the mountpoint has the extended shared flag set.
- Adjust md(4) to upgrade the vnode lock on the vnode it gets back from
  vn_open() since it now may only have a shared vnode lock.
- Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since
  FIFO's require exclusive vnode locks for their open() and close()
  routines.  (My recent MPSAFE patches for UDF and cd9660 already included
  this change.)
- Enable extended shared operations on UFS, cd9660, and UDF.

Submitted by:	ups
Reviewed by:	pjd (ZFS bits)
MFC after:	1 month
2009-03-11 14:13:47 +00:00
Alan Cox
b72cca38a6 Remove unnecessary page queues locking around vm_page_wakeup(). (This
change is applicable to RELENG_7 but not RELENG_6.)

MFC after:	1 week
2009-02-22 02:50:31 +00:00
Edward Tomasz Napierala
a9ebb31183 Add the possibility to specify "-o force" with "mdconfig -du".
Reviewed by:	scottl
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2009-01-10 17:17:18 +00:00
Edward Tomasz Napierala
41c8b468e6 Fix forced mdconfig -du. E.g. the following would previously
result in panic:

mdconfig -af blah.img -o force
mount /dev/md0 /mnt
mdconfig -du 0

Reviewed by:	scottl
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2008-12-16 20:59:27 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
Ed Schouten
06d425f92e Remove the distinction between device minor and unit numbers.
Even though we got rid of device major numbers some time ago, device
drivers still need to provide unique device minor numbers to make_dev().
These numbers are only used inside the kernel. They are not related to
device major and minor numbers which are visible in devfs. These are
actually based on the inode number of the device.

It would eventually be nice to remove minor numbers entirely, but we
don't want to be too agressive here.

Because the 8-15 bits of the device number field (si_drv0) are still
reserved for the major number, there is no 1:1 mapping of the device
minor and unit numbers. Because this is now unused, remove the
restrictions on these numbers.

The MAXMAJOR definition was actually used for two purposes. It was used
to convert both the userspace and kernelspace device numbers to their
major/minor pair, which is why it is now named UMINORMASK.

minor2unit() and unit2minor() have now become useless. Both minor() and
dev2unit() now serve the same purpose. We should eventually remove some
of them, at least turning them into macro's. If devfs would become
completely minor number unaware, we could consider using si_drv0 directly,
just like si_drv1 and si_drv2.

Approved by:	philip (mentor)
2008-05-29 12:50:46 +00:00
Philip Paeps
3cf74e539b Zero sc->vnode if mdsetcred() fails.
This fixes the panic which happens when mdcreate_vnode() calls vn_close()
and mddestroy() calls it again further down the error handling path.

Reviewed by:	kris, kib
MFC after:	3 days
2008-02-28 18:31:54 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Maxim Sobolev
a03be42da6 Put back devstat support that was lost during GEOM transition. Initially,
I've tried to move md(4) to use geom_disk class, like real disks do, but
this requires major rework of some of the existing features such as
configuration dumping for example. Therefore just putting devstat support
directly into md(4) seems to be optimal solution.

Now you can see md(4) stats in `systat -vm' again.

MFC after:	2 weeks
2007-11-07 22:47:41 +00:00
Julian Elischer
3745c395ec Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
Jeff Roberson
982d11f836 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
Konstantin Belousov
9e223287c0 Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by:	jhb
Reviewed by:	daichi (unionfs)
Approved by:	re (kensmith)
2007-05-31 11:51:53 +00:00
Konstantin Belousov
3b7b5496a7 Resolve two deadlocks that could be caused by busy md device backed
by vnode. Allow for md thread and the thread that owns lock on vnode
backing the md device to do the write even when runningbufspace is
exhausted.

Tested by:	Peter Holm
Reviewed by:	tegge
MFC after:	2 weeks
2006-12-14 11:34:07 +00:00
Pawel Jakub Dawidek
a777323904 Style nits. 2006-11-01 18:59:06 +00:00
Pawel Jakub Dawidek
5541f25ec7 Fix md(4) panic which occurs when I/O request different than
BIO_READ/BIO_WRITE is sent to vnode-backed provider (BIO_DELETE or
BIO_FLUSH).

Reported by:	ceri

Add support for BIO_FLUSH to vnode-backed md(4) devices based on
VOP_FSYNC().
2006-11-01 18:56:18 +00:00
John Baldwin
a08d2e7fe1 - Conditionally acquire Giant in mdstart_vnode(), mdcreate_vnode(), and
mddestroy() only if the file is from a non-MPSAFE VFS.
- No longer unconditionally hold Giant in the md kthread for vnode-backed
  kthreads.
- Improve the handling of the thread exit race when destroying an md
  device.
2006-03-28 21:25:11 +00:00
Wojciech A. Koszek
c27a895433 Teach md(4) and mdconfig(8) how to understand XML. Right now there won't be
a problem with listing large number of md(4) devices. Either 'list' or
'query' mode uses XML.

Additionally, new functionality was introduced. It's possible to pass
multiple devices to -u:

	# ./mdconfig -l -u md0,md1

Approved by:	cognet (mentor)
2006-03-26 23:21:11 +00:00
Luigi Rizzo
de64f22aa4 make sure that the start and end preloaded MFS markers are
in contiguous strings, and that the compiler does not optimize them
away because it thinks they are unused.
2006-01-31 13:35:30 +00:00
Pawel Jakub Dawidek
b322d85d53 Call NDFREE() only when vn_open() succeeded.
MFC after:	3 days
2006-01-27 11:27:55 +00:00
Maxim Konovalov
6c3cd0e2f6 o Fix typos in the comments.
Submitted by:	Wojciech A. Koszek
2005-12-28 15:18:18 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Poul-Henning Kamp
947fc8de03 Make sure that the worker thread knows the type early enough to
grab Giant for vnode backing.

Found by:	pho & tegge
2005-10-06 19:47:04 +00:00
Poul-Henning Kamp
9b00ca1961 Fix configuration locking in MD.
Remove  md_mtx.

Remove GIANT from the mdctl device driver and avoid DROP_GIANT,
PICKUP_GIANT and geom events since we can call into GEOM directly
now.

Pick up Giant around vn_close().

Apply an exclusive sx around mdctls ioctl and preloading to protect
lists etc..

Don't initialize our lock (md_mtx or md_sx) from a
SYSINIT when there is a perfectly good pair of _fini/_init
functions to do it from.

Prune any final fractional sector from the mediasize to
keep GEOM happy.

Cleanups:

Unify MDIOVERSION check in (x)mdctlioctl()

Add pointer to start() routine to softc to eliminate a switch{}

Inline guts of mddetach().

Always pass error pointer to mdnew(), simplify implementation.
2005-09-19 06:55:27 +00:00
Poul-Henning Kamp
9fbea3e365 Do not destroy the queue mutex until the thread is done with it. 2005-09-11 12:35:32 +00:00
Pawel Jakub Dawidek
7ee3c044d0 - Add md_mtx lock to protect ID number and list of devices.
- Always check mdnew() return value, as even in !autounit case
  kthread_create() can fail.

Those two changes fix serval panics provked by simple stress test.

Tested by:	Kris The BugMagnet
MFC after:	3 days
2005-08-31 19:45:11 +00:00
Christian S.J. Peron
8677689134 Ensure that file flags such as schg, sappnd (and others) are honored
by md(4). Before this change, it was possible to by-pass these flags
by creating memory disks which used a file as a backing store and
writing to the device.

This was discussed by the security team, and although this is problematic,
it was decided that it was not critical as we never guarantee that root will
be restricted.

This change implements the following behavior changes:

-If the user specifies the readonly flag, unset write operations before
 opening the file. If the FWRITE mask is unset, the device will be
 created with the MD_READONLY mask set. (readonly)
-Add a check in g_md_access which checks to see if the MD_READONLY mask
 is set, if so return EROFS
-Do not gracefully downgrade access modes without telling the user. Instead
 make the user specify their intentions for the device (assuming the file is
 read only). This seems like the more correct way to handle things.

This is a RELENG_6 candidate.

PR:		kern/84635
Reviewed by:	phk
2005-08-17 01:24:55 +00:00
Alan Cox
e340fc602b Request a CPU private mapping from sf_buf_alloc(). If the swap-backed
memory disk is larger than the number of available sf_bufs, this improves
performance on SMPs by eliminating interprocessor TLB shootdowns.  For
example, with 6656 sf_bufs, the default on my test machine, and a 256MB
swap-backed memory disk, I see the command
"dd if=/dev/md0 of=/dev/null bs=64k" achieve ~489MB/sec with the default,
shared mappings, and ~587MB/sec with CPU private mappings.
2005-02-13 21:51:50 +00:00
Poul-Henning Kamp
d9aaa28f63 Use MAXMINOR 2005-01-29 16:50:04 +00:00
Pawel Jakub Dawidek
1db17c6db2 - Don't destroy UMA zone on error in mdcreate_malloc(), because we need it
in mddestroy() to properly free already allocated memory.
  This fixes a panic when we want to create too big memory backed device
  with preallocate memory (-o reserve).
- Remove redundant { }.

MFC after:	1 week
2005-01-22 19:56:03 +00:00
Poul-Henning Kamp
9d3a77c463 Add a couple of mtx_asserts() to try to narrow down the window on
a bug repeatedly reported.
2005-01-22 19:08:50 +00:00
Warner Losh
098ca2bda9 Start each of the license/copyright comments with /*-, minor shuffle of lines 2005-01-06 01:43:34 +00:00
Alan Cox
c935314fae Add needed synchronization to the error handling code that was introduced
in revision 1.141.

Lock assertion failures reported by: Kris Kennaway
2005-01-05 05:32:52 +00:00
John Baldwin
63710c4d35 Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.
2004-12-30 20:29:58 +00:00
Pawel Jakub Dawidek
88b5b78d59 Rewrite piece of code which I committed some time ago that allows to
show file name for 'mdconfig -l -u <x>' command.
This allows to preserve API/ABI compatibility with version 0 (that's why
I changed version number back to 0) and will allow to merge this change
to RELENG_5.

MFC after:	5 days
2004-12-27 17:20:06 +00:00
Marcel Moolenaar
8b6fc67a49 Fix the MDIOCDETACH ioctl() for md(4). Now that the md_file field in
the mdio structure is an array and not a pointer, we cannot test for
it to be NULL. It never is. Instead, test for md_file[0] to be '\0'.
2004-11-13 05:00:12 +00:00
Pawel Jakub Dawidek
e3ed29a739 Be consistent and use 'if (error != 0)' instead of 'if (error)' everywhere. 2004-11-06 13:16:35 +00:00
Pawel Jakub Dawidek
61a6eb62ec For file backed md(4) devices output their source file via
'mdconfig -l -u <unit>'.
Bump version number, as this change breaks ABI/API.
2004-11-06 13:07:02 +00:00
Poul-Henning Kamp
3b66ad07db Don't explicitly call g_waitidle(), it happens automagically now. 2004-10-23 20:50:06 +00:00
Brian Feldman
812851b6c9 Account for failure in vm_pager_allocate() or vm_pager_get_pages() in
md(8).  The former is generally not going to fail, but the latter can
fail when the underlying swap device returns an error.

There are still plenty of other places where vm_pager_get_pages() failing
will lead directly to crashes, so it's a good idea to put your swap on
RAID if you care enough to put any of your disks on RAID....
2004-10-12 04:47:16 +00:00
Pawel Jakub Dawidek
e4cdd0d4b5 Actually this order (unlock, wakeup) in this case is race-safe and can
save us 2 context switches.

Explained by:	njl
2004-09-18 09:16:19 +00:00
Pawel Jakub Dawidek
b830359bc5 - Make md(4) 64-bit clean.
After this change it should be possible to use very big md(4) devices.
- Clean up and simplify the code a bit.
- Use humanize_number(3) to print size of md(4) devices.
- Add 't' suffix which stands for terabyte.
- Make '-S' to really work with all types of devices.
- Other minor changes.
2004-09-16 21:32:13 +00:00
Pawel Jakub Dawidek
fcd57fbe6f There is no need to keep 'npage' value inside our softc structure,
it is only used in one function. While doing so, change its type to
vm_ooffset_t.
We are still limited for swap-backed devices to 16TB on 32-bit architectures
where PAGE_SIZE is 4096 bytes.
2004-09-16 20:38:11 +00:00
Pawel Jakub Dawidek
a8a58d03f6 - Do not use bio_pblkno as it is going away anyway.
- Prefer bio_length than bio_bcount.
2004-09-16 19:42:17 +00:00
Pawel Jakub Dawidek
4b07ede4a7 First wakeup, then unlock. 2004-09-16 18:59:19 +00:00
Pawel Jakub Dawidek
6ab0a0aefe Type 'int' is too small for 'i' and 'lastp' variables. Use proper type,
which is vm_pindex_t (unsigned 64bit on i386).
2004-09-16 18:56:20 +00:00
Pawel Jakub Dawidek
2eafd8b126 Deallocate VM object on failure. 2004-09-14 19:55:07 +00:00
Pawel Jakub Dawidek
7a0970111f One more missing NDFREE(9). 2004-09-14 19:27:59 +00:00
Pawel Jakub Dawidek
52c6716fee - Don't forget about NDFREE() in case of vn_open() failure.
- Don't forget about vn_close() in case of failure.
2004-09-14 18:43:24 +00:00
Pawel Jakub Dawidek
f9963bbc08 Fix UMA zone leak. 2004-09-14 18:32:05 +00:00
Poul-Henning Kamp
affa470653 Use bioq_takefirst() 2004-09-07 07:54:45 +00:00
Colin Percival
972be79add Don't g_waitidle() when initializing a preloaded md. This fixes a
deadlock which otherwise occurs during the boot process.

Reported by:	kensmith
MFC after:	3 days
		(assuming that re@ approves)
2004-08-30 08:38:30 +00:00
Colin Percival
2b004a226d When creating a new md, wait for geom's event queue to become empty
before returning.  Device nodes are created via the "taste" mechanism,
so this is necessary in order to make sure that devfs entries are
created before mdconfig(8) returns.

This may be a MFC candidate for 5.3.

Suggested by:	phk
2004-08-22 19:44:24 +00:00
Poul-Henning Kamp
5721c9c76a Tag all geom classes in the tree with a version number. 2004-08-08 07:57:53 +00:00
Poul-Henning Kamp
199456973f Use a ->fini() from the geom class to destroy the control device.
Use default initialization of geom methods.
2004-08-08 06:47:43 +00:00
Poul-Henning Kamp
3e019deaed Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Pawel Jakub Dawidek
6a40892929 Fix panic which occurs when given sector size for memory-backed device
is less than DEV_BSIZE (512) bytes.

Reported by:	Mike Bristow <mike@urgle.com>
Approved by:	phk
2004-05-18 07:30:04 +00:00
Warner Losh
ed010cdfa2 Ooops, removed this acknowledgement bogusly.
Eagle Eyes: bde
2004-04-09 05:12:47 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Alan Cox
121230a40d In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not.  Add a new parameter so that the caller can specify which is
the case.

Reported by:	dillon
2004-04-03 09:16:27 +00:00
Luigi Rizzo
5d4ca75e56 Fix a bug with preloaded image -- for some reason [that i don't
completely understand], md_takeroot() runs before md_preloaded(),
rendering both useless.
As a fix, move the body (effectively one line!) of md_takeroot()
into md_preloaded(), and get rid of the stuff that has become useless.

Bug and fix reported 10 days ago on -current, no reply.
2004-03-31 21:48:02 +00:00
Alan Cox
07be617f09 - Remove some unused #includes.
- Apply some style fixes to mdstart_swap().
2004-03-19 21:19:15 +00:00
Alan Cox
7cd53fdda8 Utilize sf_buf_alloc() and sf_buf_free() to implement the ephemeral
mappings required by mdstart_swap().  On i386, if the ephemeral mapping
is already in the sf_buf mapping cache, a swap-backed md performs
similarly to a malloc-backed md.  Even if the ephemeral mapping is not
cached, this implementation is still faster.  On 64-bit platforms, this
change has the effect of using the direct virtual-to-physical mapping,
avoiding ephemeral mapping overheads, such as TLB shootdowns on SMPs.

On a 2.4GHz, 400MHz FSB P4 Xeon configured with 64K sf_bufs and
"mdmfs -S -o async -s 128m md /mnt"

before:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.430923 secs (311465697 bytes/sec)

after with cold sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.367948 secs (364773576 bytes/sec)

after with warm sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.252826 secs (530870010 bytes/sec)

malloc-backed md:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.253126 secs (530240978 bytes/sec)
2004-03-18 18:23:37 +00:00
Alan Cox
33651381b7 Allow swap-backed devices to run without Giant. 2004-03-14 00:24:30 +00:00
Poul-Henning Kamp
7a6b2b6429 Fix a long-standing deadlock issue with vnode backed md(4) devices:
On vnode backed md(4) devices over a certain, currently undetermined
size relative to the buffer cache our "lemming-syncer" can provoke
a buffer starvation which puts the md thread to sleep on wdrain.

This generally tends to grind the entire system to a stop because the
event that is supposed to wake up the thread will not happen until a fair
bit of the piled up I/O requests in the system finish, and since a lot
of those are on a md(4) vnode backed device which is currently waiting
on wdrain until a fair amount of the piled up ... you get the picture.

The cure is to issue all VOP_WRITES on the vnode backing the device
with IO_SYNC.

In addition to more closely emulating a real disk device with a
non-lying write-cache, this makes the writes exempt from rate-limited
(there to avoid starving the buffer cache) and consequently prevents
the deadlock.

Unfortunately performance takes a hit.

Add "async" option to give people who know what they are doing the
old behaviour.
2004-03-10 20:41:09 +00:00
John Baldwin
6074439965 kthread_exit() no longer requires Giant, so don't force callers to acquire
Giant just to call kthread_exit().

Requested by:	many
2004-03-05 22:42:17 +00:00
Poul-Henning Kamp
9ed40643ea Make swapbacked md(4) devices respect the -x and -y emulation arguments. 2004-03-02 20:13:23 +00:00
Colin Percival
e07113d65c Use DEV_BSIZE byte sectors instead of PAGE_SIZE byte sectors for
swap-backed memory disks.  This reduces filesystem allocation overhead
and makes swap-backed memory disks compatible with broken code (dd,
for example) which expects to see 512 byte sectors.  The size of a
swap-backed memory disk must still be a multiple of the page size.

When performing page-aligned operations, this change has zero
performance impact.

Reviewed by:	phk
Approved by:	rwatson (mentor)
2004-02-29 15:58:54 +00:00
Poul-Henning Kamp
dc08ffec87 Device megapatch 4/6:
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
2004-02-21 21:10:55 +00:00
Poul-Henning Kamp
d5a929dc17 Allow specification of a geometry for vnode backed devices as well as
for malloc backed devices.
2004-01-12 10:52:00 +00:00
Poul-Henning Kamp
0a93720637 Fix a locking problem with MD_ROOT_SIZE.
Retire md(4)'s static major number.
2003-12-13 18:12:58 +00:00
Poul-Henning Kamp
0eb1430969 Use the class->init() to hitch up preload devices, rather than rely on
the "old" SYSINIT.  This makes sure things happen in the right order.

XXX: md(4) needs to be fully geom-ified and in particluar /dev/md.ctl
should be abandonded for the GEOM OaM api.

Approved by:	re@
2003-11-18 18:19:26 +00:00
Poul-Henning Kamp
6e9a011a6a Don't initialize unused bio_blkno field. 2003-10-18 11:25:42 +00:00
Poul-Henning Kamp
70cd771337 The present defaults for the open and close for device drivers which
provide no methods does not make any sense, and is not used by any
driver.

It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.

Change the defaults to be nullopen() and nullclose() which simply
does nothing.

Remove explicit initializations to these from the drivers which
already used them.
2003-09-27 12:01:01 +00:00
John Baldwin
8b149b5131 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
Poul-Henning Kamp
8e28326a8b Change the implementation of swap backing to use the VM system in normal
ways, and drop the need for vm_pager_strategy().
2003-08-05 06:54:44 +00:00
Poul-Henning Kamp
7c89f162bc Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
Poul-Henning Kamp
8198a1a472 Remove 256 unit limit, there is no evil minor number encoding to
deal with any more.

Spotted by:	"Darren Freestone" <df@cops.org>
2003-06-22 11:31:38 +00:00
Poul-Henning Kamp
f075585f67 Remove the G_CLASS_INITIALIZER, we do not need it anymore. 2003-05-31 16:59:27 +00:00
Poul-Henning Kamp
17a1391990 The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
2003-05-31 16:42:45 +00:00
Alan Cox
f820bc501e Use vm_object_deallocate(), not vm_pager_deallocate(), to destroy a
vm object.  (vm_pager_deallocate() does not, in fact, destroy a vm object.)

Approved by:	re (scottl)
Reviewed by:	phk
2003-05-16 07:28:27 +00:00
Poul-Henning Kamp
6b60a2cd57 Call g_wither_geom(), instead of just setting the flag. 2003-05-02 06:18:58 +00:00
Poul-Henning Kamp
4e8bfe1482 Add a couple of undocumented test options to MD(4) to aid in regression
testting of GEOM.
2003-04-09 11:59:29 +00:00
Poul-Henning Kamp
4eba52a2d2 Remove all references to BIO_SETATTR. We will not be using it. 2003-04-03 19:19:36 +00:00
Poul-Henning Kamp
891619a66d Use bioq_flush() to drain a bio queue with a specific error code.
Retain the mistake of not updating the devstat API for now.

Spell bioq_disksort() consistently with the remaining bioq_*().

#include <geom/geom_disk.h> where this is more appropriate.
2003-04-01 15:06:26 +00:00
Poul-Henning Kamp
51a5b7f1ba Don't include <sys/disk.h>. 2003-04-01 13:33:28 +00:00
Poul-Henning Kamp
67ffbc63f3 remove a blank line. 2003-03-29 22:13:32 +00:00
Poul-Henning Kamp
83e13864c3 Allocate the toplevel indir with M_WAITOK to avoid complicating things
needlessly.

Detected by:	rwatsons EvilMalloc(9)
2003-03-27 10:14:36 +00:00
Poul-Henning Kamp
5d445dcb4e Change g_class initialization to sparse format. 2003-03-24 19:46:26 +00:00
Poul-Henning Kamp
b4b138c27f Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
Poul-Henning Kamp
60794e0478 Centralize the devstat handling for all GEOM disk device drivers
in geom_disk.c.

As a side effect this makes a lot of #include <sys/devicestat.h>
lines not needed and some biofinish() calls can be reduced to
biodone() again.
2003-03-08 08:01:31 +00:00
Poul-Henning Kamp
ebe789d61c Add a "-S sectorsize" option to enable Kirk to find a bug :-) 2003-03-03 13:05:00 +00:00
Poul-Henning Kamp
7ac40f5f59 Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by:    re(scottl)
2003-03-03 12:15:54 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Poul-Henning Kamp
b3b3d1b7fe Mark our provider with G_PF_CANDELETE in the cases where this is actually
the case.
2003-02-11 12:35:44 +00:00
Poul-Henning Kamp
5777c5b989 NO_GEOM cleanup: unifdef 2003-01-30 13:12:31 +00:00
Poul-Henning Kamp
16bcbe8cf7 Implement MDIOCLIST which returns the unit numbers of configured md(4)
devices.

We use the md_pad[] array and if there are more units than its size the
last returned unit number will be -1, but the number of units returned
is correct.
2003-01-27 07:58:18 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Poul-Henning Kamp
26d48b4044 OK Ok, so I didn't check the NO_GEOM case for the final version...
Stumbled on by:	bde
2003-01-13 20:19:04 +00:00
Poul-Henning Kamp
a4f8615810 Enable the new h0h0magic code which on GEOM kernels make the md(4)
driver a _real_ GEOM driver.
2003-01-13 17:31:46 +00:00
Poul-Henning Kamp
0f8500a5b3 Add a mutex around the per unit bioqueue.
Only grab giant in the per unit kthread for SWAP and VNODE backed devices.

Initialize the bioq before the kthread gets a chance to study it.

Don't lock Giant in mddone_swap, we shouldn't need it.
2003-01-13 08:50:23 +00:00
Poul-Henning Kamp
64bfc43b26 Remove the printf which announces the creation of malloc disks: it is
inconsistent when we do not do it for swap or vnode.

We still printf for preloaded disks because of the weak debugging
options people have in embedded/tiny environments where this is
usually used.
2003-01-13 08:01:09 +00:00
Poul-Henning Kamp
6f4f00f114 Add code to make md(4) a GEOM device driver instead of relying in
the disk mini-layer.

This is currently not enabled.
2003-01-12 21:16:49 +00:00
Poul-Henning Kamp
a522a15950 Shift things around a bit in preparation for future evilness. 2003-01-12 17:39:29 +00:00
Ian Dowse
e176446dc8 Move the check for the MD_SHUTDOWN flag to before the tsleep() call
in the per-device kthread. This ensures that synchronisation with
mddestroy() succeeds even if the kthread was not waiting in tsleep()
at the time of the wakeup(). Among other things, this fixes the
problem of mdconfig getting stuck when an attempt is made to use a
zero-length file as a vnode-type backing store.

Approved by:	re
2002-11-30 22:03:53 +00:00
Poul-Henning Kamp
8689acc48c We want /dev/md0 for ramdisk roots, not /dev/md0c.
Sponsored by:	DARPA & NAI Labs
2002-10-21 20:08:28 +00:00
Poul-Henning Kamp
975b628f27 Use ENOSPC error return, not ENOMEM.
Use %jd rather than %lld.
2002-10-20 20:50:31 +00:00
Jake Burkholder
81bb0b95b1 MODINFO_SIZE metadata has type size_t, not unsigned. This makes preloaded
md root work on sparc64.
2002-10-13 18:19:22 +00:00
Scott Long
316ec49abd Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create.  Passing the
value 0 prevents the alternate kstack from being created.  Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by:	jake, peter, jhb
2002-10-02 07:44:29 +00:00
Poul-Henning Kamp
1cff889a46 Put the casts on the right hand side of =. 2002-09-28 14:17:24 +00:00
Peter Grehan
a0c6726472 Initialize fwsectors/fwheads to allow the DIOCGFWSECTORS and
DIOCGFWHEADS ioctls to return meaningful values to disklabel/newfs

Approved by: phk
2002-09-22 10:07:18 +00:00
Poul-Henning Kamp
7812d86f03 (This commit touches about 15 disk device drivers in a very consistent
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)

If struct disklabel is the messenger: kill the messenger.

Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer).  This commit changes this communication to use four
explicit fields instead.

Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.

Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.

The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.

This concludes (modulus any mistakes) the series of disklabel related
commits.

I belive it all amounts to a NOP for all the rest of you :-)

Sponsored by:   DARPA & NAI Labs.
2002-09-20 19:36:05 +00:00
Archie Cobbs
4a6a94d8d8 Replace (ab)uses of "NULL" where "0" is really meant. 2002-08-22 21:24:01 +00:00
Maxime Henrion
6569d6f3ec Yet another warning fix for 64 bits platforms.
Reviewed by:	phk
2002-06-24 12:07:02 +00:00
Poul-Henning Kamp
58f3c42e6d mdcreate_vnode() isn't correctly clearing things out of the linked
list if the file is of 0 size or mdsetcred() fails.

Submitted by:	Martin Faxer <gmh003532@brfmasthugget.se>
2002-06-15 19:18:43 +00:00
Maxim Sobolev
d8a186ebbb - Whitespace only: use return statement consistentlt (return (foo), not
return(foo)), kill extra blank names between function names;
- fix format string in printf(): devtoname() returns string, not pointer.
2002-06-10 19:25:21 +00:00
Ian Dowse
5c97ca54e5 Use a per-device worker thread to avoid blocking in mdstrategy()
until the I/O completes. This fixes some easily reproducable deadlocks
that occur when using md(4) with GEOM.

Reviewed by:	phk
2002-06-03 22:09:04 +00:00
Poul-Henning Kamp
fcf867e9f7 Mis-edit in last commit. 2002-05-26 09:57:59 +00:00
Poul-Henning Kamp
fde2a2e414 Be a bit smarter about rewriting data so we don't loose too much performance.
Sponsored by: DARPA & NAI Labs.
2002-05-26 09:38:51 +00:00
Poul-Henning Kamp
f43b2bac72 Use an umazone per unit for allocating the sectors for malloc backing.
Clean up things properly when we unconfigure malloc backed units.

Sponsored by:	DARPA & NAI Labs.
2002-05-26 06:48:55 +00:00
Poul-Henning Kamp
c6517568df Give the "malloc" backing of md(4) an adaptive multilevel index tree to
remove the need for a contiguous array with pointers to all the sectors.

Try to make failure to malloc(9) memory a non-hang situation.

Eventually this will allow us to test the 64bit cleanness of the disk
I/O patch, but more work is outstanding here and elsewhere.

Sponsored by:	DARPA & NAI Labs.
2002-05-25 20:44:20 +00:00
Poul-Henning Kamp
9589c2561c Fix a memory-leak when configuring a vnode backed md(4) device fails.
Submitted by:	Martin Faxér <gmh003532@brfmasthugget.se>
MFC after:	4 weeks
2002-05-03 17:55:10 +00:00
Jeff Roberson
421e6a659e Remove unused include. 2002-03-20 09:55:07 +00:00
Bruce Evans
1ab0b5f920 The previous commit missed fixing 2 old printf format errors and
introduced a format printf error.
2002-03-19 04:07:29 +00:00
Andrew Gallatin
54ed0c3221 Fix printf warning caused by recent changes in bio_pblkno's type. 2002-03-19 01:45:04 +00:00
Kirk McKusick
0d2af52141 Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
2002-03-15 18:49:47 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Poul-Henning Kamp
e087ce2dd2 Staticize the malloc definitions.
Obtained from:	~bde/sys.dif.gz
2002-02-10 21:42:44 +00:00
Poul-Henning Kamp
3ca627fefa Gah! last commit botched indentation, fix indentation and some other
white-space nits while at it.
2002-01-21 20:57:03 +00:00
Poul-Henning Kamp
b4a4f93c5e Restructure slightly, eliminating some repetitive source lines and
making GEOM patches simpler and more readable at the same time.
2002-01-21 20:50:06 +00:00
Dima Dorfman
53d745bc7c Actually make use of the md_version field of 'struct mdio'. In order
not to needlessly break compatibility, decrement MDIOVERSION to 0.

Approved by:	phk
2001-12-20 06:38:21 +00:00
Matthew Dillon
7e76bb562e Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking
in wdrain during a write.  This flag needs to be used in devices whos
strategy routines turn-around and issue another high level I/O, such as
when MD turns around and issues a VOP_WRITE to vnode backing store, in order
to avoid deadlocking the dirty buffer draining code.

Remove a vprintf() warning from MD when the backing vnode is found to be
in-use.  The syncer of buf_daemon could be flushing the backing vnode at
the time of an MD operation so the warning is not correct.

MFC after:	1 week
2001-11-05 18:48:54 +00:00
John Baldwin
bd78cece5d Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
John Baldwin
9935282d50 Use crhold() instead of crdup(). The md(4) driver doesn't modify the ucred
that it uses, so it merely needs to bump its refcount to make it immutable
rather than obtain its own copy.
2001-10-09 16:37:51 +00:00
Julian Elischer
b40ce4165d KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha
2001-09-12 08:38:13 +00:00
Maxim Sobolev
166cd0b42f OOPS, remove local change that somehow slipped into a commit (I swear that
I already deleted it some time ago). This should fix problem people have with
unsefined reference to `MD_PRELOAD_COMPRESSED'.

Submitted by:	Manfred Antar <null@pozo.com>
2001-08-27 17:48:37 +00:00
Maxim Sobolev
9d4b594558 - On module unload try to detach all configured disks and let unload proceed
if all disks were detached sucessfully;
- use consistent style for return statements and fix several others style
  inconsistencies.

Reviewed by:	ru
Approved by:	phk
2001-08-27 13:25:47 +00:00
Dima Dorfman
e0cebb4027 There is no MD_OBJET disk type, it's actually MD_SWAP. I guess the
former was either a previous or proposed name that kind of snuck in.
2001-08-16 03:04:49 +00:00
Dima Dorfman
26a0ee75c6 Introduce a force option, MD_FORCE, that instructs the driver to
bypass some extra anti-foot-shooting measures.  Currently, its only
effect is to allow detaching a device while it's still open (e.g.,
mounted).  This is useful for testing how the system reacts to a disk
suddenly going away, which can happen with some removeable media.

At this point, the force option is only checked on detach, so it
would've been possible to allow the option to be passed with the
MDIOCDETACH operation.  This was not done to allow the possibility of
having the force flag influence other tests in the future, which may
not necessarily deal with detaching the device.

Reviewed by:	sobomax
Approved by:	phk
2001-08-07 19:23:16 +00:00
Maxim Sobolev
fe603109d1 - Deny detaching requests until device is still open, otherwise it is possible
to hang or panic kernel by detaching disk from which fs is mounted;
- replace "md" with MD_NAME in yet another place.

Reviewed by:	phk
Approved by:	phk
2001-08-02 10:19:13 +00:00
Thomas Moestl
3d3c27fedf Make sure the total number of sectors is not 0 for a vnode-type md to
avoid a division by zero which would occur on open() in this case.

Reviewed by:	phk
2001-07-26 20:05:20 +00:00
Dima Dorfman
10b0e058bb Use MD_NAME and MDCTL_NAME constants where appropriate. 2001-07-18 13:32:38 +00:00
Matthew Dillon
0cddd8f023 With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage).  Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.
2001-07-04 16:20:28 +00:00
John Baldwin
277be4c219 We don't need the vm lock to perform a few simple calculations on the
md device's softc.
2001-06-25 18:24:52 +00:00
Poul-Henning Kamp
8adeb35aff Remove MFS compat bits. 2001-05-29 18:49:23 +00:00