Commit graph

276 commits

Author SHA1 Message Date
Andrey V. Elsukov
ec170744a7 Use g_conf_printf_escaped() to escape illegal symbols in file name.
PR:		202289
MFC after:	1 week
2015-08-13 13:20:29 +00:00
Konstantin Belousov
5d9b4508fd For md(4), posix shm(3) and tmpfs(5), free swap space used by paged in
dirty page, which is written by the process.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-07-28 14:27:05 +00:00
Attilio Rao
0d8243cc34 vm_page_grab() and vm_pager_get_pages() can drop the vm_object lock,
then threads can sleep on the pip condition.
Avoid to deadlock such threads by correctly awakening the sleeping ones
after the pip is finished.
swapoff side of the bug can likely result in shutdown deadlocks.

Sponsored by:	EMC / Isilon Storage Division
Reported by:	pho, pluknet
Tested by:	pho
2014-03-19 01:13:42 +00:00
Konstantin Belousov
60b6e19785 Only assert the length of the passed bio in the mdstart_vnode() when
the bio is unmapped, so we must map the bio pages into pbuf.  This
works around the geom classes which do not follow the MAXPHYS limit on
the i/o size, since such classes do not know about unmapped bios
either.

Reported by:	Paolo Pinto <paolo.pinto@netasq.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-12-10 20:52:31 +00:00
Edward Tomasz Napierala
bc2308d49e Change comment to match code.
Discussed with:	thompsa
Sponsored by:	The FreeBSD Foundation
2013-12-04 09:48:52 +00:00
Edward Tomasz Napierala
0efd9bfd47 Add "null" backend to mdconfig(8). This does exactly what the name
suggests, and is somewhat useful for benchmarking.

MFC after:	1 month
No objections from:	kib
Sponsored by:	The FreeBSD Foundation
2013-12-04 07:38:23 +00:00
Alexander Motin
40ea77a036 Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.

The defined now safety requirements are:
 - caller should not hold any locks and should be reenterable;
 - callee should not depend on GEOM dual-threaded concurency semantics;
 - on the way down, if request is unmapped while callee doesn't support it,
   the context should be sleepable;
 - kernel thread stack usage should be below 50%.

To keep compatibility with GEOM classes not meeting above requirements
new provider and consumer flags added:
 - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request);
 - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done);
 - G_PF_DIRECT_SEND -- provider code meets caller requirements (done);
 - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request).
Capable GEOM class can set them, allowing direct dispatch in cases where
it is safe.  If any of requirements are not met, request is queued to
g_up or g_down thread same as before.

Such GEOM classes were reviewed and updated to support direct dispatch:
CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE,
VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL,
MAP, FLASHMAP, etc).

To declare direct completion capability disk(9) KPI got new flag equivalent
to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION.  da(4) and ada(4) disk
drivers got it set now thanks to earlier CAM locking work.

This change more then twice increases peak block storage performance on
systems with manu CPUs, together with earlier CAM locking changes reaching
more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to
256 user-level threads).

Sponsored by:	iXsystems, Inc.
MFC after:	2 months
2013-10-22 08:22:19 +00:00
Konstantin Belousov
1a42d14a80 Give the page allocations initiated by the swap-backed md(4) a higher
priority.  If the write is requested by a system daemon, sleeping
there would starve resources and cause deadlock.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
2013-08-30 20:12:23 +00:00
Konstantin Belousov
5944de8ecd Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9).
The flag was mandatory since r209792, where vm_page_grab(9) was
changed to only support the alloc retry semantic.

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-22 07:39:53 +00:00
Attilio Rao
c7aebda8a1 The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
  and vm_page_grab are being executed.  This will be very helpful
  once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by:	EMC / Isilon storage division
Discussed with:	alc
Reviewed by:	jeff, kib
Tested by:	gavin, bapt (older version)
Tested by:	pho, scottl
2013-08-09 11:11:11 +00:00
Konstantin Belousov
537cc627d7 Fix the data corruption on the swap-backed md.
Assign the rv variable a success code if the pager was not asked for
the page.  Using an error code from the previous processed page caused
zeroing of the valid page, when e.g. the previous page was not
available in the pager.

Reported by:	lstewart
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2013-05-24 09:48:42 +00:00
Konstantin Belousov
1ef76554fb Do not declare that preloaded md(4) supports unmapped bio requests, it
does not.

Reported by:	<mh@kernel32.de>
Sponsored by:	The FreeBSD Foundation
2013-04-02 19:39:31 +00:00
Konstantin Belousov
59ec9023ca Support unmapped i/o for the md(4).
The vnode-backed md(4) has to map the unmapped bio because VOP_READ()
and VOP_WRITE() interfaces do not allow to pass unmapped requests to
the filesystem. Vnode-backed md(4) uses pbufs instead of relying on
the bio_transient_map, to avoid usual md deadlock.

Sponsored by:	The FreeBSD Foundation
Tested by:	pho, scottl
2013-03-19 14:53:23 +00:00
Attilio Rao
89f6b8632c Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
  - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
  - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
  - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
  - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
    (in order to avoid visibility of implementation details)
  - The read-mode operations are added:
    VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
    VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
  sys/mutex.h in consumers directly to cater its inlining functions
  using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
  consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
  the compat layer because the name clash between FreeBSD and solaris
  versions must be avoided.
  At this purpose zfs redefines the vm_object locking functions
  directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit.  Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by:	EMC / Isilon storage division
Reviewed by:	jeff
Reviewed by:	pjd (ZFS specific review)
Discussed with:	alc
Tested by:	pho
2013-03-09 02:32:23 +00:00
Jaakko Heinonen
341b240dc7 Print correct unit number when attaching preloaded memory disks.
Retire now unused mdunits variable.
2012-11-21 17:05:57 +00:00
Jaakko Heinonen
734e78dfcb Disallow attaching preloaded memory disks via ioctl.
- The feature is dangerous because the kernel code didn't check
  validity of the memory address provided from user space.
- It seems that mdconfig(8) never really supported attaching preloaded
  memory disks.
- Preloaded memory disks are automatically attached during md(4)
  initialization. Thus there shouldn't be much use for the feature.

PR:		kern/169683
Discussed on:	freebsd-hackers
2012-11-21 16:56:47 +00:00
Konstantin Belousov
e9f581ba31 Zero the newly allocated md(4) swap-backed page to prevent random
kernel memory leakage to userspace. For the typical use, when a
filesystem put on the md disk, the change only results in CPU and
memory bandwidth spent to zero the page, since filsystems make sure
that user never see unwritten content.  But if md disk is used as raw
device by userspace, the garbage is exposed.

Reported by:	Paul Schenkeveld <freebsd@psconsult.nl>
MFC after:	2 weeks
2012-11-08 03:17:41 +00:00
Marcel Moolenaar
22ff74b2f4 Add a MD_ROOT_FSTYPE kernel option. The option specifies the
file system part for the MD_ROOT mount string. Hardcoding the
the file system type as "ufs" is too restrictive.
2012-11-03 21:20:55 +00:00
Konstantin Belousov
5050aa86cf Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by:	attilio
Tested by:	pho
2012-10-22 17:50:54 +00:00
Konstantin Belousov
1c771f9222 After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed.  Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by:	alc
MFC after:    2 weeks
2012-08-05 14:11:42 +00:00
Konstantin Belousov
2ddfc13d8d Remove verbose unused commented out debugging printf.
MFC after:	1 week
Reviewed by:	alc
2012-08-04 18:10:04 +00:00
Jaakko Heinonen
8cb51643e4 Disallow sectorsize larger than MAXPHYS and mediasize smaller than
sectorsize.

PR:		169947
Submitted by:	Filip Palian (original version)
Reviewed by:	kib
2012-08-02 15:05:34 +00:00
Edward Tomasz Napierala
dc604f0cf6 Make it possible to resize md(4) devices.
Reviewed by:	kib
Sponsored by:	FreeBSD Foundation
2012-07-07 20:32:21 +00:00
Eitan Adler
3eb9ab5255 Document a large number of currently undocumented sysctls. While here
fix some style(9) issues and reduce redundancy.

PR:		kern/155491
PR:		kern/155490
PR:		kern/155489
Submitted by:	Galimov Albert <wtfcrap@mail.ru>
Approved by:	bde
Reviewed by:	jhb
MFC after:	1 week
2011-12-13 00:38:50 +00:00
Andrey V. Elsukov
1f1928092d Add information about MD_READONLY and MD_COMPRESS flags to the
configuration dump.

MFC after:	1 week
2011-10-31 10:53:27 +00:00
Andrey V. Elsukov
657bd8b132 Include sys/sbuf.h directly. 2011-07-11 05:19:28 +00:00
Matthew D Fleming
cfb00e5aa7 Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk.  Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file.  (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by:	alc
MFC after:	1 week
MFC with:	r221853
2011-05-13 19:35:01 +00:00
Matthew D Fleming
89cb2a19ec Usa a globally visible region of zeros for both /dev/zero and the md
device.  There are likely other kernel uses of "blob of zeros" than can
be converted.

Reviewed by:	alc
MFC after:	1 week
2011-05-13 18:48:00 +00:00
Dag-Erling Smørgrav
0abd21bdb8 Implement BIO_DELETE for vnode devices by simply overwriting the deleted
sectors with all-zeroes.

The zeroes come from a static buffer; null(4) uses a dynamic buffer for
the same purpose (for /dev/zero).  It might be a good idea to have a
static, shared, read-only all-zeroes page somewhere in the kernel that
md(4), null(4) and any other code that needs zeroes could use.

Reviewed by:	kib
MFC after:	3 weeks
2011-04-29 21:18:41 +00:00
Marcel Moolenaar
8d5ac6c3cf Use the preload_fetch_addr() and preload_fetch_size() convenience
functions and only create the MD device when we have a non-zero
pointer and size.

Sponsored by: Juniper Networks
2011-02-09 19:31:10 +00:00
Konstantin Belousov
4a13a769dc Add support for BIO_DELETE on swap-backed md(4). In the case of BIO_DELETE
covering the whole page, free the page. Otherwise, clear the region and
mark it clean. Not marking the page dirty could reinstantiate cleared
data, but it is allowed by BIO_DELETE specification and saves unneeded
write to swap.

Reviewed by:	alc
Tested by:	pho
MFC after:	2 weeks
2011-01-27 16:10:25 +00:00
Konstantin Belousov
96410b9575 Bio shall not be accessed after g_io_deliver(9).
Reported and tested by:	pho
Reviewed by:	ae, phk
MFC after:	1 week
2011-01-25 14:00:30 +00:00
Konstantin Belousov
007777f137 Add missed ().
Noted by:	alc
MFC after:	3 days
2011-01-19 16:48:07 +00:00
Alan Cox
18a22f96e0 There is no point in calling vm_object_set_writeable_dirty() on an object
that is definitively known to be swap backed since its only effects are on
vnode-backed objects.

Reviewed by:	kib
2011-01-19 15:43:54 +00:00
Konstantin Belousov
d91e813c7b Add reporting of GEOM::candelete BIO_GETATTR for md(4) and geom_disk(4).
Non-zero value of attribute means that device supports BIO_DELETE.

Suggested and reviewed by:	pjd
Tested by:	pho
MFC after:	1 week
2010-12-29 12:11:07 +00:00
Konstantin Belousov
c44d423ed8 Add sysctl vm.md_malloc_wait, non-zero value of which switches malloc-backed
md(4) to using M_WAITOK malloc calls.

M_NOWAITOK allocations may fail when enough memory could be freed, but not
immediately. E.g. SU UFS becomes quite unhappy when metadata write return
error, that would happen for failed malloc() call.

Reported and tested by:	pho
MFC after:	1 week
2010-12-29 11:39:15 +00:00
Marcel Moolenaar
3d5c947d9d Allow the MDIOCATTACH ioctl operation to originate from within the kernel.
To protect against malicious software, we demand that the file name is at
a particular location (i.e. appended to the mdio structure) for it to be
treated as in-kernel.
2010-10-18 04:26:32 +00:00
Jaakko Heinonen
b42f40b8eb - Remove some extra white space.
- Wrap g_md_dumpconf() prototype to 80 columns.
2010-07-26 10:37:14 +00:00
Jaakko Heinonen
f4e7c5a894 Convert md(4) to use alloc_unr(9) and alloc_unr_specific(9) for unit
number allocation. The old approach had some problems such as it allowed
an overflow to occur in the unit number calculation.

PR:		kern/122288
2010-07-22 10:24:28 +00:00
Konstantin Belousov
d12fc952b7 Calculate nshift only once.
Also noted by:	avg
MFC after:	1 week
2010-07-06 18:22:57 +00:00
Alan Cox
ecd5dd957d Eliminate unnecessary page queues locking. 2010-06-15 18:37:31 +00:00
Konstantin Belousov
fc0c3802f0 Lock the page around vm_page_activate() and vm_page_deactivate() calls
where it was missed. The wrapped fragments now protect wire_count with
page lock.

Reviewed by:	alc
2010-05-03 20:31:13 +00:00
Edward Tomasz Napierala
5ed1eb2bb0 Fix panic on invalid 'mdconfig -at preload' usage.
PR:		kern/80136
2010-02-27 10:41:30 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Konstantin Belousov
3364c323e6 Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with:	pho
Reviewed by:	alc
Approved by:	re (kensmith)
2009-06-23 20:45:22 +00:00
Marcel Moolenaar
dbb95048da Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.
2009-05-18 18:37:18 +00:00
John Baldwin
33fc362512 Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a
filesystem supports additional operations using shared vnode locks.
Currently this is used to enable shared locks for open() and close() of
read-only file descriptors.
- When an ISOPEN namei() request is performed with LOCKSHARED, use a
  shared vnode lock for the leaf vnode only if the mount point has the
  extended shared flag set.
- Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but
  not O_CREAT.
- Use a shared vnode lock around VOP_CLOSE() if the file was opened with
  O_RDONLY and the mountpoint has the extended shared flag set.
- Adjust md(4) to upgrade the vnode lock on the vnode it gets back from
  vn_open() since it now may only have a shared vnode lock.
- Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since
  FIFO's require exclusive vnode locks for their open() and close()
  routines.  (My recent MPSAFE patches for UDF and cd9660 already included
  this change.)
- Enable extended shared operations on UFS, cd9660, and UDF.

Submitted by:	ups
Reviewed by:	pjd (ZFS bits)
MFC after:	1 month
2009-03-11 14:13:47 +00:00
Alan Cox
b72cca38a6 Remove unnecessary page queues locking around vm_page_wakeup(). (This
change is applicable to RELENG_7 but not RELENG_6.)

MFC after:	1 week
2009-02-22 02:50:31 +00:00
Edward Tomasz Napierala
a9ebb31183 Add the possibility to specify "-o force" with "mdconfig -du".
Reviewed by:	scottl
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2009-01-10 17:17:18 +00:00
Edward Tomasz Napierala
41c8b468e6 Fix forced mdconfig -du. E.g. the following would previously
result in panic:

mdconfig -af blah.img -o force
mount /dev/md0 /mnt
mdconfig -du 0

Reviewed by:	scottl
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2008-12-16 20:59:27 +00:00
Attilio Rao
0359a12ead Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-08-28 15:23:18 +00:00
Ed Schouten
06d425f92e Remove the distinction between device minor and unit numbers.
Even though we got rid of device major numbers some time ago, device
drivers still need to provide unique device minor numbers to make_dev().
These numbers are only used inside the kernel. They are not related to
device major and minor numbers which are visible in devfs. These are
actually based on the inode number of the device.

It would eventually be nice to remove minor numbers entirely, but we
don't want to be too agressive here.

Because the 8-15 bits of the device number field (si_drv0) are still
reserved for the major number, there is no 1:1 mapping of the device
minor and unit numbers. Because this is now unused, remove the
restrictions on these numbers.

The MAXMAJOR definition was actually used for two purposes. It was used
to convert both the userspace and kernelspace device numbers to their
major/minor pair, which is why it is now named UMINORMASK.

minor2unit() and unit2minor() have now become useless. Both minor() and
dev2unit() now serve the same purpose. We should eventually remove some
of them, at least turning them into macro's. If devfs would become
completely minor number unaware, we could consider using si_drv0 directly,
just like si_drv1 and si_drv2.

Approved by:	philip (mentor)
2008-05-29 12:50:46 +00:00
Philip Paeps
3cf74e539b Zero sc->vnode if mdsetcred() fails.
This fixes the panic which happens when mdcreate_vnode() calls vn_close()
and mddestroy() calls it again further down the error handling path.

Reviewed by:	kris, kib
MFC after:	3 days
2008-02-28 18:31:54 +00:00
Attilio Rao
22db15c06f VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
2008-01-13 14:44:15 +00:00
Attilio Rao
cb05b60a89 vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by:	Diego Sardina <siarodx at gmail dot com>,
		Andrea Di Pasquale <whyx dot it at gmail dot com>
2008-01-10 01:10:58 +00:00
Maxim Sobolev
a03be42da6 Put back devstat support that was lost during GEOM transition. Initially,
I've tried to move md(4) to use geom_disk class, like real disks do, but
this requires major rework of some of the existing features such as
configuration dumping for example. Therefore just putting devstat support
directly into md(4) seems to be optimal solution.

Now you can see md(4) stats in `systat -vm' again.

MFC after:	2 weeks
2007-11-07 22:47:41 +00:00
Julian Elischer
3745c395ec Rename the kthread_xxx (e.g. kthread_create()) calls
to kproc_xxx as they actually make whole processes.
Thos makes way for us to add REAL kthread_create() and friends
that actually make theads. it turns out that most of these
calls actually end up being moved back to the thread version
when it's added. but we need to make this cosmetic change first.

I'd LOVE to do this rename in 7.0  so that we can eventually MFC the
new kthread_xxx() calls.
2007-10-20 23:23:23 +00:00
Jeff Roberson
982d11f836 Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
   sychronization.
 - Use the per-process spinlock rather than the sched_lock for per-process
   scheduling synchronization.

Tested by:      kris, current@
Tested on:      i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
2007-06-05 00:00:57 +00:00
Konstantin Belousov
9e223287c0 Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by:	jhb
Reviewed by:	daichi (unionfs)
Approved by:	re (kensmith)
2007-05-31 11:51:53 +00:00
Konstantin Belousov
3b7b5496a7 Resolve two deadlocks that could be caused by busy md device backed
by vnode. Allow for md thread and the thread that owns lock on vnode
backing the md device to do the write even when runningbufspace is
exhausted.

Tested by:	Peter Holm
Reviewed by:	tegge
MFC after:	2 weeks
2006-12-14 11:34:07 +00:00
Pawel Jakub Dawidek
a777323904 Style nits. 2006-11-01 18:59:06 +00:00
Pawel Jakub Dawidek
5541f25ec7 Fix md(4) panic which occurs when I/O request different than
BIO_READ/BIO_WRITE is sent to vnode-backed provider (BIO_DELETE or
BIO_FLUSH).

Reported by:	ceri

Add support for BIO_FLUSH to vnode-backed md(4) devices based on
VOP_FSYNC().
2006-11-01 18:56:18 +00:00
John Baldwin
a08d2e7fe1 - Conditionally acquire Giant in mdstart_vnode(), mdcreate_vnode(), and
mddestroy() only if the file is from a non-MPSAFE VFS.
- No longer unconditionally hold Giant in the md kthread for vnode-backed
  kthreads.
- Improve the handling of the thread exit race when destroying an md
  device.
2006-03-28 21:25:11 +00:00
Wojciech A. Koszek
c27a895433 Teach md(4) and mdconfig(8) how to understand XML. Right now there won't be
a problem with listing large number of md(4) devices. Either 'list' or
'query' mode uses XML.

Additionally, new functionality was introduced. It's possible to pass
multiple devices to -u:

	# ./mdconfig -l -u md0,md1

Approved by:	cognet (mentor)
2006-03-26 23:21:11 +00:00
Luigi Rizzo
de64f22aa4 make sure that the start and end preloaded MFS markers are
in contiguous strings, and that the compiler does not optimize them
away because it thinks they are unused.
2006-01-31 13:35:30 +00:00
Pawel Jakub Dawidek
b322d85d53 Call NDFREE() only when vn_open() succeeded.
MFC after:	3 days
2006-01-27 11:27:55 +00:00
Maxim Konovalov
6c3cd0e2f6 o Fix typos in the comments.
Submitted by:	Wojciech A. Koszek
2005-12-28 15:18:18 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Poul-Henning Kamp
947fc8de03 Make sure that the worker thread knows the type early enough to
grab Giant for vnode backing.

Found by:	pho & tegge
2005-10-06 19:47:04 +00:00
Poul-Henning Kamp
9b00ca1961 Fix configuration locking in MD.
Remove  md_mtx.

Remove GIANT from the mdctl device driver and avoid DROP_GIANT,
PICKUP_GIANT and geom events since we can call into GEOM directly
now.

Pick up Giant around vn_close().

Apply an exclusive sx around mdctls ioctl and preloading to protect
lists etc..

Don't initialize our lock (md_mtx or md_sx) from a
SYSINIT when there is a perfectly good pair of _fini/_init
functions to do it from.

Prune any final fractional sector from the mediasize to
keep GEOM happy.

Cleanups:

Unify MDIOVERSION check in (x)mdctlioctl()

Add pointer to start() routine to softc to eliminate a switch{}

Inline guts of mddetach().

Always pass error pointer to mdnew(), simplify implementation.
2005-09-19 06:55:27 +00:00
Poul-Henning Kamp
9fbea3e365 Do not destroy the queue mutex until the thread is done with it. 2005-09-11 12:35:32 +00:00
Pawel Jakub Dawidek
7ee3c044d0 - Add md_mtx lock to protect ID number and list of devices.
- Always check mdnew() return value, as even in !autounit case
  kthread_create() can fail.

Those two changes fix serval panics provked by simple stress test.

Tested by:	Kris The BugMagnet
MFC after:	3 days
2005-08-31 19:45:11 +00:00
Christian S.J. Peron
8677689134 Ensure that file flags such as schg, sappnd (and others) are honored
by md(4). Before this change, it was possible to by-pass these flags
by creating memory disks which used a file as a backing store and
writing to the device.

This was discussed by the security team, and although this is problematic,
it was decided that it was not critical as we never guarantee that root will
be restricted.

This change implements the following behavior changes:

-If the user specifies the readonly flag, unset write operations before
 opening the file. If the FWRITE mask is unset, the device will be
 created with the MD_READONLY mask set. (readonly)
-Add a check in g_md_access which checks to see if the MD_READONLY mask
 is set, if so return EROFS
-Do not gracefully downgrade access modes without telling the user. Instead
 make the user specify their intentions for the device (assuming the file is
 read only). This seems like the more correct way to handle things.

This is a RELENG_6 candidate.

PR:		kern/84635
Reviewed by:	phk
2005-08-17 01:24:55 +00:00
Alan Cox
e340fc602b Request a CPU private mapping from sf_buf_alloc(). If the swap-backed
memory disk is larger than the number of available sf_bufs, this improves
performance on SMPs by eliminating interprocessor TLB shootdowns.  For
example, with 6656 sf_bufs, the default on my test machine, and a 256MB
swap-backed memory disk, I see the command
"dd if=/dev/md0 of=/dev/null bs=64k" achieve ~489MB/sec with the default,
shared mappings, and ~587MB/sec with CPU private mappings.
2005-02-13 21:51:50 +00:00
Poul-Henning Kamp
d9aaa28f63 Use MAXMINOR 2005-01-29 16:50:04 +00:00
Pawel Jakub Dawidek
1db17c6db2 - Don't destroy UMA zone on error in mdcreate_malloc(), because we need it
in mddestroy() to properly free already allocated memory.
  This fixes a panic when we want to create too big memory backed device
  with preallocate memory (-o reserve).
- Remove redundant { }.

MFC after:	1 week
2005-01-22 19:56:03 +00:00
Poul-Henning Kamp
9d3a77c463 Add a couple of mtx_asserts() to try to narrow down the window on
a bug repeatedly reported.
2005-01-22 19:08:50 +00:00
Warner Losh
098ca2bda9 Start each of the license/copyright comments with /*-, minor shuffle of lines 2005-01-06 01:43:34 +00:00
Alan Cox
c935314fae Add needed synchronization to the error handling code that was introduced
in revision 1.141.

Lock assertion failures reported by: Kris Kennaway
2005-01-05 05:32:52 +00:00
John Baldwin
63710c4d35 Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.
2004-12-30 20:29:58 +00:00
Pawel Jakub Dawidek
88b5b78d59 Rewrite piece of code which I committed some time ago that allows to
show file name for 'mdconfig -l -u <x>' command.
This allows to preserve API/ABI compatibility with version 0 (that's why
I changed version number back to 0) and will allow to merge this change
to RELENG_5.

MFC after:	5 days
2004-12-27 17:20:06 +00:00
Marcel Moolenaar
8b6fc67a49 Fix the MDIOCDETACH ioctl() for md(4). Now that the md_file field in
the mdio structure is an array and not a pointer, we cannot test for
it to be NULL. It never is. Instead, test for md_file[0] to be '\0'.
2004-11-13 05:00:12 +00:00
Pawel Jakub Dawidek
e3ed29a739 Be consistent and use 'if (error != 0)' instead of 'if (error)' everywhere. 2004-11-06 13:16:35 +00:00
Pawel Jakub Dawidek
61a6eb62ec For file backed md(4) devices output their source file via
'mdconfig -l -u <unit>'.
Bump version number, as this change breaks ABI/API.
2004-11-06 13:07:02 +00:00
Poul-Henning Kamp
3b66ad07db Don't explicitly call g_waitidle(), it happens automagically now. 2004-10-23 20:50:06 +00:00
Brian Feldman
812851b6c9 Account for failure in vm_pager_allocate() or vm_pager_get_pages() in
md(8).  The former is generally not going to fail, but the latter can
fail when the underlying swap device returns an error.

There are still plenty of other places where vm_pager_get_pages() failing
will lead directly to crashes, so it's a good idea to put your swap on
RAID if you care enough to put any of your disks on RAID....
2004-10-12 04:47:16 +00:00
Pawel Jakub Dawidek
e4cdd0d4b5 Actually this order (unlock, wakeup) in this case is race-safe and can
save us 2 context switches.

Explained by:	njl
2004-09-18 09:16:19 +00:00
Pawel Jakub Dawidek
b830359bc5 - Make md(4) 64-bit clean.
After this change it should be possible to use very big md(4) devices.
- Clean up and simplify the code a bit.
- Use humanize_number(3) to print size of md(4) devices.
- Add 't' suffix which stands for terabyte.
- Make '-S' to really work with all types of devices.
- Other minor changes.
2004-09-16 21:32:13 +00:00
Pawel Jakub Dawidek
fcd57fbe6f There is no need to keep 'npage' value inside our softc structure,
it is only used in one function. While doing so, change its type to
vm_ooffset_t.
We are still limited for swap-backed devices to 16TB on 32-bit architectures
where PAGE_SIZE is 4096 bytes.
2004-09-16 20:38:11 +00:00
Pawel Jakub Dawidek
a8a58d03f6 - Do not use bio_pblkno as it is going away anyway.
- Prefer bio_length than bio_bcount.
2004-09-16 19:42:17 +00:00
Pawel Jakub Dawidek
4b07ede4a7 First wakeup, then unlock. 2004-09-16 18:59:19 +00:00
Pawel Jakub Dawidek
6ab0a0aefe Type 'int' is too small for 'i' and 'lastp' variables. Use proper type,
which is vm_pindex_t (unsigned 64bit on i386).
2004-09-16 18:56:20 +00:00
Pawel Jakub Dawidek
2eafd8b126 Deallocate VM object on failure. 2004-09-14 19:55:07 +00:00
Pawel Jakub Dawidek
7a0970111f One more missing NDFREE(9). 2004-09-14 19:27:59 +00:00
Pawel Jakub Dawidek
52c6716fee - Don't forget about NDFREE() in case of vn_open() failure.
- Don't forget about vn_close() in case of failure.
2004-09-14 18:43:24 +00:00
Pawel Jakub Dawidek
f9963bbc08 Fix UMA zone leak. 2004-09-14 18:32:05 +00:00
Poul-Henning Kamp
affa470653 Use bioq_takefirst() 2004-09-07 07:54:45 +00:00
Colin Percival
972be79add Don't g_waitidle() when initializing a preloaded md. This fixes a
deadlock which otherwise occurs during the boot process.

Reported by:	kensmith
MFC after:	3 days
		(assuming that re@ approves)
2004-08-30 08:38:30 +00:00
Colin Percival
2b004a226d When creating a new md, wait for geom's event queue to become empty
before returning.  Device nodes are created via the "taste" mechanism,
so this is necessary in order to make sure that devfs entries are
created before mdconfig(8) returns.

This may be a MFC candidate for 5.3.

Suggested by:	phk
2004-08-22 19:44:24 +00:00
Poul-Henning Kamp
5721c9c76a Tag all geom classes in the tree with a version number. 2004-08-08 07:57:53 +00:00