OPNsense - FreeBSD source
Find a file
Justin T. Gibbs 439d30d121 Enhance the ZFS vdev layer to maintain both a logical and a physical
minimum allocation size for devices.  Use this information to
automatically increase ZFS's minimum allocation size for new top-level
vdevs to a value that more closely matches the optimum device
allocation size.

Use GEOM's stripesize attribute, if set, as the physical sector
size of the GEOM.

Calculate the minimum blocksize of each metaslab class.  Use the
calculated value instead of SPA_MINBLOCKSIZE (512b) when determining
the likelyhood of compression yeilding a reduction in physical space
usage.

Report devices with sub-optimal block size configuration in "zpool
status".  Also properly fail attempts to attach devices with a
logical block size greater than 8kB, since this will cause corruption
to ZFS's label area.

Sponsored by:	Spectra Logic Corporaion
MFC after:	2 weeks

Background
==========
Many modern devices use physical allocation units that are much
larger than the minimum logical allocation size accessible by
external commands.  Two prevalent examples of this are 512e disk
drives (512b logical sector, 4K physical sector) and flash devices
(512b logical sector, 4K or larger allocation block size, and 128k
or larger erase block size).  Operations that modify less than the
physical sector size result in a costly read-modify-write or garbage
collection sequence on these devices.

Simply exporting the true physical sector of the device to ZFS would
yield optimal performance, but has two serious drawbacks:

1) Existing pools created with devices that have different logical
   and physical block sizes, but were configured to use the logical
   block size (e.g. because the OS version used for pool construction
   reported the logical block size instead of the physical block
   size) will suddenly find that the vdev allocation size has
   increased.  This can be easily tolerated for active members of
   the array, but ZFS would prevent replacement of a vdev with
   another identical device because it now appears that the smaller
   allocation size required by the pool is not supported by the new
   device.

2) The device's physical block size may be too large to be supported
   by ZFS.  The optimal allocation size for the vdev may be quite
   large.  For example, a RAID controller may export a vdev that
   requires read-modify-write cycles unless accessed using 64k
   aligned/sized requests.  ZFS currently has an 8k minimum block
   size limit.

Reporting both the logical and physical allocation sizes for vdevs
solves these problems.  A device may be used so long as the logical
block size is compatible with the configuration.  By comparing the
logical and physical block sizes, new configurations can be optimized
and administrators can be notified of any existing pools that are
sub-optimal.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h:
	Add the SPA_ASHIFT constant.  ZFS currently has a hard upper
	limit of 13 (8k) for ashift and this constant is used to
	both document and enforce this limit.

sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h:
	Add the VDEV_AUX_ASHIFT_TOO_BIG error code.

	Add fields for exporting the configured, logical, and
	physical ashift to the vdev_stat_t structure.

	Add VDEV_STAT_VALID() macro which can be used to verify the
	presence of required vdev_stat_t fields in nvlist data.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:
	Provide a SYSCTL_PROC handler for "max_auto_ashift".  Since
	the limit is only referenced long after boot when a create
	operation occurs, there's no compelling need for it to be
	a boot time configurable tunable.  This also allows the
	validation code for the max_auto_ashift value to be contained
	within the sysctl handler.

	Populate the new fields in the vdev_stat_t structure.

	Fail vdev opens if the vdev reports an ashift larger than
	SPA_MAXASHIFT.

	Propogate vdev_logical_ashift and vdev_physical_ashift between
	child and parent vdevs as is done for vdev_ashift.

	In vdev_open(), restore code that fails opens for devices
	where vdev_ashift grows.  This can only happen now if the
	device's logical ashift grows, which means it really isn't
	safe to use the device.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c:
	Update the vdev_open() API so that both logical (what was
	just ashift before) and physical ashift are reported.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h:
	Add two new fields, vdev_physical_ashift and vdev_logical_ashift,
	to vdev_t.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:
	Add vdev_ashift_optimize().  Call it anytime a new top-level
	vdev is allocated.

cddl/contrib/opensolaris/cmd/zpool/zpool_main.c:
	Add text for the VDEV_AUX_ASHIFT_TOO_BIG error.

	For each sub-optimally configured leaf vdev, report configured
	and native block sizes.

cddl/contrib/opensolaris/cmd/zpool/zpool_main.c:
cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h:
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c:
	Introduce a new zpool status: ZPOOL_STATUS_NON_NATIVE_ASHIFT.
	This status is reported on healthy pools containing vdevs
	configured to use a block size smaller than their reported
	physical block size.

cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c:
	Update find_vdev_problem() and supporting functions to
	provide the full vdev_stat_t structure to problem checking
	routines, and to allow decent into replacing vdevs.

	Add a vdev_non_native_ashift() validator which is used on
	the full vdev tree to check for ZPOOL_STATUS_NON_NATIVE_ASHIFT.

cddl/contrib/opensolaris/lib/libzpool/common/kernel.c:
cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h:
	Enhance sysctl userland stubs now that a SYSCTL_PROC handler
	is used in vdev.c.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h:
	When the group membership of a metaslab class changes (i.e.
	when a vdev is added or removed from a pool), walk the group
	list to determine the smallest block size currently available
	and record this in the metaslab class.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:
	Add the metaslab_class_get_minblocksize() accessor.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_compress.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:
	In zio_compress_data(), take the minimum blocksize as an
	input parameter instead of assuming SPA_MINBLOCKSIZE.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:
	In l2arc_compress_buf(), pass SPA_MINBLOCKSIZE as the minimum
	blocksize of the device.  The l2arc code performs has it's own
	code for deciding if compression is worth while, so this
	effectively disables zio_compress_data() from second guessing
	the original decision.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:
	In zio_write_bp_init(), use the minimum blocksize of the
	normal metaslab class when compressing data.
2013-08-21 04:10:24 +00:00
bin sh: Remove unnecessary reset functions. 2013-08-16 20:24:41 +00:00
cddl Enhance the ZFS vdev layer to maintain both a logical and a physical 2013-08-21 04:10:24 +00:00
contrib Pull in r182983 from upstream clang trunk: 2013-08-20 20:51:32 +00:00
crypto Apply upstream revision 1.151 (fix relative symlinks) 2013-08-13 09:06:18 +00:00
etc Pass pidfile to bsnmpd if it's been changed (parts cut/pasted from 2013-08-19 05:37:49 +00:00
games Remove a reference to instant-server which has been removed from the 2013-03-21 12:42:25 +00:00
gnu Teach libstdc++ about logl(3). 2013-08-13 20:28:21 +00:00
include Implement fdclosedir(3) function, which is equivalent to the closedir(3) 2013-08-18 20:11:34 +00:00
kerberos5 Fix the getpwnam_r() call in the pname_to_uid() kerberos library function so 2013-05-02 12:52:49 +00:00
lib Implement fdclosedir(3) function, which is equivalent to the closedir(3) 2013-08-18 20:11:34 +00:00
libexec Revert r253748,253749 2013-07-28 18:44:17 +00:00
release Remove get_rev_branch(), functionality exists in the release/Makefile. 2013-08-13 21:01:23 +00:00
rescue - Trim an unused and bogus Makefile for mount_smbfs. 2013-06-28 21:00:08 +00:00
sbin Fix the zeroing loop. I must have been drunk when I wrote this... 2013-08-20 07:19:58 +00:00
secure Remove references to MK_IDEA. 2013-04-27 05:44:39 +00:00
share Update the SDT(9) man page with the macros added in 254468. Also change the 2013-08-17 22:06:30 +00:00
sys Enhance the ZFS vdev layer to maintain both a logical and a physical 2013-08-21 04:10:24 +00:00
tools Catch up with various changes to if_data and make this compile again 2013-08-20 14:37:06 +00:00
usr.bin Subversion requires atomic functions we only support on arm with clang. 2013-08-19 17:44:19 +00:00
usr.sbin Add entry for packages-9.2-release directory. 2013-08-19 14:04:35 +00:00
COPYRIGHT Happy New Year 2013! 2012-12-31 11:22:55 +00:00
LOCKS Test commit to make sure commit mail works after moving the server. 2012-12-29 16:03:23 +00:00
MAINTAINERS Add myself as maintainer for nvme(4), nvd(4) and nvmecontrol(8). 2013-07-31 18:18:02 +00:00
Makefile Don't let user specified DESTDIR, break building our chosen make. 2013-08-17 04:41:35 +00:00
Makefile.inc1 Update nvi-1.79 to 2.1.1-4334a8297f 2013-08-11 20:03:12 +00:00
ObsoleteFiles.inc In r227839, when removing libkvm dependency on procfs(5), 2013-07-10 19:44:43 +00:00
README Import nvi-2.1.1-4334a8297f into the work area. This is the gsoc-2011 2013-08-11 09:44:58 +00:00
UPDATING Add a note that if you were WITH_ICONV before, you should turn on 2013-08-13 07:31:27 +00:00

This is the top level of the FreeBSD source directory.  This file
was last revised on:
$FreeBSD$

For copyright information, please see the file COPYRIGHT in this
directory (additional copyright information also exists for some
sources in this tree - please see the specific source directories for
more information).

The Makefile in this directory supports a number of targets for
building components (or all) of the FreeBSD source tree, the most
commonly used one being ``world'', which rebuilds and installs
everything in the FreeBSD system from the source tree except the
kernel, the kernel-modules and the contents of /etc.  The ``world''
target should only be used in cases where the source tree has not
changed from the currently running version.  See:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html
for more information, including setting make(1) variables.

The ``buildkernel'' and ``installkernel'' targets build and install
the kernel and the modules (see below).  Please see the top of
the Makefile in this directory for more information on the
standard build targets and compile-time flags.

Building a kernel is a somewhat more involved process, documentation
for which can be found at:
   http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig.html
And in the config(8) man page.
Note: If you want to build and install the kernel with the
``buildkernel'' and ``installkernel'' targets, you might need to build
world before.  More information is available in the handbook.

The sample kernel configuration files reside in the sys/<arch>/conf
sub-directory (assuming that you've installed the kernel sources), the
file named GENERIC being the one used to build your initial installation
kernel.  The file NOTES contains entries and documentation for all possible
devices, not just those commonly used.  It is the successor of the ancient
LINT file, but in contrast to LINT, it is not buildable as a kernel but a
pure reference and documentation file.


Source Roadmap:
---------------
bin		System/user commands.

cddl		Various commands and libraries under the Common Development
		and Distribution License.

contrib		Packages contributed by 3rd parties.

crypto		Cryptography stuff (see crypto/README).

etc		Template files for /etc.

games		Amusements.

gnu		Various commands and libraries under the GNU Public License.
		Please see gnu/COPYING* for more information.

include		System include files.

kerberos5	Kerberos5 (Heimdal) package.

lib		System libraries.

libexec		System daemons.

release		Release building Makefile & associated tools.

rescue		Build system for statically linked /rescue utilities.

sbin		System commands.

secure		Cryptographic libraries and commands.

share		Shared resources.

sys		Kernel sources.

tools		Utilities for regression testing and miscellaneous tasks.

usr.bin		User commands.

usr.sbin	System administration commands.


For information on synchronizing your source tree with one or more of
the FreeBSD Project's development branches, please see:

  http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/synching.html