opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-04-21 06:07:31 -04:00

Author	SHA1	Message	Date
Warner Losh	29077eb456	Use atomic load and stores to ensure that the compiler doesn't optimize away these loops. Change boolean to int to match what atomic API supplies. Remove wmb() since the atomic_store_rel() on status.done ensure the prior writes to status. It also fixes the fact that there wasn't a rmb() before reading done. This should also be more efficient since wmb() is fairly heavy weight. Sponsored by: Netflix Reviewed by: kib@, jim harris Differential Revision: https://reviews.freebsd.org/D14053	2018-01-29 00:00:52 +00:00
Pedro F. Giffuni	ac2fffa4b7	Revert r327828, r327949, r327953, r328016-r328026, r328041: Uses of mallocarray(9). The use of mallocarray(9) has rocketed the required swap to build FreeBSD. This is likely caused by the allocation size attributes which put extra pressure on the compiler. Given that most of these checks are superfluous we have to choose better where to use mallocarray(9). We still have more uses of mallocarray(9) but hopefully this is enough to bring swap usage to a reasonable level. Reported by: wosch PR: 225197	2018-01-21 15:42:36 +00:00
Warner Losh	7e5f6f2588	Move setting of CAM_SIM_QUEUED to before we actually submit it to the hardware. Setting it after is racy, and we can lose the race on a heavily loaded system. Reviewed by: scottl@, gallatin@ Sponsored by: Netflix	2018-01-17 17:08:26 +00:00
Pedro F. Giffuni	26c1d774b5	dev: make some use of mallocarray(9). Focus on code where we are doing multiplications within malloc(9). None of these is likely to overflow, however the change is still useful as some static checkers can benefit from the allocation attributes we use for mallocarray. This initial sweep only covers malloc(9) calls with M_NOWAIT. No good reason but I started doing the changes before r327796 and at that time it was convenient to make sure the sorrounding code could handle NULL values.	2018-01-13 22:30:30 +00:00
Warner Losh	4484c8f5d2	Return domain, bus, slot, and function for the transport settings in PATH_INQ requests for nvme. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D13546	2017-12-20 19:13:55 +00:00
Warner Losh	989c7f0b7c	Although we only have one quirk at the moment, guard against the day we have more than one by checking the actual quirk bit before delaying the reset. Noticed by: rpokala@	2017-12-18 20:11:21 +00:00
Warner Losh	ce1ec9c178	When we're disabling the nvme device, some drives have a controller bug that requires 'hands off' for a period of time (2.3s) before we check the RDY bit. Sicne this is a very odd quirk for a very limited selection of drives, do this as a quirk. This prevented a successful reset of the card when the card wedged. Also, make sure that we comply with the advice from section 3.1.5 of the 1.3 spec says that transitioning CC.EN from 0 to 1 when CSTS.RDY is 1 or transitioning CC.EN from 1 to 0 when CSTS.RDY is 0 "has undefined results". Short circuit when EN == RDY == desired state. Finally, fail the reset if the disable fails. This will lead to a failed device, which is what we want. (note: nda device needs work for coping with a failed device). Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D13389	2017-12-18 18:38:00 +00:00
Pedro F. Giffuni	718cf2ccb9	sys/dev: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 14:52:40 +00:00
Warner Losh	eab9d0a85b	Inline pcie_link_{status,caps} where needed. Remove them as they aren't really needed and I don't want to document them. Suggested by: jhb@ Sponsored by: Netflix	2017-11-15 02:24:47 +00:00
Warner Losh	4e3b274457	Provide link speed data in XPT_GET_TRAN_SETTINGS. Provide full version information for that and XPT_PATH_INQ. Provide macros to encode/decode major/minor versions. Read the link speed and lane count to compute the base_transfer_speed for XPT_PATH_INQ. Sponsored by: Netflix	2017-11-14 05:05:16 +00:00
Warner Losh	fa271a5d09	Closer examination shows that nvme and CAM both normally zero-fill allocations (for req and ccb, which ultimately contain the nvme_cmd). As such, we can micro-optimize these routines. Add a comment to this effect, and bzero the ccb used to make the requests for the nda dump rotuine so it more closely matches a ccb allocated with xpt_get_ccb(). Sponsored by: Netflix	2017-10-15 23:53:55 +00:00
Warner Losh	29431e54b9	Use nvme_ctrlr_poll instead of nvme_ctrlr_intx_handler since it is more general and doesn't try to access registers that may be undefined when the card is in MSIX mode. This change, along with r324630, r324631, r324632, makes nda crash dumps work again. Previously, they only worked on CPU 0 when the stack garbage was just so. Sponsored by: Netflix Suggested by: scottl@ (who provided earlier version of the patch)	2017-10-15 16:19:09 +00:00
Warner Losh	bb1c7be429	Create general polling function for the nvme controller. Use it when we're doing the various pin-based interrupt modes. Adjust nvme_ctrlr_intx_handler to use nvme_ctrlr_poll. Sponsored by: Netflix Suggested by: scottl@	2017-10-15 16:18:08 +00:00
Warner Losh	fbed8df259	Explicitly set reserved fields and 'fuse' to 0. This prevents us from acidentally sending bogus values in these fields, which some drives may reject with an error or worse (undefined behavior). This is especially needed for the ndadump routine which allocates the cmd from stack garbage.... Sponsored by: Netflix	2017-10-15 16:17:59 +00:00
Warner Losh	cfb43eb12e	Tweak performance of nda completions Use xpt_done_direct in preference to xpt_done when completing a successful I/O. Continue to use xpt_done when there's an error, or for completion of the submission of a CCB. This eliminates a context switch to the cam_doneq thread. Sponsored by: Netflix Suggested by: scottl@	2017-09-28 01:27:00 +00:00
Warner Losh	5fff95cc1d	Fix queue depth for nda. 1/4 of the number of queues times queue entries is too limiting. It works up to about 4k IOPS / 3.0GB/s for hardware that can do 4.4k/3.2GB/s with nvd. 3/4 works better, though it highlights issues in the fairness of nda's choice of TRIM vs READ. That will be fixed separately.	2017-09-20 21:42:25 +00:00
Konstantin Belousov	5a21cd1941	The nvme module should explicitly declare dependency on the cam. If both nvme and cam are compiled as modules, nvme cannot be kldloaded otherwise. Reviewed by: imp Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-08-31 14:21:32 +00:00
Warner Losh	c2005bba77	Fix a few overlooked spots where the coded uses 16-bit NSIDs. Chuck Tuffli had submitted a more thorough patch that I was unaware of when I did my work and this brings in the bits I missed from that patch. PR: 220267 Submitted by: Chuck Tuffli	2017-08-29 15:46:34 +00:00
Warner Losh	519772814d	Add CAM/NVMe support for CAM_DATA_SG This adds support in pass(4) for data to be described with a scatter-gather list (sglist) to augment the existing (single) virtual address. Differential Revision: https://reviews.freebsd.org/D11361 Submitted by: Chuck Tuffli Reviewed by: imp@, scottl@, kenm@	2017-08-29 15:29:57 +00:00
Warner Losh	850564b948	Add new compile-time option NVME_USE_NVD that sets the default value of the runtime hw.nvme.use_vnd tunable. We still default to nvd unless otherwise requested. Sponsored by: Netflix	2017-08-28 23:54:25 +00:00
Warner Losh	c02565f9fa	Set the max transactions for NVMe drives better. Provided a better estimate for the number of transactions that can be pending at one time. This will be number of queues * number of trackers / 4, as suggested by Jim Harris. This gives a better estimate of the number of transactions that CAM should queue before applying back pressure. This should be revisted when we have real multi-queue support in CAM and the upper layers of the I/O stack. Sponsored by: Netflix	2017-08-28 23:54:20 +00:00
Warner Losh	030edcce02	Fill in reserved areas from NVMe spec in the IDENTIFY structure (struct nvme_controller_data) as defined in the NVM Express specification, revsion 1.3. Sponsored by: Netflix	2017-08-25 21:38:43 +00:00
Warner Losh	696c950297	NVME Namespace ID is 32-bits, so widen interface to reflect that. Sponsored by: Netflix	2017-08-25 21:38:38 +00:00
Warner Losh	223a9b93ac	Add feature codes from NVMe 1.3 specification: o Automomous Power State Transition o Host Memory Buffer o Timestamp o Keep Alive Timer o Host Controlled Thermal Management o Non-Operational Power State Config Also note that feature codes 0x78-0x7f are reserved for the NVMe Management Interface. Sponsored by: Netflix	2017-08-25 21:38:29 +00:00
Warner Losh	0012e436e3	Use _Static_assert These files are compiled in userland too, so we can't use sys/systm.h and rely on CTASSERT. Switch to using _Static_assert instead. MFC After: 3 days Sponsored by: Netflix	2017-08-25 04:33:06 +00:00
Warner Losh	0c26c1992f	Sanity check sizes Add compile time sanity checks to make sure that packed structures are the proper size, typically as defined in the NVMe standard.	2017-08-25 04:05:53 +00:00
Warner Losh	abb61405a6	Enable bus mastering on the device before resetting the device. The card has to do PCIe transactions to complete the reset process, but can't do them, per the PCIe spec, unless bus mastering is enabled. Submitted by: Kinjal Patel PR: 22166	2017-08-25 03:15:18 +00:00
Nathan Whitehorn	c670f31f19	Move NVME controller shutdown from being called as part of module unloading to being called through the newbus DEVICE_SHUTDOWN() path. This ensures that the NVME controller gets shut down before the device and bus disappear and prevents data corruption on shutdown on at least Samsung EVO 960 SSDs. PR: kern/211852 Reviewed by: imp MFC after: 2 weeks	2017-08-12 22:13:06 +00:00
Warner Losh	d0e75394cf	Use the correct queue depth for nda devices. Submitted by: Matt Williams	2017-08-08 16:06:16 +00:00
Warner Losh	8a5d94f94d	Make nvd vs nda choice boot-time rather than build-time Introduce hw.nvme.use_nvd tunable. This tunable allows both nvd and nda to be installed in the kernel, while allowing only one of them to create devices. This is an all-or-nothing setting, and you can't change it after boot-time. However, it will allow easier A/B testing. Differential Revision: https://reviews.freebsd.org/D11825	2017-08-04 03:40:01 +00:00
Warner Losh	df4245150a	This adds CAM pass(4) support for NVMe IO's. Applications indicate the IO type (Admin or NVM) using XPT op-codes XPT_NVME_ADMIN or XPT_NVME_IO. Submitted by: Chuck Tuffli <chuck@tuffli.net> Differential Revision: https://reviews.freebsd.org/D10247	2017-07-14 14:52:20 +00:00
Warner Losh	594ffc03cd	Add new definitions for namespaces. Sponsored by: Netflix Submitted by: Matt Williams (via D11330)	2017-06-27 20:24:39 +00:00
Warner Losh	824073fbd6	Avoid dereferencing unintialized elements in the error path. Some drives sometimes have errors for things like setting the number of queue entries in the submission queue. The error paths taken for these drives ensure a panic dereferencing uninialized data. Sponsored by: Netflix	2017-03-07 23:06:41 +00:00
Warner Losh	05ee702af6	cwd10 takes the low 32-bits and cwd11 takes the upper 32-bits of the lba. Rather than do a cast to uint64_t, which clang warns might be unaligned, do the stores 32-bits at a time. Sponsored by: Netflix	2017-03-07 23:02:59 +00:00
Warner Losh	a8a18dd590	Make multi-namespace nvme drives more robust. Fix assumptions about name spaces in NVME driver. First, it assumes cdata.nn is the number of configured devices. However, it is the number of supported name spaces. Second, it assumes that there will never be more than 16 name spaces supported, but a certain drive I'm testing reports 1024. It assumes that name spaces are a tightly packed namespace, but the standard seems to indicate otherwise. Finally, it assumes that an error would be generated when quearying an unconfigured namespace. Instead, it succeeds but the identify data is all zeros. Fix these by limiting the number of name spaces we probe to 16. Remove aborting when we find one in error. When the size of the name space is zero, ignore it. This is admittedly a bandaide. The long term fix will be to participate in the enumeration and name space change protocols definfed in the NVNe standard. Sponsored by: Netflix	2017-03-07 21:47:54 +00:00
Warner Losh	adc8145e6f	Remove obsolete comment after prior rev.	2017-02-19 17:38:17 +00:00
Alexander Motin	950c5aca4a	Remove dead mentions of CAM target mode APIs from drivers. This makes grepping kernel for target mode implementation much easier.	2017-02-19 17:27:58 +00:00
Warner Losh	a3a6c48d66	Ensure that the passthrough request will fit in MAXPHYS bytes after it has been rounded to full pages. This avoids a panic in vm_fault_quick_hold_pages due to this off-by-one error passing one page too many into vmapbuf.	2017-02-02 23:04:06 +00:00
Ravi Pokala	d3c06026c2	In the same vein as r311350, fix whitespace in handling of XPT_PATH_INQ in several more drivers. Sponsored by: Panasas	2017-01-05 03:08:57 +00:00
Alan Somers	4195c7de24	Always null-terminate ccb_pathinq.(sim_vid\|hba_vid\|dev_name) The sim_vid, hba_vid, and dev_name fields of struct ccb_pathinq are fixed-length strings. AFAICT the only place they're read is in sbin/camcontrol/camcontrol.c, which assumes they'll be null-terminated. However, the kernel doesn't null-terminate them. A bunch of copy-pasted code uses strncpy to write them, and doesn't guarantee null-termination. For at least 4 drivers (mpr, mps, ciss, and hyperv), the hba_vid field actually overflows. You can see the result by doing "camcontrol negotiate da0 -v". This change null-terminates those fields everywhere they're set in the kernel. It also shortens a few strings to ensure they'll fit within the 16-character field. PR: 215474 Reported by: Coverity CID: 1009997 1010000 `1010001` 1010002 1010003 1010004 1010005 CID: 1331519 1010006 1215097 1010007 1288967 1010008 1306000 CID: 1211924 1010009 1010010 1010011 1010012 1010013 1010014 CID: 1147190 1010017 1010016 1010018 1216435 1010020 1010021 CID: 1010022 1009666 1018185 1010023 1010025 1010026 1010027 CID: 1010028 1010029 1010030 1010031 1010033 1018186 1018187 CID: 1010035 1010036 1010042 1010041 1010040 1010039 Reviewed by: imp, sephe, slm MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D9037 Differential Revision: https://reviews.freebsd.org/D9038	2017-01-04 20:26:42 +00:00
Warner Losh	0cf14228c4	Implement HGST Log page 0xc1, as documented in the HGST SN100 and SN150 product manuals. Subpage 0x32 is documented, but not implemented. Sponsored by: Netflix, Inc	2016-11-19 17:13:08 +00:00
Warner Losh	ab1dd0917b	Print Intel's expanded Temperature log page. Sponsored by: Netflix, Inc	2016-11-19 17:13:03 +00:00
Warner Losh	d01f26f590	Add log pages that Intel SSDs provide. It turns out that many of these are widely implemented beyond just Intel drives. Sponsored by: Netflix, Inc	2016-11-19 17:12:58 +00:00
Warner Losh	aea528795a	Add log pages defined through NVM Express 1.2.1. Sponsored by: Netflix, Inc	2016-11-19 17:12:53 +00:00
Warner Losh	dc58cdf95e	Expand the SMART / Health Information Log Page (Page 02) printout based on NVM Express 1.2.1 Standard. Sponsored by: Netflix, Inc	2016-11-19 17:12:49 +00:00
Scott Long	a965389b5a	Convert the Q-Pair and PRP list memory allocations to use BUSDMA. Add a bunch of safery belts and error handling in related codepaths. Reviewed by: jimharris Obtained from: Netflix Differential Revision: D8453	2016-11-08 00:24:49 +00:00
Warner Losh	34dc8f1bb4	Kill a few stray debug printfs.	2016-07-28 22:40:31 +00:00
Warner Losh	3a31c31c22	Actually import nvme_sim so the CAM attachment for NVME (nda) actually works. MFC after: 1 week	2016-07-21 03:11:39 +00:00
Scott Long	49e20d2420	Supporting flushing the dump before returning, and simplify/combine the logic. Switch to a 5us delay since most NVME devices can easily do 200,000 iops. Submitted by: imp MFC after: 3 days Sponsored by: Netflix, Inc.	2016-07-19 19:09:23 +00:00
Scott Long	a498975ef7	Implement crashdump support on NVME MFC after: 3 days Sponsored by: Netflix, Inc.	2016-07-19 03:13:51 +00:00
Warner Losh	f24c011beb	Commit the bits of nda that were missed. This should fix the build. Approved by: re@	2016-06-10 06:04:53 +00:00
Alexander Motin	ee7f4d8187	Revert r292074 (by smh): Limit stripesize reported from nvd(4) to 4K I believe that this patch handled the problem from the wrong side. Instead of making ZFS properly handle large stripe sizes, it made unrelated driver to lie in reported parameters to workaround that. Alternative solution for this problem from ZFS side was committed at r296615. Discussed with: smh	2016-03-10 17:13:10 +00:00
Jim Harris	361e1fb408	nvme: fix intx handler to not dereference ioq during initialization This was a regression from r293328, which deferred allocation of the controller's ioq array until after interrupts are enabled during boot. PR: 207432 Reported and tested by: Andy Carrel <wac@google.com> MFC after: 3 days Sponsored by: Intel	2016-02-24 00:01:10 +00:00
Justin Hibbits	43cd61606b	Replace several bus_alloc_resource() calls using default arguments with bus_alloc_resource_any() Since these calls only use default arguments, bus_alloc_resource_any() is the right call. Differential Revision: https://reviews.freebsd.org/D5306	2016-02-19 03:37:56 +00:00
Jim Harris	7b036d7790	nvme: avoid duplicate SET_NUM_QUEUES commands nvme(4) issues a SET_NUM_QUEUES command during device initialization to ensure enough I/O queues exists for each of the MSI-X vectors we have allocated. The SET_NUM_QUEUES command is then issued again during nvme_ctrlr_start(), to ensure that is properly set after any controller reset. At least one NVMe drive exists which fails this second SET_NUM_QUEUES command during device initialization. So change nvme_ctrlr_start() to only issue its SET_NUM_QUEUES command when it is coming out of a reset - avoiding the duplicate SET_NUM_QUEUES during device initialization. Reported by: gallatin MFC after: 3 days Sponsored by: Intel	2016-02-11 17:32:41 +00:00
Warner Losh	038659e7dd	Implement power command to list all power modes, find out the power mode we're in and to set the power mode.	2016-01-30 22:48:06 +00:00
Jim Harris	9c6b5d40eb	nvme: replace NVME_CEILING macro with howmany() Suggested by: rpokala MFC after: 3 days	2016-01-07 20:35:26 +00:00
Jim Harris	50dea2da12	nvme: add hw.nvme.min_cpus_per_ioq tunable Due to FreeBSD system-wide limits on number of MSI-X vectors (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321), it may be desirable to allocate fewer than the maximum number of vectors for an NVMe device, in order to save vectors for other devices (usually Ethernet) that can take better advantage of them and may be probed after NVMe. This tunable is expressed in terms of minimum number of CPUs per I/O queue instead of max number of queues per controller, to allow for a more even distribution of CPUs per queue. This avoids cases where some number of CPUs have a dedicated queue, but other CPUs need to share queues. Ideally the PR referenced above will eventually be fixed and the mechanism implemented here becomes obsolete anyways. While here, fix a bug in the CPUs per I/O queue calculation to properly account for the admin queue's MSI-X vector. Reviewed by: gallatin MFC after: 3 days Sponsored by: Intel	2016-01-07 20:32:04 +00:00
Jim Harris	2b647da7a0	nvme: do not revert o single I/O queue when per-CPU queues not possible Previously nvme(4) would revert to a signle I/O queue if it could not allocate enought interrupt vectors or NVMe submission/completion queues to have one I/O queue per core. This patch determines how to utilize a smaller number of available interrupt vectors, and assigns (as closely as possible) an equal number of cores to each associated I/O queue. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:18:32 +00:00
Jim Harris	d400f790b1	nvme: break out interrupt setup code into a separate function MFC after: 3 days Sponsored by: Intel	2016-01-07 16:12:42 +00:00
Jim Harris	e5af5854ff	nvme: do not pre-allocate MSI-X IRQ resources The issue referenced here was resolved by other changes in recent commits, so this code is no longer needed. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:11:31 +00:00
Jim Harris	c75ad8ce5a	nvme: remove per_cpu_io_queues from struct nvme_controller Instead just use num_io_queues to make this determination. This prepares for some future changes enabling use of multiple queues when we do not have enough queues or MSI-X vectors for one queue per CPU. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:09:56 +00:00
Jim Harris	d85f84abb8	nvme: simplify some of the nested ifs in interrupt setup code This prepares for some follow-up commits which do more work in this area. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:08:04 +00:00
Steven Hartland	fdf16a68ab	Limit stripesize reported from nvd(4) to 4K Intel NVMe controllers have a slow path for I/Os that span a 128KB stripe boundary but ZFS limits ashift, which is derived from d_stripesize, to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB. This may result in a small number of additional I/Os to require splitting in nvme(4), however the NVMe I/O path is very efficient so these additional I/Os will cause very minimal (if any) difference in performance or CPU utilisation. This can be controller by the new sysctl kern.nvme.max_optimal_sectorsize. MFC after: 1 week Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4446	2015-12-11 02:06:03 +00:00
Jim Harris	fdbd3d8068	nvd, nvme: report stripesize through GEOM disk layer MFC after: 3 days Sponsored by: Intel	2015-10-30 16:35:18 +00:00
Jim Harris	e7e7bad3d7	nvme: fix race condition in split bio completion path Fixes race condition observed under following circumstances: 1) I/O split on 128KB boundary with Intel NVMe controller. Current Intel controllers produce better latency when I/Os do not span a 128KB boundary - even if the I/O size itself is less than 128KB. 2) Per-CPU I/O queues are enabled. 3) Child I/Os are submitted on different submission queues. 4) Interrupts for child I/O completions occur almost simultaneously. 5) ithread for child I/O A increments bio_inbed, then immediately is preempted (rendezvous IPI, higher priority interrupt). 6) ithread for child I/O B increments bio_inbed, then completes parent bio since all children are now completed. 7) parent bio is freed, and immediately reallocated for a VFS or gpart bio (including setting bio_children to 1 and clearing bio_driver1). 8) ithread for child I/O A resumes processing. bio_children for what it thinks is the parent bio is set to 1, so it thinks it needs to complete the parent bio. Result is either calling a NULL callback function, or double freeing the bio to its uma zone. PR: 203746 Reported by: Drew Gallatin <gallatin@netflix.com>, Marc Goroff <mgoroff@quorum.net> Tested by: Drew Gallatin <gallatin@netflix.com> MFC after: 3 days Sponsored by: Intel	2015-10-30 16:06:34 +00:00
Jim Harris	0e1fd2dda3	nvme: do not notify a consumer about failures that occur during initialization MFC after: 3 days Sponsored by: Intel	2015-07-29 21:29:50 +00:00
Jeff Roberson	fade8dd714	Refactor unmapped buffer address handling. - Use pointer assignment rather than a combination of pointers and flags to switch buffers between unmapped and mapped. This eliminates multiple flags and generally simplifies the logic. - Eliminate b_saveaddr since it is only used with pager bufs which have their b_data re-initialized on each allocation. - Gather up some convenience routines in the buffer cache for manipulating buf space and buf malloc space. - Add an inline, buf_mapped(), to standardize checks around unmapped buffers. In collaboration with: mlaier Reviewed by: kib Tested by: pho (many small revisions ago) Sponsored by: EMC / Isilon Storage Division	2015-07-23 19:13:41 +00:00
Jim Harris	cbdec09c1c	nvme: ensure csts.rdy bit is cleared before returning from nvme_ctrlr_disable PR: 200458 MFC after: 3 days Sponsored by: Intel	2015-07-23 15:50:39 +00:00
Jim Harris	de9a58f4ee	nvme: properly handle case where pci_alloc_msix does not alloc all vectors Reported by: Sean Kelly <smkelly@smkelly.org> MFC after: 3 days Sponsored by: Intel	2015-07-23 15:35:08 +00:00
Jim Harris	3345ed9a55	nvme: use BUS_SPACE_MAXSIZE for bus_dma_tag_create maxsize parameter This fixes i386 PAE build fallout from r281281. Reported by: bz MFC after: 1 week	2015-04-09 00:37:55 +00:00
Jim Harris	36b0e4ee1f	nvme: remove CHATHAM related code Chatham was an internal NVMe prototype board used for early driver development. MFC after: 1 week Sponsored by: Intel	2015-04-08 21:52:06 +00:00
Jim Harris	eb4929fb41	nvme: add device strings for Intel DC series NVMe SSDs MFC after: 1 week Sponsored by: Intel	2015-04-08 21:50:45 +00:00
Jim Harris	a6e3096392	nvme: create separate DMA tag for non-payload DMA buffers Submission and completion queue memory need to use a separate DMA tag for mappings than payload buffers, to ensure mappings remain contiguous even with DMAR enabled. Submitted by: kib MFC after: 1 week Sponsored by: Intel	2015-04-08 21:49:45 +00:00
Jim Harris	e5ce537999	nvme: fall back to a smaller MSI-X vector allocation if necessary Previously, if per-CPU MSI-X vectors could not be allocated, nvme(4) would fall back to INTx with a single I/O queue pair. This change will still fall back to a single I/O queue pair, but allocate MSI-X vectors instead of reverting to INTx. MFC after: 1 week Sponsored by: Intel	2015-04-08 21:46:18 +00:00
Jim Harris	2efb5fb1ec	Use bitwise OR instead of logical OR when constructing value for SET_FEATURES/NUMBER_OF_QUEUES command. Sponsored by: Intel MFC after: 3 days	2014-06-10 21:40:43 +00:00
Jim Harris	f42ca756b9	nvme: Allocate all MSI resources up front so that we can fall back to INTx if necessary. Sponsored by: Intel MFC after: 3 days	2014-03-18 18:10:35 +00:00
Jim Harris	496a27520d	nvme: Close hole where nvd(4) would not be notified of all nvme(4) instances if modules loaded during boot. Sponsored by: Intel MFC after: 3 days	2014-03-18 18:09:08 +00:00
Jim Harris	1416ef361e	nvme: NVMe specification dictates 4-byte alignment for PRPs (not 8). Sponsored by: Intel MFC after: 3 days	2014-03-17 22:37:17 +00:00
Jim Harris	2b26030cbc	nvme: Remove the software progress marker SET_FEATURE command during controller initialization. The spec says OS drivers should send this command after controller initialization completes successfully, but other NVMe OS drivers are not sending this command. This change will therefore reduce differences between the FreeBSD and other OS drivers. Sponsored by: Intel MFC after: 3 days	2014-03-17 22:36:04 +00:00
Jim Harris	448cffc859	For IDENTIFY passthrough commands to Chatham prototype controllers, copy the spoofed identify data into the user buffer rather than issuing the command to the controller, since Chatham IDENTIFY data is always spoofed. While here, fix a bug in the spoofed data for Chatham submission and completion queue entry sizes. Sponsored by: Intel MFC after: 3 days	2014-01-06 23:51:26 +00:00
Jim Harris	d603c3d73b	Create a unique unit number for each controller and namespace cdev. Sponsored by: Intel MFC after: 3 days	2013-11-01 23:30:54 +00:00
Jim Harris	8a959ae073	Fix the LINT build. Approved by: re (implicit) MFC after: 1 week	2013-10-08 23:23:04 +00:00
Jim Harris	7aa27dbac5	Do not leak resources during attach if nvme_ctrlr_construct() or the initial controller resets fail. Sponsored by: Intel Reviewed by: carl Approved by: re (hrs) MFC after: 1 week	2013-10-08 16:01:43 +00:00
Jim Harris	bb2f67fd72	Log and then disable asynchronous notification of persistent events after they occur. This prevents repeated notifications of the same event. Status of these events may be viewed at any time by viewing the SMART/Health Info Page using nvmecontrol, whether or not asynchronous events notifications for those events are enabled. This log page can be viewed using: nvmecontrol logpage -p 2 <ctrlr id> Future enhancements may re-enable these notifications on a periodic basis so that if the notified condition persists, it will continue to be logged. Sponsored by: Intel Reviewed by: carl Approved by: re (hrs) MFC after: 1 week	2013-10-08 16:00:12 +00:00
Jim Harris	d5fc982133	Do not enable temperature threshold as an asynchronous event notification on NVMe controllers that do not support it. Sponsored by: Intel Reviewed by: carl Approved by: re (hrs) MFC after: 1 week	2013-10-08 15:49:14 +00:00
Jim Harris	992db80f1d	Extend some 32-bit fields and variables to 64-bit to prevent overflow when calculating stats in nvmecontrol perftest. Sponsored by: Intel Reported by: Joe Golio <joseph.golio@emc.com> Reviewed by: carl Approved by: re (hrs) MFC after: 1 week	2013-10-08 15:47:22 +00:00
Jim Harris	a40e72a695	Add driver-assisted striping for upcoming Intel NVMe controllers that can benefit from it. Sponsored by: Intel Reviewed by: kib (earlier version), carl Approved by: re (hrs) MFC after: 1 week	2013-10-08 15:44:04 +00:00
Kenneth D. Merry	ce625ec719	Change the way that unmapped I/O capability is advertised. The previous method was to set the D_UNMAPPED_IO flag in the cdevsw for the driver. The problem with this is that in many cases (e.g. sa(4)) there may be some instances of the driver that can handle unmapped I/O and some that can't. The isp(4) driver can handle unmapped I/O, but the esp(4) driver currently cannot. The cdevsw is shared among all driver instances. So instead of setting a flag on the cdevsw, set a flag on the cdev. This allows drivers to indicate support for unmapped I/O on a per-instance basis. sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it with an SI_UNMAPPED cdev flag. kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine whether or not a particular driver can handle unmapped I/O. geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs. Since GEOM will create a temporary mapping when needed, setting SI_UNMAPPED unconditionally will work. Remove the D_UNMAPPED_IO flag. nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here if NVME_UNMAPPED_BIO_SUPPORT is enabled. vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a cdev instead of the D_UNMAPPED_IO flag on the cdevsw. sys/param.h: Bump __FreeBSD_version to `1000045` for the switch from setting the D_UNMAPPED_IO flag in the cdevsw to setting SI_UNMAPPED in the cdev. Reviewed by: kib, jimharris MFC after: 1 week Sponsored by: Spectra Logic	2013-08-15 22:52:39 +00:00
Jim Harris	086d23cfd3	If a controller fails to initialize, do not notify consumers (nvd) of its namespaces. Sponsoredy by: Intel Reviewed by: carl MFC after: 3 days	2013-08-13 21:49:32 +00:00
Jim Harris	56183abc2b	Send a shutdown notification in the driver unload path, to ensure notification gets sent in cases where system shuts down with driver unloaded. Sponsored by: Intel Reviewed by: carl MFC after: 3 days	2013-08-13 21:47:08 +00:00
Jim Harris	38441bd9a9	Add message when nvd disks are attached and detached. As part of this commit, add an nvme_strvis() function which borrows heavily from cam_strvis(). This will allow stripping of leading/trailing whitespace and also handle unprintable characters in model/serial numbers. This function goes into a new nvme_util.c file which is used by both the driver and nvmecontrol. Sponsored by: Intel Reviewed by: carl MFC after: 3 days	2013-07-19 21:40:57 +00:00
Jim Harris	2fb37e8f1a	Fix nvme(4) and nvd(4) to support non 512-byte sector sizes. Recent testing with QEMU that has variable sector size support for NVMe uncovered some of these issues. Chatham prototype boards supported only 512 byte sectors. Sponsored by: Intel Reviewed by: carl MFC after: 3 days	2013-07-19 21:33:24 +00:00
Jim Harris	8e0ac13f5a	Use pause() instead of DELAY() when polling for completion of admin commands during controller initialization. DELAY() does not work here during config_intrhook context - we need to explicitly relinquish the CPU for the admin command completion to get processed. Sponsored by: Intel Reported by: Adam Brooks <adam.j.brooks@intel.com> Reviewed by: carl MFC after: 3 days	2013-07-17 23:26:56 +00:00
Jim Harris	e8f25c6266	Define constants for the lengths of the serial number, model number and firmware revision in the controller's identify structure. Also modify consumers of these fields to ensure they only use the specified number of bytes for their respective fields. Sponsored by: Intel Reviewed by: carl MFC after: 3 days	2013-07-17 23:23:38 +00:00
Jim Harris	66619178b5	Fix a poorly worded comment in nvme(4). MFC after: 3 days	2013-07-11 15:02:38 +00:00
Jim Harris	bd6b0ac5be	Add comment explaining why CACHE_LINE_SIZE is defined in nvme_private.h if not already defined elsewhere. Requested by: attilio MFC after: 3 days	2013-07-09 21:24:19 +00:00
Jim Harris	e9efbc134f	Update copyright dates. MFC after: 3 days	2013-07-09 21:22:17 +00:00
Jim Harris	ec526ea90b	Do not retry failed async event requests. Sponsored by: Intel MFC after: 3 days	2013-07-09 21:03:39 +00:00
Jim Harris	eb32b874f6	Add pci_enable_busmaster() and pci_disable_busmaster() calls in nvme_attach() and nvme_detach() respectively. Sponsored by: Intel MFC after: 3 days	2013-07-09 21:02:45 +00:00
Jim Harris	49fac6101d	Add firmware replacement and activation support to nvmecontrol(8) through a new firmware command. NVMe controllers may support up to 7 firmware slots for storing of different firmware revisions. This new firmware command supports firmware replacement (i.e. firmware download) with or without immediate activation, or activation of a previously stored firmware image. It also supports selection of the firmware slot during replacement operations, using IDENTIFY information from the controller to check that the specified slot is valid. Newly activated firmware does not take effect until the new controller reset, either via a reboot or separate 'nvmecontrol reset' command to the same controller. Submitted by: Joe Golio <joseph.golio@emc.com> Obtained from: EMC / Isilon Storage Division MFC after: 3 days	2013-06-27 00:08:25 +00:00
Jim Harris	bbd412dd05	Remove remaining uio-related code. The nvme_physio() function was removed quite a while ago, which was the only user of this uio-related code. Sponsored by: Intel MFC after: 3 days	2013-06-26 23:37:11 +00:00
Jim Harris	7b68ae1e5e	Fail any passthrough command whose transfer size exceeds the controller's max transfer size. This guards against rogue commands coming in from userspace. Also add KASSERTS for the virtual address and unmapped bio cases, if the transfer size exceeds the controller's max transfer size. Sponsored by: Intel MFC after: 3 days	2013-06-26 23:32:45 +00:00
Jim Harris	8d09e3c400	Use MAXPHYS to specify the maximum I/O size for nvme(4). Also allow admin commands to transfer up to this maximum I/O size, rather than the artificial limit previously imposed. The larger I/O size is very beneficial for upcoming firmware download support. This has the added benefit of simplifying the code since both admin and I/O commands now use the same maximum I/O size. Sponsored by: Intel MFC after: 3 days	2013-06-26 23:27:17 +00:00
Jim Harris	5076698e19	Remove the NVME_IDENTIFY_CONTROLLER and NVME_IDENTIFY_NAMESPACE IOCTLs and replace them with the NVMe passthrough equivalent. Sponsored by: Intel	2013-04-12 17:56:47 +00:00
Jim Harris	7c3f19d7bb	Add support for passthrough NVMe commands. This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than separate IOCTLs for each. Sponsored by: Intel	2013-04-12 17:52:17 +00:00
Jim Harris	ca269f32ef	Move the busdma mapping functions to nvme_qpair.c. This removes nvme_uio.c completely. Sponsored by: Intel	2013-04-12 17:48:45 +00:00
Jim Harris	611060cab5	Remove the NVMe-specific physio and associated routines. These were added early on for benchmarking purposes to avoid the mapped I/O penalties incurred in kern_physio. Now that FreeBSD (including kern_physio) supports unmapped I/O, the need for these NVMe-specific routines no longer exists. Sponsored by: Intel	2013-04-12 17:44:55 +00:00
Jim Harris	97fafe2580	Add a mutex to each namespace, for general locking operations on the namespace. Sponsored by: Intel	2013-04-12 17:41:24 +00:00
Jim Harris	a90b810492	Rename the controller's fail_req_lock, so that it can be used for other locking operations on the controller. Sponsored by: Intel	2013-04-12 17:36:48 +00:00
Jim Harris	e2b9900498	Do not panic when a busdma mapping operation fails. Instead, print an error message and fail the associated command with DATA_TRANSFER_ERROR NVMe completion status. Sponsored by: Intel	2013-04-12 17:34:49 +00:00
Jim Harris	5fdf9c3c8e	Add unmapped bio support to nvme(4) and nvd(4). Sponsored by: Intel	2013-04-01 16:23:34 +00:00
Jim Harris	1e526bc478	Add "type" to nvme_request, signifying if its payload is a VADDR, UIO, or NULL. This simplifies decisions around if/how requests are routed through busdma. It also paves the way for supporting unmapped bios. Sponsored by: Intel	2013-03-29 20:34:28 +00:00
Jim Harris	64432b473b	Remove obsolete comment. This code has now been tested with the QEMU NVMe device emulator.	2013-03-28 16:57:48 +00:00
Jim Harris	bb852ae89b	Delete extra IO qpairs allocated based on number of MSI-X vectors, but later found to not be usable because the controller doesn't support the same number of queues. This is not the normal case, but does occur with the Chatham prototype board. Sponsored by: Intel	2013-03-28 16:54:19 +00:00
Jim Harris	bdd1fd402c	Fix printf format issue on i386. Reported by: bz	2013-03-27 00:37:00 +00:00
Jim Harris	547d523eb8	Clean up debug prints. 1) Consistently use device_printf. 2) Make dump_completion and dump_command into something more human-readable. Sponsored by: Intel Reviewed by: carl	2013-03-26 22:17:10 +00:00
Jim Harris	dd433dd0fb	Move common code from the different nvme_allocate_request functions into a separate function. Sponsored by: Intel Suggested by: carl Reviewed by: carl	2013-03-26 22:13:07 +00:00
Jim Harris	237d2019e5	Change a number of malloc(9) calls to use M_WAITOK instead of M_NOWAIT. Sponsored by: Intel Suggested by: carl Reviewed by: carl	2013-03-26 22:11:34 +00:00
Jim Harris	955910a916	Replace usages of mtx_pool_find used for admin commands with a polling mechanism. Now that all requests are timed, we are guaranteed to get a completion notification, even if it is an abort status due to a timed out admin command. This has the effect of simplifying the controller and namespace setup code, so that it reads straight through rather than broken up into a bunch of different callback functions. Sponsored by: Intel Reviewed by: carl	2013-03-26 22:09:51 +00:00
Jim Harris	43a3725688	Abort and do not retry any outstanding admin commands left over after a controller reset. Sponsored by: Intel Reviewed by: carl	2013-03-26 22:06:05 +00:00
Jim Harris	232e2edb6c	Add the ability to internally mark a controller as failed, if it is unable to start or reset. Also add a notifier for NVMe consumers for controller fail conditions and plumb this notifier for nvd(4) to destroy the associated GEOM disks when a failure occurs. This requires a bit of work to cover the races when a consumer is sending I/O requests to a controller that is transitioning to the failed state. To help cover this condition, add a task to defer completion of I/Os submitted to a failed controller, so that the consumer will still always receive its completions in a different context than the submission. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:58:38 +00:00
Jim Harris	3d7eb41c1b	Just disable the controller instead of deleting IO queues during detach. This is just as effective, and removes the need for a bunch of admin commands to a controller that's going to be disabled shortly anyways. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:48:41 +00:00
Jim Harris	74019d4b67	Set Pre-boot Software Load Count to 0 at the end of the controller start process. The spec indicates the OS driver should use Set Features (Software Progress Marker) to set the pre-boot software load count to 0 after the OS driver has successfully been initialized. This allows pre-boot software to determine if there have been any issues with the OS loading. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:42:53 +00:00
Jim Harris	be34f21609	Remove the is_started flag from struct nvme_controller. This flag was originally added to communicate to the sysctl code which oids should be built, but there are easier ways to do this. This needs to be cleaned up prior to adding new controller states - for example, controller failure. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:19:26 +00:00
Jim Harris	02e3348484	Ensure the controller's MDTS is accounted for in max_xfer_size. The controller's IDENTIFY data contains MDTS (Max Data Transfer Size) to allow the controller to specify the maximum I/O data transfer size. nvme(4) already provides a default maximum, but make sure it does not exceed what MDTS reports. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:16:53 +00:00
Jim Harris	cb5b7c1304	Cap the number of retry attempts to a configurable number. This ensures that if a specific I/O repeatedly times out, we don't retry it indefinitely. The default number of retries will be 4, but is adjusted using hw.nvme.retry_count. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:14:51 +00:00
Jim Harris	0d7e13ecb2	Pass associated log page data to async event consumers, if requested. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:08:32 +00:00
Jim Harris	2868353a57	When an asynchronous event request is completed, automatically fetch the specified log page. This satisfies the spec condition that future async events of the same type will not be sent until the associated log page is fetched. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:05:15 +00:00
Jim Harris	0692579bf3	Add structure definitions and controller command function for firmware log pages. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:03:03 +00:00
Jim Harris	0892778256	Add structure definitions and a controller command function for error log pages. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:01:53 +00:00
Jim Harris	cf81529ce3	Create struct nvme_status. NVMe error log entries include status, so breaking this out into its own data structure allows it to be included in both the nvme_completion data structure as well as error log entry data structures. While here, expose nvme_completion_is_error(), and change all of the places that were explicitly looking at sc/sct bits to use this macro instead. Sponsored by: Intel Reviewed by: carl	2013-03-26 21:00:18 +00:00
Jim Harris	f37c22a3bd	Make nvme_ctrlr_reset a nop if a reset is already in progress. This protects against cases where a controller crashes with multiple I/O outstanding, each timing out and requesting controller resets simultaneously. While here, remove a debugging printf from a previous commit, and add more logging around I/O that need to be resubmitted after a controller reset. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:56:58 +00:00
Jim Harris	48ce317898	By default, always escalate to controller reset when an I/O times out. While aborts are typically cleaner than a full controller reset, many times an I/O timeout indicates other controller-level issues where aborts may not work. NVMe drivers for other operating systems are also defaulting to controller reset rather than aborts for timed out I/O. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:32:57 +00:00
Jim Harris	941433323c	Add a tunable for the I/O timeout interval. Default is still 30 seconds, but can be adjusted between a min/max of 5 and 120 seconds. Sponsored by: Intel Reviewed by: carl	2013-03-26 20:02:35 +00:00
Jim Harris	12d191ec12	Add handling for controller fatal status (csts.cfs). On any I/O timeout, check for csts.cfs==1. If set, the controller is reporting fatal status and we reset the controller immediately, rather than trying to abort the timed out command. This changeset also includes deferring the controller start portion of the reset to a separate task. This ensures we are always performing a controller start operation from a consistent context. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:58:17 +00:00
Jim Harris	dbba74428b	Add API for nvme consumers to access controller and namespace identify data. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:52:57 +00:00
Jim Harris	b846efd7ec	Add controller reset capability to nvme(4) and ability to explicitly invoke it from nvmecontrol(8). Controller reset will be performed in cases where I/O are repeatedly timing out, the controller reports an unrecoverable condition, or when explicitly requested via IOCTL or an nvme consumer. Since the controller may be in such a state where it cannot even process queue deletion requests, we will perform a controller reset without trying to clean up anything on the controller first. Sponsored by: Intel Reviewed by: carl	2013-03-26 19:50:46 +00:00
Jim Harris	65c2474e6d	Keep a doubly-linked list of outstanding trackers. This enables in-order re-submission of I/O after a controller reset. Sponsored by: Intel	2013-03-26 18:45:16 +00:00
Jim Harris	5f1e251de6	Create a generic nvme_ctrlr_cmd_get_log_page function, and change the health information log page function to use it. Sponsored by: Intel	2013-03-26 18:43:53 +00:00
Jim Harris	99d99f7408	Expose the get/set features API to nvme consumers. Sponsored by: Intel	2013-03-26 18:42:05 +00:00
Jim Harris	038a5ee403	Add an interface for nvme shim drivers (i.e. nvd) to register for notifications when new nvme controllers are added to the system. Sponsored by: Intel	2013-03-26 18:39:54 +00:00
Jim Harris	0a0b08cc30	Enable asynchronous event requests on non-Chatham devices. Also add logic to clean up all outstanding asynchronous event requests when resetting or shutting down the controller, since these requests will not be explicitly completed by the controller itself. Sponsored by: Intel	2013-03-26 18:37:36 +00:00
Jim Harris	990e741c18	Move controller destruction code from nvme_detach() to new nvme_ctrlr_destruct() function. Sponsored by: Intel	2013-03-26 18:34:19 +00:00
Jim Harris	274b3a88fa	Specify command timeout interval on a per-command type basis. This is primarily driven by the need to disable timeouts for asynchronous event requests, which by nature should not be timed out. Sponsored by: Intel	2013-03-26 18:31:46 +00:00
Jim Harris	879de69910	Explicitly abort a timed out command, if the ABORT command sent to the controller indicates the command was not found. Sponsored by: Intel	2013-03-26 18:29:04 +00:00
Jim Harris	6cb0607039	Break out the code for completing an nvme_tracker object into a separate function. This allows for completions outside the normal completion path, for example when an ABORT command fails due to the controller reporting the targeted command does not exist. This is mainly for protection against a faulty controller, but we need to clean up our internal request nonetheless. Sponsored by: Intel	2013-03-26 18:27:22 +00:00
Jim Harris	448195e764	Add support for ABORT commands, including issuing these commands when an I/O times out. Also ensure that we retry commands that are aborted due to a timeout. Sponsored by: Intel	2013-03-26 18:23:35 +00:00
Jim Harris	d6f54866ea	Add an internal _nvme_qpair_submit_request function, which performs the submit action assuming the qpair lock has already been acquired. Also change nvme_qpair_submit_request to just lock/unlock the mutex around a call to this new function. This fixes a recursive mutex acquisition in the retry path. Sponsored by: Intel	2013-03-26 18:20:11 +00:00
Jim Harris	aaf6b84a4e	Make the DSM range count 0-based. Previously we were deallocating one more LBA than we should have been. Sponsored by: Intel	2013-03-26 18:16:30 +00:00

1 2 3 4 5 ...

278 commits