opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-04-21 14:17:06 -04:00

Author	SHA1	Message	Date
Mateusz Guzik	d87b31e159	nvme: clean up empty lines in .c and .h files	2020-09-01 22:03:10 +00:00
Alexander Motin	ead7e10308	Make polled request timeout less invasive. Instead of panic after one second of polling, make the normal timeout handler to activate, reset the controller and abort the outstanding requests. If all of it won't happen within 10 seconds then something in the driver is likely stuck bad and panic is the only way out. In particular this fixed device hot unplug during execution of those polled commands, allowing clean device detach instead of panic. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-06-18 19:16:03 +00:00
Alexander Motin	67abaee9fc	Add Host Memory Buffer support to nvme(4). This allows cheapest DRAM-less NVMe SSDs to use some of host RAM (about 1MB per 1GB on the devices I have) for its metadata cache, significantly improving random I/O performance. Device reports minimal and preferable size of the buffer. The code limits it to 1% of physical RAM by default. If the buffer can not be allocated or below minimal size, the device will just have to work without it. MFC after: 2 weeks Relnotes: yes Sponsored by: iXsystems, Inc.	2020-01-07 21:17:11 +00:00
Warner Losh	7588c6cc36	Move to using bool instead of boolean_t While there are subtle semantic differences between bool and boolean_t, none of them matter in these cases. Prefer true/false when dealing with bool type. Preserve a couple of TRUEs since they are passed into int args into CAM. Preserve a couple of FALSEs when used for status.done, an int. Differential Revision: https://reviews.freebsd.org/D20999	2019-12-13 18:35:48 +00:00
Alexander Motin	1eab19cbec	Make nvme(4) driver some more NUMA aware. - For each queue pair precalculate CPU and domain it is bound to. If queue pairs are not per-CPU, then use the domain of the device. - Allocate most of queue pair memory from the domain it is bound to. - Bind callouts to the same CPUs as queue pair to avoid migrations. - Do not assign queue pairs to each SMT thread. It just wasted resources and increased lock congestions. - Remove fixed multiplier of CPUs per queue pair, spread them even. This allows to use more queue pairs in some hardware configurations. - If queue pair serves multiple CPUs, bind different NVMe devices to different CPUs. MFC after: 1 month Sponsored by: iXsystems, Inc.	2019-09-23 17:53:47 +00:00
Warner Losh	f93b7f954e	Support doorbell strides != 0. The NVMe standard (1.4) states >>> 8.6 Doorbell Stride for Software Emulation >>> The doorbell stride,...is useful in software emulation of an NVM >>> Express controller. ... For hardware implementations of the NVM >>> Express interface, the expected doorbell stride value is 0h. However, hardware in the wild exists with a doorbell stride of 1 (meaning 8 byte separation). This change supports that hardware, as well as software emulators as envisioned in Section 8.6. Since this is the fast path, care has been taken to make this computation efficient. The bit of math to compute an offset for each is replaced by a memory load from cache of a pre-computed value. MFC After: 3 days Reviewed by: scottl@ Differential Revision: https://reviews.freebsd.org/D21514	2019-09-04 20:08:36 +00:00
Warner Losh	4d5475613e	Implement nvme suspend / resume for pci attachment When we suspend, we need to properly shutdown the NVME controller. The controller may go into D3 state (or may have the power removed), and to properly flush the metadata to non-volatile RAM, we must complete a normal shutdown. This consists of deleting the I/O queues and setting the shutodown bit. We have to do some extra stuff to make sure we reset the software state of the queues as well. On resume, we have to reset the card twice, for reasons described in the attach funcion. Once we've done that, we can restart the card. If any of this fails, we'll fail the NVMe card, just like we do when a reset fails. Set is_resetting for the duration of the suspend / resume. This keeps the reset taskqueue from running a concurrent reset, and also is needed to prevent any hw completions from queueing more I/O to the card. Pass resetting flag to nvme_ctrlr_start. It doesn't need to get that from the global state of the ctrlr. Wait for any pending reset to finish. All queued I/O will get sent to the hardware as part of nvme_ctrlr_start(), though the upper layers shouldn't send any down. Disabling the qpairs is the other failsafe to ensure all I/O is queued. Rename nvme_ctrlr_destory_qpairs to nvme_ctrlr_delete_qpairs to avoid confusion with all the other destroy functions. It just removes the queues in hardware, while the other _destroy_ functions tear down driver data structures. Split parts of the hardware reset function up so that I can do part of the reset in suspsend. Split out the software disabling of the qpairs into nvme_ctrlr_disable_qpairs. Finally, fix a couple of spelling errors in comments related to this. Relnotes: Yes MFC After: 1 week Reviewed by: scottl@ (prior version) Differential Revision: https://reviews.freebsd.org/D21493	2019-09-03 15:26:11 +00:00
Warner Losh	31b11bb3f2	In nvme_completion_poll, add a sanity check to make sure that we complete the polling within a second. Panic if we don't. All the commands that use this interface should typically complete within a few tens to hundreds of microseconds. Panic rather than return ETIMEDOUT because if the command somehow does later complete, it will randomly corrupt memory. Also, it helps to get a traceback from where the unexpected failure happens, rather than an infinite loop.	2019-09-02 17:11:32 +00:00
Warner Losh	ab0681aac9	In all the places that we use the polled for completion interface, except crash dump support code, move the while loop into an inline function. These aren't done in the fast path, so if the compiler choses to not inline, any performance hit is tiny.	2019-09-02 17:11:27 +00:00
Warner Losh	f182f928db	Separate the pci attachment from the rest of nvme Nvme drives can be attached in a number of different ways. Separate out the PCI attachment so that we can have other attachment types, like ahci and various types of NVMeoF. Submitted by: cognet@	2019-08-21 22:17:55 +00:00
Alexander Motin	97be8b969d	Report NOIOB and NPWG fields as stripe size. Namespace Optimal I/O Boundary field added in NVMe 1.3 and Namespace Preferred Write Granularity added in 1.4 allow upper layers to align I/Os for improved SSD performance and endurance. I don't have hardware reportig those yet, but NPWG could probably be reported by bhyve. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-08-14 16:12:03 +00:00
Warner Losh	5e83c2ffaa	Keep track of the number of commands that exhaust their retry limit. While we print failure messages on the console, sometimes logs are lost or overwhelmed. Keeping a count of how many times we've failed retriable commands helps get a magnitude of the problem.	2019-07-19 18:39:24 +00:00
Warner Losh	c37fc318c4	Keep track of the number of retried commands. Retried commands can indicate a performance degredation of an nvme drive. Keep track of the number of retries and report it out via sysctl, just like number of commands an interrupts.	2019-07-19 18:39:18 +00:00
Warner Losh	1071b50a65	Use sysctl + CTLRWTUN for hw.nvme.verbose_cmd_dump. Also convert it to a bool. While the rest of the driver isn't yet bool clean, this will help. Reviewed by: cem@ Differential Revision: https://reviews.freebsd.org/D20988	2019-07-19 00:32:56 +00:00
Warner Losh	c75bdc044d	Provide new tunable hw.nvme.verbose_cmd_dump The nvme drive dumps only the most relevant details about a command when it fails. However, there are times this is not sufficient (such as debugging weird issues for a new drive with a vendor). Setting hw.nvme.verbose_cmd_dump=1 in loader.conf will enable more complete debugging information about each command that fails. Reviewed by: rpokala Sponsored by: Netflix Differential Version: https://reviews.freebsd.org/D20988	2019-07-18 21:58:51 +00:00
Warner Losh	2ffd6fce5b	Don't print all the I/O we abort on a reset, unless we're out of retries. When resetting the controller, we abort I/O. Prior to this fix, we printed a ton of abort messages for I/O that we're going to retry. This imparts no useful information. Stop printing them unless our retry count is exhausted. Clarify code for when we don't retry, and remove useless arg to a routine that's always called with it as 'true'. All the other debug is still printed (including multiple reset messages if we have multiple timeouts before the taskqueue runs the actual reset) so that we know when we reset. Reviewed by: jimharris@, chuck@ Differential Revision: https://reviews.freebsd.org/D19431	2019-03-09 01:18:16 +00:00
Warner Losh	45d7e233a5	Unconditionally support unmapped BIOs. This was another shim for supporting older kernels. However, all supported versions of FreeBSD have unmapped I/Os (as do several that have gone EOL), remove it. It's unlikely the driver would work on the older kernels anyway at this point.	2019-02-27 22:16:59 +00:00
Warner Losh	d706306d49	Remove #ifdef code to support FreeBSD versions that haven't been supported in years. A number of changes have been made to the driver that likely wouldn't work on those older versions that aren't properly ifdef'd and it's project policy to GC such code once it is stale.	2019-02-27 22:05:01 +00:00
Warner Losh	09efa3dfb2	Put a workaround in for command timeout malfunctioning At least one NVMe drive has a bug that makeing the Command Time Out PCIe feature unreliable. The workaround is to disable this feature. The driver wouldn't deal correctly with a timeout anyway. Only do this for drives that are known bad. Sponsored by: Netflix, Inc Differential Revision: https://reviews.freebsd.org/D17708	2018-10-26 14:27:37 +00:00
Alexander Motin	f439e3a4ff	Refactor NVMe CAM integration. - Remove layering violation, when NVMe SIM code accessed CAM internal device structures to set pointers on controller and namespace data. Instead make NVMe XPT probe fetch the data directly from hardware. - Cleanup NVMe SIM code, fixing support for multiple namespaces per controller (reporting them as LUNs) and adding controller detach support and run-time namespace change notifications. - Add initial support for namespace change async events. So far only in CAM mode, but it allows run-time namespace arrival and departure. - Add missing nvme_notify_fail_consumers() call on controller detach. Together with previous changes this allows NVMe device detach/unplug. Non-CAM mode still requires a lot of love to stay on par, but at least CAM mode code should not stay in the way so much, becoming much more self-sufficient. Reviewed by: imp MFC after: 1 month Sponsored by: iXsystems, Inc.	2018-05-25 03:34:33 +00:00
Warner Losh	d85d964829	Try polling the qpairs on timeout. On some systems, we're getting timeouts when we use multiple queues on drives that work perfectly well on other systems. On a hunch, Jim Harris suggested I poll the completion queue when we get a timeout. This patch polls the completion queue if no fatal status was indicated. If it had pending I/O, we complete that request and return. Otherwise, if aborts are enabled and no fatal status, we abort the command and return. Otherwise we reset the card. This may clear up the problem, or we may see it result in lots of timeouts and a performance problem. Either way, we'll know the next step. We may also need to pay attention to the fatal status bit of the controller. PR: 211713 Suggested by: Jim Harris Sponsored by: Netflix	2018-03-16 05:23:48 +00:00
Wojciech Macek	0d787e9b35	NVMe: Add big-endian support Remove bitfields from defined structures as they are not portable. Instead use shift and mask macros in the driver and nvmecontrol application. NVMe is now working on powerpc64 host. Submitted by: Michal Stanek <mst@semihalf.com> Obtained from: Semihalf Reviewed by: imp, wma Sponsored by: IBM, QCM Technologies Differential revision: https://reviews.freebsd.org/D13916	2018-02-22 13:32:31 +00:00
Warner Losh	29077eb456	Use atomic load and stores to ensure that the compiler doesn't optimize away these loops. Change boolean to int to match what atomic API supplies. Remove wmb() since the atomic_store_rel() on status.done ensure the prior writes to status. It also fixes the fact that there wasn't a rmb() before reading done. This should also be more efficient since wmb() is fairly heavy weight. Sponsored by: Netflix Reviewed by: kib@, jim harris Differential Revision: https://reviews.freebsd.org/D14053	2018-01-29 00:00:52 +00:00
Warner Losh	ce1ec9c178	When we're disabling the nvme device, some drives have a controller bug that requires 'hands off' for a period of time (2.3s) before we check the RDY bit. Sicne this is a very odd quirk for a very limited selection of drives, do this as a quirk. This prevented a successful reset of the card when the card wedged. Also, make sure that we comply with the advice from section 3.1.5 of the 1.3 spec says that transitioning CC.EN from 0 to 1 when CSTS.RDY is 1 or transitioning CC.EN from 1 to 0 when CSTS.RDY is 0 "has undefined results". Short circuit when EN == RDY == desired state. Finally, fail the reset if the disable fails. This will lead to a failed device, which is what we want. (note: nda device needs work for coping with a failed device). Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D13389	2017-12-18 18:38:00 +00:00
Pedro F. Giffuni	718cf2ccb9	sys/dev: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 14:52:40 +00:00
Warner Losh	bb1c7be429	Create general polling function for the nvme controller. Use it when we're doing the various pin-based interrupt modes. Adjust nvme_ctrlr_intx_handler to use nvme_ctrlr_poll. Sponsored by: Netflix Suggested by: scottl@	2017-10-15 16:18:08 +00:00
Warner Losh	519772814d	Add CAM/NVMe support for CAM_DATA_SG This adds support in pass(4) for data to be described with a scatter-gather list (sglist) to augment the existing (single) virtual address. Differential Revision: https://reviews.freebsd.org/D11361 Submitted by: Chuck Tuffli Reviewed by: imp@, scottl@, kenm@	2017-08-29 15:29:57 +00:00
Warner Losh	c02565f9fa	Set the max transactions for NVMe drives better. Provided a better estimate for the number of transactions that can be pending at one time. This will be number of queues * number of trackers / 4, as suggested by Jim Harris. This gives a better estimate of the number of transactions that CAM should queue before applying back pressure. This should be revisted when we have real multi-queue support in CAM and the upper layers of the I/O stack. Sponsored by: Netflix	2017-08-28 23:54:20 +00:00
Warner Losh	696c950297	NVME Namespace ID is 32-bits, so widen interface to reflect that. Sponsored by: Netflix	2017-08-25 21:38:38 +00:00
Scott Long	a965389b5a	Convert the Q-Pair and PRP list memory allocations to use BUSDMA. Add a bunch of safery belts and error handling in related codepaths. Reviewed by: jimharris Obtained from: Netflix Differential Revision: D8453	2016-11-08 00:24:49 +00:00
Warner Losh	3a31c31c22	Actually import nvme_sim so the CAM attachment for NVME (nda) actually works. MFC after: 1 week	2016-07-21 03:11:39 +00:00
Warner Losh	f24c011beb	Commit the bits of nda that were missed. This should fix the build. Approved by: re@	2016-06-10 06:04:53 +00:00
Jim Harris	2b647da7a0	nvme: do not revert o single I/O queue when per-CPU queues not possible Previously nvme(4) would revert to a signle I/O queue if it could not allocate enought interrupt vectors or NVMe submission/completion queues to have one I/O queue per core. This patch determines how to utilize a smaller number of available interrupt vectors, and assigns (as closely as possible) an equal number of cores to each associated I/O queue. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:18:32 +00:00
Jim Harris	e5af5854ff	nvme: do not pre-allocate MSI-X IRQ resources The issue referenced here was resolved by other changes in recent commits, so this code is no longer needed. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:11:31 +00:00
Jim Harris	c75ad8ce5a	nvme: remove per_cpu_io_queues from struct nvme_controller Instead just use num_io_queues to make this determination. This prepares for some future changes enabling use of multiple queues when we do not have enough queues or MSI-X vectors for one queue per CPU. MFC after: 3 days Sponsored by: Intel	2016-01-07 16:09:56 +00:00
Jim Harris	36b0e4ee1f	nvme: remove CHATHAM related code Chatham was an internal NVMe prototype board used for early driver development. MFC after: 1 week Sponsored by: Intel	2015-04-08 21:52:06 +00:00
Jim Harris	a6e3096392	nvme: create separate DMA tag for non-payload DMA buffers Submission and completion queue memory need to use a separate DMA tag for mappings than payload buffers, to ensure mappings remain contiguous even with DMAR enabled. Submitted by: kib MFC after: 1 week Sponsored by: Intel	2015-04-08 21:49:45 +00:00
Jim Harris	f42ca756b9	nvme: Allocate all MSI resources up front so that we can fall back to INTx if necessary. Sponsored by: Intel MFC after: 3 days	2014-03-18 18:10:35 +00:00
Jim Harris	496a27520d	nvme: Close hole where nvd(4) would not be notified of all nvme(4) instances if modules loaded during boot. Sponsored by: Intel MFC after: 3 days	2014-03-18 18:09:08 +00:00
Jim Harris	bb2f67fd72	Log and then disable asynchronous notification of persistent events after they occur. This prevents repeated notifications of the same event. Status of these events may be viewed at any time by viewing the SMART/Health Info Page using nvmecontrol, whether or not asynchronous events notifications for those events are enabled. This log page can be viewed using: nvmecontrol logpage -p 2 <ctrlr id> Future enhancements may re-enable these notifications on a periodic basis so that if the notified condition persists, it will continue to be logged. Sponsored by: Intel Reviewed by: carl Approved by: re (hrs) MFC after: 1 week	2013-10-08 16:00:12 +00:00
Jim Harris	a40e72a695	Add driver-assisted striping for upcoming Intel NVMe controllers that can benefit from it. Sponsored by: Intel Reviewed by: kib (earlier version), carl Approved by: re (hrs) MFC after: 1 week	2013-10-08 15:44:04 +00:00
Jim Harris	56183abc2b	Send a shutdown notification in the driver unload path, to ensure notification gets sent in cases where system shuts down with driver unloaded. Sponsored by: Intel Reviewed by: carl MFC after: 3 days	2013-08-13 21:47:08 +00:00
Jim Harris	bd6b0ac5be	Add comment explaining why CACHE_LINE_SIZE is defined in nvme_private.h if not already defined elsewhere. Requested by: attilio MFC after: 3 days	2013-07-09 21:24:19 +00:00
Jim Harris	e9efbc134f	Update copyright dates. MFC after: 3 days	2013-07-09 21:22:17 +00:00
Jim Harris	bbd412dd05	Remove remaining uio-related code. The nvme_physio() function was removed quite a while ago, which was the only user of this uio-related code. Sponsored by: Intel MFC after: 3 days	2013-06-26 23:37:11 +00:00
Jim Harris	8d09e3c400	Use MAXPHYS to specify the maximum I/O size for nvme(4). Also allow admin commands to transfer up to this maximum I/O size, rather than the artificial limit previously imposed. The larger I/O size is very beneficial for upcoming firmware download support. This has the added benefit of simplifying the code since both admin and I/O commands now use the same maximum I/O size. Sponsored by: Intel MFC after: 3 days	2013-06-26 23:27:17 +00:00
Jim Harris	ca269f32ef	Move the busdma mapping functions to nvme_qpair.c. This removes nvme_uio.c completely. Sponsored by: Intel	2013-04-12 17:48:45 +00:00
Jim Harris	97fafe2580	Add a mutex to each namespace, for general locking operations on the namespace. Sponsored by: Intel	2013-04-12 17:41:24 +00:00
Jim Harris	a90b810492	Rename the controller's fail_req_lock, so that it can be used for other locking operations on the controller. Sponsored by: Intel	2013-04-12 17:36:48 +00:00
Jim Harris	5fdf9c3c8e	Add unmapped bio support to nvme(4) and nvd(4). Sponsored by: Intel	2013-04-01 16:23:34 +00:00

1 2

87 commits