opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-06-11 09:41:03 -04:00

Author	SHA1	Message	Date
Gordon Bergling	6e8ab6715d	nvmw(4): Fix a typo in a source code comment - s/inaccessable/inaccessible/ MFC after: 3 days	2022-06-04 11:46:03 +02:00
Warner Losh	3740a8db13	nvme: Further refinements in Host Memory Buffer Sizing Host Memory Buffer units are a mix. For those in the identify structure, the size is in 4kiB chunks. For specifying the buffer description, though, they are in terms of the drive's MPS. Add comments to this effect and change PAGE_SIZE to ctrlr->page_size where needed, as well as correct a mistaken use of NVME_HPS_UNITS in `214df80a9c` as pointed out by rpokala@ after the commit. No functional change is intended, as page_size is still 4k which matches all current hosts' PAGE_SIZE, but to support 16k pages on arm, we need to differentiate these two cases. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34871	2022-04-15 14:46:19 -06:00
Warner Losh	3086efe895	nvme: Remove NVME_MAX_XFER_SIZE, replace inline calculation NVME_MAX_XFER_SIZE used to be a constant (back when MAXPHYS was a constant) to denote the smaller of MAXPHYS or the largest PRP we could encode with our prealloation scheme. However, it's no longer constant since MAXPHYS varies at runtime. In addition, the actual maximum is now based on the drive's currently in use page_size, which is also a runtime expression. As such, remove the define and expand it inline in the one place its used still in the tree. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D34870	2022-04-15 14:46:18 -06:00
Warner Losh	3a468f2010	nvme: Use saved mps when initializing drive Make sure we set the MPS we cached (currently the drives minimum mps) in CC (Controller Configuration) when reinitializing the drive. It must match the page_size that we're going to use. Also retire less specific NVME_PAGE_SHIFT since it's now unused. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D34869	2022-04-15 14:46:18 -06:00
Warner Losh	55412ef90a	nvme: Rename min_page_size to page_size and save mps The Memory Page Size sets the basic unit of operation for the drive. We currently set this to the drive's minimum page size, but we could set it to any page size the drive supports in the future. Replace min_page_size (it's now unused for that purpose) with page_size to reflect this and cache the MPS we want to use. Use NVME_MPS_SHIFT to compute page_size. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D34868	2022-04-15 14:46:18 -06:00
Warner Losh	6e3deec8ca	nvme: Base maximum data transfer size directly on MPSMIN in cap_hi Calculate the maxmimum transfer size based on the MPSMIN we have in our cached copy of cap_hi rather than using min_page_size in the controller. Sponsored by: Netflix Reviewed by: chuck Differential Revision: https://reviews.freebsd.org/D34867	2022-04-15 14:46:18 -06:00
Warner Losh	214df80a9c	nvme: new define for size of host memory buffer sizes The nvme spec defines the various fields that specify sizes for host memory buffers in terms of 4096 chunks. So, rather than use a bare 4096 here, use NVME_HMB_UNITS. This is explicitly not the host page size of 4096, nor the default memory page size (mps) of the NVMe drive, but its own thing and needs its own define. No functional change is intended, only the logical spelling of 4k. Sponsored by: Netflix	2022-04-08 23:05:25 -06:00
Warner Losh	6af6a52ee4	nvme: Save cap_lo and cap_hi Save the capabilities for the drive. Sponsored by: Netflix	2022-03-31 21:12:38 -06:00
Warner Losh	a70b5660f3	nvme: MPS is a power of two, not a size / 8k Setting MPS in the CC should be a power of 2 number (it specifies the page size of the host is 2^(12+MPS)), so adjust the calcuation. There is no functional change because we do not support any architecutres != 4k pages (yet). Other changes are needed for architectures with 16k or 64k pages, especially when the underlying NVMe drive doesn't support that page size (Most drives support a range that's small, and many only support 4k), but let's at least do this calculation correctly. 12 - 12 is just as much 0 as 4096 >> 13 is :) Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D34707	2022-03-31 21:12:38 -06:00
Warner Losh	83581511d9	nvme: Use adaptive spinning when polling for completion or state change We only use nvme_completion_poll in the initialization path. The commands they queue and wait for finish quickly as they involve no I/O to the drive's media. These command take about 20-200 microsecnds each. Set the wait time to 1us and then increase it by 1.5 each successive iteration (max 1ms). This reduces initialization time by 80ms in cpervica's tests. Use this same technique waiting for RDY state transitions. This saves another 20ms. In total we're down from ~330ms to ~2ms. Tested by: cperciva Sponsored by: Netflix Reviewed by: mav Differential Review: https://reviews.freebsd.org/D32259	2021-10-01 19:17:55 -06:00
Warner Losh	4b3da659bf	nvme: Only reset once on attach. The FreeBSD nvme driver has reset the nvme controller twice on attach to address a theoretical issue assuring the hardware is in a known state. However, exierence has shown the second reset is unnecessary and increases the time to boot. Eliminate the second reset. Should there be a situation when you need a second reset (for buggy or at least somewhat out of the mainstream hardware), the hardware option NVME_2X_RESET will restore the old behavior. Document this in nvme(4). If there's any trouble at all with this, I'll add a sysctl tunable to control it. Sponsored by: Netflix Reviewed by: cperciva, mav Differential Revision: https://reviews.freebsd.org/D32241	2021-10-01 11:09:34 -06:00
Warner Losh	e5e26e4a24	nvme: Remove pause while resetting After some study of the code and the standard, I think we can just drop the pause(), unconditionally. If we're not initialized, then there's nothing to wait for from a software perspective. If we are initialized, then there might be outstanding I/O. If so, then the qpair 'recovery state' will transition to WAITING in nvme_ctrlr_disable_qpairs, which will ignore any interrupts for items that complete before we complete the reset by setting cc.en=0. If we go on to fail the controller, we'll cancel the outstanding I/O transactions. If we reset the controller, the hardware throws away pending transactions and we retry all the pending I/O transactions. Any transactions that happend to complete before cc.en=0 will have the same effect in the end (doing the same transaction twice is just inefficient, it won't affect the state of the device any differently than having done it once). The standard imposes no wait times here, so it isn't needed from that perspective. Unanswered Question: Do we may need to disable interrupts while we disable in legacy mode since those are level-sensitive. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D32248	2021-10-01 11:09:05 -06:00
Warner Losh	77054a897f	nvme: Explain a workaround a little better The don't touch the mmio of the drive after we do a EN 1->0 transition is only for a tiny number of dirves that have this unforunate issue. Sponsored by: Netflix	2021-10-01 10:56:10 -06:00
Warner Losh	a245627a4e	nvme_ctrlr_enable: Small style nits Rewrite the nested if's using the preferred FreeBSD style for branches of ifs that return. NFC. Minor tweaks to the comments to better fit new code layout. Sponsored by: Netflix Reviewed by: mav, chuck (prior rev, but comments rolled in) Differential Revision: https://reviews.freebsd.org/D32245	2021-10-01 10:56:10 -06:00
Warner Losh	26259f6ab9	nvme: Use MS_2_TICKS rather than rolling our own Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D32246	2021-10-01 10:56:10 -06:00
Warner Losh	d5fca1dc1d	nvme_ctrlr_enable: Remove unnecessary 5ms delays Remove the 5ms delays after writing the administrative queue registers. These delays are from the very earliest days of the driver (they are in the first commit) and were most likely vestiges of the Chatham NVMe prototype card that was used to create this driver. Many of the workarounds necessary for it aren't necessary for standards compliant cards. The original driver had other areas marked for Chatham, but these were not. They are unneeded. There's three lines of supporting evidence. First, the NVMe standards make no mention of a delay time after these registers are written. Second, the Linux driver doesn't have them, even as an option. Third, all my nvme cards work w/o them. To be safe, add a write barrier between setting up the admin queue and enabling the controller. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D32247	2021-10-01 10:56:10 -06:00
Warner Losh	502dc84a8b	nvme: Use shared timeout rather than timeout per transaction Keep track of the approximate time commands are 'due' and the next deadline for a command. twice a second, wake up to see if any commands have entered timeout. If so, quiessce and then enter a recovery mode half the timeout further in the future to allow the ISR to complete. Once we exit recovery mode, we go back to operations as normal. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D28583	2021-09-23 16:42:08 -06:00
Colin Percival	bad42df9bf	Add some nvme initialization routines to TSLOG About 335 ms of EC2 instance boot time is being spent here.	2021-09-05 12:48:43 -07:00
Alexander Motin	e3bdf3da76	nvme(4): Add MSI and single MSI-X support. If we can't allocate more MSI-X vectors, accept using single shared. If we can't allocate any MSI-X, try to allocate 2 MSI vectors, but accept single shared. If still no luck, fall back to shared INTx. This provides maximal flexibility in some limited scenarios. For example, vmd(4) does not support INTx and can handle only limited number of MSI/MSI-X vectors without sharing. MFC after: 1 week	2021-08-31 13:45:46 -04:00
Alexander Motin	31111372e6	nvme(4): Do not panic on admin queue construct error. MFC after: 1 week	2021-08-30 20:38:23 -04:00
Warner Losh	f0f4712165	nvme: fix a race between failing the controller and failing requests Part of the nvme recovery process for errors is to reset the card. Sometimes, this results in failing the entire controller. When nda is in use, we free the sim, which will sleep until all the I/O has completed. However, with only one thread, the request fail task never runs once the reset thread sleeps here. Create two threads to allow I/O to fail until it's all processed and the reset task can proceed. This is a temporary kludge until I can work out questions that arose during the review, not least is what was the race that queueing to a failure task solved. The original commit is vague and other error paths in the same context do a direct failure. I'll investigate that more completely before committing changing that to a direct failure. mav@ raised this issue during the review, but didn't otherwise object. Multiple threads, though, solve the problem in the mean time until other such means can be perfected. Reviewed by: jhb@ Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D30366	2021-05-28 23:05:40 -06:00
Alexander Motin	4fbbe52365	nvme: Replace potentially long DELAY() with pause(). In some cases like broken hardware nvme(4) may wait minutes for controller response before timeout. Doing so in a tight spin loop made whole system unresponsive. Reviewed by: imp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29309 Sponsored by: iXsystems, Inc.	2021-03-17 10:35:49 -04:00
Warner Losh	8423f5d4c1	nvme: use config_intrhook_drain to avoid removable card races nvme drives are configured early in boot. However, a number of the configuration steps takes which take a while, so we defer those to a config intrhook that runs before the root filesystem is mounted. At the same time, the PCI hot plug wakes up and tests the status of the card. It may decide that the card has gone away and deletes the child. As part of that process nvme_detach is called. If this call happens after the config_intrhook starts to run, but before it is finished, there's a race where we can tear down the device's soft state while the config_intrhook is still using it. Use the new config_intrhook_drain to disestablish the hook. Either it will be removed w/o running, or the routine will wait for it to finish. This closes the race and allows safe hotplug at any time, even very early in boot. Sponsored by: Netflix, Inc Reviewed by: jhb, mav Differential Revision: https://reviews.freebsd.org/D29006	2021-03-11 09:45:10 -07:00
Warner Losh	dd2516fc07	nvme: Make nvme_ctrlr_hw_reset static nvme_ctrlr_hw_reset is no longer used outside of nvme_ctrlr.c, so make it static. If we need to change this in the future we can.	2021-02-08 13:29:24 -07:00
Warner Losh	9600aa31aa	nvme: use NVME_GONE rather than hard-coded 0xffffffff Make it clearer that the value 0xfffffff is being used to detect the device is gone. We use it other places in the driver for other meanings.	2021-02-08 13:08:48 -07:00
Alexander Motin	1770bae5f8	Remove aligment requirements for passthrough buffer. After r368124 vmapbuf() should happily map misaligned maxphys-sized buffers thanks to extra page added to pbuf_zone.	2020-11-29 00:57:19 +00:00
Alexander Motin	ac90f70d1e	Increase nvme(4) maximum transfer size from 1MB to 2MB. With 4KB page size the 2MB is the maximum we can address with one page PRP. Going further would require chaining, that would add some more complexity. On the other side, to reduce memory consumption, allocate the PRP memory respecting maximum transfer size reported in the controller identify data. Many of NVMe devices support much smaller values, starting from 128KB. To do that we have to change the initialization sequence to pull the data earlier, before setting up the I/O queue pairs. The admin queue pair is still allocated for full MIN(maxphys, 2MB) size, but it is not a big deal, since there is only one such queue with only 16 trackers. Reviewed by: imp MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2020-11-29 00:20:31 +00:00
Konstantin Belousov	cd85379104	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225	2020-11-28 12:12:51 +00:00
Alexander Motin	0bed3eabc5	Add PMRCAP printing and fix earlier CAP_HI. MFC after: 3 days	2020-11-14 01:45:34 +00:00
Alexander Motin	46fbd8004f	Fix panic if NVMe is detached before the intrhook call. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-11-12 20:20:43 +00:00
Alexander Motin	c44441f8fd	Print NVMe controller capabilities in verbose dmesg. Those values are not reported in controller identification, while sometimes interesting for development and debugging. MFC after: 1 week	2020-10-28 15:43:29 +00:00
Brooks Davis	44ca4575ea	vmapbuf: don't smuggle address or length in buf Instead, add arguments to vmapbuf. Since this argument is always a pointer use a type of void * and cast to vm_offset_t in vmapbuf. (In CheriBSD we've altered vm_fault_quick_hold_pages to take a pointer and check its bounds.) In no other situtation does b_data contain a user pointer and vmapbuf replaces b_data with the actual mapping. Suggested by: jhb Reviewed by: imp, jhb Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26784	2020-10-21 16:00:15 +00:00
Alexander Motin	915f019715	Use RTD3 Entry Latency value as shutdown timeout. This field was not in specs when the driver was written, but now there are SSDs with the reported latency of 10s, where hardcoded value of 5s seems to be not enough sometimes, causing shutdown timeout messages. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-10-14 15:50:28 +00:00
David Bright	e32d47f32d	Add an ioctl to get an NVMe device's maximum transfer size Reviewed by: imp, chuck Obtained from: Dell EMC Isilon MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26390	2020-09-21 15:41:47 +00:00
Mateusz Guzik	d87b31e159	nvme: clean up empty lines in .c and .h files	2020-09-01 22:03:10 +00:00
Warner Losh	881534f09c	Use symbolic names for asych events Rather than \|= 0x300, define and use asyn event names for the name space changes and the firmware activations that we're asking for.	2020-08-31 19:38:03 +00:00
Alexander Motin	701267ad19	Fix few panics on NVMe's timing out initialization requests. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-06-25 20:29:29 +00:00
Alexander Motin	ead7e10308	Make polled request timeout less invasive. Instead of panic after one second of polling, make the normal timeout handler to activate, reset the controller and abort the outstanding requests. If all of it won't happen within 10 seconds then something in the driver is likely stuck bad and panic is the only way out. In particular this fixed device hot unplug during execution of those polled commands, allowing clean device detach instead of panic. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-06-18 19:16:03 +00:00
Alexander Motin	550d5d64fe	Fix admin qpair leak if detached during initial reset. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-06-17 17:51:40 +00:00
Alexander Motin	92390644e3	Fix config_intrhook leak on initial reset failure. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-06-12 14:14:01 +00:00
David Bright	4053f8ac4d	Fix various Coverity-detected errors in nvme driver This fixes several Coverity-detected errors in the nvme driver. CIDs addressed: 1008344, 1009377, 1009380, 1193740, 1305470, 1403975, 1403980 Reviewed by: imp@, vangyzen@ MFC after: 5 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D24532	2020-05-02 20:47:58 +00:00
Warner Losh	4e6a434b6b	Make sure that we get the sbuf resources we need. Since we're calling sbuf_new with NOWAIT, make sure it can allocate a buffer to use. Don't print anything if we can't get it. Noticed by: rpokala	2020-04-30 00:43:11 +00:00
Warner Losh	244b805397	Generate a devctl event for interesting events When we reset the controller, and when the controller tells us about a critical warning, send an event.	2020-04-30 00:27:19 +00:00
Alexander Motin	b2cdfb72f4	Fix copy-paste bug in HMB free code. MFC after: 2 weeks X-MFC-with: r356474	2020-01-08 18:26:23 +00:00
Alexander Motin	6de4e458fa	Minor adjustments to r356474 and r356480. Reported by: jkim, imp MFC after: 2 weeks X-MFC-with: r356474	2020-01-07 23:29:54 +00:00
Alexander Motin	1c7dd40e58	Increate HMB limit from 1% to 5%. SSD capacity in laptops is growing faster then RAM size, so my original guess seems too low on second thought. Hopefully nobody will build large array of those crappy SSDs. MFC after: 2 weeks X-MFC-with: 356474	2020-01-07 23:10:38 +00:00
Alexander Motin	67abaee9fc	Add Host Memory Buffer support to nvme(4). This allows cheapest DRAM-less NVMe SSDs to use some of host RAM (about 1MB per 1GB on the devices I have) for its metadata cache, significantly improving random I/O performance. Device reports minimal and preferable size of the buffer. The code limits it to 1% of physical RAM by default. If the buffer can not be allocated or below minimal size, the device will just have to work without it. MFC after: 2 weeks Relnotes: yes Sponsored by: iXsystems, Inc.	2020-01-07 21:17:11 +00:00
Warner Losh	7588c6cc36	Move to using bool instead of boolean_t While there are subtle semantic differences between bool and boolean_t, none of them matter in these cases. Prefer true/false when dealing with bool type. Preserve a couple of TRUEs since they are passed into int args into CAM. Preserve a couple of FALSEs when used for status.done, an int. Differential Revision: https://reviews.freebsd.org/D20999	2019-12-13 18:35:48 +00:00
Warner Losh	66e5985084	Move reset to the interrutp processing stage This trims the boot time a bit more for AWS and other platforms that have nvme drives. There's no reason too do this inline. This has been in my tree a while, but IIRC I talked to Jim Harris about this at one of our face to face meetings. MFC After: 2 weeks	2019-12-11 22:51:02 +00:00
Alexander Motin	1eab19cbec	Make nvme(4) driver some more NUMA aware. - For each queue pair precalculate CPU and domain it is bound to. If queue pairs are not per-CPU, then use the domain of the device. - Allocate most of queue pair memory from the domain it is bound to. - Bind callouts to the same CPUs as queue pair to avoid migrations. - Do not assign queue pairs to each SMT thread. It just wasted resources and increased lock congestions. - Remove fixed multiplier of CPUs per queue pair, spread them even. This allows to use more queue pairs in some hardware configurations. - If queue pair serves multiple CPUs, bind different NVMe devices to different CPUs. MFC after: 1 month Sponsored by: iXsystems, Inc.	2019-09-23 17:53:47 +00:00

1 2 3 4

151 commits