nvme: Warn if there's system interrupt issues.

Issue a warning if we have system interrupt issues. If you get this
warning, then we submitted a request, it timed out without an interrupt
being posted, but when we polled the card's completion, we found
completion events. This indicates that we're missing interrupts, and to
date all the times I've helped people track issues like this down it has
been a system issue, not an NVMe driver isseue.

Sponsored by:		Netflix
Reviewed by:		gallatin
Differential Revision:	https://reviews.freebsd.org/D46031
This commit is contained in:
Warner Losh 2024-07-23 17:02:33 -06:00
parent aa41354349
commit bb7f7d5b52
3 changed files with 17 additions and 2 deletions

View file

@ -239,6 +239,15 @@ detects that the AHCI device supports RST and when it is enabled.
See
.Xr ahci 4
for more details.
.Sh DIAGNOSTICS
.Bl -diag
.It "nvme%d: System interrupt issues?"
The driver found a timed-out transaction had a pending completion record,
indicating an interrupt had not been delivered.
The system is either not configuring interrupts properly, or the system drops
them under load.
This message will appear at most once per boot per controller.
.El
.Sh SEE ALSO
.Xr nda 4 ,
.Xr nvd 4 ,

View file

@ -303,6 +303,7 @@ struct nvme_controller {
bool is_failed;
bool is_dying;
bool isr_warned;
STAILQ_HEAD(, nvme_request) fail_req;
/* Host Memory Buffer */

View file

@ -1145,9 +1145,14 @@ do_reset:
/*
* There's a stale transaction at the start of the queue whose
* deadline has passed. Poll the competions as a last-ditch
* effort in case an interrupt has been missed.
* effort in case an interrupt has been missed. Warn the user if
* transactions were found of possible interrupt issues, but
* just once per controller.
*/
_nvme_qpair_process_completions(qpair);
if (_nvme_qpair_process_completions(qpair) && !ctrlr->isr_warned) {
nvme_printf(ctrlr, "System interrupt issues?\n");
ctrlr->isr_warned = true;
}
/*
* Now that we've run the ISR, re-rheck to see if there's any