Chain frames required to satisfy all 2K of declared I/Os of 128KB each take
more then a megabyte of a physical memory, all of which existing code tries
allocate as physically contiguous. This patch removes that physical
contiguousness requirement, leaving only virtual contiguousness. I was
thinking about other ways of allocation, but the less granular allocation
becomes, the bigger is the overhead and/or complexity, reaching about 100%
overhead if allocate each frame separately.
The patch also bumps the chain frames hard limit from 2K to 16K. It is more
than enough for the case of default REQ_FRAMES and MAXPHYS (the drivers will
allocate less than that automatically), while in case of increased MAXPHYS
it will control maximal memory usage.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14420
This is a first part of the change. It makes the drivers to calculate
the required number of chain frames to satisfy worst case scenarios, but
it does not change existing overly strict limits on them. The next step
will be to rewrite the allocator to not require megabytes of physically
contiguous address space, that may be problematic if done after boot,
after doing which the limits can be removed. Until that this code can
just correct user set limits, if they are set too high.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14261
a bit in the normal operation of the driver. Covert it to represent bytes
instead of 32bit words. Fix what I believe to be is a bug in this respect
with the Tri-mode cards.
Sponsored by: Netflix
Both drivers were found to report CAM bigger queue depth then they really
can handle. It made them later under high load with many disks return
some of submitted requests back with CAM_REQUEUE_REQ status for later
resubmission.
Reviewed by: scottl
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D14215
In mp{r,s}_diag_register(), which is used to register diagnostic
buffers with the mp{r,s}(4) firmware, we allocate DMAable memory.
There were several issues here:
o No checking of the bus_dmamap_load() return value. If the load
failed or got deferred, mp{r,s}_diag_register() continued on as if
nothing had happened. We now check the return value and bail
out if it fails.
o No waiting for a deferred load callback. bus_dmamap_load()
calls a supplied callback when the mapping is done. This is
generally done immediately, but it can be deferred.
mp{r,s}_diag_register() did not check to see whether the callback
was already done before proceeding on. We now sleep until the
callback is done if it is deferred.
o No call to bus_dmamap_sync(... BUS_DMASYNC_PREREAD) after the
memory is allocated and loaded. This is necessary on some
platforms to synchronize host memory that is going to be updated
by a device.
Both drivers would also panic if the firmware was reinitialized while
a diagnostic buffer operation was in progress. This fixes that problem
as well. (The driver will reinitialize the firmware in various
circumstances, but the problem I ran into was that the firmware would
generate an IOC Fault due to a PCIe error.)
mp{r,s}var.h:
Add a new structure, struct mpr_busdma_context, that is
used for deferred busdma load callbacks.
Add a prototype for mp{r,s}_memaddr_wait_cb().
mp{r,s}.c:
Add a new busdma callback function, mp{r,s}_memaddr_wait_cb().
This provides synchronization for callers that want to
wait on a deferred bus_dmamap_load() callback.
mp{r,s}_user.c:
In bus_dmamap_register(), add a call to bus_dmamap_sync()
with the BUS_DMASYNC_PREREAD flag set after an allocation
is loaded.
Also, check the return value of bus_dmamap_load(). If it
fails, bail out. If it is EINPROGRESS, wait for the
callback to happen. We use an interruptible sleep (msleep
with PCATCH) and let the callback clean things up if we get
interrupted.
In mpr_diag_read_buffer() and mps_diag_read_buffer(), call
bus_dmamap_sync(..., BUS_DMASYNC_POSTREAD) before copying
the data out to make sure the data is in stable storage.
In mp{r,s}_post_fw_diag_buffer() and
mp{r,s}_release_fw_diag_buffer(), check the reply to see
whether it is NULL. It can be NULL (and the command non-NULL)
if the controller gets reinitialized while we're waiting for
the command to complete but the driver structures aren't
reallocated. The driver structures generally won't be
reallocated unless there is a firmware upgrade that changes
one of the IOCFacts.
When freeing diagnostic buffers in mp{r,s}_diag_register()
and mp{r,s}_diag_unregister(), zero/NULL out the buffer after
freeing it. This will prevent a duplicate free in some
situations.
Sponsored by: Spectra Logic
Reviewed by: mav, scottl
MFC after: 1 week
Differential Revision: D13453
SGList elements, but there's only enough space in the request frame for
either 1 element or a chain frame pointer. Previously, the code would
hit the wrong case, add the SGList element, but then fail to add the
chain frame due to lack of space. Re-arrange the code to catch this case
earlier and handle it.
Sponsored by: Netflix
When allocating memory through malloc(9), we always expect the amount of
memory requested to be unsigned as a negative value would either stand for
an error or an overflow.
Unsign some values, found when considering the use of mallocarray(9), to
avoid unnecessary casting. Also consider that indexes should be of
at least the same size/type as the upper limit they pretend to index.
MFC after: 3 weeks
Uses of mallocarray(9).
The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.
Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.
Reported by: wosch
PR: 225197
Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.
This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
would attempt to re-allocate interrupts during a chip reset without
first de-allocating them. Doing that right is going to be tricky, so
just band-aid it for now so that a re-init doesn't guarantee a failure
due to resource re-use.
Reported by: gallatin
Sponsored by: Netflix
needed, but it silences an erroneous Coverity warning and makes the code a
little more logically consistent. Also mark the sysctl as MPSAFE.
Sponsored by: Netflix
commit it to make initiazation less chatty in the normal case, and more useful
and informative when real debugging is turned on.
Reviewed by: ken (earlier version)
Sponsored by: Netflix
When the mps(4) and mpr(4) drivers need to reinitialize the
firmware, they sometimes need to reallocate all of the memory
allocated by the driver. The reallocation happens whenever the IOC
Facts change. That should only happen after a firmware upgrade.
If the reinitialization happens as a result of a timed out command
sent to the card, the command that timed out and triggered the
reinit may have been freed if iocfacts_allocate() reallocated all
memory. If the caller attempts to access the command after that,
the kernel will panic because the caller will be dereferencing
freed memory.
The solution is to set a flag in the softc when we reallocate,
and avoid dereferencing the command strucure if we've reallocated.
The changes are largely the same in both drivers, since mpr(4) is a
derivative of mps(4).
o In iocfacts_allocate(), if the IOC Facts have changed and we
need to reallocate, set the REALLOCATED flag in the softc.
o Change wait_command() to take a struct mps_command ** instead of
a struct mps_command *. This allows us to NULL out the caller's
command pointer if we have to reinit the controller and the data
structures get reallocated. (The REALLOCATED flag will be set
in the softc if that has happened.)
o In every place that calls wait_command(), make sure we handle
the case where the command is NULL after the call.
o The mpr(4) driver has mpr_request_polled() which can also
reinitialize the card. Also check for reallocation there.
Reviewed by: scottl, slm
MFC after: 1 week
Sponsored by: Spectra Logic
the informational print functions. Collapse the debug API a bit to be
more generic and not require as much code duplication. While here, fix
a bug in MPS that was already fixed in MPR.
Do the allocation before requesting the IOCFacts message. This triggers
the LSI firmware to recognize the multiqueue should be enabled if available.
Multiqueue isn't used by the driver yet, but this also fixes a problem with
the cached IOCFacts not matching latter checks, leading to potential problems
with error recovery.
As a side-effect, fetch the driver tunables as early as possible.
Reviewed by: slm
Obtained from: Netflix
Differential Revision: D9243
mps_wait_command() and mpr_wait_command() were using getmicrotime() to
determine elapsed time when checking for a timeout in polled mode.
getmicrotime() isn't guaranteed to monotonically increase, and that
caused spurious timeouts occasionally.
Switch to using getmicrouptime(), which does increase monotonically.
This fixes the spurious timeouts in my test case.
Reviewed by: slm, scottl
MFC after: 3 days
Sponsored by: Spectra Logic
that are apparently misconfigured by the manufacturer and cause the mapping
logic to fail. The fallback allows drive numbers to be assigned based on the
PHY number that they're attached to. Add sysctls and tunables to overrid
this new behavior, but they should be considered only necessary for debugging.
Reviewed by: imp, smh
Obtained from: Netflix
MFC after: 3 days
Sponsored by: D8403
Use MPI2_IOCSTATUS_MASK when checking IOCStatus to mask off the log bit, and
make a few more things endian-safe.
- Fix possible use of invalid pointer.
It was possible to use an invalid pointer to get the target ID value. To fix
this, initialize a local Target ID variable to an invalid value and change that
variable to a valid value only if the pointer to the Target ID is not NULL.
- No need to set the MPSSAS_SHUTDOWN flag because it's never used.
- done_ccb pointer can be used if it is NULL.
To prevent this, move check for done_ccb == NULL to before done_ccb is used in
mpssas_stop_unit_done().
- Disks can go missing until a reboot is done in some cases.
This is due to the DevHandle not being released, which causes the Firmware to
not allow that disk to be re-added.
Reviewed by: ken
Approved by: re (gjb), ken, scottl, ambrisko (mentors)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D6872
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.
This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
dates.
- Changed all of the PCI device strings from LSI to Avago Technologies (LSI).
- Added a sysctl variable to control how StartStopUnit behavior works. User can
select to spin down disks based on if disk is SSD or HDD.
- Inquiry data is required to tell if a disk will support SSU at shutdown or
not. Due to the addition of mpssas_async, which gets Advanced Info but not
Inquiry data, the setting of supports_SSU was moved to the
mpssas_scsiio_complete function, which snoops for any Inquiry commands. And,
since disks are shutdown as a target and not a LUN, this process was
simplified by basing it on targets and not LUNs.
- Added a sysctl variable that sets the amount of time to retry after sending a
failed SATA ID command. This helps with some bad disks and large disks that
require a lot of time to spin up. Part of this change was to add a callout to
handle timeouts with the SATA ID command. The callout function is called
mpssas_ata_id_timeout(). (Fixes PR 191348)
- Changed the way resets work by allowing I/O to continue to devices that are
not currently under a reset condition. This uses devq's instead of simq's and
makes use of the MPSSAS_TARGET_INRESET flag. This change also adds a function
called mpssas_prepare_tm().
- Some changes were made to reduce code duplication when getting a SAS address
for a SATA disk.
- Fixed some formatting and whitespace.
- Bump version of mps driver to 20.00.00.00-fbsd
PR: 191348
Reviewed by: ken, scottl
Approved by: ken, scottl
MFC after: 2 weeks
- Wrong integer type was specified.
- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.
- Logical OR where binary OR was expected.
- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.
- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.
- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.
- Updated "EXAMPLES" section in SYSCTL manual page.
MFC after: 3 days
Sponsored by: Mellanox Technologies
* Implements Start Stop Unit for SATA direct-attach devices in IR mode to avoid
data corruption.
* Use CAM_DEV_NOT_THERE instead of CAM_SEL_TIMEOUT and CAM_TID_INVALID
Obtained from: LSI
MFC after: 2 weeks
Add a tunable that allows such a device to be excluded from the driver.
The id parameter is the target id that the driver assigns to a given device.
dev.mps.X.exclude_ids=<id>,<id>
Obtained from: Netflix
MFC after: 3 days
---------------------------------------------------------------
System panics during a Port reset with ouststanding I/O
---------------------------------------------------------------
It is possible to call mps_mapping_free_memory after this
memory is already freed, causing a panic. Removed this extra
call to mps_mappiing_free_memory and call mps_mapping_exit
in place of the mps_mapping_free_memory call so that any
outstanding mapping items can be flushed before memory is
freed.
---------------------------------------------------------------
Correct memory leak during a Port reset with ouststanding I/O
---------------------------------------------------------------
In mps_reinit function, the mapping memory was not being
freed before being re-allocated. Added line to call the
memory free function for mapping memory.
---------------------------------------------------------------
Use CAM_SIM_QUEUED flag in Driver IO path.
---------------------------------------------------------------
This flag informs the XPT that successful abort of a CCB
requires an abort ccb to be issued to the SIM. While
processing SCSI IO's, set the CAM_SIM_QUEUED flag in the
status for the IO. When the command completes, clear this
flag.
---------------------------------------------------------------
Check for CAM_REQ_INPROG in I/O path.
---------------------------------------------------------------
Added a check in mpssas_action_scsiio for the In Progress
status for the IO. If this flag is set, the IO has already
been aborted by the upper layer (before CAM_SIM_QUEUED was
set) and there is no need to send the IO. The request will
be completed without error.
---------------------------------------------------------------
Improve "doorbell handshake method" for mps_get_iocfacts
---------------------------------------------------------------
Removed call to get Port Facts since this information is
not used currently.
Added mps_iocfacts_allocate function to allocate memory
that is based on IOC Facts data. Added mps_iocfacts_free
function to free memory that is based on IOC Facts data.
Both of the functions are used when a Diag Reset is performed
or when the driver is attached/detached. This is needed in
case IOC Facts changes after a Diag Reset, which could
happen if FW is upgraded.
Moved call of mps_bases_static_config_pages from the attach
routine to after the IOC is ready to process accesses based
on the new memory allocations (instead of polling through
the Doorbell).
---------------------------------------------------------------
Set TimeStamp in INIT message in millisecond format Set the IOC
---------------------------------------------------------------
---------------------------------------------------------------
Prefer mps_wait_command to mps_request_polled
---------------------------------------------------------------
Instead of using mps_request_polled, call mps_wait_command
whenever possible. Change the mps_wait_command function to
check the current context and either use interrupt context
or poll if required by using the pause or DELAY function.
Added a check after waiting 50mSecs to see if the command
has timed out. This is only done if polliing, the msleep
command will automatically timeout if the command has taken
too long to complete.
---------------------------------------------------------------
Integrated RAID: Volume Activation Failed error message is
displayed though the volume has been activated.
---------------------------------------------------------------
Instead of failing an IOCTL request that does not have a
large enough buffer to hold the complete reply, copy as
much data from the reply as possible into the user's buffer
and log a message saying that the user's buffer was smaller
than the returned data.
---------------------------------------------------------------
mapping_add_new_device failure due to persistent table FULL
---------------------------------------------------------------
When a new device is added, if it is determined that the
device persistent table is being used and is full, instead
of displaying a message for this condition every time, only
log a message if the MPS_INFO bit is set in the debug_flags.
Submitted by: LSI
MFC after: 1 week
sys/dev/mps/mps_user.c
Fix uninitialized memory reference in mps_read_config_page. It was
referencing a field (params->hdr.Ext.ExtPageType) that would only be
set when reading an Extended config page. The symptom was that
MPSIO_READ_CFG_PAGE ioctls would randomly fail with
MPI2_IOCSTATUS_CONFIG_INVALID_PAGE errors. The solution is to
determine whether an extended or an ordinary config page is requested
by looking at the PageType field, which should be available regardless.
Similarly, mps_user_read_extcfg_header and mps_user_read_extcfg_page,
which call mps_read_config_page, had to be fixed to always set the
PageType field. They were implicitly assuming that
mps_read_config_page always operated on Extended pages.
Reviewed by: ken
Approved by: ken (mentor)
MFC after: 3 days
every architecture's busdma_machdep.c. It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code. The MD busdma is then given a chance to do any final processing
in the complete() callback.
The cam changes unify the bus_dmamap_load* handling in cam drivers.
The arm and mips implementations are updated to track virtual
addresses for sync(). Previously this was done in a type specific
way. Now it is done in a generic way by recording the list of
virtuals in the map.
Submitted by: jeff (sponsored by EMC/Isilon)
Reviewed by: kan (previous version), scottl,
mjacob (isp(4), no objections for target mode changes)
Discussed with: ian (arm changes)
Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)