source/ipfw: retry divert socket sendto on transient failure

The verdict path issues a single sendto() to the ipfw divert socket and treats any errno other than EHOSTDOWN/ENETDOWN as a hard failure that silently drops the packet. Under sustained load (e.g. many TCP segments
of a single large response on OPNsense divert-to), the kernel's divert socket can return ENOBUFS / EAGAIN / EINTR transiently, in which case Suricata aborts the verdict and the segment is lost on the wire. The client never sees the data, and after its retransmit/abort timeout the connection fails. Small responses (single segment) don't trigger the buffer pressure and work fine.

NFQ handles the same class of transient failure with a retry loop (NFQ_VERDICT_RETRY_COUNT) in NFQVerdictCacheFlush and VerdictNFQ. Mirror that pattern on the ipfw side: wrap the sendto() in an equivalent retry
loop bounded by IPFW_VERDICT_RETRY_COUNT (3). Preserve the existing errno classification on the final failure so persistent errors are still reported as before.
Reported on Redmine #8377.
This commit is contained in:
Samaresh Kumar Singh 2026-05-15 15:18:20 -05:00
parent bb4e79c4f7
commit fe3d2ef8fe

View file

@ -99,6 +99,8 @@ TmEcode NoIPFWSupportExit(ThreadVars *tv, const void *initdata, void **data)
#define IPFW_SOCKET_POLL_MSEC 300
#define IPFW_VERDICT_RETRY_COUNT 3
extern uint32_t max_pending_packets;
/**
@ -561,7 +563,14 @@ TmEcode IPFWSetVerdict(ThreadVars *tv, IPFWThreadVars *ptv, Packet *p)
#endif
IPFWMutexLock(nq);
if (sendto(nq->fd, GET_PKT_DATA(p), GET_PKT_LEN(p), 0,(struct sockaddr *)&nq->ipfw_sin, nq->ipfw_sinlen) == -1) {
ssize_t ret;
int iter = 0;
do {
ret = sendto(nq->fd, GET_PKT_DATA(p), GET_PKT_LEN(p), 0,
(struct sockaddr *)&nq->ipfw_sin, nq->ipfw_sinlen);
} while (ret == -1 && (iter++ < IPFW_VERDICT_RETRY_COUNT));
if (ret == -1) {
int r = errno;
switch (r) {
default: