Commit graph

6970 commits

Author SHA1 Message Date
Marius Halden
0909e05779 carp: deal with negative net.inet.carp.demotion
Given nodes 1 and 2, where node 1 has an advskew of 0 and node 2 has an
advskew of 100, making them master and backup respectively.

If net.inet.carp.demotion is set to a negative value on node 1, node 2
might become master while node 1 still retains it master status. Wether
or not node 2 becomes master seems to depend on the nodes advskew and
what the demotion sysctl was set to on node 1.

The reason for node 2 becoming master seems to be that the calculated
advskew taking demotion into account is truncated to a single unsigned
byte when copied into the carp header for sending, and node 1 stays
master since it takes uses the whole non-truncated calculated advskew
when deciding wether to stay master.

PR:		259528
Reviewed by:	donner, glebius
MFC after:	3 weeks
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D32759

(cherry picked from commit 1019354b54)
2021-11-22 02:55:02 +01:00
Roy Marples
ec5691aa2f net: Allow binding of unspecified address without address existance
Previously in_pcbbind_setup returned EADDRNOTAVAIL for empty
V_in_ifaddrhead (i.e., no IPv4 addresses configured) and in6_pcbbind
did the same for empty V_in6_ifaddrhead (no IPv6 addresses).

An equivalent test has existed since 4.4-Lite.  It was presumably done
to avoid extra work (assuming the address isn't going to be found
later).

In normal system operation *_ifaddrhead will not be empty: they will
at least have the loopback address(es).  In practice no work will be
avoided.

Further, this case caused net/dhcpd to fail when run early in boot
before assignment of any addresses.  It should be possible to bind the
unspecified address even if no addresses have been configured yet, so
just remove the tests.

The now-removed "XXX broken" comments were added in 59562606b9,
which converted the ifaddr lists to TAILQs.  As far as I (emaste) can
tell the brokenness is the issue described above, not some aspect of
the TAILQ conversion.

PR:		253166
Reviewed by:	ae, bz, donner, emaste, glebius
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D32563

(cherry picked from commit 5c5340108e)
2021-11-18 19:28:56 -05:00
Andrey V. Elsukov
faba420cb9 ip_divert: calculate delayed checksum for IPv6 adress family
Before passing an IPv6 packet to application apply delayed checksum
calculation. Mbuf flags will be lost when divert listener will return a
packet back, so we will not be able to do delayed checksum calculation
later. Also an application will get a packet with correct checksum.

Reviewed by:	donner
Differential Revision: https://reviews.freebsd.org/D32807

(cherry picked from commit 4a9e95286c)
2021-11-12 15:19:19 +03:00
Gordon Bergling
e3f2519c5c Fix a common typo in syctl descriptions
- s/maxiumum/maximum/

(cherry picked from commit c28e39c3d6)
2021-11-06 08:52:57 +01:00
Gordon Bergling
d843e777a5 netinet: Fix a common typo in source code comments
- s/writting/writing/

(cherry picked from commit bb91496a85)
2021-11-06 08:52:38 +01:00
Mike Karels
3ee882bf21 Change lowest address on subnet (host 0) not to broadcast by default.
The address with a host part of all zeros was used as a broadcast long
ago, but the default has been all ones since 4.3BSD and RFC1122.  Until
now, we would broadcast the host zero address as well as the configured
address.  Change to not broadcasting that address by default, but add a
sysctl (net.inet.ip.broadcast_lowest) to re-enable it.  Note that the
correct way to use the zero address for broadcast would be to configure
it as the broadcast address for the network.

See https:/datatracker.ietf.org/doc/draft-schoen-intarea-lowest-address/
and the discussion in https://reviews.freebsd.org/D19316.  Note, Linux
now implements this.

Reviewed by:	rgrimes, tuexen; melifaro (previous version)
Relnotes:	yes
Differential Revision: https://reviews.freebsd.org/D31861

(cherry picked from commit fd0765933c)
2021-10-19 08:16:32 -05:00
Marko Zec
602f81ea50 [fib_algo][dxr] Retire counters which are no longer used
The number of chunks can still be tracked via vmstat -z|fgrep dxr.

MFC after:	3 days
2021-10-13 22:06:49 +02:00
Marko Zec
0eeef61aec [fib_algo][dxr] Improve incremental updating strategy
Tracking the number of unused holes in the trie and the range table
was a bad metric based on which full trie and / or range rebuilds
were triggered, which would happen in vain by far too frequently,
particularly with live BGP feeds.

Instead, track the total unused space inside the trie and range table
structures, and trigger rebuilds if the percentage of unused space
exceeds a sysctl-tunable threshold.

MFC after:	3 days
PR:		257965
2021-10-13 22:06:10 +02:00
Mark Johnston
f983298883 socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK().  These operate on a
socket rather than a socket buffer.  Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.

When locking for I/O, return an error if the socket is a listening
socket.  Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.

Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before.  In
particular, add checks to sendfile() and sorflush().

Reviewed by:	tuexen, gallatin
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit f94acf52a4)
2021-10-07 09:56:47 -04:00
Marko Zec
94ad8d7c7a [fib_algo][dxr] Split unused range chunk list in multiple buckets
Traversing a single list of unused range chunks in search for a block
of optimal size was suboptimal.

The experience with real-world BGP workloads has shown that on average
unused range chunks are tiny, mostly in length from 1 to 4 or 5, when
DXR is configured with K = 20 which is the current default (D16X4R).

Therefore, introduce a limited amount of buckets to accomodate descriptors
of empty blocks of fixed (small) size, so that those can be found in O(1)
time.  If no empty chunks of the requested size can be found in fixed-size
buckets, the search continues in an unsorted list of empty chunks of
variable lengths, which should only happen infrequently.

This change should permit us to manage significantly more empty range
chunks without sacrifying the speed of incremental range table updating.

MFC after:	3 days
2021-09-29 22:40:56 +02:00
Marko Zec
c5981a8130 [fib_algo][dxr] Merge adjacent empty range table chunks.
MFC after:	3 days
2021-09-29 22:40:01 +02:00
Gordon Bergling
81d34d466c sctp: Fix a typo in a comment
- s/assue/assume/

(cherry picked from commit d2e616147d)
2021-09-29 19:18:27 +02:00
Mark Johnston
32f1d05f78 sctp: Allow blocking on I/O locks even with non-blocking sockets
There are two flags to request a non-blocking receive on a socket:
MSG_NBIO and MSG_DONTWAIT.  They are handled a bit differently in that
soreceive_generic() and soreceive_stream() will block on the socket I/O
lock when MSG_NBIO is set, but not if MSG_DONTWAIT is set.  In general,
MSG_NBIO seems to mean, "don't block if there is no data to receive" and
MSG_DONTWAIT means "don't go to sleep for any reason".

SCTP's soreceive implementation did not allow blocking on the I/O lock
if either flag is set, but this violates an assumption in
aio_process_sb(), which specifies MSG_NBIO but nonetheless
expects to make progress if data is available to read.  Change
sctp_sorecvmsg() to block on the I/O lock only if MSG_DONTWAIT
is not set.

Reported by:	syzbot+c7d22dbbb9aef509421d@syzkaller.appspotmail.com
Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit e6c19aa94d)
2021-09-21 09:38:39 -04:00
Marko Zec
ec47ee78b4 [fib algo][dxr] Fix division by zero.
A division by zero would occur if DXR would be activated on a vnet
with no IP addresses configured on any interfaces.

PR:		257965
MFC after:	3 days
Reported by:	Raul Munoz

(cherry picked from commit eb3148cc4d)
2021-09-18 19:38:09 +02:00
Marko Zec
ad2cca48ed [fib algo][dxr] Optimize trie updating.
Don't rebuild in vain trie parts unaffected by accumulated incremental
RIB updates.

PR:		257965
Tested by:	Konrad Kreciwilk
MFC after:	3 days

(cherry picked from commit b51f8bae57)
2021-09-18 19:37:35 +02:00
Marko Zec
d3b9b83623 [fib algo][dxr] Fix undefined behavior.
The result of shifting uint32_t by 32 (or more) is undefined: fix it.

(cherry picked from commit 442c8a245e)
2021-09-18 19:36:32 +02:00
orange30
7959799d93 net: Fix memory leaks upon arp_fillheader() failures
Free memory before return from arprequest_internal().  In in_arpinput(),
if arp_fillheader() fails, it should use goto drop.

Reviewed by:	melifaro, imp, markj
Pull Request:	https://github.com/freebsd/freebsd-src/pull/534

(cherry picked from commit f5777c123a)
2021-09-17 09:14:12 -04:00
Mark Johnston
adfb7f807c sctp: Clear assoc socket references when freeing a PCB
This restores behaviour present in the first import of SCTP.  Commit
ceaad40ae7 commented this out and commit
62fb761ff2 removed it.  However, once
sctp_inpcb_free() returns, the socket reference is gone no matter what,
so we need to clear it.

Reported by:	syzbot+30dd69297fcbc5f0e10a@syzkaller.appspotmail.com
Reported by:	syzbot+7b2f9d4bcac1c9569291@syzkaller.appspotmail.com
Reported by:	syzbot+ed3e651f7d040af480a6@syzkaller.appspotmail.com
Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 4250aa1188)
2021-09-16 08:37:53 -04:00
Mark Johnston
97d24f3dfa sctp: Fix iterator synchronization in sctp_sendall()
- The SCTP_PCB_FLAGS_SND_ITERATOR_UP check was racy, since two threads
  could observe that the flag is not set and then both set it.  I'm not
  sure if this is actually a problem in practice, i.e., maybe there's no
  problem having multiple sends for a single PCB in the iterator list?
- sctp_sendall() was modifying sctp_flags without the inp lock held.

The change simply acquires the PCB write lock before toggling the flag,
fixing both problems.

Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 173a7a4ee4)
2021-09-14 08:51:54 -04:00
Mark Johnston
086a3ea828 sctp: Remove an unused sctp_inpcb field
This appears to be unused in usrsctp as well.  No functional change
intended.

Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit e8e23ec127)
2021-09-14 08:51:45 -04:00
Mark Johnston
072901b7bc sctp: Fix races around sctp_inpcb_free()
sctp_close() and sctp_abort() disassociate the PCB from its socket.
As a part of this, they attempt to free the PCB, which may end up
lingering.  Fix some bugs in this area:

- For some reason, sctp_close() and sctp_abort() set
  SCTP_PCB_FLAGS_SOCKET_GONE using an atomic compare-and-set without the
  PCB lock held.  This is racy since sctp_flags is normally updated
  without atomics, using the PCB lock to synchronize.  So, the update
  can be lost, which can cause all sort of races with other SCTP
  components which look for the _GONE flag.  Fix the problem simply by
  acquiring the PCB lock in order to set the flag.  Note that we have to
  drop and re-acquire the lock again in sctp_inpcb_free(), but I don't
  see a good way around that for now.  If it's a real problem, the _GONE
  flag could be split out of sctp_flags and into a dedicated sctp_inpcb
  field.
- In sctp_inpcb_free(), load sctp_socket after acquiring the PCB lock,
  to avoid possible races with parallel sctp_inpcb_free() calls.
- Add an assertion sctp_inpcb_free() to verify that _ALLGONE is not set.

Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit c17b531bed)
2021-09-14 08:51:35 -04:00
Artem Khramov
8d783b1dcd netinet: prevent NULL pointer dereference in in_aifaddr_ioctl()
It appears that maliciously crafted ifaliasreq can lead to NULL
pointer dereference in in_aifaddr_ioctl(). In order to replicate
that, one needs to

1. Ensure that carp(4) is not loaded

2. Issue SIOCAIFADDR call setting ifra_vhid field of the request
   to a negative value.

A repro code would look like this.

int main() {
    struct ifaliasreq req;
    struct sockaddr_in sin, mask;
    int fd, error;

    bzero(&sin, sizeof(struct sockaddr_in));
    bzero(&mask, sizeof(struct sockaddr_in));

    sin.sin_len = sizeof(struct sockaddr_in);
    sin.sin_family = AF_INET;
    sin.sin_addr.s_addr = inet_addr("192.168.88.2");

    mask.sin_len = sizeof(struct sockaddr_in);
    mask.sin_family = AF_INET;
    mask.sin_addr.s_addr = inet_addr("255.255.255.0");

    fd = socket(AF_INET, SOCK_DGRAM, 0);
    if (fd < 0)
        return (-1);

    memset(&req, 0, sizeof(struct ifaliasreq));
    strlcpy(req.ifra_name, "lo0", sizeof(req.ifra_name));
    memcpy(&req.ifra_addr, &sin, sin.sin_len);
    memcpy(&req.ifra_mask, &mask, mask.sin_len);
    req.ifra_vhid = -1;

    return ioctl(fd, SIOCAIFADDR, (char *)&req);
}

To fix, discard both positive and negative vhid values in
in_aifaddr_ioctl, if carp(4) is not loaded. This prevents NULL pointer
dereference and kernel panic.

Reviewed by:	imp@
Pull Request:	https://github.com/freebsd/freebsd-src/pull/530

(cherry picked from commit 620cf65c2b)
2021-09-12 19:12:31 -06:00
Mark Johnston
aacbd4dd57 sctp: Implement sctp_inpcb_bind_locked()
This will be used by sctp_listen() to avoid dropping locks when
performing an implicit bind.  No functional change intended.

Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 457abbb857)
2021-09-08 08:41:16 -04:00
Mark Johnston
6bfe4afe73 sctp: Release the socket reference when detaching an association
Later in sctp_free_assoc(), when we clean up chunk lists,
sctp_free_spbufspace() is used to reset the byte count in the socket
send buffer.  However, if the PCB is going away, the socket may already
have been detached from the PCB, in which case this becomes a use-after
free.  Clear the socket reference from the association before detaching
it from the PCB, if the PCB has already lost its socket reference.

Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 65f30a39e1)
2021-09-08 08:40:36 -04:00
Mark Johnston
d30602a2b4 sctp: Hold association locks across socket wakeups when freeing
At this point we do not hold the inpcb lock, so the only thing holding
the socket reference live is the TCB lock, which needs to be acquired by
sctp_inpcb_free() in order to destroy associations.  Defer the unlock to
until after we dereference the socket reference.

Reported by:	syzbot+1d0f2c4675de76a4cf1e@syzkaller.appspotmail.com
Reported by:	syzbot+fabee77954fe69d3a5ad@syzkaller.appspotmail.com
Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit d35be50f57)
2021-09-08 08:40:33 -04:00
Mark Johnston
2d0d1d6e07 sctp: Add macros to assert on inp info lock state
Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit be8ee77e9e)
2021-09-08 08:40:29 -04:00
Zhenlei Huang
e8df60a69a routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549).
Implement kernel support for RFC 5549/8950.

* Relax control plane restrictions and allow specifying IPv6 gateways
 for IPv4 routes. This behavior is controlled by the
 net.route.rib_route_ipv6_nexthop sysctl (on by default).

* Always pass final destination in ro->ro_dst in ip_forward().

* Use ro->ro_dst to exract packet family inside if_output() routines.
 Consistently use RO_GET_FAMILY() macro to handle ro=NULL case.

* Pass extracted family to nd6_resolve() to get the LLE with proper encap.
 It leverages recent lltable changes committed in c541bd368f.

Presence of the functionality can be checked using ipv4_rfc5549_support feature(3).
Example usage:
  route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0

Differential Revision: https://reviews.freebsd.org/D30398

(cherry picked from commit 62e1a437f3)
2021-09-07 21:25:06 +00:00
Alexander V. Chernikov
48f38f47b1 lltable: Add support for "child" LLEs holding encap for IPv4oIPv6 entries.
Currently we use pre-calculated headers inside LLE entries as prepend data
 for `if_output` functions. Using these headers allows saving some
 CPU cycles/memory accesses on the fast path.

However, this approach makes adding L2 header for IPv4 traffic with IPv6
 nexthops more complex, as it is not possible to store multiple
 pre-calculated headers inside lle. Additionally, the solution space is
 limited by the fact that PCB caching saves LLEs in addition to the nexthop.

Thus, add support for creating special "child" LLEs for the purpose of holding
 custom family encaps and store mbufs pending resolution. To simplify handling
 of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE.
 Such LLEs are not visible when iterating LLE table. Their lifecycle is bound
 to the "parent" LLE - it is not possible to delete "child" when parent is alive.
 Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state
 machine used by the standard LLEs.

nd6_lookup() and nd6_resolve() now accepts an additional argument, family,
 allowing to return such child LLEs. This change uses `LLE_SF()` macro which
 packs family and flags in a single int field. This is done to simplify merging
 back to stable/. Once this code lands, most of the cases will be converted to
 use a dedicated `family` parameter.

Differential Revision: https://reviews.freebsd.org/D31379

(cherry picked from commit c541bd368f)
2021-09-07 21:02:58 +00:00
Alexander V. Chernikov
10e0976103 Simplify nhop operations in ip_output().
Consistently use `nh` instead of always dereferencing
 ro->ro_nh inside the if block.
Always use nexthop mtu, as it provides guarantee that mtu is accurate.
Pass `nh` pointer to rt_update_ro_flags() to allow upcoming uses
 of updating ro flags based on different nexthop.

Differential Revision: https://reviews.freebsd.org/D31451
Reviewed by:	kp

(cherry picked from commit 9748eb7427)
2021-09-07 21:02:58 +00:00
Alexander V. Chernikov
0ea561762b Use lltable calculated header when sending lle holdchain after successful lle resolution.
Subscribers: imp, ae, bz

Differential Revision: https://reviews.freebsd.org/D31391

(cherry picked from commit 8482aa7748)
2021-09-07 21:02:58 +00:00
Alexander V. Chernikov
2802014380 [lltable] Unify datapath feedback mechamism.
Use newly-create llentry_request_feedback(),
 llentry_mark_used() and llentry_get_hittime() to
 request datapatch usage check and fetch the results
 in the same fashion both in IPv4 and IPv6.

While here, simplify llentry_provide_feedback() wrapper
 by eliminating 1 condition check.

Differential Revision: https://reviews.freebsd.org/D31390

(cherry picked from commit f3a3b06121)
2021-09-07 21:02:58 +00:00
Mark Johnston
6053349c46 sctp: Fix racy UNBOUND flag check in sctp_inpcb_bind()
SCTP needs to avoid binding a given socket twice.  The check used to
avoid this is racy since neither the inpcb lock nor the global info lock
is held.  Fix it by synchronizing using the global info lock.  In
particular, sctp_inpcb_bind() may drop the inpcb lock in some cases, but
the info lock is sufficient to prevent double insertion into PCB hash
tables.

Reported by:	syzbot+548a8560d959669d0e12@syzkaller.appspotmail.com
Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 4a36122b1d)
2021-09-07 09:36:19 -04:00
Mark Johnston
8522f7ddac sctp: Simplify the free port search in sctp_inpcb_bind()
Eliminate a flag variable and reduce indentation.  No functional change
intended.

Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 2496d812a9)
2021-09-07 09:36:19 -04:00
Mark Johnston
96ec1edc4a sctp: Avoid unnecessary refcount bumps in sctp_inpcb_bind()
We only drop the inp lock when binding to a specific port.  So, only
acquire an extra reference when required.  This simplifies error
handling a bit.

Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 93908fce72)
2021-09-07 09:36:19 -04:00
Mark Johnston
53fcd24b1e sctp: Remove always-false checks in sctp_inpcb_bind()
No functional change intended.

Reviewed by:	tuexen
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit 0d29e4bc01)
2021-09-07 09:36:18 -04:00
Gordon Bergling
6a93acc380 Fix a common typo in source code comments
- s/existant/existent/

(cherry picked from commit 631504fb34)
2021-09-07 09:24:05 +02:00
Gordon Bergling
64986351d3 inet(3): Fix a few common typos in source code comments
- s/funtion/function/

(cherry picked from commit 586c9dc374)
2021-08-31 08:11:48 +02:00
Luiz Otavio O Souza
09e25aff54 ipfw: use unsigned int for dummynet bandwidth
This allows the maximum value of 4294967295 (~4Gb/s) instead of previous
value of 2147483647 (~2Gb/s).

Reviewed by:	np, scottl
Obtained from:	pfSense
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31582

(cherry picked from commit 20ffd88ed5)
2021-08-26 14:05:26 +02:00
Mateusz Guzik
990e592dae ip_reass: do less work in ipreass_slowtimo if possible
ipreass_slowtimo avoidably uses CPU on otherwise idle boxes

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31526

(cherry picked from commit 3be3cbe06d)
2021-08-18 09:44:44 +00:00
Mateusz Guzik
ad9671955a ip_reass: drop the volatile keyword from nfrags and mark with __exclusive_cache_line
The keyword adds nothing as all operations on the var are performed
through atomic_*

Reviewed by:	kp
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D31526

(cherry picked from commit d2b95af1c2)
2021-08-18 09:44:43 +00:00
Konstantin Kukushkin
2141aab35b udp: Fix soroverflow SOCKBUF unlocking
We hold the SOCKBUF_LOCK so use soroverflow_locked here.
This bug may manifest as a non-killable process stuck in [*so_rcv].

Approved by:	scottl
Reviewed by:	Roy Marples <roy@marples.name>
Fixes:	7045b1603b
MFC after:  10 days
Differential Revision:	https://reviews.freebsd.org/D31374

(cherry picked from commit a61c24ddb7)
2021-08-10 18:54:18 -07:00
Roy Marples
f452713408 socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by:	philip (network), kbowling (transport), gbe (manpages)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26652

(cherry picked from commit 7045b1603b)
2021-08-10 18:54:00 -07:00
Mark Johnston
668a555de6 rip: Add missing minimum length validation in rip_output()
If the socket is configured such that the sender is expected to supply
the IP header, then we need to verify that it actually did so.

Reported by:	syzkaller+KMSAN
Reviewed by:	donner
Sponsored by:	The FreeBSD Foundation

(cherry picked from commit ba21825202)
2021-08-02 15:01:11 -04:00
Richard Scheffenegger
e4ee2a39ad tcp: Add PRR cwnd reduction for non-SACK loss
This completes PRR cwnd reduction in all circumstances
for the base TCP stack (SACK loss recovery, ECN window reduction,
non-SACK loss recovery), preventing the arriving ACKs to
clock out new data at the old, too high rate. This
reduces the chance to induce additional losses while
recovering from loss (during congested network conditions).

For non-SACK loss recovery, each ACK is assumed to have
one MSS delivered. In order to prevent ACK-split attacks,
only one window worth of ACKs is considered to actually
have delivered new data.

MFC after: 6 weeks
Reviewed By: rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29441

(cherry picked from commit 74d7fc8753)
2021-08-02 13:59:23 +02:00
Kristof Provost
c3d03672e1 pf: syncookie support
Import OpenBSD's syncookie support for pf. This feature help pf resist
TCP SYN floods by only creating states once the remote host completes
the TCP handshake rather than when the initial SYN packet is received.

This is accomplished by using the initial sequence numbers to encode a
cookie (hence the name) in the SYN+ACK response and verifying this on
receipt of the client ACK.

Reviewed by:	kbowling
Obtained from:	OpenBSD
MFC after:	1 week
Sponsored by:	Modirum MDPay
Differential Revision:	https://reviews.freebsd.org/D31138

(cherry picked from commit 8e1864ed07)
2021-07-27 09:42:25 +02:00
Michael Tuexen
9b1219b24a tcp: fix RACK and BBR when using VIMAGE enabled kernel
Fix a bug in VNET handling, which occurs when using specific NICs.
PR:			257195
Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D31212

(cherry picked from commit a730d82378)
2021-07-22 11:13:31 +02:00
Stefan Eßer
791035c8da libalias: fix divide by zero causing panic
The packet_limit can fall to 0, leading to a divide by zero abort in
the "packets % packet_limit".

An possible solution would be to apply a lower limit of 1 after the
calculation of packet_limit, but since any number modulo 1 gives 0,
the more efficient solution is to skip the modulo operation for
packet_limit <= 1.

Reported by:	Karl Denninger <karl@denninger.net>

(cherry picked from commit 58080fbca0)
2021-07-14 13:49:21 +02:00
Andrew Gallatin
7751a6b585 tcp: fix alternate stack build with LINT-NO{INET,INET6,IP}
When fixing another bug, I noticed that the alternate
TCP stacks do not build when various combinations of
ipv4 and ipv6 are disabled.

Reviewed by:		rrs, tuexen
Differential Revision:	https://reviews.freebsd.org/D31094
Sponsored by:		 Netflix

(cherry picked from commit b1e806c0ed)
2021-07-13 22:00:50 +02:00
Randall Stewart
1bb521ab7d tcp: Fix 32 bit platform breakage
This fixes the incorrect use of a sysctl add to u64. It
was for a useconds time, but on 32 bit platforms its
not a u64. Instead use the long directive.

Reviewed by:		tuexen
Sponsored by:		Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D31107

(cherry picked from commit 7312e4e5cf)
2021-07-13 21:59:50 +02:00
Randall Stewart
deb3c279d1 tcp: HPTS performance enhancements
HPTS drives both rack and bbr, and yet there have been many complaints
about performance. This bit of work restructures hpts to help reduce CPU
overhead. It does this by now instead of relying on the timer/callout to
drive it instead use user return from a system call as well as lro flushes
to drive hpts. The timer becomes a backstop that dynamically adjusts
based on how "late" we are.

Reviewed by:		tuexen, glebius
Sponsored by:		Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D31083

(cherry picked from commit d7955cc0ff)
2021-07-13 21:58:30 +02:00