and fix assigning IP addresses to the gif(4) interface when it is a
member of a if_bridge(4) interface.
When setting the sysctl net.link.bridge.member_ifaddrs to 1, if_bridge(4)
can eliminate unnecessary walk of the member list to determine whether
the inbound unicast packets are for us or not.
Well when a gif(4) interface is member of a if_bridge(4) interface, it
acts as the tunnel endpoint to tunnel Ethernet frames over IP network,
aka the EtherIP protocol, so the IP addresses configured on it are
independent of the if_bridge(4) interface or other if_bridge(4) members,
hence the sysctl net.link.bridge.member_ifaddrs should not have any
influnce over gif(4) interfaces's behavior of assigning IP addresses.
PR: 227450
Reported by: Siva Mahadevan <me@svmhdvn.name>
Reviewed by: ivy, #bridge
MFC after: 1 week
Fixes: 0a1294f6c610 bridge: allow IP addresses on members to be disabled
Differential Revision: https://reviews.freebsd.org/D52200
(cherry picked from commit 9764aa1ccad08a7ec53ed9b80741b9553f3fa4e6)
While diagnosing PR 279653 and PR 285129, I observed that thread may
write to freed memory but the system does not crash. This hides the
real problem. A clear NULL pointer derefence is much better than writing
to freed memory.
PR: 279653
PR: 285129
Reviewed by: glebius
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D49444
(cherry picked from commit b5c46895fdddcdb7dd1994598925d6989ea7c8f2)
Since there are multicast and broadcast specific error counters,
use them.
Reviewed by: rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D51869
(cherry picked from commit 0312f80349eedfc2b0d2f24b4fd073795148d3d5)
When reflecting a packet, use an offset of 0 and clear all three bits,
in particular the DF bit.
PR: 288558
Reviewed by: markj, zlei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D51991
(cherry picked from commit b9a2d84b1bf7f9cf556e2f0b68023d5af8362797)
If a blind attacker wants to guess by sending ACK segments if there
exists a TCP connection , this might trigger a challenge ACK on an
existing TCP connection. To make this hit non-observable for the
attacker, also increment the global counter, which would have been
incremented if it would have been a non-hit.
This issue was reported as issue number 11 in Keyu Man et al.:
SCAD: Towards a Universal and Automated Network Side-Channel
Vulnerability Detection
Reviewed by: Nick Banks, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D51724
(cherry picked from commit f0f6e50388963cae44bb92bb69ed7a1135dd2eec)
It is not used anymore...
Reviewed by: rscheff, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D50900
(cherry picked from commit 124120d44ba23ccc44144f9fc48d35818c660dc1)
The sysctl-variable net.inet.tcp.blackhole_local should affect
TCP segments from an IPv6 address of the local host, not of a host
on the local area network.
Thanks to cc@ for pointing me to the issue.
Reviewed by: cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D50828
(cherry picked from commit de8fb1b3835758998a53d772deeebcdb71bbb823)
The sysctl-variable net.inet.tcp.nolocaltimewait should affect
TCP connections where the remote endpoint is on the local host and
not on the local area network.
Reported by: cc
Reviewed by: cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D50830
(cherry picked from commit 49eabd405f661fa3a9f0a005c2e54dc4cad07e48)
When the socket has a tunneling function attached, udp_append() drops
the inpcb lock before calling it. To keep the inpcb alive, we bump the
refcount. After commit 742e7210d0 we only dropped the reference if
the tunnel consumed the packet, but it needs to be dropped in either
case. if_ovpn is the only driver that can trigger this bug.
Fixes: 742e7210d0 ("udp: allow udp_tun_func_t() to indicate it did not eat the packet")
Reviewed by: kp
MFC after: 2 weeks
Sponsored by: Stormshield
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D51505
(cherry picked from commit e1751ef896119d7372035b1b60f18a6342bd0e3b)
Just like we already do for IPv6 set the PFIL_FWD flag when we're forwarding
IPv4 traffic. This allows firewalls to make more precise decisions.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D48824
pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed
to them. (E.g. when rejecting a packet, or when gathering up packets
for reassembly).
If the hook returns PFIL_PASS the mbuf must still be present. Assert
this in pfil_mem_common() and ensure that ipfilter follows this
convention. pf and ipfw already did.
Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf
must have been freed (or now be owned by the firewall for further
processing, like packet scheduling or reassembly).
This allows us to remove a few extraneous NULL checks.
Suggested by: tuexen
Reviewed by: tuexen, zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43617
This removes the if_output calls in the pf(4) code that escape further
processing by defering the forwarding execution to the network stack
using on/off style sysctls for both IPv4 and IPv6.
Also see: https://reviews.freebsd.org/D8877
This commit also includes the original refactoring changes
This change allows the kernel to operate with the default netisr cpu-affinity settings while having RSS compiled in. Normally, RSS changes quite a bit of the behaviour of the kernel dispatch service - this change allows for reducing impact on incompatible hardware while preserving the option to boost throughput speeds based on packet flow CPU affinity.
Make sure to compile the following options in the kernel:
options RSS
As well as setting the following sysctls:
net.inet.rss.enabled: 1
net.isr.bindthreads: 1
net.isr.maxthreads: -1 (automatically sets it to the number of CPUs)
And optionally (to force a 1:1 mapping between CPUs and buckets):
net.inet.rss.bits: 3 (for 8 CPUs)
net.inet.rss.bits: 2 (for 4 CPUs)
etc.
Set pin_default_swi to 0 by default in the RSS case.
This patch provides UDP encapsulation of ESP packets over IPv6.
Ports the IPv4 code to IPv6 and adds support for IPv6 in udpencap.c
As required by the RFC and unlike in IPv4 encapsulation,
UDP checksums are calculated.
Co-authored-by: Aurelien Cazuc <aurelien.cazuc.external@stormshield.eu>
Sponsored-by: Stormshield
Sponsored-by: Wiktel
Sponsored-by: Klara, Inc.
Fix KASSERT in 80044c78 causing build failures
Move the KASSERT to where struct ip6_hdr is populated
Fixes: 80044c785cb040a2cf73779d23f9e1e81a00c6c3
Reported-by: bapt
Reviewed-by: markj
Sponsored-by: Klara, Inc.
Based on a patch originally found in m0n0wall, expanded
to IPv6 and aligned with FreeBSD's IP input path.
The limit may not be correctly accounted for on the WAN
interface due to dummynet counting the packet again even
though it was already processed.
The problem here is that there's no proper way to reinject
the packet at the point where it was previously removed
from so we make the assumption that ip input was already
done (including pfil) and more or less directly move to
packet output processing.
While here move the passin label up to take the extra check
but avoiding a second label. Also remove the spurious tag
read for forward check since we don't use it and we should
really trust the mbuf flag.
Do not clear the SCTP_ADDR_IFA_UNUSEABLE flag, if it was set due
to the address being deprecated. Also don't declare tentative
addresses as unusable.
While there, cleanup the code.
PR: 230242
(cherry picked from commit 9639de2a6f7eec8b2158782fbfab3419d507fdc5)
Only call sctp_gather_internal_ifa_flags() for IPv6 addresses and
also compile this code only, when IPv6 is supported.
This fixes the compilation of IPv4 only kernels.
Reported by: bz@
Fixes: 6ab4b0c0df57 ("sctp: initilize local address flags correctly")
(cherry picked from commit 99c58ad021b2f7dc0496e16d313c5e28a552f0d0)
When binding to an address, which is not available, use
consistently EADDRNOTAVAIL.
(cherry picked from commit 79952cd7649b63fa312ecafcffb719f5060929d4)
When reporting the local addresses of an endpoint (inp without
stcb), ignore unusable addresses.
(cherry picked from commit 8f5f6680efa28135bf37f3def2aa71f35bd30333)
add a new sysctl, net.link.bridge.member_ifaddrs, which defaults to 1.
if it is set to 1, bridge behaviour is unchanged.
if it is set to 0:
- an interface which has AF_INET6 or AF_INET addresses assigned cannot
be added to a bridge.
- an interface in a bridge cannot have an AF_INET6 or AF_INET address
assigned to it.
- the bridge will no longer consider the lladdrs on bridge members to be
local addresses, i.e. frames sent to member lladdrs will not be
processed by the host.
update bridge.4 to document this behaviour, as well as the existing
recommendation that IP addresses should not be configured on bridge
members anyway, even if it currently partially works.
in testing, setting this to 0 on a bridge with 50 member interfaces
improved throughput by 22% (4.61Gb/s -> 5.67Gb/s) across two member
epairs due to eliding the bridge member list walk in GRAB_OUR_PACKETS.
Reviewed by: kp, des
Approved by: des (mentor)
Differential Revision: https://reviews.freebsd.org/D49995
(cherry picked from commit 0a1294f6c610948d7447ae276df74a6d5269b62e)
PR: 286539
MFC after: 3 days
Approved by: re (cperciva)
(cherry picked from commit 75d173a84836d14b12a0f747ffed7d37766dd274)
(cherry picked from commit 02dde7c43fe76a5dcdc170de1c2740a31629e106)
When doing a limited retransmit, allow up to 2 * MSS - 1 if the
Nagle algorithm has been disabled.
PR: 282605
Approved by: re (cperciva)
Reviewed by: cc, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D49922
(cherry picked from commit 934caaec3afc43638c2a1da8fbe3b672566db4fe)
(cherry picked from commit 0906658c3409996b26518e67df48c01052ef934c)
Clear the black box logging containing union rather than the u_bbr
structure for clarity and consistency. Currently u_bbr, u_raw, and
u64_raw are the same size.
No functional change intended.
Reviewed by: tuexen
Sponsored by: Netflix, Inc.
(cherry picked from commit 382af4d38b62675f00f64275793a6b5fccfe62fa)
beta and beta_ecn were stored using a variable of type struct newreno
in struct rack_control. Later, struct newreno was extended and now
contains several more fields.
This results in a memory inefficiency and also in copying around
uninitialized memory.
This patch fixes this by storing beta and beta_ecn individually in
struct rack_control.
Please note that the newreno_flags field was only stored and never
used. Therefore, this is not stored anymore in struct rack_control.
No functional change intended.
CID: 1523796
Reviewed by: rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D49578
(cherry picked from commit f6deb9ea0a0ee760e5ff9ad5c15d0bd7a1714355)
Remove the macros COPY_STAT and COPY_STAT_T, since they do not improve
the readability of the code. No functional change intended.
Thanks to glebius@ for suggesting the change.
Reviewed by: glebius, rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D49670
(cherry picked from commit 46fc12741e70f2d1f0510a0dacd2f9dd3aa116c0)
In general we are working towards making public headers self-contained.
cdefs.h is included for __packed; just assume that types.h includes
cdefs.h as that's a very common assumption.
PR: 285924
Reviewed by: emaste
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D49735
(cherry picked from commit 31d3a94bdda4a9ca4c4d7d4e8e8a0ba1b05c7f18)
struct tcp_log_rack is not used, therefore remove it.
Reviewed by: Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D49669
(cherry picked from commit b1c62081feec535a4f2eeb4f8deb58913d9e281c)
The sendfile black box logging struct is much smaller than the
encompassing stack specific logging union. Be sure to clear the
trailing unused memory when logging.
Reviewed by: tuexen
Sponsored by: Netflix, Inc.
(cherry picked from commit 3bd1e85fc13cb90853046300dcaa31d63b45ee21)
Initialize the fields in the tcp_log_buffer in the sequence they
appear in the structure and add the initialization of tlb_flex1,
tlb_flex2, and _pad[].
Reviewed by: rrs, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D49652
(cherry picked from commit 94acddd2ad0142221124c3fb7fe3778a5a1f8036)
Thanks to glebius@ for pointing to the problem.
Reported by: syzbot+1d5c164f1c10de84ad8a@syzkaller.appspotmail.com
Fixes: 2d5c48eccd ("sctp: Tighten up locking around sctp_aloc_assoc()")
(cherry picked from commit e8623834ca29b562687db945bdd12a3e2fe4aeb1)
These routines were all assuming that the sysctl handler has some new
value, but this is not the case. SYSCTL_IN() returns 0 in this
scenario, so they were all operating on an uninitialized address. This
is mostly harmless, but trips KMSAN checks, so let's fix them.
Reviewed by: zlei, rrs, glebius
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D49348
(cherry picked from commit 3ff865c6a7948b2cfc01d7056c619145b696700a)
If timestamps are enabled, the actions performed by a retransmission
timeout were rolled back, when they should not.
It is needed to make sure the incoming segment advances SND.UNA.
To do this, remove the incorrect upfront check and extend the check in
the fast path to handle also the case of timestamps.
PR: 282605
Reviewed by: cc, rscheff, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D49414
(cherry picked from commit fbcf3b74e8f2c0c5ba37f1839bfe9395eb2fd0b1)
The SUS doesn't mention this error code as a possible one [1]. The FreeBSD
manual page specifies a possible ECONNRESET for close(2):
[ECONNRESET] The underlying object was a stream socket that was
shut down by the peer before all pending data was
delivered.
In the past it had been EINVAL (see 21367f630d), and this EINVAL was
added as a safety measure in 623dce13c6. After conversion to
ECONNRESET it had been documented in the manual page in 78e3a7fdd5, but
I bet wasn't ever tested to actually be ever returned, cause the
tcp-testsuite[2] didn't exist back then. So documentation is incorrect
since 2006, if my bet wins. Anyway, in the modern FreeBSD the condition
described above doesn't end up with ECONNRESET error code from close(2).
The error condition is reported via SO_ERROR socket option, though. This
can be checked using the tcp-testsuite, temporarily disabling the
getsockopt(SO_ERROR) lines using sed command [3]. Most of these
getsockopt(2)s are followed by '+0.00 close(3) = 0', which will confirm
that close(2) doesn't return ECONNRESET even on a socket that has the
error stored, neither it is returned in the case described in the manual
page. The latter case is covered by multiple tests residing in tcp-
testsuite/state-event-engine/rcv-rst-*.
However, the deleted block of code could be entered in a race condition
between close(2) and processing of incoming packet, when connection had
already been half-closed with shutdown(SHUT_WR) and sits in TCPS_LAST_ACK.
This was reported in the bug 146845. With the block deleted, we will
continue into tcp_disconnect() which has proper handling of INP_DROPPED.
The race explanation follows. The connection is in TCPS_LAST_ACK. The
network input thread acquires the tcpcb lock first, sets INP_DROPPED,
acquires the socket lock in soisdisconnected() and clears SS_ISCONNECTED.
Meanwhile, the syscall thread goes through sodisconnect() which checks for
SS_ISCONNECTED locklessly(!). The check passes and the thread blocks on
the tcpcb lock in tcp_usr_disconnect(). Once input thread releases the
lock, the syscall thread observes INP_DROPPED and returns ECONNRESET.
- Thread 1: tcp_do_segment()->tcp_close()->in_pcbdrop(),soisdisconnected()
- Thread 2: sys_close()...->soclose()->sodisconnect()->tcp_usr_disconnect()
Note that the lockless operation in sodisconnect() isn't correct, but
enforcing the socket lock there will not fix the problem.
[1] https://pubs.opengroup.org/onlinepubs/9799919799/
[2] https://github.com/freebsd-net/tcp-testsuite
[3] sed -i "" -Ee '/\+0\.00 getsockopt\(3, SOL_SOCKET, SO_ERROR, \[ECONNRESET\]/d' $(grep -lr ECONNRESET tcp-testsuite)
PR: 146845
Reviewed by: tuexen, rrs, imp
Differential Revision: https://reviews.freebsd.org/D48148
(cherry picked from commit 053a988497342a6fd0a717cc097d09c23f83e103)
One variable that became critical to correctly calculate
the cwnd during limited transmit was not properly reverted
on detection of spurious timeouts.
PR: 282605
Reviewed By: cc, tuexen, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D48652
(cherry picked from commit 6f6c07813b38ab04d8b1b2bb87c0291dbae25a25)