For pppoe/ng interfaces sometimes we enter ip6_tryforward() with
a NULL pointer array and IN6_LINKMTU() glancing over the fact
that this is not a valid destination since if_afdata structure
is not initialized.
While here remove the RT_LINK_IS_UP macro since nothing outside
of nhop is using it.
This is probably a side effect generator, but fixing one spot
instead of the general case would leave other holes in the stack.
Do not return a route destination if the address families were not
yet attached.
Just like we already do for IPv6 set the PFIL_FWD flag when we're forwarding
IPv4 traffic. This allows firewalls to make more precise decisions.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D48824
garp_rexmit() is a callback, so is not in net_epoch, which
arprequest_internal() expects.
Enter and exit the net_epoch.
PR: 284073
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
(cherry picked from commit 38fdcca05d09b4d5426a253d3c484f9481a73ac2)
To comply with LINCE certification, it's necessary to ensure that
packets to 0.0.0.0/::0 are dropped and logged by the firewall. Such
packets are dropped by ip_input() and ip6_input() before reaching pfil
hooks; reorder the checks to give firewalls a chance to drop the packets
themselves, as this gives better observability.
Note that ip_forward() and ip6_forward() ensure that such packets are
not forwarded; they are passed back unmodified.
pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed
to them. (E.g. when rejecting a packet, or when gathering up packets
for reassembly).
If the hook returns PFIL_PASS the mbuf must still be present. Assert
this in pfil_mem_common() and ensure that ipfilter follows this
convention. pf and ipfw already did.
Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf
must have been freed (or now be owned by the firewall for further
processing, like packet scheduling or reassembly).
This allows us to remove a few extraneous NULL checks.
Suggested by: tuexen
Reviewed by: tuexen, zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43617
This patch provides UDP encapsulation of ESP packets over IPv6.
Ports the IPv4 code to IPv6 and adds support for IPv6 in udpencap.c
As required by the RFC and unlike in IPv4 encapsulation,
UDP checksums are calculated.
Co-authored-by: Aurelien Cazuc <aurelien.cazuc.external@stormshield.eu>
Sponsored-by: Stormshield
Sponsored-by: Wiktel
Sponsored-by: Klara, Inc.
Fix KASSERT in 80044c78 causing build failures
Move the KASSERT to where struct ip6_hdr is populated
Fixes: 80044c785cb040a2cf73779d23f9e1e81a00c6c3
Reported-by: bapt
Reviewed-by: markj
Sponsored-by: Klara, Inc.
This removes the if_output calls in the pf(4) code that escape further
processing by defering the forwarding execution to the network stack
using on/off style sysctls for both IPv4 and IPv6.
Also see: https://reviews.freebsd.org/D8877
This commit also includes the original refactoring changes
This change allows the kernel to operate with the default netisr cpu-affinity settings while having RSS compiled in. Normally, RSS changes quite a bit of the behaviour of the kernel dispatch service - this change allows for reducing impact on incompatible hardware while preserving the option to boost throughput speeds based on packet flow CPU affinity.
Make sure to compile the following options in the kernel:
options RSS
As well as setting the following sysctls:
net.inet.rss.enabled: 1
net.isr.bindthreads: 1
net.isr.maxthreads: -1 (automatically sets it to the number of CPUs)
And optionally (to force a 1:1 mapping between CPUs and buckets):
net.inet.rss.bits: 3 (for 8 CPUs)
net.inet.rss.bits: 2 (for 4 CPUs)
etc.
Set pin_default_swi to 0 by default in the RSS case.
Based on a patch originally found in m0n0wall, expanded
to IPv6 and aligned with FreeBSD's IP input path.
The limit may not be correctly accounted for on the WAN
interface due to dummynet counting the packet again even
though it was already processed.
The problem here is that there's no proper way to reinject
the packet at the point where it was previously removed
from so we make the assumption that ip input was already
done (including pfil) and more or less directly move to
packet output processing.
While here move the passin label up to take the extra check
but avoiding a second label. Also remove the spurious tag
read for forward check since we don't use it and we should
really trust the mbuf flag.
(cherry picked from commit 518a1163d0aa73b26da1dd1a4bb186042ea3c66e)
(cherry picked from commit 0e8faabc270f89fbc54bbc118b2ebe2a38364375)
Approved by: re (cperviva)
Identify interfaces consistenly by the pair of the ifn pointer
and the index.
This avoids a use after free when the ifn and or index was reused.
Reported by: bz, pho, and others
(cherry picked from commit 523913c94371ab50a8129cbab820394d25f7a269)
(cherry picked from commit 331db93815afb49b01f269aeff0fe899acd47455)
Approved by: re (cperviva)
(cherry picked from commit 470a63cde4285ea4a317b0bba966514c11f4ed5b)
(cherry picked from commit e3f26ce52b71d4005e666ced22c0855dbc70b28e)
Approved by: re (cperviva)
Checking the interface name can not be done consistently, so
don't do it.
(cherry picked from commit bf11fdaf0d095fecca61fa8b457d06e27fae5946)
(cherry picked from commit 66628552a38751ed5c395858d1754660557674cd)
Approved by: re (cperviva)
Improve consistency, no functional change intended.
(cherry picked from commit d839cf2fbb47c52d5153fb366c51bd6f6a3dd0fd)
(cherry picked from commit 107704217b)
Approved by: re (cperviva)
Actually assert the locking instead of describing it in a comment.
No functional change intended.
(cherry picked from commit 4466a97e83fd9484cb22dd2867b6972f6b185e8b)
The address lock is always held, so no need for the second
parameter.
No functional change intended.
(cherry picked from commit 2e9761eb80f3e58c116efc10c739ed0d8497c1d6)
When the sysctl-variable net.inet.ip.accept_sourceroute is non-zero,
an mbuf would be leaked when processing a SYN-segment containing an
IPv4 strict or loose source routing option, when the on-stack
syncache entry is used or there is an error related to processing
TCP MD5 options.
Fix this by freeing the mbuf whenever an error occurred or the
on-stack syncache entry is used.
Reviewed by: markj, rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46839
(cherry picked from commit 01eb635d12953e24ee5fae69692c28e4aab4f0f6)
Don't leak a reference count for so->so_cred when processing an
incoming SYN segment with an on-stack syncache entry and the
sysctl variable net.inet.tcp.syncache.see_other is false.
Reviewed by: cc, markj, rscheff
Sponsored by: Netflix, Inc.
Pull Request: https://reviews.freebsd.org/D46793
(cherry picked from commit cbc9438f0505bd971e9eba635afdae38a267d76e)
Don't leak a maclabel when SYN segments are processed which results
in an error due to MD5 signature handling.
Tweak the #idef MAC to allow additional upcoming changes.
Reviewed by: markj
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46766
(cherry picked from commit 78e1b031d2e8ef0e1cbc8874891f5476dc7868bc)
tcp_lro_flush() is not used anymore outside of tcp_lro.c. Therefore
make it static.
Reviewed by: rscheff, glebius, Peter Lei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46435
(cherry picked from commit e06cf0fc5dd626c34acdef308b696b4995371a4b)
When the initial sending of the SYN ACK segment using
syncache_respond() fails, it is handled as a permanent error.
To improve consistency, apply this policy in all cases, where
syncache_respond() is called. These include
* timer based retransmissions of the SYN ACK
* retransmitting a SYN ACK in response to a SYN retransmission
* sending of challenge ACKs in response to received RST segments
In these cases, fall back to SYN cookies, if enabled.
While there, also improve consistency of the TCP stats counters.
Reviewed by: cc, glebius (earlier version)
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46428
(cherry picked from commit ef438f7706be48f1cf7fd4c8a60329e1619cfe30)
Do not report an error, if it is stored as a soft error. This avoids,
for example, the dropping of TCP connections using an interface,
while enabling or disabling LRO on that interface.
Reviewed by: cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46427
(cherry picked from commit b2044c4557443bbce974101f04e2b465d1bbe769)
When snd_nxt doesn't track snd_max, partial SACK ACKs may elicit
unexpected duplicate retransmissions. This is usually masked by
LRO not necessarily ACKing every individual segment, and prior
to RFC6675 SACK loss recovery, harder to trigger even when an
RTO happens while SACK loss recovery is ongoing.
Address this by improving the logic when to start a SACK loss recovery
and how to deal with a RTO, as well as improvements to the adjusted
congestion window during transmission selection.
Reviewed By: tuexen, cc, #transport
Sponsored by: NetApp, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43355
(cherry picked from commit 440f4ba18e3ab7be912858bbcb96a419fcf14809)
While processing the ECN flags of an incoming packet,
incorrectly cleared all other syncache flags.
Reported by: tuexen
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D46694
(cherry picked from commit 0a05ea1f56e65ec0477d56daf5ed623087464082)
If the V_connect_ifaddr_wild sysctl says that we shouldn't infer a
destination address, return an error. Otherwise it's possible for use
of an unspecified foreign address to trigger a subsequent assertion
failure, for example in in_pcblookup_hash_locked().
Similarly, if no interface addresses are assigned, fail quickly upon an
attempt to connect to the unspecified address.
Reported by: Shawn Webb <shawn.webb@hardenedbsd.org>
MFC after: 2 weeks
Reviewed by: zlei, allanjude, emaste
Differential Revision: https://reviews.freebsd.org/D46454
(cherry picked from commit 0c605af3f9d9e66be6af0a3bbc36dbedc5dfe516)
See the discussion in Bugzilla PR 280705 for context.
PR: 280705
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D46259
(cherry picked from commit 417b35a97b7669eb0bf417b43e97cccbedbce6f9)
The format for CTLTYPE_UINT is "IU" instead of "UI" as specified
in sysctl.9.
Reviewed by: cc, zlei
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46408
(cherry picked from commit 498286d4e807d6b9e4caad22b96ebca7f16e9b18)
Originally, a SYN-cache entry was always allocated and later freed,
when not needed anymore. Then the allocation was avoided, when no
SYN-cache entry was needed, and a copy on the stack was used.
But the logic regarding freeing was not updated.
This patch doesn't re-check conditions (which may have changed) when
deciding to insert or free the entry, but uses the result of
the earlier check.
This simplifies the code and improves also consistency.
Reviewed by: glebius
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46410
(cherry picked from commit e41364711ca3f7e214f9607ebedf62e03e51633d)
There will at most lro_entries entries in the LRO hash table. So no
need to take lro_mbufs into account, which only results in the
LRO hash table being too large and therefore wasting memory.
Reviewed by: rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46378
(cherry picked from commit aa6c490bf80fcef15cfc0d3f562fae19ef2375aa)
Use LIST_FOREACH_SAFE(), since the list element is removed from
the list in the loop body, zero out and inserted in the free list.
Reviewed by: rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46383
(cherry picked from commit 64443828bbe7c571db8d8731758ec8c4b8364c86)
When we release a multicast address (e.g. on interface shutdown) we may
still have packets queued in inm_scq. We have to free those, or we'll
leak memory.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43033
(cherry picked from commit c2e340452c147b551180f2a1600ae76491342b0e)
This reverts commit fa03d37432caf17d56a931a9e6f5d9b06f102c5b.
This commit caused us to not send IGMP leave messages if the inpcb went
away. In other words: we freed pending packets whenever the socket
closed rather than when the interface (or address) goes away.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43032
(cherry picked from commit c196e43243b83840cc9f3d1dadc7dacb3b0f556f)
sys/netinet/libalias/alias_db.c has internal static function UseLink()
that passes a link to CleanupLink() to verify if the link has expired.
If so, UseLink() may return NULL.
_FindLinkIn()'s usage of UseLink() is not quite correct.
Assume there is "redirect_port udp" configured to forward incoming
traffic for specific port to some internal address.
Such a rule creates partially specified permanent link.
After first such incoming packet libalias creates new fully specifiled
temporary LINK_UDP with default timeout of 60 seconds.
Also, in case of low traffic libalias may assign "timestamp"
for this new temporary link way in the past because
LibAliasTime is updated seldom and can keep old value
for tens of seconds, and it will be used for the temporary link.
It may happen that next incoming packet for redirected port
passed to _FindLinkIn() results in a call to UseLink()
that returns NULL due to detected expiration.
Immediate return of NULL results in broken translation:
either a packet is dropped (deny_incoming mode) or delivered to
original destination address instead of internal one.
Fix it with additional check for NULL to proceed with a search
for original partially specified link. In case of UDP,
it also recreates temporary fully specified link
with a call to ReLink().
Practical examples are "redirect_port udp" rules for unidirectional
SYSLOG protocol (port 514) or some low volume VPN encapsulated in UDP.
Thanks to Peter Much for initial analysis and first version of a patch.
Reported by: Peter Much <pmc@citylink.dinoex.sub.org>
PR: 269770
(cherry picked from commit 8132e959099f0c533f698d8fbc17386f9144432f)
(cherry picked from commit e5b85380836378c9e321a4e6d300591e6faf622a)
Update the ddb printing of t_flags and t_flags2 to the current state of
definitions in tcp_var.h.
Reviewed by: cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46222
(cherry picked from commit 093d9b46f4720392e53c171eaabfd7a6a8101170)
Update the ddb printing of t_flags and t_flags2 to the current state of
definitions in tcp_var.h.
Reviewed by: cc
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D46222
(cherry picked from commit 093d9b46f4720392e53c171eaabfd7a6a8101170)