For pppoe/ng interfaces sometimes we enter ip6_tryforward() with
a NULL pointer array and IN6_LINKMTU() glancing over the fact
that this is not a valid destination since if_afdata structure
is not initialized.
While here remove the RT_LINK_IS_UP macro since nothing outside
of nhop is using it.
This is probably a side effect generator, but fixing one spot
instead of the general case would leave other holes in the stack.
Do not return a route destination if the address families were not
yet attached.
commit 49f39043a02d6011c1907e1b07eb034652a1269c
Author: phessler <phessler@openbsd.org>
Date: Fri Apr 28 14:08:34 2023 +0000
Relax the "pass all" rule so all forms of neighbor advertisements are allowed
in either direction.
This more closely matches the IPv4 ARP behaviour.
From sashan@
discussed with kn@ deraadt@
Commit 20c4899a8e modified pf_test_eth_rule() to not acquire the
rules read lock, so pf_commit_eth() was changed to wait until the
now-inactive rules are no longer in use before freeing them. In
particular, it uses the net_epoch to schedule callbacks once the
inactive rules are no longer visible to packet processing threads.
However, since commit 812839e5aa, pf_test_eth_rule() acquires the
rules read lock, so this deferred action is unneeded. This patch
reverts a portion of 20c4899a8e such that we avoid using deferred
callbacks to free inactive rules.
The main motivation is performance: epoch_drain_callbacks() is quite
slow, especially on busy systems, and its use in the DIOCXBEGIN handler
in particular causes long stalls in relayd when reloading configuration.
Reviewed by: kp
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D48822
(cherry picked from commit 7a66b3008693ce61957e8b2a3d99829063e1e4af)
When an interface is moving to/from a vnet jail, it may still have BPF
descriptors attached. The userland (e.g. tcpdump) does not get noticed
that the interface is departing and still opens BPF descriptors thus
may result in leaking sensitive traffic (e.g. an interface is moved
back to parent jail but a user is still sniffing traffic over it in
the child jail).
Detach BPF descriptors so that the userland will be signaled.
Reviewed by: ae
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D45727
(cherry picked from commit 1ed9b381d4701fc9f66741256e93b96e22273217)
ifnet: Fix build without BPF
The newly introduced function bpf_ifdetach() is only available when
device bpf is enabled.
Fixes: 1ed9b381d470 ifnet: Detach BPF descriptors on interface vmove event
(cherry picked from commit d8413a1c3ba235a79ae6b8cc35767a861855c7e2)
if_detach_internal() never fail since change [1]. As a consequence,
also does its caller if_vmove(). While here, remove a stall comment.
No functional change intended.
This reverts commit c7bab2a7ca.
[1] a779388f8b if: Protect V_ifnet in vnet_if_return()
Reviewed by: glebius
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D48820
(cherry picked from commit bb0348a17974d83671becbd32ea0e4bd2ea61906)
d82c3e81:
net: if_media for 100BASE-BX
Renumber 1000BASE-BX and add 100BASE-BX sequentially
I added this 1000BASE-BX in 78c63ed260fa20b3500aedfe41dc0dcae9593f51 but
did not connect it to any code yet, appologize for the churn.
7835a4ad:
net: if_media fix syntax/build
Fixes: d82c3e815a5f ("net: if_media for 100BASE-BX")
(cherry picked from commit d82c3e815a5fc0069562b69145ad695f9aa183f9)
(cherry picked from commit 7835a4ad6948290c92ea55c7be34ae72f4e2b0bd)
Allow users to choose to allow permitted SCTP connections to set up additional
multihomed connections regardless of the ruleset. That is, allow an already
established connection to set up flows that would otherwise be disallowed.
In case of if-bound connections we initially set the extra associations to
be floating, because we don't know what path they'll be taking when they're
created. Once we see the first traffic we can bind them.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D48453
(cherry picked from commit e4f2733df8c9d2fd0c5e8fdc8bec002bf39811f3)
There're two possible race conditions,
1. Concurrent bpfattach() and bpf_setif(), i.e., BIOCSETIF ioctl,
2. Concurrent bpfdetach() and bpf_setif().
For the first case, userland may see BPF interface attached but it has
not been in the attached interfaces list `bpf_iflist` yet. Well it
will eventually be so this case does not matter.
For the second one, bpf_setif() may reference `dead_bpf_if` and the
kernel will panic (spotted by change [1], without the change we will
end up silently corrupted memory).
A simple fix could be that, we add additional check for `dead_bpf_if`
in the function `bpf_setif()`. But that requires to extend protection
of global lock (BPF_LOCK), i.e., BPF_LOCK should also protect the
assignment of `ifp->if_bpf`. That simple fix works but is apparently
not a good design. Since the attached interfaces list `bpf_iflist` is
the single source of truth, we look through it rather than check
against the interface's side, aka `ifp->if_bpf`.
This change has performance regression, that the cost of BPF interface
attach operation (BIOCSETIF ioctl) goes back from O(1) to O(N) (where
N is the number of BPF interfaces). Well we normally have sane amounts
of interfaces, an O(N) should be affordable.
[1] 7a974a649848 bpf: Make dead_bpf_if const
Fixes: 16d878cc99 Fix the following bpf(4) race condition ...
MFC after: 4 days
Differential Revision: https://reviews.freebsd.org/D45725
(cherry picked from commit 7def047a1ae93b3b10bd57ed1bd28e861f94b596)
This driver does not need to retrieve those tunable during early boot.
Meanwhile SYSCTL_INT can provide rich info such as description.
Also `sysctl net.link.vxlan.[legacy_port|reuse_port]` can report the
current settings.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D48621
(cherry picked from commit 92632371b360010709fad60146f1aee0b8b99776)
It is harmless but pointless to invoke vxlan_stop event handler when the
interface was not previously configured. This change will also prevent
an assert panic from t4_vxlan_stop_handler().
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D48494
(cherry picked from commit 960c5bb0f6bf44aeb09fa14fd0f82c2e82ebe2e2)
When a DCO client reconnects (e.g. on server restart) OpenVPN may create a new
socket rather than reusing the existing one. This used to be rejected because we
expect all peers to use the same socket. However, if there are no peers it's
safe to release the previous socket and install the tunnel function on the new
one.
See also: https://redmine.pfsense.org/issues/15928
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
(cherry picked from commit 3624de5394991c0cacd42d5a3b33e35c1a002e09)
Reading PR 273559 made me realize that commit 767723ddebe9 is
incomplete. iflib should set the NUMA domain of received packets before
passing them to protocol layers.
PR: 273559
Reviewed by: zlei, kbowling, erj
Fixes: 767723ddebe9 ("iflib: Use if_alloc_dev() to allocate the ifnet")
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47841
(cherry picked from commit 3d642b0f71c501dd9ee7aa0487788f619900d297)
Currently pf_get_translation() returns a pointer to a matching
nat/rdr/binat rule, or NULL if no rule was matched or an error occurred
while applying the translation. That is, we don't distinguish between
errors and the lack of a matching rule. This, if an error (e.g., a
memory allocation failure or a state conflict) occurs, we simply handle
the packet as if no translation rule was present. This is not
desireable.
Make pf_get_translation() return the matching rule as an out-param and
instead return a reason code which indicates whether there was no
translation rule, or there was a translation rule and we failed to apply
it, or there was a translation rule and we applied it successfully.
Reviewed by: kp, allanjude
MFC after: 3 months
Sponsored by: Klara, Inc.
Sponsored by: Modirum
Differential Revision: https://reviews.freebsd.org/D45672
(cherry picked from commit 7e65cfc9bbe5a9d735ef38f7ed49965b234b8a20)
If a packet is malformed, it is dropped by pf(4). The rule referenced
in pflog(4) is the default rule. As the default rule is a pass
rule, tcpdump printed "pass" although the packet was actually
dropped. Use the actual action, rather than the rule's action, or an
attempt at guessing the correct action.
Inspired by OpenBSD's 'pflog(4) logs packet dropped by default rule with block.' commit.
Sponsored by: Rubicon Communications, LLC ("Netgate")
pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed
to them. (E.g. when rejecting a packet, or when gathering up packets
for reassembly).
If the hook returns PFIL_PASS the mbuf must still be present. Assert
this in pfil_mem_common() and ensure that ipfilter follows this
convention. pf and ipfw already did.
Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf
must have been freed (or now be owned by the firewall for further
processing, like packet scheduling or reassembly).
This allows us to remove a few extraneous NULL checks.
Suggested by: tuexen
Reviewed by: tuexen, zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43617
This commit also includes the original refactoring changes
This change allows the kernel to operate with the default netisr cpu-affinity settings while having RSS compiled in. Normally, RSS changes quite a bit of the behaviour of the kernel dispatch service - this change allows for reducing impact on incompatible hardware while preserving the option to boost throughput speeds based on packet flow CPU affinity.
Make sure to compile the following options in the kernel:
options RSS
As well as setting the following sysctls:
net.inet.rss.enabled: 1
net.isr.bindthreads: 1
net.isr.maxthreads: -1 (automatically sets it to the number of CPUs)
And optionally (to force a 1:1 mapping between CPUs and buckets):
net.inet.rss.bits: 3 (for 8 CPUs)
net.inet.rss.bits: 2 (for 4 CPUs)
etc.
Set pin_default_swi to 0 by default in the RSS case.
Similar to how the network stack needs to use mb_unmapped_to_ext() to
convert mbufs before passing them to an unsupported driver, if_bridge
needs to avoid passing M_EXTPG mbufs to interfaces that don't support
them. Thus, clear IFCAP_MEXTPG on the bridge if any member interfaces
don't handle unmapped mbufs.
Approved by: re (kib)
PR: 278245
Reviewed by: jhb, gallatin
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47294
(cherry picked from commit 2bbfbf80d3bb828ac782c2d990a1fba0eb51e45a)
(cherry picked from commit 01a3c17d18)
It makes no sense to assign NULL vnet to an interface when the kernel
option VIMAGE is enabled. Add an assertion to catch that.
This will also help diagnosing problem report [1] and [2].
1. https://bugs.freebsd.org/275381
2. https://bugs.freebsd.org/282168
Reviewed by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D46982
(cherry picked from commit d1d839d0b593541174ca48c675c9eff4ddb4715e)
This driver allows only exactly one instance to be created. Clone
creating additional interfaces, e.g. `ifconfig enc1 create`, will get
error EEXIST which is somewhat confusing.
Convert to new KPI for less confusing error ENOSPC.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D45758
(cherry picked from commit eacad82f3ad0af7d74968e73ed383fc4531d1924)
This might be useful when adding bounds checks to mtod(). No functional
change intended.
MFC after: 1 week
(cherry picked from commit 5c385a54fe9ccbd3f28f20b5a025a856d229fa05)
It is declared as static. Make the definition consistent with the
declaration.
This follows 7ff9ae90f0 and partially reverts 09f6ff4f1a.
Reviewed by: erj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D46185
(cherry picked from commit 8161000892830ee52bc8048be91b40cdad25fea8)
If we fail to change the vlan id we have to undo the removal (and vlan id
change) in the error path. Otherwise we'll have removed the vlan object from the
hash table, and have the wrong vlan id as well. Subsequent modification attempts
will then try to remove an entry which doesn't exist, and panic.
Undo the vlan id modification if the insertion in the hash table fails, and
re-insert it under the original vlan id.
PR: 279195
Reviewed by: zlei
MFC atfer: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D45285
(cherry picked from commit bdd12889eaa64032b3d09ef47e9a6f7081863378)
As for the consumer `enc_add_hhooks()`, `hhook_add_hook()` will never
fail for the given parameters. Meanwhile, to build the module if_enc(4),
at least option INET or INET6 is required, so no need for the error
EPFNOSUPPORT.
No functional change intended.
Reviewed by: ae
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D46770
(cherry picked from commit 7643141e9314f1eac0d9ac08457410509e6829ad)
This ensures that the ifnet's NUMA affinity is accurate.
Reviewed by: kbowling
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D46667
(cherry picked from commit 767723ddebe9c76a2d4a45a50d9b0efc9f2f91d7)
Follow the pattern from iflib_irq_alloc_generic function and use
iflib_fast_intr as a handler for RX only interrupts.
Also remove some intermediate variables and use consistent
way for referencing queue's structures.
Signed-off-by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D46061
(cherry picked from commit 36a001923836e280e750b76947b8705fcc47b0b7)
The indentation style for the SYSCTL_* macros used was not matching KNF.
Reported by: jhb
Differential Revision: https://reviews.freebsd.org/D44811
(cherry picked from commit e4a0c92e7aea50654290e3082668932cea16b64f)
Some of the QUAD sysctls are actually for unsigned quad values.
Switch to using UQUAD instead, as that is meant for unsigned.
Reviewed by: erj, jhb
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D44620
(cherry picked from commit 303dea74c2cb3a41fba455fce8577993e637c3da)
This comment was introduced by fix [1], later the fix was refined by
change [2], and the context of the usage of `m_get2()` and `m_getjcl()`
got lost, then the comment became obscure.
Update to reflect the current behavior.
1. f13da24715 net/bpf: Fix writing of buffer bigger than PAGESIZE
2. a051ca72e2 Introduce m_get3()
Fixes: a051ca72e2 Introduce m_get3()
MFC after: 3 days
(cherry picked from commit 343bf78e487190557889c8ba53d8080b268867f7)
An interface's bpf could feasibly not exist, in which case
bpf_peers_present() would panic from a NULL pointer dereference. Solve
this by adding a new IfAPI that could deal with a NULL bpf, if such
could occur in the network stack.
Reviewed by: zlei
Sponsored by: Juniper Networks, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42082
(cherry picked from commit 8f31b879ecaf9e738dba631df4606742ee404e8e)
bpf: Prefer the boolean form when calling bpf_peers_present()
Reviewed by: markj, kp, #network
MFC with: 8f31b879ecaf
Differential Revision: https://reviews.freebsd.org/D45509
(cherry picked from commit 89204d9dcbe28558fae65936a0e93f44d926b88f)
IFF_ALLMULTI has an associated activation counter and so needs special
treatment, like IFF_PROMISC. Introduce IFF_PALLMULTI, akin to
IFF_PPROMISC, which indicates that userspace requested allmulti mode,
and handle it specially in ifhwioctl().
Reviewed by: zlei, glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D46524
(cherry picked from commit 58f194223ab8578269772a6874a8444e5e03afaf)
All uses of this function were incorrect. if_amcount is a reference
count which tracks the number of times the network stack internally set
IFF_ALLMULTI. (if_pcount is the corresponding counter for IFF_PROMISC.)
Remove if_getamcount() and fix up callers to get the number of assigned
multicast addresses instead, since that's what they actually want.
Sponsored by: Klara, Inc.
Reviewed by: zlei, glebius
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D46523
(cherry picked from commit 408c909dc64f77d2696d6fec77a2e0b00255cf96)
If pf_icmp_state_lookup() finds a state but rejects it for not matching the
expected direction we should unlock the state (and NULL out *state). This
simplifies life for callers, and also ensures there's no confusion about what a
non-NULL returned state means.
Previously it could have been left in there by the caller, resulting in callers
unlocking the same state twice.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
(cherry picked from commit 0578fe492284ded4745167060be794032e6e22f0)
Following bluhm's advice this changes the way we setup state keys and
perform state lookups for ICMPv6 Neighbor Discovery packets:
- replace the NS-dst with ND target address;
- replace the NA-src with ND target address;
- replace the NA-dst with unspecified address if it is a multicast.
This allows pf to match Address Resolution, Neighbor Unreachability
Detection and Duplicate Address Detection packets to the corresponding
states without the need to create new ones or match unrelated ones.
As a side effect we're doing now one state table lookup for ND packets
instead of two.
Fixes a bug uncovered by one of the previous commits that virtually
breaks IPv6 connectivity after few minutes of use.
ok stsp henning, with and ok bluhm
PR: 280701
MFC after: 1 week
Obtained from: OpenBSD, mikeb <mikeb@openbsd.org>, 2633ae8c4c8a
Sponsored by: Rubicon Communications, LLC ("Netgate")
(cherry picked from commit 5ab1e5f7e5585558a73b723f07528977a82cee82)