Commit graph

7397 commits

Author SHA1 Message Date
Richard Scheffenegger
66605ff791 tcp: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error.
If an error occurs while processing a TCP segment with some data and the FIN
flag, the back out of the sequence number advance does not take into account the
increase by 1 due to the FIN flag.

Reviewed By: jch, gnn, #transport, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D2970
2022-07-14 03:18:19 +02:00
Mike Karels
efe58855f3 IPv4: experimental changes to allow net 0/8, 240/4, part of 127/8
Combined changes to allow experimentation with net 0/8 (network 0),
240/4 (Experimental/"Class E"), and part of the loopback net 127/8
(all but 127.0/16).  All changes are disabled by default, and can be
enabled by the following sysctls:

    net.inet.ip.allow_net0=1
    net.inet.ip.allow_net240=1
    net.inet.ip.loopback_prefixlen=16

When enabled, the corresponding addresses can be used as normal
unicast IP addresses, both as endpoints and when forwarding.

Add descriptions of the new sysctls to inet.4.

Add <machine/param.h> to vnet.h, as CACHE_LINE_SIZE is undefined in
various C files when in.h includes vnet.h.

The proposals motivating this experimentation can be found in

    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0
    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-240
    https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-127

Reviewed by:	rgrimes, pauamma_gundo.com; previous versions melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D35741
2022-07-13 09:46:05 -05:00
Gleb Smirnoff
aeb6948d43 bbr: check proper flag for connection had been closed
An older version of D35663 slipped through final reviews.

Submitted by:	Peter Lei
Fixes:		74703901d8
2022-07-08 22:04:44 -07:00
Gleb Smirnoff
1b91978f63 tcp: remove a condition in tcp_usr_detach() that never happens
The comment from Robert Watson doubts that this condition ever happens.
Our analysis confirm that.  Also, we found that if you manage to create
such a connection with help of some other bug, then after the "second
case" code is executed, the kernel will panic in other part of the stack.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D35714
2022-07-06 21:09:45 -07:00
Mitchell Horne
258958b3c7 ddb: use _FLAGS command macros where appropriate
Some command definitions were forced to use DB_FUNC in order to specify
their required flags, CS_OWN or CS_MORE. Use the new macros to simplify
these.

Reviewed by:	markj, jhb
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35582
2022-07-05 11:56:55 -03:00
Gleb Smirnoff
d8596171c5 sockets: use only soref()/sorele() as socket reference count
o Retire SS_FDREF as it is basically a debug flag on top of already
  existing soref()/sorele().
o Convert SS_PROTOREF into soref()/sorele().
o Change reference model for the listen queues, see below.
o Make sofree() private.  The correct KPI to use is only sorele().
o Make soabort() respect the model and sorele() instead of sofree().

Note on listening queues.  Until now the sockets on a queue had zero
reference count.  And the reference were given only upon accept(2).  The
assumption was that there is no way to see the queued socket from anywhere
except its head.  This is not true, since queued sockets already have pcbs,
which are linked at least into the global pcb lists.  With this change we
put the reference right in the sonewconn() and on accept(2) path we just
hand the existing reference to the file descriptor.

Differential revision:	https://reviews.freebsd.org/D35679
2022-07-04 12:40:51 -07:00
Gleb Smirnoff
74703901d8 tcp: use a TCP flag to check if connection has been close(2)d
The flag SS_NOFDREF is a private flag of the socket layer.  It also
is supposed to be read with SOCK_LOCK(), which we don't own here.

Reviewed by:		rrs, tuexen
Differential revision:	https://reviews.freebsd.org/D35663
2022-07-04 12:40:51 -07:00
Gleb Smirnoff
ad3ad06477 blackhole(4): fix operator precedence
Fixes:	3ea9a7cf7b
2022-06-27 17:52:19 -07:00
Michael Tuexen
121ecca0d8 sctp: add KASSERTs to ensure correct handling of listeners
This was suggested by markj@.

MFC after:	3 days
2022-06-27 19:04:45 +02:00
Gleb Smirnoff
bafe71fd27 sctp: do not clobber listening socket with sockbuf operations
The problem was here since 779f106aa1, but a4fc41423f turned it
into a panic.

Reviewed by:	tuexen
Reported by:	syzcaller
2022-06-27 09:24:49 -07:00
Hans Petter Selasky
f5766992c0 tcp: Correctly compute the TCP goodput in bits per second by using SEQ_SUB().
TCP sequence number differences should be computed using SEQ_SUB().

Differential Revision:	https://reviews.freebsd.org/D35505
Reviewed by:	rscheff@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-06-23 21:10:39 +02:00
Claudio Jeker
97453e5e72 Unlock inp when handling TCP_MD5SIG socket options
Unlock the inp when hanlding TCP_MD5SIG socket options. tcp_ipsec_pcbctl
handles locking the inp when the option is being modified.

This was found by Claudio Jeker while working on the OpenBGPd port.

On 14 we get a panic when trying to call getsockopt, on 13.1 the process
locks up using 100% CPU.

Reviewed by:	rscheff (transport), tuexen
MFC after:	3 days
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D35532
2022-06-23 15:57:56 +01:00
Michael Tuexen
bf6c6162c7 tcp: fix TCPPCAP for kernels enabling VNET
Reviewed by:		rscheff
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D35503
2022-06-15 23:28:54 +02:00
Michael Tuexen
ee9ee699d6 sctp: remove book keeping not needed anymore
MFC after:	3 days
2022-06-08 23:30:52 +02:00
Michael Tuexen
ad6ae52d1c sctp: cleanup, no functional change
MFC after:	3 days
2022-06-08 22:35:14 +02:00
Richard Scheffenegger
57317c8971 tcp: exclude KASSERTS when rescue retransmissions are in play.
The KASSERT criteria needs to be checked against the
sendbuffer so_snd in a subsequent version.

Reviewed By:	tuexen, #transport
PR:		263445
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35431
2022-06-08 14:51:31 +02:00
Richard Scheffenegger
ce2525c810 tcp: remove goto and address another NULL deref in SACK
Missed another NULL dereference during KASSERTS after traversing
the scoreboard. While at it, scratch the goto by making the
traversal conditional, and remove duplicate checks using an
unconditional loop with all checks inside.

Reviewed By:	hselasky
PR:		263445
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35428
2022-06-08 09:18:32 +02:00
Richard Scheffenegger
231e0dd5d1 tcp: skip sackhole checks on NULL
Inadvertedly introduced NULL pointer dereference during
sackhole sanity check in D35387.

Reviewed By:	glebius
PR:		263445
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35423
2022-06-07 18:18:42 +02:00
Richard Scheffenegger
91d6afe6e2 tcp: Sanity check of SACK holes on retransmissions
Adding a few KASSERT() to validate sanity of sack holes, and
bail out if sack hole is inconsistent to avoid panicing non-invariant builds.

Reviewed By:	hselasky, glebius
PR:		263445
MFC after:	1 week
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D35387
2022-06-07 09:38:16 +02:00
Arseny Smalyuk
81cac3906e ipfw: add support radix tables and table lookup for MAC addresses
By analogy with IP address matching, add a way to use ipfw radix
tables for MAC matching. This is implemented using new ipfw table
with mac:radix type. Also there are src-mac and dst-mac lookup
commands added.

Usage example:
  ipfw table 1 create type mac
  ipfw table 1 add 11:22:33:44:55:66/48
  ipfw add skipto tablearg src-mac 'table(1)'
  ipfw add deny src-mac 'table(1, 100)'
  ipfw add deny lookup dst-mac 1

Note: sysctl net.link.ether.ipfw=1 should be set to enable ipfw
filtering on L2.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D35103
2022-06-04 19:12:29 +03:00
Gordon Bergling
32a01b2b86 rack: Fix a common typo in comments and a sysctl description
- s/multipler/multiplier/

MFC after:	3 days
2022-06-04 17:56:56 +02:00
Gordon Bergling
c93db89231 rack: Fix a typo in a source code comment
- s/enought/enough/

MFC after:	3 days
2022-06-04 15:32:59 +02:00
Gordon Bergling
bd9e23c0a9 rack: Fix a typo in a source code comment
- s/continous/continuous/

MFC after:	3 days
2022-06-04 13:27:29 +02:00
Michael Tuexen
a5c2009dd8 sctp: improve handling of sctp inpcb flags
Use an atomic operation when the inp is not write locked.

Reported by:	syzbot+bf27083e9a3f8fde8b4d@syzkaller.appspotmail.com
MFC after:	3 days
2022-06-04 07:38:19 +02:00
Gordon Bergling
21b923c330 tcp_rack: Fix two typos in sysctl descriptions
- s/higest/highest/

MFC after:	3 days
2022-06-04 11:24:18 +02:00
Hans Petter Selasky
28173d49dc tcp: Correctly compute the retransmit length for all 64-bit platforms.
When the TCP sequence number subtracted is greater than 2**32 minus
the window size, or 2**31 minus the window size, the use of unsigned
long as an intermediate variable, may result in an incorrect retransmit
length computation on all 64-bit platforms.

While at it create a helper macro to facilitate the computation of
the difference between two TCP sequence numbers.

Differential Revision:	https://reviews.freebsd.org/D35388
Reviewed by:	rscheff
MFC after:	3 days
Sponsored by:	NVIDIA Networking
2022-06-03 10:49:17 +02:00
Arseny Smalyuk
d18b4bec98 netinet6: Fix mbuf leak in NDP
Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.

Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35365
MFC after:	2 weeks
2022-05-31 21:06:14 +00:00
KUROSAWA Takahiro
77001f9b6d lltable: introduce the llt_post_resolved callback
In order to decrease ifdef INET/INET6s in the lltable implementation,
introduce the llt_post_resolved callback and implement protocol-dependent
code in the protocol-dependent part.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35322
MFC after:	2 weeks
2022-05-30 10:53:33 +00:00
Michael Tuexen
a6a596e102 sctp: improve handling of listen() call
Fail the listen() call for 1-to-1 style sockets when the SCTP
association has been shutdown or aborted.

Reported by:	syzbot+6c484f116b9dc88f7db1@syzkaller.appspotmail.com
MFC after:	3 days
2022-05-29 20:40:30 +02:00
Dmitry Chagin
31d1b816fe sysent: Get rid of bogus sys/sysent.h include.
Where appropriate hide sysent.h under proper condition.

MFC after:	2 weeks
2022-05-28 20:52:17 +03:00
Michael Tuexen
2646cd0858 sctp: use a consistent view of the send parameters
Reported by:	syzbot+e26628a755f78bacff16@syzkaller.appspotmail.com
MFC after:	3 days
2022-05-28 19:35:58 +02:00
Michael Tuexen
e2ceff3028 sctp: ignore SCTP_SENDALL flag on 1-to-1 style sockets
MFC after:	3 days
2022-05-28 19:07:10 +02:00
Michael Tuexen
64b297e803 sctp: improve handling of send() when association is shutdown
Accept send() calls only when the association is not being
shut down or the expicit message EOR mode is used and the
application provides follow-up data.

Reported by:	syzbot+341e9ebd9d24ca7dc62a@syzkaller.appspotmail.com
MFC after:	3 days
2022-05-28 17:40:17 +02:00
Michael Tuexen
f21168e614 sctp: cleanup of error paths
MFC after:	3 days
2022-05-28 17:15:14 +02:00
Michael Tuexen
9cb70cb476 sctp: cleanup, no functional change except on error paths
MFC after:	3 days
2022-05-28 11:34:20 +02:00
Konrad Sewiłło-Jopek
c9a5c48ae8 arp: Implement sticky ARP mode for interfaces.
Provide sticky ARP flag for network interface which marks it as the
"sticky" one similarly to what we have for bridges. Once interface is
marked sticky, any address resolved using the ARP will be saved as a
static one in the ARP table. Such functionality may be used to prevent
ARP spoofing or to decrease latencies in Ethernet networks.

The drawbacks include potential limitations in usage of ARP-based
load-balancers and high-availability solutions such as carp(4).

The implemented option is disabled by default, therefore should not
impact the default behaviour of the networking stack.

Sponsored by:		Conclusive Engineering sp. z o.o.
Reviewed By:		melifaro, pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D35314
MFC after:		2 weeks
2022-05-27 12:41:30 +00:00
Michael Tuexen
5cebd8305a sctp: more sb_cc related cleanups
No functional change intended. It allows a simpler patch for PR 260116.

MFC after:	3 days
2022-05-23 16:09:23 +02:00
Gleb Smirnoff
b46667c63e sockbuf: merge two versions of sbcreatecontrol() into one
No functional change.
2022-05-17 10:10:42 -07:00
Michael Tuexen
edc5b6ea88 sctp: use sb_avail() when accessing sb_acc for reading
This is a cleanup to simplify a patch for PR 260116.

PR:		260116
MFC after:	3 days
2022-05-14 12:38:43 +02:00
Michael Tuexen
f210e4fbc5 sctp: cleanup, no functional change intended
MFC after:	3 days
2022-05-14 08:30:41 +02:00
Michael Tuexen
aab6e5bd1e sctp: improve path verification
Ensure that a HB can be sent faster than a HB.Interval when performing
path verification of a reachable peer address.

Thanks to Alexander Funke for finding the issue and proposing a fix.

MFC after:	3 days
2022-05-14 08:07:28 +02:00
Michael Tuexen
9312ba239e sctp: improve path verification
When sending path confirmation heartbeats, do not take HB.interval
into account when the path is still reachable.

Thanks to Alexander Funke for finding the issue and suggesting a fix.

MFC after:	3 days
2022-05-14 08:05:03 +02:00
Michael Tuexen
9b2a35b3a9 sctp: improve consistency
No functional change intended.

MFC after:	3 days
2022-05-14 06:28:19 +02:00
Mitchell Horne
38a36057ae netdump: check the support status of the interface
If the interface does not support debugnet(4) we should bail early,
rather than having the user find this out at the time of the panic.
dumpon(8) already expects this return value and will print a helpful
error message.

Reviewed by:	cem, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35180
2022-05-14 10:27:53 -03:00
Gleb Smirnoff
808b7d80e0 mbuf: remove PH_vt alias for mbuf packet header persistent shared data
Mechanical sed change s/PH_vt\.vt_nrecs/vt_nrecs/g
2022-05-13 13:32:43 -07:00
Mitchell Horne
489ba22236 kerneldump: remove physical argument from d_dumper
The physical address argument is essentially ignored by every dumper
method. In addition, the dump routines don't actually pass a real
address; every call to dump_append() passes a value of zero for
physical.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D35173
2022-05-13 10:42:48 -03:00
Gleb Smirnoff
4581cffb3d sockets: fix build, convert missed sbreserve_locked() calls
Fixes:	4328318445
2022-05-12 14:29:19 -07:00
Gleb Smirnoff
4328318445 sockets: use socket buffer mutexes in struct socket directly
Since c67f3b8b78 the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it.  In 74a68313b5 macros that access
this mutex directly were added.  Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.

This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.

This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally.  However, it allows operation of a
protocol that doesn't use them.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35152
2022-05-12 13:22:12 -07:00
Randall Stewart
04831efd9f tcp: Rack idle reduce not working.
Rack converted to micro-seconds quite some time ago, but in testing
we have found a miss in that work. The idle reduce time is still based
in ticks, so it must be converted to microseconds before any comparisons
else you will likely not do idle reduce.

Reviewed by: tuexen, thj
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D35066
2022-05-10 09:46:05 -04:00
Kristof Provost
017e7d0390 in_rss: fix set but not used warning
If 'options RSS' is set.

MFC after:	1 week
Sponsored by:	Orange Business Services
2022-05-07 18:17:33 +02:00