Commit graph

6057 commits

Author SHA1 Message Date
Navdeep Parhar
32d2623ae2 Add the ability to look up the 3b PCP of a VLAN interface. Use it in
toe_l2_resolve to fill up the complete vtag and not just the vid.

Reviewed by:	kib@
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D16752
2018-08-16 23:46:38 +00:00
Matt Macy
f9be038601 Fix in6_multi double free
This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
  - add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
  goes to zero OR when the interface goes away
  - decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
  - add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
  door wider still to races
  - call inp_gcmoptions synchronously after dropping the the inpcb lock

Fundamentally multicast needs a rewrite - but keep applying band-aids for now.

Tested by: kp
Reported by: novel, kp, lwhsu
2018-08-15 20:23:08 +00:00
Luiz Otavio O Souza
59b2022f94 Late style follow up on r312770.
Submitted by:	glebius
X-MFC with:	r312770
MFC after:	3 days
2018-08-15 15:44:30 +00:00
Jonathan T. Looney
a967df1c8f Lower the default limits on the IPv4 reassembly queue.
In particular, try to ensure that no bucket will have a reassembly
queue larger than approximately 100 items. This limits the cost to
find the correct reassembly queue when processing an incoming
fragment.

Due to the low limits on each bucket's length, increase the size of
the hash table from 64 to 1024.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:30:46 +00:00
Jonathan T. Looney
ff790bbad0 Implement a limit on on the number of IPv4 reassembly queues per bucket.
There is a hashing algorithm which should distribute IPv4 reassembly
queues across the available buckets in a relatively even way. However,
if there is a flaw in the hashing algorithm which allows a large number
of IPv4 fragment reassembly queues to end up in a single bucket, a per-
bucket limit could help mitigate the performance impact of this flaw.

Implement such a limit, with a default of twice the maximum number of
reassembly queues divided by the number of buckets. Recalculate the
limit any time the maximum number of reassembly queues changes.
However, allow the user to override the value using a sysctl
(net.inet.ip.maxfragbucketsize).

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:23:05 +00:00
Jonathan T. Looney
7b9c5eb0a5 Add a global limit on the number of IPv4 fragments.
The IP reassembly fragment limit is based on the number of mbuf clusters,
which are a global resource. However, the limit is currently applied
on a per-VNET basis. Given enough VNETs (or given sufficient customization
of enough VNETs), it is possible that the sum of all the VNET limits
will exceed the number of mbuf clusters available in the system.

Given the fact that the fragment limit is intended (at least in part) to
regulate access to a global resource, the fragment limit should
be applied on a global basis.

VNET-specific limits can be adjusted by modifying the
net.inet.ip.maxfragpackets and net.inet.ip.maxfragsperpacket
sysctls.

To disable fragment reassembly globally, set net.inet.ip.maxfrags to 0.
To disable fragment reassembly for a particular VNET, set
net.inet.ip.maxfragpackets to 0.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:19:49 +00:00
Jonathan T. Looney
5d9bd45518 Improve hashing of IPv4 fragments.
Currently, IPv4 fragments are hashed into buckets based on a 32-bit
key which is calculated by (src_ip ^ ip_id) and combined with a random
seed. However, because an attacker can control the values of src_ip
and ip_id, it is possible to construct an attack which causes very
deep chains to form in a given bucket.

To ensure more uniform distribution (and lower predictability for
an attacker), calculate the hash based on a key which includes all
the fields we use to identify a reassembly queue (dst_ip, src_ip,
ip_id, and the ip protocol) as well as a random seed.

Reviewed by:	jhb
Security:	FreeBSD-SA-18:10.ip
Security:	CVE-2018-6923
2018-08-14 17:15:47 +00:00
Michael Tuexen
0f1346f7f4 Remove a set but not used warning showing up in usrsctp. 2018-08-14 08:32:33 +00:00
Andrey V. Elsukov
62484790e0 Restore ability to send ICMP and ICMPv6 redirects.
It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled
only when sending redirects for corresponding IP version is disabled via
sysctl. Otherwise will be used default forwarding function.

PR:		221137
Submitted by:	mckay@
MFC after:	2 weeks
2018-08-14 07:54:14 +00:00
Michael Tuexen
839d21d62e Use the stacb instead of the asoc in state macros.
This is not a functional change. Just a preparation for upcoming
dtrace state change provider support.
2018-08-13 13:58:45 +00:00
Michael Tuexen
61a2188021 Use consistently the macors to modify the assoc state.
No functional change.
2018-08-13 11:56:21 +00:00
Michael Tuexen
812649d86f Add explicit cast to silence a warning for the userland stack.
Thanks to Felix Weinrank for providing the patch.
2018-08-12 14:05:15 +00:00
Devin Teske
ab9ed8a1bd Fix misspellings of transmitter/transmitted
Reviewed by:	emaste, bcr
Sponsored by:	Smule, Inc.
Differential Revision:	https://reviews.freebsd.org/D16025
2018-08-10 20:37:32 +00:00
Andrey V. Elsukov
16bbf600d9 Remove unneeded ipsec-related includes.
Reviewed by:	rrs
Differential Revision:	https://reviews.freebsd.org/D16637
2018-08-10 07:24:01 +00:00
Leandro Lupori
c8e2123b6a [ppc] Fix kernel panic when using BOOTP_NFSROOT
On PowerPC (and possibly other architectures), that doesn't use
EARLY_AP_STARTUP, the config task queue may be used initialized.
This was observed while trying to mount the root fs from NFS, as
reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230168.

This patch has 2 main changes:
1- Perform a basic initialization of qgroup_config, similar to
what is done in taskqgroup_adjust, but simpler.
This makes qgroup_config ready to be used during NFS root mount.

2- When EARLY_AP_STARTUP is not used, call inm_init() and
in6m_init() right before SI_SUB_ROOT_CONF, because bootp needs
to send multicast packages to request an IP.

PR:		Bug 230168
Reported by:	sbruno
Reviewed by:	jhibbits, mmacy, sbruno
Approved by:	jhibbits
Differential Revision:	D16633
2018-08-09 14:04:51 +00:00
Randall Stewart
d18ea344e6 Fix a small bug in rack where it will
end up sending the FIN twice.
Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D16604
2018-08-08 13:36:49 +00:00
Jonathan T. Looney
95a914f631 Address concerns about CPU usage while doing TCP reassembly.
Currently, the per-queue limit is a function of the receive buffer
size and the MSS.  In certain cases (such as connections with large
receive buffers), the per-queue segment limit can be quite large.
Because we process segments as a linked list, large queues may not
perform acceptably.

The better long-term solution is to make the queue more efficient.
But, in the short-term, we can provide a way for a system
administrator to set the maximum queue size.

We set the default queue limit to 100.  This is an effort to balance
performance with a sane resource limit.  Depending on their
environment, goals, etc., an administrator may choose to modify this
limit in either direction.

Reviewed by:	jhb
Approved by:	so
Security:	FreeBSD-SA-18:08.tcp
Security:	CVE-2018-6922
2018-08-06 17:36:57 +00:00
Randall Stewart
936b2b64ae This fixes a bug in Rack where we were
not properly using the correct value for
Delayed Ack.

Sponsored by:	Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D16579
2018-08-06 09:22:07 +00:00
Gleb Smirnoff
cc7963191d Now that after r335979 the kernel addresses in API structures are
fixed size, there is no reason left for the unions.

Discussed with:	brooks
2018-08-04 00:03:21 +00:00
Michael Tuexen
7bda966394 Add a dtrace provider for UDP-Lite.
The dtrace provider for UDP-Lite is modeled after the UDP provider.
This fixes the bug that UDP-Lite packets were triggering the UDP
provider.
Thanks to dteske@ for providing the dwatch module.

Reviewed by:		dteske@, markj@, rrs@
Relnotes:		yes
Differential Revision:	https://reviews.freebsd.org/D16377
2018-07-31 22:56:03 +00:00
Michael Tuexen
51e08d53ae Fix INET only builds.
r336940 introduced an "unused variable" warning on platforms which
support INET, but not INET6, like MALTA and MALTA64 as reported
by Mark Millard. Improve the #ifdefs to address this issue.

Sponsored by:		Netflix, Inc.
2018-07-31 06:27:05 +00:00
Michael Tuexen
888973f5ae Allow implicit TCP connection setup for TCP/IPv6.
TCP/IPv4 allows an implicit connection setup using sendto(), which
is used for TTCP and TCP fast open. This patch adds support for
TCP/IPv6.
While there, improve some tests for detecting multicast addresses,
which are mapped.

Reviewed by:		bz@, kbowling@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16458
2018-07-30 21:27:26 +00:00
Michael Tuexen
e2662978b8 Send consistent SEG.WIN when using timewait codepath for TCP.
When sending TCP segments from the timewait code path, a stored
value of the last sent window is used. Use the same code for
computing this in the timewait code path as in the main code
path used in tcp_output() to avoiv inconsistencies.

Reviewed by:		rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16503
2018-07-30 21:13:42 +00:00
Michael Tuexen
8db239dc6b Fix some TCP fast open issues.
The following issues are fixed:
* Whenever a TCP server with TCP fast open enabled, calls accept(),
  recv(), send(), and close() before the TCP-ACK segment has been received,
  the TCP connection is just dropped and the reception of the TCP-ACK
  segment triggers the sending of a TCP-RST segment.
* Whenever a TCP server with TCP fast open enabled, calls accept(), recv(),
  send(), send(), and close() before the TCP-ACK segment has been received,
  the first byte provided in the second send call is not transferred.
* Whenever a TCP client with TCP fast open enabled calls sendto() followed
  by close() the TCP connection is just dropped.

Reviewed by:		jtl@, kbowling@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16485
2018-07-30 20:35:50 +00:00
Michael Tuexen
6138da62a9 Add missing send/recv dtrace probes for TCP.
These missing probe are mostly in the syncache and timewait code.

Reviewed by:		markj@, rrs@
MFC after:		1 month
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16369
2018-07-30 20:13:38 +00:00
Alan Somers
6040822c4e Make timespecadd(3) and friends public
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub.  NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions.  Solaris also defines a three-argument version, but
only in its kernel.  This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with:	cem, jilles, ian, bde
Differential Revision:	https://reviews.freebsd.org/D14725
2018-07-30 15:46:40 +00:00
Randall Stewart
4ad5b7a0ac This fixes a hole where rack could end up
sending an invalid segment into the reassembly
queue. This would happen if you enabled the
data after close option.

Sponsored by:	Netflix
Differential Revision: https://reviews.freebsd.org/D16453
2018-07-30 10:23:29 +00:00
Andrew Turner
1e0582fd55 icmp_quotelen was accidentially changes in r336676, undo this.
Sponsored by:	DARPA, AFRL
2018-07-24 16:45:01 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Randall Stewart
399973c33d Delete the example tcp stack "fastpath" which
was only put in has an example.

Sponsored by:	Netflix inc.
Differential Revision:	https://reviews.freebsd.org/D16420
2018-07-24 14:55:47 +00:00
Matt Macy
e5e3e746fe Fix a potential use after free in getsockopt() access to inp_options
Discussed with: jhb
Reviewed by:	sbruno, transport
MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14621
2018-07-22 20:02:14 +00:00
Matt Macy
2269988749 NULL out cc_data in pluggable TCP {cc}_cb_destroy
When ABE was added (rS331214) to NewReno and leak fixed (rS333699) , it now has
a destructor (newreno_cb_destroy) for per connection state. Other congestion
controls may allocate and free cc_data on entry and exit, but the field is
never explicitly NULLed if moving back to NewReno which only internally
allocates stateful data (no entry contstructor) resulting in a situation where
newreno_cb_destory might be called on a junk pointer.

 -    NULL out cc_data in the framework after calling {cc}_cb_destroy
 -    free(9) checks for NULL so there is no need to perform not NULL checks
     before calling free.
 -    Improve a comment about NewReno in tcp_ccalgounload

This is the result of a debugging session from Jason Wolfe, Jason Eggleston,
and mmacy@ and very helpful insight from lstewart@.

Submitted by: Kevin Bowling
Reviewed by: lstewart
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16282
2018-07-22 05:37:58 +00:00
Michael Tuexen
34fc9072ce Set the IPv4 version in the IP header for UDP and UDPLite. 2018-07-21 02:14:13 +00:00
Michael Tuexen
e1526d5a5b Add missing dtrace probes for received UDP packets.
Fire UDP receive probes when a packet is received and there is no
endpoint consuming it. Fire the probe also if the TTL of the
received packet is smaller than the minimum required by the endpoint.

Clarify also in the man page, when the probe fires.

Reviewed by:		dteske@, markj@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16046
2018-07-20 15:32:20 +00:00
Michael Tuexen
0053ed28ff Whitespace changes due to changes in ident. 2018-07-19 20:16:33 +00:00
Michael Tuexen
b0471b4b95 Revert https://svnweb.freebsd.org/changeset/base/336503
since I also ran the export script with different parameters.
2018-07-19 20:11:14 +00:00
Michael Tuexen
7679e49dd4 Whitespace changes due to change if ident. 2018-07-19 19:33:42 +00:00
Randall Stewart
8de9ac5eec Bump the ICMP echo limits to match the RFC
Reviewed by:	tuexen
Sponsored by: Netflix Inc.
Differential Revision:		https://reviews.freebsd.org/D16333
2018-07-18 22:49:53 +00:00
Andrey V. Elsukov
acf673edf0 Move invoking of callout_stop(&lle->lle_timer) into llentry_free().
This deduplicates the code a bit, and also implicitly adds missing
callout_stop() to in[6]_lltable_delete_entry() functions.

PR:		209682, 225927
Submitted by:	hselasky (previous version)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D4605
2018-07-17 11:33:23 +00:00
Sean Bruno
c8b1bdc31c There was quite a bit of feedback on r336282 that has led to the
submitter to want to revert it.
2018-07-14 23:53:51 +00:00
Sean Bruno
179a28b098 Fixup memory management for fetching options in ip_ctloutput()
Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14621
2018-07-14 16:19:46 +00:00
Mark Johnston
aaf268f9f6 Remove a duplicate check.
PR:		229663
Submitted by:	David Binderman <dcb314@hotmail.com>
MFC after:	3 days
2018-07-11 14:54:56 +00:00
Brooks Davis
3a20f06a1c Use uintptr_t alone when assigning to kvaddr_t variables.
Suggested by:	jhb
2018-07-10 13:03:06 +00:00
Michael Tuexen
c9da58534d Add support for printing the TCP FO client-side cookie cache via the
sysctl interface. This is similar to the TCP host cache.

Reviewed by:		pkelsey@, kbowling@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D14554
2018-07-10 10:50:43 +00:00
Michael Tuexen
a026a53a76 Use appropriate MSS value when populating the TCP FO client cookie cache
When a client receives a SYN-ACK segment with a TFP fast open cookie,
but without an MSS option, an MSS value from uninitialised stack memory is used.
This patch ensures that in case no MSS option is included in the SYN-ACK,
the appropriate value as given in RFC 7413 is used.

Reviewed by:		kbowling@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16175
2018-07-10 10:42:48 +00:00
Steven Hartland
65c3a353e6 Removed pointless NULL check
Removed pointless NULL check after malloc with M_WAITOK which can never
return NULL.

Sponsored by:	Multiplay
2018-07-10 08:05:32 +00:00
Andrey V. Elsukov
f7c4fdee1a Add "record-state", "set-limit" and "defer-action" rule options to ipfw.
"record-state" is similar to "keep-state", but it doesn't produce implicit
O_PROBE_STATE opcode in a rule. "set-limit" is like "limit", but it has the
same feature as "record-state", it is single opcode without implicit
O_PROBE_STATE opcode. "defer-action" is targeted to be used with dynamic
states. When rule with this opcode is matched, the rule's action will
not be executed, instead dynamic state will be created. And when this
state will be matched by "check-state", then rule action will be executed.
This allows create a more complicated rulesets.

Submitted by:	lev
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D1776
2018-07-09 11:35:18 +00:00
Michael Tuexen
5f1347d7c9 Allow alternate TCP stack to populate the TCP FO client cookie
cache.

Without this patch, TCP FO could be used when using alternate
TCP stack, but only existing entires in the TCP client cookie
cache could be used. This cache was not populated by connections
using alternate TCP stacks.

Sponsored by:		Netflix, Inc.
2018-07-07 12:28:16 +00:00
Michael Tuexen
c556884f8e When initializing the TCP FO client cookie cache, take into account
whether the TCP FO support is enabled or not for the client side.

The code in tcp_fastopen_init() implicitly assumed that the sysctl
variable V_tcp_fastopen_client_enable was initialized to 0. This
was initially true, but was changed in r335610, which unmasked this
bug.

Thanks to Pieter de Goeje for reporting the issue on freebsd-net@
2018-07-07 11:18:26 +00:00
Brooks Davis
5c5e39e3d5 One more 32-bit fix for r335979.
Reported by:	tuexen
2018-07-06 13:34:45 +00:00