opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-04-23 15:19:31 -04:00

Author	SHA1	Message	Date
Richard Scheffenegger	a743fc8826	tcp: fix cwnd restricted SACK retransmission loop While doing the initial SACK retransmission segment while heavily cwnd constrained, tcp_ouput can erroneously send out the entire sendbuffer again. This may happen after an retransmission timeout, which resets snd_nxt to snd_una while the SACK scoreboard is still populated. Reviewed By: tuexen, #transport PR: 264257 PR: 263445 PR: 260393 MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36637	2022-09-22 13:28:43 +02:00
Michael Tuexen	5ae83e0d87	tcp: send ACKs when requested When doing Limited Transmit send an ACK when needed by the protocol processing (like sending ACKs with a DSACK block). PR: 264257 PR: 263445 PR: 260393 Reviewed by: rscheff@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36631	2022-09-22 12:12:11 +02:00
Gleb Smirnoff	9453ec6619	tcp: increment tcpstats in tcp_respond() tcp_respond() crafts a packet and sends it directly to ip[6]output(), bypassing tcp_output(). Hence it must increment TCP send statistics. Reviewed by: rscheff, tuexen, rrs (implicitly) Differential revision: https://reviews.freebsd.org/D36641	2022-09-21 14:03:33 -07:00
Gleb Smirnoff	493105c2a8	tcp: fix simultaneous open and refine `e80062a2d4` - The soisconnected() call on transition from SYN_RCVD to ESTABLISHED is also necessary for a half-synchronized connection. Fix that just setting the flag, when we transfer SYN-SENT -> SYN-RECEIVED. - Provide a comment that explains at what conditions the call to soisconnected() is necessary. - Hence mechanically rename the TF_INCQUEUE flag to TF_SONOTCONN. - Extend the change to the BBR and RACK stacks. Note: the interaction between the accept_filter(9) and the socket layer is not fully consistent, yet. For most accept filters this call to soisconnected() will not move the connection from the incomplete queue to the complete. The move would happen only when the filter has received the desired data, and soisconnected() would be called once again from sorwakeup(). Ideally, we should mark socket as connected only there, and leave the soisconnected() from SYN_RCVD->ESTABLISHED only for the simultaneous open case. However, this doesn't yet work. Reviewed by: rscheff, tuexen, rrs Differential revision: https://reviews.freebsd.org/D36641	2022-09-21 14:02:49 -07:00
Gleb Smirnoff	0c7f3ae8c6	tcpcb: fix tabulation count in i4012ef7754c and abbreviate "packets" This lines up comments to the rest of the file. Abbreviation helps to fit in to 80 char terminal. Not a functional change.	2022-09-19 10:29:53 -07:00
Michael Tuexen	6d9e911fba	tcp: fix computation of offset Only update the offset if actually retransmitting from the scoreboard. If not done correctly, this may result in trying to (re)-transmit data not being being in the socket buffe and therefore resulting in a panic. PR: 264257 PR: 263445 PR: 260393 Reviewed by: rscheff@ MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36626	2022-09-19 12:49:31 +02:00
Gleb Smirnoff	da6715bbb1	ip_output: always increase "cantfrag" stat if ip_fragment() fails While here, join two unlikely cases into one if clause. Submitted by: Ivan Rozhuk <rozhuk.im gmail.com> PR: 265718 Reviewed by: mjg, melifaro Differential revision: https://reviews.freebsd.org/D36584	2022-09-14 19:22:40 -07:00
Gleb Smirnoff	15b73a2a14	ip_reass: use correct comparison in ipreass_callout() Reported-by: syzbot+55415dc73f9b89b87fce@syzkaller.appspotmail.com	2022-09-14 08:32:07 -07:00
Richard Scheffenegger	bb1d472d79	tcp: make CUBIC the default congestion control mechanism. This changes the default TCP Congestion Control (CC) to CUBIC. For small, transactional exchanges (e.g. web objects <15kB), this will not have a material effect. However, for long duration data transfers, CUBIC allocates a slightly higher fraction of the available bandwidth, when competing against NewReno CC. Reviewed By: tuexen, mav, #transport, guest-ccui, emaste Relnotes: Yes Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36537	2022-09-13 12:09:21 +02:00
Richard Scheffenegger	ea6d0de299	tcp: Make all references to CUBIC uppercase Consistently refer to the CUBIC congestion control mechanism in uppercase throughout all comments. No functional change. Reviewed By: #transport, tuexen, mav, guest-ccui, emaste Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36547	2022-09-13 12:07:06 +02:00
Dag-Erling Smørgrav	c198adf394	siftr: spell PFIL_PASS correctly. Sponsored by: NetApp Sponsored by: Klara Inc. Differential Revision: https://reviews.freebsd.org/D36539	2022-09-12 19:20:10 +02:00
Mateusz Guzik	1760a6950a	Fixup build after recent getsock changes	2022-09-10 20:40:43 +00:00
Mateusz Guzik	3212ad15ab	Add getsock All but one consumers of getsock_cap only pass 4 arguments. Take advantage of it.	2022-09-10 19:47:47 +00:00
Gleb Smirnoff	29b4b63c59	ip_reass: optimize ipreass_drain_vnet() - Call ipreass_reschedule() only once per slot [1] - Aggregate stats and update them once Suggested by: jtl [1]	2022-09-10 02:17:15 -07:00
Gleb Smirnoff	13018bfae8	ip_reass: make stray callout assertion more verbose Syzcaller hits this assertion, but can't find reproducer. I also never seen it hit in my testing. Try to get more information via syzcaller.	2022-09-10 02:11:39 -07:00
Gleb Smirnoff	c8bc874172	ip_reass: fixup the just added tunable - Don't use hardcoded hash mask - free the memory on VNET destroy Fixes: `1494f4776a`	2022-09-09 09:19:39 -07:00
Randall Stewart	81560c5582	TCP: Rack ends up sending all that is outstanding every timeout. In doing some testing for a different problem, I have found rack retransmitting all outstanding data every time a timeout occurs. The outstanding is sent 1ms apart between each packet, and then the timeout runs off again. This causes extra retransmissions when we should be waiting for an ack after sending the very first segment. Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D36494	2022-09-09 08:59:21 -04:00
Gleb Smirnoff	1494f4776a	ip_reass: add loader tunable to tune the reassembly hash size	2022-09-08 13:49:58 -07:00
Gleb Smirnoff	a30cb31589	ip_reass: retire ipreass_slowtimo() in favor of per-slot callout o Retire global always running ipreass_slowtimo(). o Instead use one callout entry per hash slot. The per-slot callout would be scheduled only if a slot has entries, and would be driven by TTL of the very last entry. o Make net.inet.ip.fragttl read/write and document it. o Retire IPFRAGTTL, which used to be meaningful only with PR_SLOWTIMO. Differential revision: https://reviews.freebsd.org/D36275	2022-09-08 13:49:58 -07:00
Mateusz Guzik	dda6376b04	net: employ newly added pfil_mbuf_{in,out} where approriate Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D36454	2022-09-08 16:21:08 +00:00
Gleb Smirnoff	e80062a2d4	tcp: avoid call to soisconnected() on transition to ESTABLISHED This call existed since pre-FreeBSD times, and it is hard to understand why it was there in the first place. After `6f3caa6d81` it definitely became necessary always and commit message from `f1ee30ccd6` confirms that. Now that `6f3caa6d81` is effectively backed out by `07285bb4c2`, the call appears to be useful only for sockets that landed on the incomplete queue, e.g. sockets that have accept_filter(9) enabled on them. Provide a new TCP flag to mark connections that are known to be on the incomplete queue, and call soisconnected() only for those connections. Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D36488	2022-09-08 09:16:04 -07:00
Mateusz Guzik	14c9a2dbfb	net: retire PFIL_FWD It is now unused and not having it allows further clean ups. Reviewed by: cy, glebius, kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D36452	2022-09-07 10:04:31 +00:00
Mateusz Guzik	223a73a1c4	net: remove stale altq_input reference Code setting it was removed in: commit `325fab802e` Author: Eric van Gyzen <vangyzen@FreeBSD.org> Date: Tue Dec 4 23:46:43 2018 +0000 altq: remove ALTQ3_COMPAT code Reviewed by: glebius, kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D36471	2022-09-07 10:03:12 +00:00
Gleb Smirnoff	aa74cc6d6f	divert(4): do not depend on ipfw(4) Although originally socket was intended to use with ipfw(4) only, now it also can be used with pf(4). On a kernel without packet filters, it still can be used to inject traffic.	2022-09-06 20:54:57 -07:00
Gleb Smirnoff	999c9fd733	divert(4): don't check for CSUM_SCTP without INET This compiles, but actually is a dead code. Noticed by: bz Fixes: `e72c522858`	2022-09-06 20:54:57 -07:00
Gleb Smirnoff	0773b44e82	tcp: tcp6_connect() requires net epoch PR: 262663 Reported & tested by: dch MFC after: 2 weeks	2022-09-05 10:19:11 -07:00
Gordon Bergling	347b1991b0	netdump(4): Correct a typo in source code comment - s/occured/occurred/ MFC after: 3 days	2022-09-04 12:59:29 +02:00
Gordon Bergling	c3679af313	tcp_rack: Correct some typos in source code comments - s/occured/occurred/ MFC after: 3 days	2022-09-04 12:58:13 +02:00
Gordon Bergling	893f36b7f1	netinet: Correct a typo in source code comment - s/occured/occurred/ MFC after: 3 days	2022-09-04 12:57:12 +02:00
Gordon Bergling	d07a501876	tcp_hpts: Correct some typos in source code comments - s/occured/occurred/ - s/the the/the/ MFC after: 3 days	2022-09-04 12:47:49 +02:00
Gordon Bergling	fa52f9dc9a	tcp_rack: Fix two typos in source code comments - s/overriden/overridden/ MFC after: 3 days	2022-09-03 15:05:42 +02:00
Gleb Smirnoff	74ed2e8ab2	raw ip: fix regression with multicast and RSVP With `61f7427f02` raw sockets protosw has wildcard pr_protocol. Protocol of a specific pcb is stored in inp_ip_p. Reviewed by: karels Reported by: karels Differential revision: https://reviews.freebsd.org/D36429 Fixes: `61f7427f02`	2022-09-02 12:17:09 -07:00
Richard Scheffenegger	4012ef7754	tcp: Functional implementation of Accurate ECN The AccECN handshake and TCP header flags are supported, no support yet for the AccECN option. This minimalistic implementation is sufficient to support DCTCP while dramatically cutting the number of ACKs, and provide ECN response from the receiver to the CC modules. Reviewed By: #transport, #manpages, rrs, pauamma Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D21011	2022-08-31 15:05:53 +02:00
Richard Scheffenegger	c21b7b55be	tcp: finish SACK loss recovery on sudden lack of SACK blocks While a receiver should continue sending SACK blocks for the duration of a SACK loss recovery, if for some reason the TCP options no longer contain these SACK blocks, but we already started maintaining the Scoreboard, keep on handling incoming ACKs (without SACK) as belonging to the SACK recovery. Reported by: thj Reviewed by: tuexen, #transport MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D36046	2022-08-31 14:49:47 +02:00
Gleb Smirnoff	e72c522858	divert(4): make it compilable and working without INET Differential revision: https://reviews.freebsd.org/D36383	2022-08-30 15:09:21 -07:00
Gleb Smirnoff	f1fb051716	divert(4): maintain own cb database and stop using inpcb KPI Here go cons of using inpcb for divert: - divert(4) uses only 16 bits (local port) out of struct inpcb, which is 424 bytes today. - The inpcb KPI isn't able to provide hashing for divert(4), thus it uses global inpcb list for lookups. - divert(4) uses INET-specific part of the KPI, making INET a requirement for IPDIVERT. Maintain our own very simple hash lookup database instead. It has mutex protection for write and epoch protection for lookups. Since now so->so_pcb no longer points to struct inpcb, don't initialize protosw methods to methods that belong to PF_INET. Also, drop support for setting options on a divert socket. My review of software in base and ports confirms that this has no use and unlikely worked before. Differential revision: https://reviews.freebsd.org/D36382	2022-08-30 15:09:21 -07:00
Gleb Smirnoff	2b1c72171e	divert(4): provide statistics Instead of incrementing pretty random counters in the IP statistics, create divert socket statistics structure. Export via netstat(1). Differential revision: https://reviews.freebsd.org/D36381	2022-08-30 15:09:21 -07:00
Gleb Smirnoff	61f7427f02	protosw: cleanup protocols that existed merely to provide pr_input Since 4.4BSD the protosw was used to implement socket types created by socket(2) syscall and at the same to demultiplex incoming IPv4 datagrams (later copied to IPv6). This story ended with `78b1fc05b2`. These entries (e.g. IPPROTO_ICMP) in inetsw that were added to catch packets in ip_input(), they would also be returned by pffindproto() if user says socket(AF_INET, SOCK_RAW, IPPROTO_ICMP). Thus, for raw sockets to work correctly, all the entries were pointing at raw_usrreq differentiating only in the value of pr_protocol. With `78b1fc05b2` all these entries are no longer needed, as ip_protox is independent of protosw. Any socket syscall requesting SOCK_RAW type would end up with rip_protosw. And this protosw has its pr_protocol set to 0, allowing to mark socket with any protocol. For IPv6 raw socket the change required two small fixes: o Validate user provided protocol value o Always use protocol number stored in inp in rip6_attach, instead of protosw value, which is now always 0. Differential revision: https://reviews.freebsd.org/D36380	2022-08-30 15:09:21 -07:00
Gleb Smirnoff	8624f4347e	divert: declare PF_DIVERT domain and stop abusing PF_INET The divert(4) is not a protocol of IPv4. It is a socket to intercept packets from ipfw(4) to userland and re-inject them back. It can divert and re-inject IPv4 and IPv6 packets today, but potentially it is not limited to these two protocols. The IPPROTO_DIVERT does not belong to known IP protocols, it doesn't even fit into u_char. I guess, the implementation of divert(4) was done the way it is done basically because it was easier to do it this way, back when protocols for sockets were intertwined with IP protocols and domains were statically compiled in. Moving divert(4) out of inetsw accomplished two important things: 1) IPDIVERT is getting much closer to be not dependent on INET. This will be finalized in following changes. 2) Now divert socket no longer aliases with raw IPv4 socket. Domain/proto selection code won't need a hack for SOCK_RAW and multiple entries in inetsw implementing different flavors of raw socket can merge into one without requirement of raw IPv4 being the last member of dom_protosw. Differential revision: https://reviews.freebsd.org/D36379	2022-08-30 15:09:21 -07:00
Gleb Smirnoff	c00605751e	tcp: remove a dead code leftover from T/TCP, that doesn't have any value today.	2022-08-29 19:30:12 -07:00
Gleb Smirnoff	8fc8063849	divert: merge div_output() into div_send() No functional change intended.	2022-08-29 19:15:01 -07:00
Gleb Smirnoff	c414347bc5	mbufs: isolate max_linkhdr and max_protohdr handling in the mbuf code o Statically initialize max_linkhdr to default value without relying on domain(9) code doing that. o Statically initialize max_protohdr to a sane value, without relying on TCP being always compiled in. o Retire max_datalen. Set, but not used. o Don't make the domain(9) system responsible in validating these values and updating max_hdr. Instead provide KPI max_linkhdr_grow() and max_protohdr_grow(). o Call max_linkhdr_grow() from IEEE802.11 and max_protohdr_grow() from TCP. Those are the only protocols today that may want to grow. Reviewed by: tuexen Differential revision: https://reviews.freebsd.org/D36376	2022-08-29 19:14:25 -07:00
Alexander V. Chernikov	7b3440fc30	Revert "routing: install prefix and loopback routes using new nhop-based KPI." Temporarily revert the commit to unblock testing. This reverts commit `a1b59379db`.	2022-08-29 16:20:42 +00:00
Alexander V. Chernikov	a1b59379db	routing: install prefix and loopback routes using new nhop-based KPI. Construct the desired hexthops directly instead of using the "translation" layer in form of filling rt_addrinfo data. Simplify V_rt_add_addr_allfibs handling by using recently-added rib_copy_route() to propagate the routes to the non-primary address fibs. MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D36166	2022-08-29 10:07:58 +00:00
Michael Tuexen	c624b9a549	tcp: fix stats counter for SYN_RCVD state when TCP-FO is used Reviewed by: glebius Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D36384	2022-08-28 18:45:59 +02:00
Randall Stewart	62ce18fc9a	tcp: Rack rwnd collapse. Currently when the peer collapses its rwnd, we mark packets to be retransmitted and use the must_retran flags like we do when a PMTU collapses to retransmit the collapsed packets. However this causes a problem with some middle boxes that play with the rwnd to control flow. As soon as the rwnd increases we start resending which may be not even a rtt.. and in fact the peer may have gotten the packets. Which means we gratuitously retransmit packets we should not. The fix here is to make sure that a rack time has passed before retransmitting the packets. This makes sure that the rwnd collapse was real and the packets do need retransmission. Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D35166	2022-08-23 09:17:05 -04:00
Randall Stewart	4e0ce82b53	TCP Lro has a loss of timestamp precision and reorders packets. A while back Hans optimized the LRO code. This is great but one optimization he did degrades the timestamp precision so that all flushed LRO entries end up with the same LRO timestamp if there is not a hardware timestamp. The intent of the LRO timestamp is to get as close to the time that the packet arrived as possible. Without the LRO queuing this works out fine since a binuptime is taken and then the rx_common code is called. But when you go through the queue path you end up not updating the M_LRO_TSTMP fields. Another issue in the LRO code is several places that cause packet reordering. In general TCP can handle reordering but it can cause extra un-needed retransmission as well as other oddities. We will fix all of the reordering problems. Lets fix this so that we restore the precision to the timestamp. Reviewed by: tuexen, gallatin Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D36043	2022-08-23 09:12:31 -04:00
Gleb Smirnoff	6498153665	ip_reass: don't drain all vnets on a vnet destroy	2022-08-21 07:44:58 -07:00
Gleb Smirnoff	8338690a0a	ip_reass: provide sysctl MIB returning IP fragment TTL For now it is read-only, but eventually the cycle that goes over all fragments should be refactored and this MIB should also become read/write. This MIB will allow SNMP daemons to implement MIB-II ipReasmTimeout MIB straightfoward. Right now net-snmp compilation is broken by `1922eb3e9c`. The base system bsnmpd is not broken just because it ignored PR_SLOWTIMO, and thus always returned incorrectly doubled value for ipReasmTimeout.	2022-08-20 13:39:12 -07:00
Gleb Smirnoff	e7d02be19d	protosw: refactor protosw and domain static declaration and load o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge struct pr_usrreqs into struct protosw. This was suggested in 1996 by wollman@ (see `7b187005d1`), and later reiterated in 2006 by rwatson@ (see `6fbb9cf860`). o Make struct domain hold a variable sized array of protosw pointers. For most protocols these pointers are initialized statically. Those domains that may have loadable protocols have spacers. IPv4 and IPv6 have 8 spacers each (andre@ `dff3237ee5`). o For inetsw and inet6sw leave a comment noting that many protosw entries very likely are dead code. o Refactor pf_proto_[un]register() into protosw_[un]register(). o Isolate pr_*_notsupp() methods into uipc_domain.c Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36232	2022-08-17 11:50:32 -07:00

1 2 3 4 5 ...

7487 commits