opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-03-30 22:45:13 -04:00

Author	SHA1	Message	Date
Gleb Smirnoff	40b1ae9e00	Add a new feature for optimizining ipfw rulesets - substitution of the action argument with the value obtained from table lookup. The feature is now applicable only to "pipe", "queue", "divert", "tee", "netgraph" and "ngtee" rules. An example usage: ipfw pipe 1000 config bw 1000Kbyte/s ipfw pipe 4000 config bw 4000Kbyte/s ipfw table 1 add x.x.x.x 1000 ipfw table 1 add x.x.x.y 4000 ipfw pipe tablearg ip from table(1) to any In the example above the rule will throw different packets to different pipes. TODO: - Support "skipto" action, but without searching all rules. - Improve parser, so that it warns about bad rules. These are: - "tablearg" argument to action, but no "table" in the rule. All traffic will be blocked. - "tablearg" argument to action, but "table" searches for entry with a specific value. All traffic will be blocked. - "tablearg" argument to action, and two "table" looks - for src and for dst. The last lookup will match.	2005-12-13 12:16:03 +00:00
Gleb Smirnoff	bbce982bd5	When we drop packet due to no space in output interface output queue, also increase the ifp->if_snd.ifq_drops. PR: 72440 Submitted by: ikob	2005-12-06 11:16:11 +00:00
Gleb Smirnoff	95d1f36f82	Optimize parallel processing of ipfw(4) rulesets eliminating the locking of the radix lookup tables. Since several rnh_lookup() can run in parallel on the same table, we can piggyback on the shared locking provided by ipfw(4). However, the single entry cache in the ip_fw_table can't be used lockless, so it is removed. This pessimizes two cases: processing of bursts of similar packets and matching one packet against the same table several times during one ipfw_chk() lookup. To optimize the processing of similar packet bursts administrator should use stateful firewall. To optimize the second problem a solution will be provided soon. Details: o Since we piggyback on the ipfw(4) locking, and the latter is per-chain, the tables are moved from the global declaration to the struct ip_fw_chain. o The struct ip_fw_table is shrunk to one entry and thus vanished. o All table manipulating functions are extended to accept the struct ip_fw_chain * argument. o All table modifing functions use IPFW_WLOCK_ASSERT().	2005-12-06 10:45:49 +00:00
Ruslan Ermilov	f4e9888107	Fix -Wundef.	2005-12-04 02:12:43 +00:00
Hajimu UMEMOTO	8846bbf3ce	obey opt_inet6.h and opt_ipsec.h in kernel build directory. Requested by: hrs	2005-11-29 17:56:11 +00:00
Gleb Smirnoff	b090e4ce1f	Garbage-collect now unused struct _ipfw_insn_pipe and flush_pipe_ptrs(), thus removing a few XXXes. Document the ABI breakage in UPDATING.	2005-11-29 08:59:41 +00:00
Gleb Smirnoff	99b41b34fb	First step in removing welding between ipfw(4) and dummynet. o Do not use ipfw_insn_pipe->pipe_ptr in locate_flowset(). The _ipfw_insn_pipe isn't touched by this commit to preserve ABI compatibility. o To optimize the lookup of the pipe/flowset in locate_flowset() introduce hashes for pipes and queues: - To preserve ABI compatibility utilize the place of global list pointer for SLIST_ENTRY. - Introduce locate_flowset(queue nr) and locate_pipe(pipe nr). o Rework all the dummynet code to deal with the hashes, not global lists. Also did some style(9) changes in the code blocks that were touched by this sweep: - Be conservative about flowset and pipe variable names on stack, use "fs" and "pipe" everywhere. - Cleanup whitespaces. - Sort variables. - Give variables more meaningful names. - Uppercase and dots in comments. - ENOMEM when malloc(9) failed.	2005-11-29 00:11:01 +00:00
Ruslan Ermilov	fc1eaecf4a	Fix prototype.	2005-11-24 14:17:35 +00:00
Paul Saab	d0a14f55c3	Fix for a bug that causes SACK scoreboard corruption when the limit on holes per connection is reached. Reported by: Patrik Roos Submitted by: Mohan Srinivasan Reviewed by: Raja Mukerji, Noritoshi Demizu	2005-11-21 19:22:10 +00:00
Andre Oppermann	22f2c8b5db	Remove 'ipprintfs' which were protected under DIAGNOSTIC. It doesn't have any know to enable it from userland and could only be enabled by either setting it to 1 at compile time or through the kernel debugger. In the future it may be brought back as KTR tracing points. Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-19 17:04:52 +00:00
Andre Oppermann	c444cdded2	Move MAX_IPOPTLEN and struct ipoption back into ip_var.h as userland programs depend on it. Pointed out by: le Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-19 14:01:32 +00:00
Andre Oppermann	ef39adf007	Consolidate all IP Options handling functions into ip_options.[ch] and include ip_options.h into all files making use of IP Options functions. From ip_input.c rev 1.306: ip_dooptions(struct mbuf m, int pass) save_rte(m, option, dst) ip_srcroute(m0) ip_stripoptions(m, mopt) From ip_output.c rev 1.249: ip_insertoptions(m, opt, phlen) ip_optcopy(ip, jp) ip_pcbopts(struct inpcb inp, int optname, struct mbuf *m) No functional changes in this commit. Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-18 20:12:40 +00:00
Andre Oppermann	147f74d176	Purge layer specific mbuf flags on layer crossings to avoid confusing upper or lower layers. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-18 16:23:26 +00:00
Andre Oppermann	e86ebebc52	Rework icmp_error() to deal with truncated IP packets from ip_forward() when doing extended quoting in error messages. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-18 14:48:42 +00:00
Andre Oppermann	780b2f698c	In ip_forward() copy as much into the temporary error mbuf as we have free space in it. Allocate correct mbuf from the beginning. This allows icmp_error() to quote the entire TCP header in error messages. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-18 14:44:48 +00:00
Gleb Smirnoff	218837618a	MFOpenBSD 1.62: Prevent backup CARP hosts from replying to arp requests, fixes strangeness with some layer-3 switches. From Bill Marquette. Tested by: Kazuaki Oda <kaakun highway.ne.jp>	2005-11-17 12:56:40 +00:00
Ruslan Ermilov	433aaf04cb	Unbreak for !INET6 case.	2005-11-14 12:50:23 +00:00
Ruslan Ermilov	4a0d6638b3	- Store pointer to the link-level address right in "struct ifnet" rather than in ifindex_table[]; all (except one) accesses are through ifp anyway. IF_LLADDR() works faster, and all (except one) ifaddr_byindex() users were converted to use ifp->if_addr. - Stop storing a (pointer to) Ethernet address in "struct arpcom", and drop the IFP2ENADDR() macro; all users have been converted to use IF_LLADDR() instead.	2005-11-11 16:04:59 +00:00
SUZUKI Shinsuke	d9a989231e	fixed a bug that uRPF does not work properly for an IPv6 packet bound for the sending machine itself (this is a bug introduced due to a change in ip6_input.c:Rev.1.83) Pointed out by: Sean McNeil and J.R.Oldroyd MFC after: 3 days	2005-11-10 22:10:39 +00:00
Ruslan Ermilov	303989a2f3	Use sparse initializers for "struct domain" and "struct protosw", so they are easier to follow for the human being.	2005-11-09 13:29:16 +00:00
Andrew Thompson	4e7e0183e1	Move the cloned interface list management in to if_clone. For some drivers the softc lists and associated mutex are now unused so these have been removed. Calling if_clone_detach() will now destroy all the cloned interfaces for the driver and in most cases is all thats needed to unload. Idea by: brooks Reviewed by: brooks	2005-11-08 20:08:34 +00:00
Gleb Smirnoff	e1ff74c58d	Rework ARP retransmission algorythm so that ARP requests are retransmitted without suppression, while there is demand for such ARP entry. As before, retransmission is rate limited to one packet per second. Details: - Remove net.link.ether.inet.host_down_time - Do not set/clear RTF_REJECT flag on route, to avoid rt_check() returning error. We will generate error ourselves. - Return EWOULDBLOCK on first arp_maxtries failed requests , and return EHOSTDOWN/EHOSTUNREACH on further requests. - Retransmit ARP request always, independently from return code. Ratelimit to 1 pps.	2005-11-08 12:05:57 +00:00
Andre Oppermann	34333b16cd	Retire MT_HEADER mbuf type and change its users to use MT_DATA. Having an additional MT_HEADER mbuf type is superfluous and redundant as nothing depends on it. It only adds a layer of confusion. The distinction between header mbuf's and data mbuf's is solely done through the m->m_flags M_PKTHDR flag. Non-native code is not changed in this commit. For compatibility MT_HEADER is mapped to MT_DATA. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-11-02 13:46:32 +00:00
Robert Watson	5bb84bc84b	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.	2005-10-31 15:41:29 +00:00
Robert Watson	d374e81efd	Push the assignment of a new or updated so_qlimit from solisten() following the protocol pru_listen() call to solisten_proto(), so that it occurs under the socket lock acquisition that also sets SO_ACCEPTCONN. This requires passing the new backlog parameter to the protocol, which also allows the protocol to be aware of changes in queue limit should it wish to do something about the new queue limit. This continues a move towards the socket layer acting as a library for the protocol. Bump __FreeBSD_version due to a change in the in-kernel protocol interface. This change has been tested with IPv4 and UNIX domain sockets, but not other protocols.	2005-10-30 19:44:40 +00:00
Gleb Smirnoff	f3d30eb20d	First fill in structure with valid values, and only then attach it to the global list. Reviewed by: rwatson	2005-10-28 20:29:42 +00:00
Yaroslav Tykhiy	9f4abef9a3	Since carp(4) interfaces presently are kinda fake yet possess IP addresses, mark them with LOOPBACK so that routing daemons take them easy for link-state routing protocols. Reviewed by: glebius	2005-10-26 05:57:35 +00:00
Max Laier	1e4b360655	Fix build after in6_joingroup change. It remains unclear if DAD breaks CARP or not.	2005-10-22 14:54:02 +00:00
Gleb Smirnoff	bfb26eecfb	In in_addprefix() compare not only route addresses, but their masks, too. This fixes problem when connected prefixes overlap. Obtained from: OpenBSD (rev. 1.40 by claudio); [ I came to this fix myself, and then found out that OpenBSD had already fixed it the same way.]	2005-10-22 14:50:27 +00:00
SUZUKI Shinsuke	743eee666f	sync with KAME regarding NDP - introduced fine-grain-timer to manage ND-caches and IPv6 Multicast-Listeners - supports Router-Preference <draft-ietf-ipv6-router-selection-07.txt> - better prefix lifetime management - more spec-comformant DAD advertisement - updated RFC/internet-draft revisions Obtained from: KAME Reviewed by: ume, gnn MFC after: 2 month	2005-10-21 16:23:01 +00:00
Robert Watson	a65e12b09d	Convert if (tp->t_state == TCPS_LISTEN) panic() into a KASSERT. MFC after: 2 weeks	2005-10-19 09:37:52 +00:00
Andrew Thompson	febd0759f3	Change the reference counting to count the number of cloned interfaces for each cloner. This ensures that ifc->ifc_units is not prematurely freed in if_clone_detach() before the clones are destroyed, resulting in memory modified after free. This could be triggered with if_vlan. Assert that all cloners have been destroyed when freeing the memory. Change all simple cloners to destroy their clones with ifc_simple_destroy() on module unload so the reference count is properly updated. This also cleans up the interface destroy routines and allows future optimisation. Discussed with: brooks, pjd, -current Reviewed by: brooks	2005-10-12 19:52:16 +00:00
Maxim Konovalov	d46ff6bd1e	o INP_ONESBCAST is inpcb.inp_vflag flag not inp_flags. The confusion with IP_PORTRANGE_HIGH leads to the incorrect checksum calculation. PR: kern/87306 Submitted by: Rickard Lind Reviewed by: bms MFC after: 2 weeks	2005-10-12 18:13:25 +00:00
Philip Paeps	7691747aac	Unbreak the net.inet6.tcp6.getcred sysctl. This makes inetd/auth work again in IPv6 setups. Pointy hat to: ume/KAME	2005-10-12 09:24:18 +00:00
Andrew Thompson	f69453ca8b	When bridging is enabled and an ARP request is recieved on a member interface, the arp code will search all local interfaces for a match. This triggers a kernel log if the bridge has been assigned an address. arp: ac🇩🇪48:18:83:3d is using my IP address 192.168.0.142! bridge0: flags=8041<UP,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.142 netmask 0xffffff00 ether ac🇩🇪48:18:83:3d Silence this warning for 6.0 to stop unnecessary bug reports, the code will need to be reworked. Approved by: mlaier (mentor) MFC after: 3 days	2005-10-04 19:50:02 +00:00
Andre Oppermann	1fd7af262a	Correct brainfart in SO_BINTIME test. Pointed out by: nate Pointy hat to: andre	2005-10-04 18:19:21 +00:00
Andre Oppermann	e5fbf72cd8	Make SO_BINTIME timestamps available on raw_ip sockets. Sponsored by: TCP/IP Optimization Fundraise 2005	2005-10-04 18:07:11 +00:00
Robert Watson	c48b03fb69	Unlock Giant symmetrically with respect to lock acquire order as that's generally nicer. Spotted by: johan MFC after: 1 week	2005-10-03 11:34:29 +00:00
Robert Watson	1fa9efeffb	Acquire Giant conditionally in in_addmulti() and in_delmulti() based on whether the interface being accessed is IFF_NEEDSGIANT or not. This avoids lock order reversals when calling into the interface ioctl handler, which could potentially lead to deadlock. The long term solution is to eliminate non-MPSAFE network drivers. Discussed with: jhb MFC after: 1 week	2005-10-03 11:09:39 +00:00
Maxim Konovalov	ac827533df	o Teach sysctl_drop() how to deal with the sockets in TIME_WAIT state. This is a special case because tcp_twstart() destroys a tcp control block via tcp_discardcb() so we cannot call tcp_drop(struct *tcpcb) on such connections. Use tcp_twclose() instead. MFC after: 5 days	2005-10-02 08:43:57 +00:00
Max Laier	b6de9e91bd	Remove bridge(4) from the tree. if_bridge(4) is a full functional replacement and has additional features which make it superior. Discussed on: -arch Reviewed by: thompsa X-MFC-after: never (RELENG_6 as transition period)	2005-09-27 18:10:43 +00:00
Andre Oppermann	b2828ad291	Implement IP_DONTFRAG IP socket option enabling the Don't Fragment flag on IP packets. Currently this option is only repected on udp and raw ip sockets. On tcp sockets the DF flag is controlled by the path MTU discovery option. Sending a packet larger than the MTU size of the egress interface returns an EMSGSIZE error. Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005	2005-09-26 20:25:16 +00:00
Andre Oppermann	fe53256dc2	Use monotonic 'time_uptime' instead of 'time_second' as timebase for rt->rt_rmx.rmx_expire.	2005-09-19 22:54:55 +00:00
Andre Oppermann	e6b9152d20	Use monotonic 'time_uptime' instead of 'time_second' as timebase for timeouts.	2005-09-19 22:31:45 +00:00
Robert Watson	b1c53bc9c0	Take a first cut at cleaning up ifnet removal and multicast socket panics, which occur when stale ifnet pointers are left in struct moptions hung off of inpcbs: - Add in_ifdetach(), which matches in6_ifdetach(), and allows the protocol to perform early tear-down on the interface early in if_detach(). - Annotate that if_detach() needs careful consideration. - Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR -- this is not the place to detect interface removal! This also removes what is basically a nasty (and now unnecessary) hack. - Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP IPv4 sockets. It is now possible to run the msocket_ifnet_remove regression test using HEAD without panicking. MFC after: 3 days	2005-09-18 17:36:28 +00:00
Andre Oppermann	db1240661f	Do not ignore all other TCP options (eg. timestamp, window scaling) when responding to TCP SYN packets with TCP_MD5 enabled and set. PR: kern/82963 Submitted by: <demizu at dd.iij4u.or.jp> MFC after: 3 days	2005-09-14 15:06:22 +00:00
Bjoern A. Zeeb	75398603ad	Fix panic when kernel compiled without INET6 by rejecting IPv6 opcodes which are behind #if(n)def INET6 now. PR: kern/85826 MFC after: 3 days	2005-09-14 07:53:54 +00:00
Andre Oppermann	ffabe3dce8	In tcp_ctlinput() do not swap ip->ip_len a second time. It has been done in icmp_input() already. This fixes the ICMP_UNREACH_NEEDFRAG case where no MTU was proposed in the ICMP reply. PR: kern/81813 Submitted by: Vitezslav Novy <vita at fio.cz> MFC after: 3 days	2005-09-10 07:43:29 +00:00
Gleb Smirnoff	a20e25385c	- Do not hold route entry lock, when calling arprequest(). One such call was introduced by me in 1.139, the other one was present before. - Do all manipulations with rtentry and la before dropping the lock. - Copy interface address from route into local variable before dropping the lock. Supply this copy as argument to arprequest() LORs fixed: http://sources.zabbadoz.net/freebsd/lor/003.html http://sources.zabbadoz.net/freebsd/lor/037.html http://sources.zabbadoz.net/freebsd/lor/061.html http://sources.zabbadoz.net/freebsd/lor/062.html http://sources.zabbadoz.net/freebsd/lor/064.html http://sources.zabbadoz.net/freebsd/lor/068.html http://sources.zabbadoz.net/freebsd/lor/071.html http://sources.zabbadoz.net/freebsd/lor/074.html http://sources.zabbadoz.net/freebsd/lor/077.html http://sources.zabbadoz.net/freebsd/lor/093.html http://sources.zabbadoz.net/freebsd/lor/135.html http://sources.zabbadoz.net/freebsd/lor/140.html http://sources.zabbadoz.net/freebsd/lor/142.html http://sources.zabbadoz.net/freebsd/lor/145.html http://sources.zabbadoz.net/freebsd/lor/152.html http://sources.zabbadoz.net/freebsd/lor/158.html	2005-09-09 10:06:27 +00:00
Gleb Smirnoff	5d40d65b5a	When a carp(4) interface is being destroyed and is in a promiscous mode, first interface is detached from parent and then bpfdetach() is called. If the interface was the last carp(4) interface attached to parent, then the mutex on parent is destroyed. When bpfdetach() calls if_setflags() we panic on destroyed mutex. To prevent the above scenario, clear pointer to parent, when we detach ourselves from parent.	2005-09-09 08:41:39 +00:00
Sam Leffler	245c31ccaf	clear lock on error in O_LIMIT case of install_state Submitted by: Ted Unangst MFC after: 3 days	2005-09-04 17:33:40 +00:00
Andre Oppermann	e0aec68255	Use the correct mbuf type for MGET().	2005-08-30 16:35:27 +00:00
Gleb Smirnoff	e3ea67a077	Add newline to debuging printf. PR: kern/85271 Submitted by: Simon Morgan	2005-08-26 15:27:18 +00:00
Gleb Smirnoff	360856f60e	- Refuse hashsize of 0, since it is invalid. - Use defined constant instead of 512.	2005-08-25 13:57:00 +00:00
Gleb Smirnoff	510b360fc0	When we have a published ARP entry for some IP address, do reply on ARP requests only on the network where this IP address belong, to. Before this change we did replied on all interfaces. This could lead to an IP address conflict with host we are doing ARP proxy for. PR: kern/75634 Reviewed by: andre	2005-08-25 13:25:57 +00:00
Paul Saab	4d3b134633	Remove a KASSERT in the sack path that fails because of a interaction between sack and a bug in the "bad retransmit recovery" logic. This is a workaround, the underlying bug will be fixed later. Submitted by: Mohan Srinivasan, Noritoshi Demizu	2005-08-24 02:48:45 +00:00
Paul Saab	b24de0e665	Fix up the comment for MAX_SACK_BLKS. Submitted by: Noritoshi Demizu	2005-08-24 02:47:16 +00:00
Andre Oppermann	ef8fd90476	Remove unnecessary IPSEC includes. MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005	2005-08-23 14:42:40 +00:00
Andre Oppermann	23655387e9	o Fix a logic error when not doing mbuf cluster allocation. o Change an old panic() to a clean function exit. MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005	2005-08-22 22:13:41 +00:00
Andre Oppermann	936cd18dad	Add socketoption IP_MINTTL. May be used to set the minimum acceptable TTL a packet must have when received on a socket. All packets with a lower TTL are silently dropped. Works on already connected/connecting and listening sockets for RAW/UDP/TCP. This option is only really useful when set to 255 preventing packets from outside the directly connected networks reaching local listeners on sockets. Allows userland implementation of 'The Generalized TTL Security Mechanism (GTSM)' according to RFC3682. Examples of such use include the Cisco IOS BGP implementation command "neighbor ttl-security". MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005	2005-08-22 16:13:08 +00:00
Andre Oppermann	6b773dff30	Always quote the entire TCP header when responding and allocate an mbuf cluster if needed. Fixes the TCP issues raised in I-D draft-gont-icmp-payload-00.txt. This aids in-the-wild debugging a lot and allows the receiver to do more elaborate checks on the validity of the response. MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005	2005-08-22 14:12:18 +00:00
Andre Oppermann	d56ea155bd	Handle pure layer 2 broad- and multicasts properly and simplify related checks. PR: kern/85052 Submitted by: Dmitrij Tejblum <tejblum at yandex-team.ru> MFC after: 3 days	2005-08-22 12:06:26 +00:00
Andre Oppermann	bb10780f9f	Commit correct version of the change and note the name of the new sysctl: net.inet.icmp.quotelen and defaults to 8 bytes. Pointy hat to: andre	2005-08-21 15:18:00 +00:00
Andre Oppermann	e875dfb826	Add a sysctl to change to length of the quotation of the original packet in an ICMP reply. The minimum of 8 bytes is internally enforced. The maximum quotation is the remaining space in the reply mbuf. This option is added in response to the issues raised in I-D draft-gont-icmp-payload-00.txt. MFC after: 2 weeks Spnsored by: TCP/IP Optimizations Fundraise 2005	2005-08-21 15:09:07 +00:00
Andre Oppermann	a0866c8d4e	Add an option to have ICMP replies to non-local packets generated with the IP address the packet came through in. This is useful for routers to show in traceroutes the actual path a packet has taken instead of the possibly different return path. The new sysctl is named net.inet.icmp.reply_from_interface and defaults to off. MFC after: 2 weeks	2005-08-21 12:29:39 +00:00
Gleb Smirnoff	1ae954096e	In order to support CARP interfaces kernel was taught to handle more than one interface in one subnet. However, some userland apps rely on the believe that this configuration is impossible. Add a sysctl switch net.inet.ip.same_prefix_carp_only. If the switch is on, then kernel will refuse to add an additional interface to already connected subnet unless the interface is CARP. Default value is off. PR: bin/82306 In collaboration with: mlaier	2005-08-18 10:34:30 +00:00
Bjoern A. Zeeb	bd2e5495d1	Fix broken build of rev. 1.108 in case of no INET6 and IPFIREWALL compiled into kernel. Spotted and tested by: Michal Mertl <mime at traveller.cz>	2005-08-14 18:20:33 +00:00
Bjoern A. Zeeb	9066356ba1	* Add dynamic sysctl for net.inet6.ip6.fw. * Correct handling of IPv6 Extension Headers. * Add unreach6 code. * Add logging for IPv6. Submitted by: sysctl handling derived from patch from ume needed for ip6fw Obtained from: is_icmp6_query and send_reject6 derived from similar functions of netinet6,ip6fw Reviewed by: ume, gnn; silence on ipfw@ Test setup provided by: CK Software GmbH MFC after: 6 days	2005-08-13 11:02:34 +00:00
Craig Rodrigues	eee9fe3078	Add NATM_LOCK() and NATM_UNLOCK() in places where npcb_add() and npcb_free() are called, in order to eliminate witness panics. This was overlooked in removal of GIANT from ATM. Reviewed by: rwatson	2005-08-12 02:38:20 +00:00
Gleb Smirnoff	1ed7bf1e3b	o Fix a race between three threads: output path, incoming ARP packet and route request adding/removing ARP entries. The root of the problem is that struct llinfo_arp was accessed without any locks. To close race we will use locking provided by rtentry, that references this llinfo_arp: - Make arplookup() return a locked rtentry. - In arpresolve() hold the lock provided by rt_check()/arplookup() until the end of function, covering all accesses to the rtentry itself and llinfo_arp it refers to. - In in_arpinput() do not drop lock provided by arplookup() during first part of the function. - Simplify logic in the first part of in_arpinput(), removing one level of indentation. - In the second part of in_arpinput() hold rtentry lock while copying address. o Fix a condition when route entry is destroyed, while another thread is contested on its lock: - When storing a pointer to rtentry in llinfo_arp list, always add a reference to this rtentry, to prevent rtentry being destroyed via RTM_DELETE request. - Remove this reference when removing entry from llinfo_arp list. o Further cleanup of arptimer(): - Inline arptfree() into arptimer(). - Use official queue(3) way to pass LIST. - Hold rtentry lock while reading its structure. - Do not check that sdl_family is AF_LINK, but assert this. Reviewed by: sam Stress test: http://www.holm.cc/stress/log/cons141.html Stress test: http://people.freebsd.org/~pho/stress/log/cons144.html	2005-08-11 08:25:48 +00:00
David E. O'Brien	c11ba30c9a	Remove public declarations of variables that were forgotten when they were made static.	2005-08-10 07:10:02 +00:00
David E. O'Brien	31793d594b	Match IPv6 and use a static struct pr_usrreqs nousrreqs.	2005-08-10 06:41:04 +00:00
Robert Watson	a2dc1f5021	Add helper function ip_findmoptions(), which accepts an inpcb, and attempts to atomically return either an existing set of IP multicast options for the PCB, or a newlly allocated set with default values. The inpcb is returned locked. This function may sleep. Call ip_moptions() to acquire a reference to a PCB's socket options, and perform the update of the options while holding the PCB lock. Release the lock before returning. Remove garbage collection of multicast options when values return to the default, as this complicates locking substantially. Most applications allocate a socket either to be multicast, or not, and don't tend to keep around sockets that have previously been used for multicast, then used for unicast. This closes a number of race conditions involving multiple threads or processes modifying the IP multicast state of a socket simultaenously. MFC after: 7 days	2005-08-09 17:19:21 +00:00
Robert Watson	13f4c340ae	Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to ifnet.if_drv_flags. Device drivers are now responsible for synchronizing access to these flags, as they are in if_drv_flags. This helps prevent races between the network stack and device driver in maintaining the interface flags field. Many __FreeBSD__ and __FreeBSD_version checks maintained and continued; some less so. Reviewed by: pjd, bz MFC after: 7 days	2005-08-09 10:20:02 +00:00
Gleb Smirnoff	9bd8ca3014	In preparation for fixing races in ARP (and probably in other L2/L3 mappings) make rt_check() return a locked rtentry.	2005-08-09 08:39:56 +00:00
Robert Watson	dd5a318ba3	Introduce in_multi_mtx, which will protect IPv4-layer multicast address lists, as well as accessor macros. For now, this is a recursive mutex due code sequences where IPv4 multicast calls into IGMP calls into ip_output(), which then tests for a multicast forwarding case. For support macros in in_var.h to check multicast address lists, assert that in_multi_mtx is held. Acquire in_multi_mtx around iteration over the IPv4 multicast address lists, such as in ip_input() and ip_output(). Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses, as well as over the manipulation of ifnet multicast address lists in order to keep the two layers in sync. Lock down accesses to IPv4 multicast addresses in IGMP, or assert the lock when performing IGMP join/leave events. Eliminate spl's associated with IPv4 multicast addresses, portions of IGMP that weren't previously expunged by IGMP locking. Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded lock order in WITNESS, in that order. Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca> MFC after: 10 days	2005-08-03 19:29:47 +00:00
Robert Watson	bccb41014a	Modify network protocol consumers of the ifnet multicast address lists to lock if_addr_mtx. Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca> MFC after: 1 week	2005-08-02 23:51:22 +00:00
Hajimu UMEMOTO	4dad226e45	recover the line which was wrongly disappeared during scope cleanup. tcpdrop(8) should work for IPv6, again.	2005-08-01 12:08:49 +00:00
Bjoern A. Zeeb	9e669156d4	Add support for IPv6 over GRE [1]. PR kern/80340 includes the FreeBSD specific ip_newid() changes NetBSD does not have. Correct handling of non AF_INET packets passed to bpf [2]. PR: kern/80340[1], NetBSD PRs 29150[1], 30844[2] Obtained from: NetBSD ip_gre.c rev. 1.34,1.35, if_gre.c rev. 1.56 Submitted by: Gert Doering <gert at greenie.muc.de>[2] MFC after: 4 days	2005-08-01 08:14:21 +00:00
Hajimu UMEMOTO	c85ed85b1c	include scope6_var.h for in6_clearscope().	2005-07-26 00:19:58 +00:00
Hajimu UMEMOTO	29da8af658	include netinet6/scope6_var.h.	2005-07-25 12:36:43 +00:00
Hajimu UMEMOTO	a1f7e5f8ee	scope cleanup. with this change - most of the kernel code will not care about the actual encoding of scope zone IDs and won't touch "s6_addr16[1]" directly. - similarly, most of the kernel code will not care about link-local scoped addresses as a special case. - scope boundary check will be stricter. For example, the current BSD code allows a packet with src=::1 and dst=(some global IPv6 address) to be sent outside of the node, if the application do: s = socket(AF_INET6); bind(s, "::1"); sendto(s, some_global_IPv6_addr); This is clearly wrong, since ::1 is only meaningful within a single node, but the current implementation of the BSD kernel cannot reject this attempt. Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp> Obtained from: KAME	2005-07-25 12:31:43 +00:00
Giorgos Keramidas	a09ad79379	Misc spelling and/or English fixes in comments. Reviewed by: glebius, andre	2005-07-23 00:59:13 +00:00
Hajimu UMEMOTO	6c4eaa873f	move RFC3542 related definitions into ip6.h. Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Reviewed by: mlaier Obtained from: KAME	2005-07-20 10:30:52 +00:00
Hajimu UMEMOTO	77b6f9ed40	add missing RFC3542 definition. Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME	2005-07-20 09:17:41 +00:00
Hajimu UMEMOTO	18b35df8fe	update comments: - RFC2292bis -> RFC3542 - typo fixes Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME	2005-07-20 08:59:45 +00:00
Robert Watson	de35559f82	Remove no-op spl references in in_pcb.c, since in_pcb locking has been basically complete for several years now. Update one spl comment to reference the locking strategy. MFC after: 3 days	2005-07-19 12:24:27 +00:00
Robert Watson	f59a9ebf10	Remove no-op spl's and most comment references to spls, as TCP locking is believed to be basically done (modulo any remaining bugs). MFC after: 3 days	2005-07-19 12:21:26 +00:00
Robert Watson	b77634d046	Remove spl() calls from ip_slowtimo(), as IP fragment queue locking was merged several years ago. Submitted by: gnn MFC after: 1 day	2005-07-19 12:14:22 +00:00
Max Laier	6de8d9dc52	Export pfsyncstats via sysctl "net.inet.pfsync" in order to print them with netstat (seperate commit). Requested by: glebius MFC after: 1 week	2005-07-14 22:22:51 +00:00
Robert Watson	3c308b091f	Eliminate MAC entry point mac_create_mbuf_from_mbuf(), which is redundant with respect to existing mbuf copy label routines. Expose a new mac_copy_mbuf() routine at the top end of the Framework and use that; use the existing mpo_copy_mbuf_label() routine on the bottom end. Obtained from: TrustedBSD Project Sponsored by: SPARTA, SPAWAR Approved by: re (scottl)	2005-07-05 23:39:51 +00:00
Paul Saab	d758711729	Fix for a bug in newreno partial ack handling where if a large amount of data is partial acked, snd_cwnd underflows, causing a burst. Found, Submitted by: Noritoshi Demizu Approved by: re	2005-07-05 19:23:02 +00:00
Max Laier	b4373150d9	Remove ambiguity from hlen. IPv4 is now indicated by is_ipv4 and we need a proper hlen value for IPv6 to implement O_REJECT and O_LOG. Reviewed by: glebius, brooks, gnn Approved by: re (scottl)	2005-07-03 15:42:22 +00:00
Andrew Thompson	2fcb030ad5	Check the alignment of the IP header before passing the packet up to the packet filter. This would cause a panic on architectures that require strict alignment such as sparc64 (tier1) and ia64/ppc (tier2). This adds two new macros that check the alignment, these are compile time dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where alignment isn't need so the cost is avoided. IP_HDR_ALIGNED_P() IP6_HDR_ALIGNED_P() Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment is checked for ipfw and dummynet too. PR: ia64/81284 Obtained from: NetBSD Approved by: re (dwhite), mlaier (mentor)	2005-07-02 23:13:31 +00:00
Paul Saab	482ac96888	Fix for a bug in the change that defers sack option processing until after PAWS checks. The symptom of this is an inconsistency in the cached sack state, caused by the fact that the sack scoreboard was not being updated for an ACK handled in the header prediction path. Found by: Andrey Chernov. Submitted by: Noritoshi Demizu, Raja Mukerji. Approved by: re	2005-07-01 22:54:18 +00:00
Paul Saab	69e0362019	Fix for a SACK crash caused by a bug in tcp_reass(). tcp_reass() does not clear tlen and frees the mbuf (leaving th pointing at freed memory), if the data segment is a complete duplicate. This change works around that bug. A fix for the tcp_reass() bug will appear later (that bug is benign for now, as neither th nor tlen is referenced in tcp_input() after the call to tcp_reass()). Found by: Pawel Jakub Dawidek. Submitted by: Raja Mukerji, Noritoshi Demizu. Approved by: re	2005-07-01 22:52:46 +00:00
Gleb Smirnoff	a196a3c8aa	When doing ARP load balancing source IP is taken in network byte order, so residue of division for all hosts on net is the same, and thus only one VHID answers. Change source IP in host byte order. Reviewed by: mlaier Approved by: re (scottl)	2005-07-01 08:22:13 +00:00
Simon L. B. Nielsen	0a389eab22	Fix ipfw packet matching errors with address tables. The ipfw tables lookup code caches the result of the last query. The kernel may process multiple packets concurrently, performing several concurrent table lookups. Due to an insufficient locking, a cached result can become corrupted that could cause some addresses to be incorrectly matched against a lookup table. Submitted by: ru Reviewed by: csjp, mlaier Security: CAN-2005-2019 Security: FreeBSD-SA-05:13.ipfw Correct bzip2 permission race condition vulnerability. Obtained from: Steve Grubb via RedHat Security: CAN-2005-0953 Security: FreeBSD-SA-05:14.bzip2 Approved by: obrien Correct TCP connection stall denial of service vulnerability. A TCP packets with the SYN flag set is accepted for established connections, allowing an attacker to overwrite certain TCP options. Submitted by: Noritoshi Demizu Reviewed by: andre, Mohan Srinivasan Security: CAN-2005-2068 Security: FreeBSD-SA-05:15.tcp Approved by: re (security blanket), cperciva	2005-06-29 21:36:49 +00:00
Paul Saab	5a53ca1627	- Postpone SACK option processing until after PAWS checks. SACK option processing is now done in the ACK processing case. - Merge tcp_sack_option() and tcp_del_sackholes() into a new function called tcp_sack_doack(). - Test (SEG.ACK < SND.MAX) before processing the ACK. Submitted by: Noritoshi Demizu Reveiewed by: Mohan Srinivasan, Raja Mukerji Approved by: re	2005-06-27 22:27:42 +00:00
Poul-Henning Kamp	dca9c930da	Libalias incorrectly applies proxy rules to the global divert socket: it should only look for existing translation entries, not create new ones (no matter how it got the idea). Approved by: re(scottl)	2005-06-27 22:21:42 +00:00
Gleb Smirnoff	59dde15e82	Disable checksum processing in LibAlias, when it works as a kernel module. LibAlias is not aware about checksum offloading, so the caller should provide checksum calculation. (The only current consumer is ng_nat(4)). When TCP packet internals has been changed and it requires checksum recalculation, a cookie is set in th_x2 field of TCP packet, to inform caller that it needs to recalculate checksum. This ugly hack would be removed when LibAlias is made more kernel friendly. Incremental checksum updates are left as is, since they don't conflict with offloading. Approved by: re (scottl)	2005-06-27 07:36:02 +00:00
David Malone	01399f34a5	Fix some long standing bugs in writing to the BPF device attached to a DLT_NULL interface. In particular: 1) Consistently use type u_int32_t for the header of a DLT_NULL device - it continues to represent the address family as always. 2) In the DLT_NULL case get bpf_movein to store the u_int32_t in a sockaddr rather than in the mbuf, to be consistent with all the DLT types. 3) Consequently fix a bug in bpf_movein/bpfwrite which only permitted packets up to 4 bytes less than the MTU to be written. 4) Fix all DLT_NULL devices to have the code required to allow writing to their bpf devices. 5) Move the code to allow writing to if_lo from if_simloop to looutput, because it only applies to DLT_NULL devices but was being applied to other devices that use if_simloop possibly incorrectly. PR: 82157 Submitted by: Matthew Luckie <mjl@luckie.org.nz> Approved by: re (scottl)	2005-06-26 18:11:11 +00:00
Stephan Uphoff	68d376254c	Fix a timer ticks wrap around bug for minmssoverload processing. Approved by: re (scottl,dwhite) MFC after: 4 weeks	2005-06-25 22:24:45 +00:00
Warner Losh	d980b05275	Add back missing copyright and license statement. This is identical to the statement in ip_mroute.h, as well as being the same as what OpenBSD has done with this file. It matches the copyright in NetBSD's 1.1 through 1.14 versions of the file as well, which they subsequently added back. It appears to have been lost in the 4.4-lite1 import for FreeBSD 2.0, but where and why I've not investigated further. OpenBSD had the same problem. NetBSD had a copyright notice until Multicast 3.5 was integrated verbatim back in 1995. This appears to be the version that made it into 4.4-lite1. Approved by: re (scottl) MFC after: 3 days	2005-06-23 18:42:58 +00:00
Paul Saab	9004ded9df	Fix for a bug in tcp_sack_option() causing crashes. Submitted by: Noritoshi Demizu, Mohan Srinivasan. Approved by: re (scottl blanket SACK)	2005-06-23 00:18:54 +00:00
Bjoern A. Zeeb	67df9f3896	Fix IP(v6) over IP tunneling most likely broken with ifnet changes. Reviewed by: gnn Approved by: re (dwhite), rwatson (mentor)	2005-06-20 08:39:30 +00:00
Gleb Smirnoff	72f2d6578c	- Don't use legacy function in a non-legacy one. This gives us possibility to compile libalias without legacy support. - Use correct way to mark variable as unused. Approved by: re (dwhite)	2005-06-20 08:31:48 +00:00
Max Laier	e4c959952b	In verify_rev_path6(): - do not use static memory as we are under a shared lock only - properly rtfree routes allocated with rtalloc - rename to verify_path6() - implement the full functionality of the IPv4 version Also make O_ANTISPOOF work with IPv6. Reviewed by: gnn Approved by: re (blanket)	2005-06-16 14:55:58 +00:00
Max Laier	ad7abe197d	Fix indentation in INET6 section in preperation of more serious work. Approved by: re (blanket ip6fw removal)	2005-06-16 13:20:36 +00:00
Max Laier	cf21d53cbf	When doing matching based on dst_ip/src_ip make sure we are really looking on an IPv4 packet as these variables are uninitialized if not. This used to allow arbitrary IPv6 packets depending on the value in the uninitialized variables. Some opcodes (most noteably O_REJECT) do not support IPv6 at all right now. Reviewed by: brooks, glebius Security: IPFW might pass IPv6 packets depending on stack contents. Approved by: re (blanket)	2005-06-12 16:27:10 +00:00
Brooks Davis	fc74a9f93a	Stop embedding struct ifnet at the top of driver softcs. Instead the struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com. This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go. Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr. Reviewed by: sobomax, sam	2005-06-10 16:49:24 +00:00
Brian Feldman	b34d56f1ef	Modify send_pkt() to return the generated packet and have the caller do the subsequent ip_output() in IPFW. In ipfw_tick(), the keep-alive packets must be generated from the data that resides under the stateful lock, but they must not be sent at that time, as this would cause a lock order reversal with the normal ordering (interface's lock, then locks belonging to the pfil hooks). In practice, this caused deadlocks when using IPFW and if_bridge(4) together to do stateful transparent filtering. MFC after: 1 week	2005-06-10 12:28:17 +00:00
Andrew Thompson	c8b0129238	Add dummynet(4) support to if_bridge, this code is largely based on bridge.c. This is the final piece to match bridge.c in functionality, we can now be a drop-in replacement. Approved by: mlaier (mentor)	2005-06-10 01:25:22 +00:00
Paul Saab	e912f906d0	Fix a mis-merge. Remove a redundant call to tcp_sackhole_insert Submitted by: Mohan Srinivasan	2005-06-09 17:55:29 +00:00
Paul Saab	8b9bbaaa94	Fix for a crash in tcp_sack_option() caused by hitting the limit on the number of sack holes. Reported by: Andrey Chernov Submitted by: Noritoshi Demizu Reviewed by: Raja Mukerji	2005-06-09 14:01:04 +00:00
Paul Saab	db4b83fe49	Fix for a bug in the change that walks the scoreboard backwards from the tail (in tcp_sack_option()). The bug was caused by incorrect accounting of the retransmitted bytes in the sackhint. Reported by: Kris Kennaway. Submitted by: Noritoshi Demizu.	2005-06-06 19:46:53 +00:00
Andrew Thompson	8f86751705	Add hooks into the networking layer to support if_bridge. This changes struct ifnet so a buildworld is necessary. Approved by: mlaier (mentor) Obtained from: NetBSD	2005-06-05 03:13:13 +00:00
Brian Feldman	5278d40bcc	Better explain, then actually implement the IPFW ALTQ-rule first-match policy. It may be used to provide more detailed classification of traffic without actually having to decide its fate at the time of classification. MFC after: 1 week	2005-06-04 19:04:31 +00:00
Paul Saab	9d17a7a64a	Changes to tcp_sack_option() that - Walks the scoreboard backwards from the tail to reduce the number of comparisons for each sack option received. - Introduce functions to add/remove sack scoreboard elements, making the code more readable. Submitted by: Noritoshi Demizu Reviewed by: Raja Mukerji, Mohan Srinivasan	2005-06-04 08:03:28 +00:00
Max Laier	57cd6d263b	Add support for IPv4 only rules to IPFW2 now that it supports IPv6 as well. This is the last requirement before we can retire ip6fw. Reviewed by: dwhite, brooks(earlier version) Submitted by: dwhite (manpage) Silence from: -ipfw	2005-06-03 01:10:28 +00:00
Ian Dowse	ba5da2a06f	Use IFF_LOCKGIANT/IFF_UNLOCKGIANT around calls to the interface if_ioctl routine. This should fix a number of code paths through soo_ioctl() that could call into Giant-locked network drivers without first acquiring Giant.	2005-06-02 00:04:08 +00:00
Robert Watson	303939942c	When aborting tcp_attach() due to a problem allocating or attaching the tcpcb, lock the inpcb before calling in_pcbdetach() or in6_pcbdetach(), as they expect the inpcb to be passed locked. MFC after: 7 days	2005-06-01 12:14:56 +00:00
Robert Watson	e6e0b5ffd1	Assert tcbinfo lock, inpcb lock in tcp_disconnect(). Assert tcbinfo lock, inpcb lock in in tcp_usrclosed(). MFC after: 7 days	2005-06-01 12:08:15 +00:00
Robert Watson	e3d5315d01	Assert tcbinfo lock in tcp_drop() due to its call of tcp_close() Assert tcbinfo lock in tcp_close() due to its call to in{,6}_detach() Assert tcbinfo lock in tcp_drop_syn_sent() due to its call to tcp_drop() MFC after: 7 days	2005-06-01 12:06:07 +00:00
Robert Watson	1e2d989d0d	Assert that tcbinfo is locked in tcp_input() before calling into tcp_drop(). MFC after: 7 days	2005-06-01 12:03:18 +00:00
Robert Watson	416738a781	Assert the tcbinfo lock whenever tcp_close() is to be called by tcp_input(). MFC after: 7 days	2005-06-01 11:49:14 +00:00
Robert Watson	7609aad7d9	Assert tcbinfo lock in tcp_attach(), as it is required; the caller (tcp_usr_attach()) currently grabs it. MFC after: 7 days	2005-06-01 11:44:43 +00:00
Robert Watson	fe6bfc3730	Commit correct version of previous commit (in_pcb.c:1.164). Use the local variables as currently named. MFC after: 7 days	2005-06-01 11:43:39 +00:00
Robert Watson	6b348152be	Assert pcbinfo lock in in_pcbdisconnect() and in_pcbdetach(), as the global pcb lists are modified. MFC after: 7 days	2005-06-01 11:39:42 +00:00
Robert Watson	3ca1570c82	Slight white space tweak. MFC after: 7 days	2005-06-01 11:38:35 +00:00
Robert Watson	277afaff66	De-spl UDP. MFC after: 3 days	2005-06-01 11:24:00 +00:00
Seigo Tanimura	29ea671b36	Let OSPFv3 go through ipfw. Some more additional checks would be desirable, though.	2005-05-28 07:46:44 +00:00
Paul Saab	808f11b768	This is conform with the terminology in M.Mathis and J.Mahdavi, "Forward Acknowledgement: Refining TCP Congestion Control" SIGCOMM'96, August 1996. Submitted by: Noritoshi Demizu, Raja Mukerji	2005-05-25 17:55:27 +00:00
Paul Saab	64b5fbaa04	Rewrite of tcp_sack_option(). Kentaro Kurahone (NetBSD) pointed out that if we sort the incoming SACK blocks, we can update the scoreboard in one pass of the scoreboard. The added overhead of sorting upto 4 sack blocks is much lower than traversing (potentially) large scoreboards multiple times. The code was updating the scoreboard with multiple passes over it (once for each sack option). The rewrite fixes that, reducing the complexity of the main loop from O(n^2) to O(n). Submitted by: Mohan Srinivasan, Noritoshi Demizu. Reviewed by: Raja Mukerji.	2005-05-23 19:22:48 +00:00
Paul Saab	2cdbfa66ee	Replace t_force with a t_flag (TF_FORCEDATA). Submitted by: Raja Mukerji. Reviewed by: Mohan, Silby, Andre Opperman.	2005-05-21 00:38:29 +00:00
Paul Saab	4fc5324557	Introduce routines to alloc/free sack holes. This cleans up the code considerably. Submitted by: Noritoshi Demizu. Reviewed by: Raja Mukerji, Mohan Srinivasan.	2005-05-16 19:26:46 +00:00
Gleb Smirnoff	32247f8629	- When carp interface is destroyed, and it affects global preemption suppresion counter, decrease the latter. [1] - Add sysctl to monitor preemption suppression. PR: kern/80972 [1] Submitted by: Frank Volf [1] MFC after: 1 week	2005-05-15 01:44:26 +00:00
Paul Saab	fdace17f81	Fix for a bug where the "nexthole" sack hint is out of sync with the real next hole to retransmit from the scoreboard, caused by a bug which did not update the "nexthole" hint in one case in tcp_sack_option(). Reported by: Daniel Eriksson Submitted by: Mohan Srinivasan	2005-05-13 18:02:02 +00:00
Gleb Smirnoff	b3cf6808ce	In div_output() explicitly set m->m_nextpkt to NULL. If divert socket is not userland, but ng_ksocket, then m->m_nextpkt may be non-NULL. In this case we would panic in sbappend.	2005-05-13 11:44:37 +00:00
Paul Saab	0077b0163f	When looking for the next hole to retransmit from the scoreboard, or to compute the total retransmitted bytes in this sack recovery episode, the scoreboard is traversed. While in sack recovery, this traversal occurs on every call to tcp_output(), every dupack and every partial ack. The scoreboard could potentially get quite large, making this traversal expensive. This change optimizes this by storing hints (for the next hole to retransmit and the total retransmitted bytes in this sack recovery episode) reducing the complexity to find these values from O(n) to constant time. The debug code that sanity checks the hints against the computed value will be removed eventually. Submitted by: Mohan Srinivasan, Noritoshi Demizu, Raja Mukerji.	2005-05-11 21:37:42 +00:00
Colin Percival	fe2eee8231	Fix two issues which were missed in FreeBSD-SA-05:08.kmem. Reported by: Uwe Doering	2005-05-07 00:41:36 +00:00
Gleb Smirnoff	cbfbc555e0	Add a workaround for 64-bit archs: store unsigned long return value in temporary variable, check it and then cast to in_addr_t.	2005-05-06 13:01:31 +00:00
Gleb Smirnoff	6293e003c9	s/DEBUG/LIBALIAS_DEBUG/, since DEBUG is defined in LINT and not supported for kernel build.	2005-05-06 11:07:49 +00:00
Colin Percival	fd94099ec2	If we are going to 1. Copy a NULL-terminated string into a fixed-length buffer, and 2. copyout that buffer to userland, we really ought to 0. Zero the entire buffer first. Security: FreeBSD-SA-05:08.kmem	2005-05-06 02:50:00 +00:00
Gleb Smirnoff	e9d5db2888	More bits for kernel version: - copy inet_aton() from libc - disable getservbyname() lookup and accept only numeric port	2005-05-05 22:00:32 +00:00
Gleb Smirnoff	75bc262006	Always include alias.h before alias_local.h	2005-05-05 21:55:17 +00:00
Gleb Smirnoff	f87fe393ce	When used in kernel define NO_FW_PUNCH, NO_LOGGING, NO_USE_SOCKETS.	2005-05-05 21:53:17 +00:00
Gleb Smirnoff	c8d3ca728f	Fix argument order for bcopy() in last commit. Noticed by: njl Pointy hat to: glebius	2005-05-05 21:40:49 +00:00
Gleb Smirnoff	efdc8fbf79	Use bcopy() instead of memmove().	2005-05-05 21:10:51 +00:00
Gleb Smirnoff	ae0440572f	Hide fflush(3) under ifdef DEBUG.	2005-05-05 21:07:34 +00:00
Gleb Smirnoff	c8564bffd2	Things required to build libalias as kernel module: - kernel module declarations and handler. - macros to map malloc(3) calls to malloc(9) ones. - malloc(9) declarations. - call finishoff() from module handler MOD_UNLOAD case instead of atexit(3). - use panic(9) instead of abort(3) - take time from time_second instead of gettimeofday(2) - define INADDR_NONE	2005-05-05 21:05:38 +00:00
Gleb Smirnoff	00fc9a5bb9	Add NO_USE_SOCKETS knob, which cuts off functionality socket binding.	2005-05-05 20:25:12 +00:00
Gleb Smirnoff	40106c140f	Add NO_LOGGING knob, which cuts off functionality of debug logging to a file.	2005-05-05 20:22:09 +00:00
Gleb Smirnoff	c649a2e033	Play with includes so that libalias can be compiled both as userland library and kernel module.	2005-05-05 19:27:32 +00:00
Andre Oppermann	9e4ca6315d	If we don't get a suggested MTU during path MTU discovery look up the packet size of the packet that generated the response, step down the MTU by one step through ip_next_mtu() and try again. Suggested by: dwmalone	2005-05-04 13:48:44 +00:00
Gleb Smirnoff	1f8f08e1c9	Cleanup IPFW2 ifdefs.	2005-05-04 13:24:37 +00:00
Gleb Smirnoff	c3c2f9a9ba	Makefile is not needed here.	2005-05-04 13:24:12 +00:00
Andre Oppermann	4c037f8d6e	Add another step of 1280 (gif(4) tunnels) to ip_next_mtu().	2005-05-04 13:23:54 +00:00
Gleb Smirnoff	a1429ad928	IPFW version 2 is the only option in HEAD and RELENG_5. Thus, cleanup unnecessary now ifdefs.	2005-05-04 13:12:52 +00:00
Andre Oppermann	c773494edd	Pass icmp_error() the MTU argument directly instead of an interface pointer. This simplifies a couple of uses and removes some XXX workarounds.	2005-05-04 13:09:19 +00:00
Robert Watson	b60d26c9b9	Remove now unused inirw variable from previous use of COMMON_END(). Reported by: csjp	2005-05-01 14:01:38 +00:00
Peter Grehan	73fddedac8	Fix typo in last commit. Approved by: rwatson	2005-05-01 13:06:05 +00:00
Robert Watson	d1401c9000	Slide unlocking of the tcbinfo lock earlier in tcp_usr_send(), as it's needed only for implicit connect cases. Under load, especially on SMP, this can greatly reduce contention on the tcbinfo lock. NB: Ambiguities about the state of so_pcb need to be resolved so that all use of the tcbinfo lock in non-implicit connection cases can be eliminated. Submited by: Kazuaki Oda <kaakun at highway dot ne dot jp>	2005-05-01 11:11:38 +00:00
Brooks Davis	31519b13c8	Introduce a struct icmphdr which contains the type, code, and cksum fields of an ICMP packet. Use this to allow ipfw to pullup only these values since it does not use the rest of the packet and it was failed on ICMP packets because they were not long enough. struct icmp should probably be modified to use these at some point, but that will break a fair bit of code so it can wait for another day. On the off chance that adding this struct breaks something in ports, bump __FreeBSD_version. Reported by: Randy Bush <randy at psg dot com> Tested by: Randy Bush <randy at psg dot com>	2005-04-26 18:10:21 +00:00
Paul Saab	91232d6ccc	Remove some code that snuck in by accident. Submitted by: Mohan Srinivasan	2005-04-21 20:29:40 +00:00
Paul Saab	be3f3b5ead	Fix for interaction problems between TCP SACK and TCP Signature. If TCP Signatures are enabled, the maximum allowed sack blocks aren't going to fit. The fix is to compute how many sack blocks fit and tack these on last. Also on SYNs, defer padding until after the SACK PERMITTED option has been added. Found by: Mohan Srinivasan. Submitted by: Mohan Srinivasan, Noritoshi Demizu. Reviewed by: Raja Mukerji.	2005-04-21 20:26:07 +00:00
Paul Saab	97b76190eb	Undo rev 1.71 as it is the wrong change.	2005-04-21 20:24:43 +00:00
Paul Saab	a6235da61e	- Make the sack scoreboard logic use the TAILQ macros. This improves code readability and facilitates some anticipated optimizations in tcp_sack_option(). - Remove tcp_print_holes() and TCP_SACK_DEBUG. Submitted by: Raja Mukerji. Reviewed by: Mohan Srinivasan, Noritoshi Demizu.	2005-04-21 20:11:01 +00:00
Paul Saab	a3047bc036	Fix for 2 bugs related to TCP Signatures : - If the peer sends the Signature option in the SYN, use of Timestamps and Window Scaling were disabled (even if the peer supports them). - The sender must not disable signatures if the option is absent in the received SYN. (See comment in syncache_add()). Found, Submitted by: Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp>. Reviewed by: Mohan Srinivasan <mohans at yahoo-inc dot com>.	2005-04-21 20:09:09 +00:00
Andre Oppermann	1aedbd9c80	Move Path MTU discovery ICMP processing from icmp_input() to tcp_ctlinput() and subject it to active tcpcb and sequence number checking. Previously any ICMP unreachable/needfrag message would cause an update to the TCP hostcache. Now only ICMP PMTU messages belonging to an active TCP session with the correct src/dst/port and sequence number will update the hostcache and complete the path MTU discovery process. Note that we don't entirely implement the recommended counter measures of Section 7.2 of the paper. However we close down the possible degradation vector from trivially easy to really complex and resource intensive. In addition we have limited the smallest acceptable MTU with net.inet.tcp.minmss sysctl for some time already, further reducing the effect of any degradation due to an attack. Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.2 MFC after: 3 days	2005-04-21 14:29:34 +00:00
Andre Oppermann	1600372b6b	Ignore ICMP Source Quench messages for TCP sessions. Source Quench is ineffective, depreciated and can be abused to degrade the performance of active TCP sessions if spoofed. Replace a bogus call to tcp_quench() in tcp_output() with the direct equivalent tcpcb variable assignment. Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.1 MFC after: 3 days	2005-04-21 12:37:12 +00:00
Gleb Smirnoff	9dc1f8e41e	Remove anti-LOR bandaid, it is not needed now. Sponsored by: Rambler	2005-04-20 09:32:05 +00:00
Poul-Henning Kamp	6196e2db3e	Make DUMMYNET compile without INET6	2005-04-19 10:12:21 +00:00
Poul-Henning Kamp	d137deac11	typo	2005-04-19 10:04:38 +00:00
Poul-Henning Kamp	7292e12676	Make IPFIREWALL compile without INET6	2005-04-19 09:56:14 +00:00
Brooks Davis	8195404bed	Add IPv6 support to IPFW and Dummynet. Submitted by: Mariano Tortoriello and Raffaele De Lorenzo (via luigi)	2005-04-18 18:35:05 +00:00
Paul Saab	b7c755717c	Rewrite of tcp_update_sack_list() to make it simpler and more readable than our original OpenBSD derived version. Submitted by: Noritoshi Demizu Reviewed by: Mohan Srinivasan, Raja Mukerji	2005-04-18 18:10:56 +00:00
Brooks Davis	27a2f39bcf	Centralized finding the protocol header in IP packets in preperation for IPv6 support. The header in IPv6 is more complex then in IPv4 so we want to handle skipping over it in one location. Submitted by: Mariano Tortoriello and Raffaele De Lorenzo (via luigi)	2005-04-15 00:47:44 +00:00
Paul Saab	25e6f9ed4b	Fix for a TCP SACK bug where more than (win/2) bytes could have been in flight in SACK recovery. Found by: Noritoshi Demizu Submitted by: Mohan Srinivasan <mohans at yahoo-inc dot com> Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp> Raja Mukerji <raja at moselle dot com>	2005-04-14 20:09:52 +00:00
Paul Saab	cf09195ba5	- Tighten up the Timestamp checks to prevent a spoofed segment from setting ts_recent to an arbitrary value, stopping further communication between the two hosts. - If the Echoed Timestamp is greater than the current time, fall back to the non RFC 1323 RTT calculation. Submitted by: Raja Mukerji (raja at moselle dot com) Reviewed by: Noritoshi Demizu, Mohan Srinivasan	2005-04-10 05:24:59 +00:00
Paul Saab	e346eeff65	- If the reassembly queue limit was reached or if we couldn't allocate a reassembly queue state structure, don't update (receiver) sack report. - Similarly, if tcp_drain() is called, freeing up all items on the reassembly queue, clean the sack report. Found, Submitted by: Noritoshi Demizu <demizu at dd dot iij4u dot or dot jp> Reviewed by: Mohan Srinivasan (mohans at yahoo-inc dot com), Raja Mukerji (raja at moselle dot com).	2005-04-10 05:21:29 +00:00
Paul Saab	b962fa74b5	When the rightmost SACK block expands, rcv_lastsack should be updated. (Fix for kern/78226). Submitted by : Noritoshi Demizu <demizu at dd dot iij4u dot or dot jp> Reviewed by : Mohan Srinivasan (mohans at yahoo-inc dot com), Raja Mukerji (raja at moselle dot com).	2005-04-10 05:20:10 +00:00
Paul Saab	da39f5b963	Remove some unused sack fields. Submitted by : Noritoshi Demizu, Mohan Srinivasan.	2005-04-10 05:19:22 +00:00
Maxim Konovalov	800af1fb81	o Nano optimize ip_reass() code path for the first fragment: do not try to reasseble the packet from the fragments queue with the only fragment, finish with the first fragment as soon as we create a queue. Spotted by: Vijay Singh o Drop the fragment if maxfragsperpacket == 0, no chances we will be able to reassemble the packet in future. Reviewed by: silby	2005-04-08 10:25:13 +00:00
Maxim Konovalov	29f2a6ec18	o Tweak the comment a bit.	2005-04-08 08:43:21 +00:00
Maxim Konovalov	e99971bf2f	o Disable random port allocation when ip.portrange.first == ip.portrange.last and there is the only port for that because: a) it is not wise; b) it leads to a panic in the random ip port allocation code. In general we need to disable ip port allocation randomization if the last - first delta is ridiculous small. PR: kern/79342 Spotted by: Anjali Kulkarni Glanced at by: silby MFC after: 2 weeks	2005-04-08 08:42:10 +00:00
Gleb Smirnoff	8351d04f34	When a packet has been reinjected into ipfw(4) after dummynet(4) processing we have a non-NULL args.rule. If the same packet later is subject to "tee" rule, its original is sent again into ipfw_chk() and it reenters at the same rule. This leads to infinite loop and frozen router. Assign args.rule to NULL, any time we are going to send packet back to ipfw_chk() after a tee rule. This is a temporary workaround, which we will leave for RELENG_5. In HEAD we are going to make divert(4) save next rule the same way as dummynet(4) does. PR: kern/79546 Submitted by: Oleg Bulyzhin Reviewed by: maxim, andre MFC after: 3 days	2005-04-06 14:00:33 +00:00
Brooks Davis	a0d17f7e98	Use ACTION_PTR(r) instead of (r->cmd + r->act_ofs). Reviewed by: md5	2005-04-06 00:26:08 +00:00
Brooks Davis	f4ff11976d	Make dummynet_flush() match its prototype.	2005-04-05 23:38:16 +00:00
Poul-Henning Kamp	a8bc22b47a	natd core dumps when -reverse switch is used because of a bug in libalias. In /usr/src/lib/libalias/alias.c, the functions LibAliasIn and LibAliasOutTry call the legacy PacketAliasIn/PacketAliasOut instead of LibAliasIn/LibAliasOut when the PKT_ALIAS_REVERSE option is set. In this case, the context variable "la" gets lost because the legacy compatibility routines expect "la" to be global. This was obviously an oversight when rewriting the PacketAlias* functions to the LibAlias* functions. The fix (as shown in the patch below) is to remove the legacy subroutine calls and replace with the new ones using the "la" struct as the first arg. Submitted by: Gil Kloepfer <fgil@kloepfer.org> Confirmed by: <nicolai@catpipe.net> PR: 76839 MFC after: 3 days	2005-04-05 13:04:35 +00:00
Gleb Smirnoff	4cb39345c0	When several carp interfaces are attached to Ethernet interface, carp_carpdev_state_locked() is called every time carp interface is attached. The first call backs up flags of the first interface, and the second call backs up them again, erasing correct values. To solve this, a carp_sc_state_locked() function is introduced. It is called when interface is attached to parent, instead of calling carp_carpdev_state_locked. carp_carpdev_state_locked() calls carp_sc_state_locked() for each sc in chain. Reported by: Yuriy N. Shkandybin, sem	2005-03-30 11:44:43 +00:00
Gleb Smirnoff	d1a4742962	- Don't free mbuf, passed to interface output method if the latter returns error. In this case mbuf has already been freed. [1] - Remove redundant declaration. PR: kern/78893 [1] Submitted by: Liang Yi [1] Reviewed by: sam MFC after: 1 day	2005-03-29 13:43:09 +00:00
Sam Leffler	812d865346	eliminate extraneous null ptr checks Noticed by: Coverity Prevent analysis tool	2005-03-29 01:10:46 +00:00
Sam Leffler	5309f84168	deal with malloc failures Noticed by: Coverity Prevent analysis tool Together with: mdodd	2005-03-26 22:20:22 +00:00
Maxim Konovalov	6ee79c59d2	o Document net.inet.ip.portrange.random* sysctls. o Correct a comment about random port allocation threshold implementation. Reviewed by: silby, ru MFC after: 3 days	2005-03-23 09:26:38 +00:00
Gleb Smirnoff	d4d2297060	ifma_protospec is a pointer. Use NULL when assigning or compating it.	2005-03-20 14:31:45 +00:00
Gleb Smirnoff	50bb170471	Remove a workaround from previos revision. It proved to be incorrect. Add two another workarounds for carp(4) interfaces: - do not add connected route when address is assigned to carp(4) interface - do not add connected route when other interface goes down Embrace workarounds with #ifdef DEV_CARP	2005-03-20 10:27:17 +00:00
Gleb Smirnoff	ee6f227017	If vhid exists return more informative EEXIST instead of EINVAL. While here remove redundant brackets.	2005-03-18 13:41:38 +00:00
Gleb Smirnoff	9860bab349	Fix a potential crash that could occur when CARP_LOG is being used. Obtained from: OpenBSD (pat)	2005-03-18 13:18:34 +00:00
Sam Leffler	6a9909b5e6	plug resource leak Noticed by: Coverity Prevent analysis tool	2005-03-16 05:27:19 +00:00
Robert Watson	d2bc35ab29	In tcp_usr_send(), broaden coverage of the socket buffer lock in the non-OOB case so that the sbspace() check is performed under the same lock instance as the append to the send socket buffer. MFC after: 1 week	2005-03-14 22:15:14 +00:00
Gleb Smirnoff	422a115a4a	Embrace with #ifdef DEV_CARP carp-related code.	2005-03-13 11:23:22 +00:00
Gleb Smirnoff	0504a89fdd	Add antifootshooting workaround, which will make all routes "connected" to carp(4) interfaces host routes. This prevents a problem, when connected network is routed to carp(4) interface.	2005-03-10 15:26:45 +00:00
Paul Saab	e891d82b56	Add limits on the number of elements in the sack scoreboard both per-connection and globally. This eliminates potential DoS attacks where SACK scoreboard elements tie up too much memory. Submitted by: Raja Mukerji (raja at moselle dot com). Reviewed by: Mohan Srinivasan (mohans at yahoo-inc dot com).	2005-03-09 23:14:10 +00:00
Gleb Smirnoff	2ef4a436e0	Make ARP do not complain about wrong interface if correct interface is a carp one and address matched it. Reviewed by: brooks	2005-03-09 10:00:01 +00:00
Joe Marcus Clarke	70037e98c4	Fix a problem in the Skinny ALG where a specially crafted packet could cause a libalias application (e.g. natd, ppp, etc.) to crash. Note: Skinny support is not enabled in natd or ppp by default. Approved by: secteam (nectar) MFC after: 1 day Secuiryt: This fixes a remote DoS exploit	2005-03-03 03:06:37 +00:00
Gleb Smirnoff	b82936c5d4	Fix typo. Unbreak build. Take pointy hat.	2005-03-02 09:11:18 +00:00
Gleb Smirnoff	d220759b41	Add more locking when reading/writing to carp softc. When carp softc is attached to a parent interface we use its mutex to lock the softc. This means that in several places like carp_ioctl() we lock softc conditionaly. This should be redesigned. To avoid LORs when MII announces us a link state change, we schedule a quick callout and call carp_carpdev_state_locked() from it. Initialize callouts using NET_CALLOUT_MPSAFE. Sponsored by: Rambler Reviewed by: mlaier	2005-03-01 13:14:33 +00:00
Gleb Smirnoff	d92d54d54d	- Add carp_mtx. Use it to protect list of all carp interfaces. - In carp_send_ad_all() walk through list of all carp interfaces instead of walking through list of all interfaces. Sponsored by: Rambler Reviewed by: mlaier	2005-03-01 12:36:07 +00:00
Gleb Smirnoff	31199c8463	Use NET_CALLOUT_MPSAFE macro.	2005-03-01 12:01:17 +00:00
Gleb Smirnoff	3a84d72a78	Revert change to struct ifnet. Use ifnet pointer in softc. Embedding ifnet into smth will soon be removed. Requested by: brooks	2005-03-01 10:59:14 +00:00
Gleb Smirnoff	4358dfc32f	Remove debugging printf. Reviewed by: mlaier	2005-03-01 09:31:36 +00:00
Yaroslav Tykhiy	630481bb92	Support running carp(4) over a vlan(4) parent interface. Encouraged by: glebius	2005-02-28 16:19:11 +00:00
Gleb Smirnoff	1d0a237660	Remove unused field from carp softc. OK'ed by: mcbride@OpenBSD	2005-02-28 11:57:03 +00:00
Gleb Smirnoff	3e07def4cd	Fix tcpdump(8) on carp(4) interface: - Use our loop DLT type, not OpenBSD. [1] - The fields that are converted to network byte order are not 32-bit fields but 16-bit fields, so htons should be used in htonl. [1] - Secondly, ip_input changes ip->ip_len into its value without the ip-header length. So, restore the length to make bpf happy. [1] - Use bpf_mtap2(), use temporary af1, since bpf_mtap2 doesn't understand uint8_t af identifier. Submitted by: Frank Volf [1]	2005-02-28 11:54:36 +00:00
Paul Saab	8291294024	If the receiver sends an ack that is out of [snd_una, snd_max], ignore the sack options in that segment. Else we'd end up corrupting the scoreboard. Found by: Raja Mukerji (raja at moselle dot com) Submitted by: Mohan Srinivasan	2005-02-27 20:39:04 +00:00
Max Laier	a4e5390551	Unbreak the build. carp_iamatch6 and carp_macmatch6 are not supposed to be static as they are used elsewhere.	2005-02-27 11:32:26 +00:00
Gleb Smirnoff	e8c34a71eb	Remove carp_softc.sc_ifp member in favor of union pointers in struct ifnet. Obtained from: OpenBSD	2005-02-26 13:55:07 +00:00
Gleb Smirnoff	5c1f0f6de5	Staticize local functions.	2005-02-26 10:33:14 +00:00
Gleb Smirnoff	88bf82a62e	New lines when logging.	2005-02-25 11:26:39 +00:00
Gleb Smirnoff	947b7cf3c6	Embrace macros with do {} while (0) Submitted by: maxim	2005-02-25 10:49:47 +00:00
Gleb Smirnoff	39aeaa0eb5	Call carp_carpdev_state() from carp_set_addr6(). See log for rev 1.4. Sponsored by: Rambler	2005-02-25 10:12:11 +00:00
Gleb Smirnoff	1e9e65729b	Improve logging: - Simplify CARP_LOG() and making it working (we don't have addlog in FreeBSD). - Introduce CARP_DEBUG() which logs with LOG_DEBUG severity when net.inet.carp.log > 1 - Use CARP_DEBUG to log state changes of carp interfaces. After CARP_LOG() cleanup it appeared that carp_input_c() does not need sc argument. Remove it. Sponsored by: Rambler	2005-02-25 10:09:44 +00:00
Gleb Smirnoff	6fba4c0bae	Fix problem when master comes up with one interface down, and preempts mastering on all other interfaces: - call carp_carpdev_state() on initialize instead of just setting to INIT - in carp_carpdev_state() check that interface is UP, instead of checking that it is not DOWN, because a rebooted machine may have interface in UNKNOWN state. Sponsored by: Rambler Obtained from: OpenBSD (partially)	2005-02-24 09:05:28 +00:00
Sam Leffler	db77984c5b	fix potential invalid index into ip_protox array Noticed by: Coverity Prevent analysis tool	2005-02-23 00:38:12 +00:00
Maxime Henrion	2368737719	Unbreak CARP build on 64-bit architectures. Tested on: sparc64	2005-02-23 00:20:33 +00:00
Andre Oppermann	099dd0430b	Bring back the full packet destination manipulation for 'ipfw fwd' with the kernel compile time option: options IPFIREWALL_FORWARD_EXTENDED This option has to be specified in addition to IPFIRWALL_FORWARD. With this option even packets targeted for an IP address local to the host can be redirected. All restrictions to ensure proper behaviour for locally generated packets are turned off. Firewall rules have to be carefully crafted to make sure that things like PMTU discovery do not break. Document the two kernel options. PR: kern/71910 PR: kern/73129 MFC after: 1 week	2005-02-22 17:40:40 +00:00
Gleb Smirnoff	67df421496	Remove promisc counter from parent interface in carp_clone_destroy(), so that parent interface is not left in promiscous mode after carp interface is destroyed. This is not perfect, since promisc counter is added when carp interface is assigned an IP address. However, when address is removed parent interface is still in promiscuous mode. Only removal of carp interface removes promisc from parent. Same way in OpenBSD. Sponsored by: Rambler	2005-02-22 16:24:55 +00:00
Gleb Smirnoff	a97719482d	Add CARP (Common Address Redundancy Protocol), which allows multiple hosts to share an IP address, providing high availability and load balancing. Original work on CARP done by Michael Shalayeff, with many additions by Marco Pfatschbacher and Ryan McBride. FreeBSD port done solely by Max Laier. Patch by: mlaier Obtained from: OpenBSD (mickey, mcbride)	2005-02-22 13:04:05 +00:00
Gleb Smirnoff	797127a9bf	We can make code simplier after last change. Noticed by: Andrew Thompson	2005-02-22 08:35:24 +00:00
Gleb Smirnoff	3a1757b9c0	In in_pcbconnect_setup() jailed sockets are treated specially: if local address is not supplied, then jail IP is choosed and in_pcbbind() is called. Since udp_output() does not save local addr after call to in_pcbconnect_setup(), in_pcbbind() is called for each packet, and this is incorrect. So, we shall treat jailed sockets specially in udp_output(), we will save their local address. This fixes a long standing bug with broken sendto() system call in jails. PR: kern/26506 Reviewed by: rwatson MFC after: 2 weeks	2005-02-22 07:50:02 +00:00
Gleb Smirnoff	914d092f5d	In in_pcbconnect_setup() remove a check that route points at loopback interface. Nobody have explained me sense of this check. It breaks connect() system call to a destination address which is loopback routed (e.g. blackholed). Reviewed by: silence on net@ MFC after: 2 weeks	2005-02-22 07:39:15 +00:00
Robert Watson	0daccb9c94	In the current world order, solisten() implements the state transition of a socket from a regular socket to a listening socket able to accept new connections. As part of this state transition, solisten() calls into the protocol to update protocol-layer state. There were several bugs in this implementation that could result in a race wherein a TCP SYN received in the interval between the protocol state transition and the shortly following socket layer transition would result in a panic in the TCP code, as the socket would be in the TCPS_LISTEN state, but the socket would not have the SO_ACCEPTCONN flag set. This change does the following: - Pushes the socket state transition from the socket layer solisten() to to socket "library" routines called from the protocol. This permits the socket routines to be called while holding the protocol mutexes, preventing a race exposing the incomplete socket state transition to TCP after the TCP state transition has completed. The check for a socket layer state transition is performed by solisten_proto_check(), and the actual transition is performed by solisten_proto(). - Holds the socket lock for the duration of the socket state test and set, and over the protocol layer state transition, which is now possible as the socket lock is acquired by the protocol layer, rather than vice versa. This prevents additional state related races in the socket layer. This permits the dual transition of socket layer and protocol layer state to occur while holding locks for both layers, making the two changes atomic with respect to one another. Similar changes are likely require elsewhere in the socket/protocol code. Reported by: Peter Holm <peter@holm.cc> Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net> Philosophical head nod: gnn	2005-02-21 21:58:17 +00:00
Paul Saab	7643c37cf2	Remove 2 (SACK) fields from the tcpcb. These are only used by a function that is called from tcp_input(), so they oughta be passed on the stack instead of stuck in the tcpcb. Submitted by: Mohan Srinivasan	2005-02-17 23:04:56 +00:00
Paul Saab	7776346f83	Fix for a SACK (receiver) bug where incorrect SACK blocks are reported to the sender - in the case where the sender sends data outside the window (as WinXP does :(). Reported by: Sam Jensen <sam at wand dot net dot nz> Submitted by: Mohan Srinivasan	2005-02-16 01:46:17 +00:00
Paul Saab	8db456bf17	- Retransmit just one segment on initiation of SACK recovery. Remove the SACK "initburst" sysctl. - Fix bugs in SACK dupack and partialack handling that can cause large bursts while in SACK recovery. Submitted by: Mohan Srinivasan	2005-02-14 21:01:08 +00:00
Maxim Konovalov	9945c0e21f	o Add handling of an IPv4-mapped IPv6 address. o Use SYSCTL_IN() macro instead of direct call of copyin(9). Submitted by: ume o Move sysctl_drop() implementation to sys/netinet/tcp_subr.c where most of tcp sysctls live. o There are net.inet[6].tcp[6].getcred sysctls already, no needs in a separate struct tcp_ident_mapping. Suggested by: ume	2005-02-14 07:37:51 +00:00
Gleb Smirnoff	1af305441d	Jump to common action checks after doing specific once. This fixes adding of divert rules, which I break in previous commit. Pointy hat to: glebius	2005-02-06 11:13:59 +00:00
Maxim Konovalov	212a79b010	o Implement net.inet.tcp.drop sysctl and userland part, tcpdrop(8) utility: The tcpdrop command drops the TCP connection specified by the local address laddr, port lport and the foreign address faddr, port fport. Obtained from: OpenBSD Reviewed by: rwatson (locking), ru (man page), -current MFC after: 1 month	2005-02-06 10:47:12 +00:00
Gleb Smirnoff	670742a102	Add a ng_ipfw node, implementing a quick and simple interface between ipfw(4) and netgraph(4) facilities. Reviewed by: andre, brooks, julian	2005-02-05 12:06:33 +00:00
Hajimu UMEMOTO	6d0a982bdf	teach scope of IPv6 address to net.inet6.tcp6.getcred. MFC after: 1 week	2005-02-04 14:43:05 +00:00
Robert Watson	06456da2c6	Update an additional reference to the rate of ISN tick callouts that was missed in tcp_subr.c:1.216: projected_offset must also reflect how often the tcp_isn_tick() callout will fire. MFC after: 2 weeks Submitted by: silby	2005-01-31 01:35:01 +00:00
Christian S.J. Peron	0ba04c87b3	Change the state allocator from using regular malloc to using a UMA zone instead. This should eliminate a bit of the locking overhead associated with with malloc and reduce the memory consumption associated with each new state. Reviewed by: rwatson, andre Silence on: ipfw@ MFC after: 1 week	2005-01-31 00:48:39 +00:00
Robert Watson	54082796aa	Have tcp_isn_tick() fire 100 times a second, rather than HZ times a second; since the default hz has changed to 1000 times a second, this resulted in unecessary work being performed. MFC after: 2 weeks Discussed with: phk, cperciva General head nod: silby	2005-01-30 23:30:28 +00:00
Robert Watson	024105493d	Prefer (NULL) spelling of (0) for pointers. MFC after: 3 days	2005-01-30 19:29:47 +00:00
Robert Watson	77c16eed7c	Remove clause three from tcp_syncache.c license per permission of McAfee. Update copyright to McAfee from NETA.	2005-01-30 19:28:27 +00:00
Alan Cox	7258e9687b	Correctly move the packet header in ip_insertoptions(). Reported by: Anupam Chanda Reviewed by: sam@ MFC after: 2 weeks	2005-01-23 19:43:46 +00:00
Ruslan Ermilov	24a0682c64	Sort sections.	2005-01-20 09:17:07 +00:00
Gleb Smirnoff	28935658c4	- Reduce number of arguments passed to dummynet_io(), we already have cookie in struct ip_fw_args itself. - Remove redundant &= 0xffff from dummynet_io().	2005-01-16 11:13:18 +00:00
Gleb Smirnoff	6c69a7c30b	o Clean up interface between ip_fw_chk() and its callers: - ip_fw_chk() returns action as function return value. Field retval is removed from args structure. Action is not flag any more. It is one of integer constants. - Any action-specific cookies are returned either in new "cookie" field in args structure (dummynet, future netgraph glue), or in mbuf tag attached to packet (divert, tee, some future action). o Convert parsing of return value from ip_fw_chk() in ipfw_check_{in,out}() to a switch structure, so that the functions are more readable, and a future actions can be added with less modifications. Approved by: andre MFC after: 2 months	2005-01-14 09:00:46 +00:00
Paul Saab	8d03f2b53b	Fix a TCP SACK related crash resulting from incorrect computation of len in tcp_output(), in the case where the FIN has already been transmitted. The mis-computation of len is because of a gcc optimization issue, which this change works around. Submitted by: Mohan Srinivasan	2005-01-12 21:40:51 +00:00
Brian Somers	2a4cd52421	include "alias.h", not <alias.h> MFC after: 3 days	2005-01-10 10:54:06 +00:00
Warner Losh	c398230b64	/* -> /*- for license, minor formatting changes	2005-01-07 01:45:51 +00:00
Mike Silbersack	a69968ee4e	Add a sysctl (net.inet.tcp.insecure_rst) which allows one to specify that the RFC 793 specification for accepting RST packets should be following. When followed, this makes one vulnerable to the attacks described in "slipping in the window", but it may be necessary in some odd circumstances.	2005-01-03 07:08:37 +00:00
Mike Silbersack	5f311da2cc	Port randomization leads to extremely fast port reuse at high connection rates, which is causing problems for some users. To retain the security advantage of random ports and ensure correct operation for high connection rate users, disable port randomization during periods of high connection rates. Whenever the connection rate exceeds randomcps (10 by default), randomization will be disabled for randomtime (45 by default) seconds. These thresholds may be tuned via sysctl. Many thanks to Igor Sysoev, who proved the necessity of this change and tested many preliminary versions of the patch. MFC After: 20 seconds	2005-01-02 01:50:57 +00:00
Robert Watson	74d4630b71	Remove an errant blank line apparently introduced in ip_output.c:1.194.	2004-12-25 22:59:42 +00:00
Robert Watson	42cf3289c3	In the dropafterack case of tcp_input(), it's OK to release the TCP pcbinfo lock before calling tcp_output(), as holding just the inpcb lock is sufficient to prevent garbage collection.	2004-12-25 22:26:13 +00:00
Robert Watson	e0bef1cb35	Revert parts of tcp_input.c:1.255 associated with the header predicted cases for tcp_input(): While it is true that the pcbinfo lock provides a pseudo-reference to inpcbs, both the inpcb and pcbinfo locks are required to free an un-referenced inpcb. As such, we can release the pcbinfo lock as long as the inpcb remains locked with the confidence that it will not be garbage-collected. This leads to a less conservative locking strategy that should reduce contention on the TCP pcbinfo lock. Discussed with: sam	2004-12-25 22:23:13 +00:00
Robert Watson	452d9f5b1c	Attempt to consistently use () around return values in calls to return() in newer code (sysctl, ISN, timewait).	2004-12-23 01:34:26 +00:00
Robert Watson	06da46b241	Remove an XXXRW comment relating to whether or not the TCP timers are MPSAFE: they are now believed to be. Correct a typo in a second comment. MFC after: 2 weeks	2004-12-23 01:27:13 +00:00
Robert Watson	db0aae38b6	Remove the now unused tcp_canceltimers() function. tcpcb timers are now stopped as part of tcp_discardcb(). MFC after: 2 weeks	2004-12-23 01:25:59 +00:00
Robert Watson	950ab1e470	Remove an annotation of a minor race relating to the update of multiple MIB entries using sysctl in short order, which might result in unexpected values for tcp_maxidle being generated by tcp_slowtimo. In practice, this will not happen, or at least, doesn't require an explicit comment. MFC after: 2 weeks	2004-12-23 01:21:54 +00:00
Gleb Smirnoff	5e5da86597	In certain cases ip_output() can free our route, so check for its presence before RTFREE(). Noticed by: ru	2004-12-10 07:51:14 +00:00
Gleb Smirnoff	d2a09f901a	Revert last change. Andre: First lets get major new features into the kernel in a clean and nice way, and then start optimizing. In this case we don't have any obfusication that makes later profiling and/or optimizing difficult in any way. Requested by: csjp, sam	2004-12-10 07:47:17 +00:00
Christian S.J. Peron	fbf2edb6e4	This commit adds a shared locking mechanism very similar to the mechanism used by pfil. This shared locking mechanism will remove a nasty lock order reversal which occurs when ucred based rules are used which results in hard locks while mpsafenet=1. So this removes the debug.mpsafenet=0 requirement when using ucred based rules with IPFW. It should be noted that this locking mechanism does not guarantee fairness between read and write locks, and that it will favor firewall chain readers over writers. This seemed acceptable since write operations to firewall chains protected by this lock tend to be less frequent than reads. Reviewed by: andre, rwatson Tested by: myself, seanc Silence on: ipfw@ MFC after: 1 month	2004-12-10 02:17:18 +00:00
Gleb Smirnoff	f5a19d3909	Check that DUMMYNET_LOADED before seeking dummynet m_tag. Reviewed by: andre MFC after: 1 week	2004-12-09 16:41:47 +00:00
Max Laier	067a8bab8a	More fixing of multiple addresses in the same prefix. This time do not try to arp resolve "secondary" local addresses. Found and submitted by: ru With additions from: OpenBSD (rev. 1.47) Reviewed by: ru	2004-12-09 00:12:41 +00:00
Ruslan Ermilov	5cae05ad33	Time out routes created by redirect.	2004-12-06 22:27:22 +00:00
Gleb Smirnoff	98335aa976	- Make route cacheing optional, configurable via IFF_LINK0 flag. - Turn it off by default. Requested by: many Reviewed by: andre Approved by: julian (mentor) MFC after: 3 days	2004-12-06 19:02:43 +00:00
Robert Watson	79a9e59c89	Assert the tcptw inpcb lock in tcp_timer_2msl_reset(), as fields in the tcptw undergo non-atomic read-modify-writes. MFC after: 2 weeks	2004-12-05 22:47:29 +00:00
Robert Watson	b9155d92b2	Assert inpcb lock in: tcpip_fillheaders() tcp_discardcb() tcp_close() tcp_notify() tcp_new_isn() tcp_xmit_bandwidth_limit() Fix a locking comment in tcp_twstart(): the pcbinfo will be locked (and is asserted). MFC after: 2 weeks	2004-12-05 22:27:53 +00:00
Robert Watson	6fbed4af22	Minor grammer fix in comment.	2004-12-05 22:20:59 +00:00
Robert Watson	89924e5865	Pass the inpcb reference into ip_getmoptions() rather than just the inp->inp_moptions pointer, so that ip_getmoptions() can perform necessary locking when doing non-atomic reads. Lock the inpcb by default to copy any data to local variables, then unlock before performing sooptcopyout(). MFC after: 2 weeks	2004-12-05 22:08:37 +00:00
Robert Watson	92c71ab30b	Define INP_UNLOCK_ASSERT() to assert that an inpcb is unlocked. MFC after: 2 weeks	2004-12-05 22:07:14 +00:00
Robert Watson	5c918b56d8	Push the inpcb argument into ip_setmoptions() when setting IP multicast socket options, so that it is available for locking.	2004-12-05 21:38:33 +00:00
Robert Watson	993d9505d4	Start working through inpcb locking for ip_ctloutput() by cleaning up modifications to the inpcb IP options mbuf: - Lock the inpcb before passing it into ip_pcbopts() in order to prevent simulatenous reads and read-modify-writes that could result in races. - Pass the inpcb reference into ip_pcbopts() instead of the option chain pointer in the inpcb. - Assert the inpcb lock in ip_pcbots. - Convert one or two uses of a pointer as a boolean or an integer comparison to a comparison with NULL for readability.	2004-12-05 19:11:09 +00:00
Paul Saab	7d5ed1ceea	Fixes a bug in SACK causing us to send data beyond the receive window. Found by: Pawel Worach and Daniel Hartmeier Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-11-29 18:47:27 +00:00
Robert Watson	2be3bf2244	Assert the inpcb lock in tcp_xmit_timer() as it performs read-modify- write of various time/rtt-related fields in the tcpcb.	2004-11-28 11:06:22 +00:00
Robert Watson	18ad5842c5	Expand coverage of the receive socket buffer lock when handling urgent pointer updates: test available space while holding the socket buffer mutex, and continue to hold until until the pointer update has been performed. MFC after: 2 weeks	2004-11-28 11:01:31 +00:00
Robert Watson	c8443a1dc0	Do export the advertised receive window via the tcpi_rcv_space field of struct tcp_info.	2004-11-27 20:20:11 +00:00
Robert Watson	b8af5dfa81	Implement parts of the TCP_INFO socket option as found in Linux 2.6. This socket option allows processes query a TCP socket for some low level transmission details, such as the current send, bandwidth, and congestion windows. Linux provides a 'struct tcpinfo' structure containing various variables, rather than separate socket options; this makes the API somewhat fragile as it makes it dificult to add new entries of interest as requirements and implementation evolve. As such, I've included a large pad at the end of the structure. Right now, relatively few of the Linux API fields are filled in, and some contain no logical equivilent on FreeBSD. I've include __'d entries in the structure to make it easier to figure ou what is and isn't omitted. This API/ABI should be considered unstable for the time being.	2004-11-26 18:58:46 +00:00
Mike Silbersack	6a220ed80a	Fix a problem where our TCP stack would ignore RST packets if the receive window was 0 bytes in size. This may have been the cause of unsolved "connection not closing" reports over the years. Thanks to Michiel Boland for providing the fix and providing a concise test program for the problem. Submitted by: Michiel Boland MFC after: 2 weeks	2004-11-25 19:04:20 +00:00
Robert Watson	de30ea131f	In tcp_reass(), assert the inpcb lock on the passed tcpcb, since the contents of the tcpcb are read and modified in volume. In tcp_input(), replace th comparison with 0 with a comparison with NULL. At the 'findpcb', 'dropafterack', and 'dropwithreset' labels in tcp_input(), assert 'headlocked'. Try to improve consistency between various assertions regarding headlocked to be more informative. MFC after: 2 weeks	2004-11-23 23:41:20 +00:00
Robert Watson	cce83ffb5a	tcp_timewait() performs multiple non-atomic reads on the tcptw structure, so assert the inpcb lock associated with the tcptw. Also assert the tcbinfo lock, as tcp_timewait() may call tcp_twclose() or tcp_2msl_rest(), which require it. Since tcp_timewait() is already called with that lock from tcp_input(), this doesn't change current locking, merely documents reasons for it. In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest() is called, which requires that lock. In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop() is called, which requires that lock. Document the locking strategy for the time wait queues in tcp_timer.c, which consists of protecting the time wait queues in the same manner as the tcbinfo structure (using the tcbinfo lock). In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait queues are modified. In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait queues may be modified. In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait queues may be modified. MFC after: 2 weeks	2004-11-23 17:21:30 +00:00
Robert Watson	b42ff86e73	De-spl tcp_slowtimo; tcp_maxidle assignment is subject to possible but unlikely races that could be corrected by having tcp_keepcnt and tcp_keepintvl modifications go through handler functions via sysctl, but probably is not worth doing. Updates to multiple sysctls within evaluation of a single addition are unlikely. Annotate that tcp_canceltimers() is currently unused. De-spl tcp_timer_delack(). De-spl tcp_timer_2msl(). MFC after: 2 weeks	2004-11-23 16:45:07 +00:00
Robert Watson	7258e91f0f	Assert the inpcb lock in tcp_twstart(), which does both read-modify-write on the tcpcb, but also calls into tcp_close() and tcp_twrespond(). Annotate that tcp_twrecycleable() requires the inpcb lock because it does a series of non-atomic reads of the tcpcb, but is currently called without the inpcb lock by the caller. This is a bug. Assert the inpcb lock in tcp_twclose() as it performs a read-modify-write of the timewait structure/inpcb, and calls in_pcbdetach() which requires the lock. Assert the inpcb lock in tcp_twrespond(), as it performs multiple non-atomic reads of the tcptw and inpcb structures, as well as calling mac_create_mbuf_from_inpcb(), tcpip_fillheaders(), which require the inpcb lock. MFC after: 2 weeks	2004-11-23 16:23:13 +00:00
Robert Watson	8263bab34d	Assert inpcb lock in tcp_quench(), tcp_drop_syn_sent(), tcp_mtudisc(), and tcp_drop(), due to read-modify-write of TCP state variables. MFC after: 2 weeks	2004-11-23 16:06:15 +00:00
Robert Watson	8438db0f59	Assert the tcbinfo write lock in tcp_new_isn(), as the tcbinfo lock protects access to the ISN state variables. Acquire the tcbinfo write lock in tcp_isn_tick() to synchronize timer-driven isn bumping. Staticize internal ISN variables since they're not used outside of tcp_subr.c. MFC after: 2 weeks	2004-11-23 15:59:43 +00:00
Robert Watson	ca127a3e80	Remove "Unlocked read" annotations associated with previously unlocked use of socket buffer fields in the TCP input code. These references are now protected by use of the receive socket buffer lock. MFC after: 1 week	2004-11-22 13:16:27 +00:00
Robert Watson	98734750b4	s/send/sent/ in comment describing TCPS_SYN_RECEIVED.	2004-11-21 14:38:04 +00:00
Gleb Smirnoff	c1384b5ae2	- Since divert protocol is not connection oriented, remove SS_ISCONNECTED flag from divert sockets. - Remove div_disconnect() method, since it shouldn't be called now. - Remove div_abort() method. It was never called directly, since protocol doesn't have listen queue. It was called only from div_disconnect(), which is removed now. Reviewed by: rwatson, maxim Approved by: julian (mentor) MT5 after: 1 week MT4 after: 1 month	2004-11-18 13:49:18 +00:00
Max Laier	9a6a6eeba2	Fix host route addition for more than one address to a loopback interface after allowing more than one address with the same prefix. Reported by: Vladimir Grebenschikov <vova NO fbsd SPAM ru> Submitted by: ru (also NetBSD rev. 1.83) Pointyhat to: mlaier	2004-11-17 23:14:03 +00:00
Max Laier	81d96ce8a4	Merge copyright notices. Requested by: njl	2004-11-13 17:05:40 +00:00
Gleb Smirnoff	ea0bd57615	Fix ng_ksocket(4) operation as a divert socket, which is pretty useful and has been broken twice: - in the beginning of div_output() replace KASSERT with assignment, as it was in rev. 1.83. [1] [to be MFCed] - refactor changes introduced in rev. 1.100: do not prepend a new tag unconditionally. Before doing this check whether we have one. [2] A small note for all hacking in this area: when divert socket is not a real userland, but ng_ksocket(4), we receive _the same_ mbufs, that we transmitted to socket. These mbufs have rcvif, the tags we've put on them. And we should treat them correctly. Discussed with: mlaier [1] Silence from: green [2] Reviewed by: maxim Approved by: julian (mentor) MFC after: 1 week	2004-11-12 22:17:42 +00:00
Max Laier	48321abefe	Change the way we automatically add prefix routes when adding a new address. This makes it possible to have more than one address with the same prefix. The first address added is used for the route. On deletion of an address with IFA_ROUTE set, we try to find a "fallback" address and hand over the route if possible. I plan to MFC this in 4 weeks, hence I keep the - now obsolete - argument to in_ifscrub as it must be considered KAPI as it is not static in in.c. I will clean this after the MFC. Discussed on: arch, net Tested by: many testers of the CARP patches Nits from: ru, Andrea Campi <andrea+freebsd_arch webcom it> Obtained from: WIDE via OpenBSD MFC after: 1 month	2004-11-12 20:53:51 +00:00
Poul-Henning Kamp	e21e4c19c9	Add missing '=' Spotted by: obrien	2004-11-11 19:02:01 +00:00
Andre Oppermann	5e7b233055	Fix a double-free in the 'hlen > m->m_len' sanity check. Bug report by: <james@towardex.com> MFC after: 2 weeks	2004-11-09 09:40:32 +00:00
SUZUKI Shinsuke	3d54848fc2	support TCP-MD5(IPv4) in KAME-IPSEC, too. MFC after: 3 week	2004-11-08 18:49:51 +00:00
Poul-Henning Kamp	756d52a195	Initialize struct pr_userreqs in new/sparse style and fill in common default elements in net_init_domain(). This makes it possible to grep these structures and see any bogosities.	2004-11-08 14:44:54 +00:00
Robert Watson	d6915262af	Do some re-sorting of TCP pcbinfo locking and assertions: make sure to retain the pcbinfo lock until we're done using a pcb in the in-bound path, as the pcbinfo lock acts as a pseuo-reference to prevent the pcb from potentially being recycled. Clean up assertions and make sure to assert that the pcbinfo is locked at the head of code subsections where it is needed. Free the mbuf at the end of tcp_input after releasing any held locks to reduce the time the locks are held. MFC after: 3 weeks	2004-11-07 19:19:35 +00:00
Andre Oppermann	e9a4cd2426	Fix a double-free in the 'm->m_len < sizeof (struct ip)' sanity check. Bug report by: <james@towardex.com> MFC after: 2 weeks	2004-11-06 10:47:36 +00:00
Poul-Henning Kamp	c83c1318f5	Hide udp_in6 behind #ifdef INET6	2004-11-04 07:14:03 +00:00
Bruce M Simpson	38f061057b	When performing IP fast forwarding, immediately drop traffic which is destined for a blackhole route. This also means that blackhole routes do not need to be bound to lo(4) or disc(4) interfaces for the net.inet.ip.fastforwarding=1 case. Submitted by: james at towardex dot com Sponsored by: eXtensible Open Router Project <URL:http://www.xorp.org/> MFC after: 3 weeks	2004-11-04 02:14:38 +00:00
Robert Watson	d4b509bd7f	Until this change, the UDP input code used global variables udp_in, udp_in6, and udp_ip6 to pass socket address state between udp_input(), udp_append(), and soappendaddr_locked(). While file in the default configuration, when running with multiple netisrs or direct ithread dispatch, this can result in races wherein user processes using recvmsg() get back the wrong source IP/port. To correct this and related races: - Eliminate udp_ip6, which is believed to be generated but then never used. Eliminate ip_2_ip6_hdr() as it is now unneeded. - Eliminate setting, testing, and existence of 'init' status fields for the IPv6 structures. While with multiple UDP delivery this could lead to amortization of IPv4 -> IPv6 conversion when delivering an IPv4 UDP packet to an IPv6 socket, it added substantial complexity and side effects. - Move global structures into the stack, declaring udp_in in udp_input(), and udp_in6 in udp_append() to be used if a conversion is required. Pass &udp_in into udp_append(). - Re-annotate comments to reflect updates. With this change, UDP appears to operate correctly in the presence of substantial inbound processing parallelism. This solution avoids introducing additional synchronization, but does increase the potential stack depth. Discovered by: kris (Bug Magnet) MFC after: 3 weeks	2004-11-04 01:25:23 +00:00
Andre Oppermann	c94c54e4df	Remove RFC1644 T/TCP support from the TCP side of the network stack. A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch	2004-11-02 22:22:22 +00:00
Robert Watson	ab5c14d828	Correct a bug in TCP SACK that could result in wedging of the TCP stack under high load: only set function state to loop and continuing sending if there is no data left to send. RELENG_5_3 candidate. Feet provided: Peter Losher <Peter underscore Losher at isc dot org> Diagnosed by: Aniel Hartmeier <daniel at benzedrine dot cx> Submitted by: mohan <mohans at yahoo-inc dot com>	2004-10-30 12:02:50 +00:00
Robert Watson	c427483381	Add a matching tunable for net.inet.tcp.sack.enable sysctl.	2004-10-26 08:59:09 +00:00
Bruce M Simpson	d6fa5d2806	Check that rt_mask(rt) is non-NULL before dereferencing it, in the RTM_ADD case, thus avoiding a panic. Submitted by: Iasen Kostov	2004-10-26 03:31:58 +00:00
Andre Oppermann	84bb6a2e75	IPDIVERT is a module now and tell the other parts of the kernel about it. IPDIVERT depends on IPFIREWALL being loaded or compiled into the kernel.	2004-10-25 20:02:34 +00:00
Ruslan Ermilov	a35d88931c	For variables that are only checked with defined(), don't provide any fake value.	2004-10-24 15:33:08 +00:00
Andre Oppermann	cd109b0d82	Shave 40 unused bytes from struct tcpcb.	2004-10-22 19:55:04 +00:00
Andre Oppermann	21dcc96f4a	When printing the initialization string and IPDIVERT is not compiled into the kernel refer to it as "loadable" instead of "disabled".	2004-10-22 19:18:06 +00:00
Andre Oppermann	24fc79b0a4	Refuse to unload the ipdivert module unless the 'force' flag is given to kldunload. Reflect the fact that IPDIVERT is a loadable module in the divert(4) and ipfw(8) man pages.	2004-10-22 19:12:01 +00:00
Andre Oppermann	57bbe2e1ab	Destroy the UMA zone on unload.	2004-10-19 22:51:20 +00:00
Andre Oppermann	2de1a9eb6e	Slightly extend the locking during unload to fully cover the protocol deregistration. This does not entirely close the race but narrows the even previously extremely small chance of a race some more.	2004-10-19 22:08:13 +00:00
Robert Watson	279128e295	Annotate a newly introduced race present due to the unloading of protocols: it is possible for sockets to be created and attached to the divert protocol between the test for sockets present and successful unload of the registration handler. We will need to explore more mature APIs for unregistering the protocol and then draining consumers, or an atomic test-and-unregister mechanism.	2004-10-19 21:35:42 +00:00
Andre Oppermann	72584fd2c0	Convert IPDIVERT into a loadable module. This makes use of the dynamic loadability of protocols. The call to divert_packet() is done through a function pointer. All semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to install divert rules and natd will complain about 'protocol not supported'. Once it is loaded both will work and accept rules and open the divert socket. The module can only be unloaded if no divert sockets are open. It does not close any divert sockets when an unload is requested but will return EBUSY instead.	2004-10-19 21:14:57 +00:00
Andre Oppermann	969bb53e80	Properly declare the "net.inet" sysctl subtree.	2004-10-19 21:06:14 +00:00
Andre Oppermann	539be79a9d	Pre-emptively define IPPROTO_SPACER to 32767, the same value as PROTO_SPACER to document that this value is globally assigned for a special purpose and may not be reused within the IPPROTO number space.	2004-10-19 20:59:01 +00:00
Andre Oppermann	dff3237ee5	Make use of the PROTO_SPACER functionality for dynamically loadable protocols in inetsw[] and define initially eight spacer slots. Remove conflicting declaration 'struct pr_usrreqs nousrreqs'. It is now declared and initialized in kern/uipc_domain.c.	2004-10-19 15:58:22 +00:00
Andre Oppermann	de38924dc0	Support for dynamically loadable and unloadable IP protocols in the ipmux. With pr_proto_register() it has become possible to dynamically load protocols within the PF_INET domain. However the PF_INET domain has a second important structure called ip_protox[] that is derived from the 'struct protosw inetsw[]' and takes care of the de-multiplexing of the various protocols that ride on top of IP packets. The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[] array mux in a consistent and easy way. To register a protocol within ip_protox[] the existence of a corresponding and matching protocol definition in inetsw[] is required. The function does not allow to overwrite an already registered protocol. The unregister function simply replaces the mux slot with the default index pointer to IPPROTO_RAW as it was previously.	2004-10-19 15:45:57 +00:00
Andre Oppermann	1cf15713ed	Add a macro for the destruction of INP_INFO_LOCK's used by loadable modules.	2004-10-19 14:34:13 +00:00
Andre Oppermann	de1c2ac4bf	Make comments more clear. Change the order of one if() statement to check the more likely variable first.	2004-10-19 14:31:56 +00:00
Robert Watson	81158452be	Push acquisition of the accept mutex out of sofree() into the caller (sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>	2004-10-18 22:19:43 +00:00
Robert Watson	6b8e5a9862	Don't release the udbinfo lock until after the last use of UDP inpcb in udp_input(), since the udbinfo lock is used to prevent removal of the inpcb while in use (i.e., as a form of reference count) in the in-bound path. RELENG_5 candidate.	2004-10-12 20:03:56 +00:00
Robert Watson	00fcf9d12d	Modify the thrilling "%D is using my IP address %s!" message so that it isn't printed if the IP address in question is '0.0.0.0', which is used by nodes performing DHCP lookup, and so constitute a false positive as a report of misconfiguration.	2004-10-12 17:10:40 +00:00
Robert Watson	6c67b8b695	When the access control on creating raw sockets was modified so that processes in jail could create raw sockets, additional access control checks were added to raw IP sockets to limit the ways in which those sockets could be used. Specifically, only the socket option IP_HDRINCL was permitted in rip_ctloutput(). Other socket options were protected by a call to suser(). This change was required to prevent processes in a Jail from modifying system properties such as multicast routing and firewall rule sets. However, it also introduced a regression: processes that create a raw socket with root privilege, but then downgraded credential (i.e., a daemon giving up root, or a setuid process switching back to the real uid) could no longer issue other unprivileged generic IP socket option operations, such as IP_TOS, IP_TTL, and the multicast group membership options, which prevented multicast routing daemons (and some other tools) from operating correctly. This change pushes the access control decision down to the granularity of individual socket options, rather than all socket options, on raw IP sockets. When rip_ctloutput() doesn't implement an option, it will now pass the request directly to in_control() without an access control check. This should restore the functionality of the generic IP socket options for raw sockets in the above-described scenarios, which may be confirmed with the ipsockopt regression test. RELENG_5 candidate. Reviewed by: csjp	2004-10-12 16:47:25 +00:00
Robert Watson	cf2942b67c	Acquire the send socket buffer lock around tcp_output() activities reaching into the socket buffer. This prevents a number of potential races, including dereferencing of sb_mb while unlocked leading to a NULL pointer deref (how I found it). Potentially this might also explain other "odd" TCP behavior on SMP boxes (although haven't seen it reported). RELENG_5 candidate.	2004-10-09 16:48:51 +00:00
Robert Watson	fcf4e3a168	When running with debug.mpsafenet=0, initialize IP multicast routing callouts as non-CALLOUT_MPSAFE. Otherwise, they may trigger an assertion regarding Giant if they enter other parts of the stack from the callout. MFC after: 3 days Reported by: Dikshie < dikshie at ppk dot itb dot ac dot id >	2004-10-07 14:13:35 +00:00
Paul Saab	a55db2b6e6	- Estimate the amount of data in flight in sack recovery and use it to control the packets injected while in sack recovery (for both retransmissions and new data). - Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c. - Add a new sysctl (net.inet.tcp.sack.initburst) that controls the number of sack retransmissions done upon initiation of sack recovery. Submitted by: Mohan Srinivasan <mohans@yahoo-inc.com>	2004-10-05 18:36:24 +00:00
Brian Feldman	c99ee9e042	Add support to IPFW for matching by TCP data length.	2004-10-03 00:47:15 +00:00
Brian Feldman	6daf7ebd28	Add support to IPFW for classification based on "diverted" status (that is, input via a divert socket).	2004-10-03 00:26:35 +00:00
Brian Feldman	974dfe3084	Add to IPFW the ability to do ALTQ classification/tagging.	2004-10-03 00:17:46 +00:00
Brian Feldman	88ef2880c1	Validate the action pointer to be within the rule size, so that trying to add corrupt ipfw rules would not potentially panic the system or worse.	2004-09-30 17:42:00 +00:00
Max Laier	d6a8d58875	Add an additional struct inpcb * argument to pfil(9) in order to enable passing along socket information. This is required to work around a LOR with the socket code which results in an easy reproducible hard lockup with debug.mpsafenet=1. This commit does not fix the LOR, but enables us to do so later. The missing piece is to turn the filter locking into a leaf lock and will follow in a seperate (later) commit. This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in forseeable future. Suggested by: rwatson A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;) Reviewed by: rwatson, csjp Tested by: -pf, -ipfw, LINT, csjp and myself MFC after: 3 days LOR IDs: 14 - 17 (not fixed yet)	2004-09-29 04:54:33 +00:00
Robert Watson	48ac555d83	Assign so_pcb to NULL rather than 0 as it's a pointer. Spotted by: dwhite	2004-09-29 04:01:13 +00:00
Maxim Konovalov	4bc37f9836	o Turn net.inet.ip.check_interface sysctl off by default. When net.inet.ip.check_interface was MFCed to RELENG_4 3+ years ago in rev. 1.130.2.17 ip_input.c it was 1 by default but shortly changed to 0 (accidently?) in rev. 1.130.2.20 in RELENG_4 only. Among with the fact this knob is not documented it breaks POLA especially in bridge environment. OK'ed by: andre Reviewed by: -current	2004-09-24 12:18:40 +00:00
Andre Oppermann	db09bef308	Fix an out of bounds write during the initialization of the PF_INET protocol family to the ip_protox[] array. The protocol number of IPPROTO_DIVERT is larger than IPPROTO_MAX and was initializing memory beyond the array. Catch all these kinds of errors by ignoring protocols that are higher than IPPROTO_MAX or 0 (zero). Add more comments ip_init().	2004-09-16 18:33:39 +00:00
Andre Oppermann	76ff6dcf46	Clarify some comments for the M_FASTFWD_OURS case in ip_input().	2004-09-15 20:17:03 +00:00
Andre Oppermann	e098266191	Remove the last two global variables that are used to store packet state while it travels through the IP stack. This wasn't much of a problem because IP source routing is disabled by default but when enabled together with SMP and preemption it would have very likely cross-corrupted the IP options in transit. The IP source route options of a packet are now stored in a mtag instead of the global variable.	2004-09-15 20:13:26 +00:00
Andre Oppermann	bda337d05e	Do not allow 'ipfw fwd' command when IPFIREWALL_FORWARD is not compiled into the kernel. Return EINVAL instead.	2004-09-13 19:27:23 +00:00
Andre Oppermann	f91248c1ad	If we have to 'ipfw fwd'-tag a packet the second time in ipfw_pfil_out() don't prepend an already existing tag again. Instead unlink it and prepend it again to have it as the first tag in the chain. PR: kern/71380	2004-09-13 19:20:14 +00:00
Andre Oppermann	f4fca2d8d3	Make comments more clear for the packet changed cases after pfil hooks.	2004-09-13 17:09:06 +00:00
Andre Oppermann	eedc0a7535	Fix ip_input() fallback for the destination modified cases (from the packet filters). After the ipfw to pfil move ip_input() expects M_FASTFWD_OURS tagged packets to have ip_len and ip_off in host byte order instead of network byte order. PR: kern/71652 Submitted by: mlaier (patch)	2004-09-13 17:01:53 +00:00
Andre Oppermann	7c0102f575	Make 'ipfw tee' behave as inteded and designed. A tee'd packet is copied and sent to the DIVERT socket while the original packet continues with the next rule. Unlike a normally diverted packet no IP reassembly attemts are made on tee'd packets and they are passed upwards totally unmodified. Note: This will not be MFC'd to 4.x because of major infrastucture changes. PR: kern/64240 (and many others collapsed into that one)	2004-09-13 16:46:05 +00:00
Gleb Smirnoff	324398687f	Check flag do_bridge always, even if kernel was compiled without BRIDGE support. This makes dynamic bridge.ko working. Reviewed by: sam Approved by: julian (mentor) MFC after: 1 week	2004-09-09 12:34:07 +00:00
John-Mark Gurney	cb459254a2	revert comment from rev1.158 now that rev1.225 backed it out.. MFC after: 3 days	2004-09-06 15:48:38 +00:00
Gleb Smirnoff	f46a6aac29	Recover normal behavior: return EINVAL to attempt to add a divert rule when module is built without IPDIVERT. Silence from: andre Approved by: julian (mentor)	2004-09-05 20:06:50 +00:00
John-Mark Gurney	b5d47ff592	fix up socket/ip layer violation... don't assume/know that SO_DONTROUTE == IP_ROUTETOIF and SO_BROADCAST == IP_ALLOWBROADCAST...	2004-09-05 02:34:12 +00:00
Andre Oppermann	3161f583ca	Apply error and success logic consistently to the function netisr_queue() and its users. netisr_queue() now returns (0) on success and ERRNO on failure. At the moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full) are supported. Previously it would return (1) on success but the return value of IF_HANDOFF() was interpreted wrongly and (0) was actually returned on success. Due to this schednetisr() was never called to kick the scheduling of the isr. However this was masked by other normal packets coming through netisr_dispatch() causing the dequeueing of waiting packets. PR: kern/70988 Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 3 days	2004-08-27 18:33:08 +00:00
Andre Oppermann	a9c92b54a9	In the case the destination of a packet was changed by the packet filter to point to a local IP address; and the packet was sourced from this host we fill in the m_pkthdr.rcvif with a pointer to the loopback interface. Before the function ifunit("lo0") was used to obtain the ifp. However this is sub-optimal from a performance point of view and might be dangerous if the loopback interface has been renamed. Use the global variable 'loif' instead which always points to the loopback interface. Submitted by: brooks	2004-08-27 15:39:34 +00:00
Andre Oppermann	319c4c256a	Remove a junk line left over from the recent IPFW to PFIL_HOOKS conversion.	2004-08-27 15:32:28 +00:00
Andre Oppermann	c21fd23260	Always compile PFIL_HOOKS into the kernel and remove the associated kernel compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and thus it becomes a standard part of the network stack. If no hooks are connected the entire packet filter hooks section and related activities are jumped over. This removes any performance impact if no hooks are active. Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.	2004-08-27 15:16:24 +00:00
Ruslan Ermilov	9bfe6d472a	Revert the last change to sys/modules/ipfw/Makefile and fix a standalone module build in a better way. Silence from: andre MFC after: 3 days	2004-08-26 14:18:30 +00:00
Pawel Jakub Dawidek	a7f3feff1b	Allocate memory when dumping pipes with M_WAITOK flag. On a system with huge number of pipes, M_NOWAIT failes almost always, because of memory fragmentation. My fix is different than the patch proposed by Pawel Malachowski, because in FreeBSD 5.x we cannot sleep while holding dummynet mutex (in 4.x there is no such lock). My fix is also ugly, but there is no easy way to prepare nice and clean fix. PR: kern/46557 Submitted by: Eugene Grosbein <eugen@grosbein.pp.ru> Reviewed by: mlaier	2004-08-25 09:31:30 +00:00
Max Laier	ca7a789aa6	Allow early drop for non-ALTQ enabled queues in an ALTQ-enabled kernel. Previously the early drop was disabled unconditionally for ALTQ-enabled kernels. This should give some benefit for the normal gateway + LAN-server case with a busy LAN leg and an ALTQ managed uplink. Reviewed and style help from: cperciva, pjd	2004-08-22 16:42:28 +00:00
Robert Watson	392e840716	When sliding the m_data pointer forward, update m_pktrhdr.len as well as m_len, or the pkthdr length will be inconsistent with the actual length of data in the mbuf chain. The symptom of this occuring was "out of data" warnings from in_cksum_skip() on large UDP packets sent via the loopback interface. Foot shot: green	2004-08-22 01:32:48 +00:00
Christian S.J. Peron	5090559b7f	When a prison is given the ability to create raw sockets (when the security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged access to jails is given out, it is possible for prison root to manipulate various network parameters which effect the host environment. This commit plugs a number of security holes associated with the use of raw sockets and prisons. This commit makes the following changes: - Add a comment to rtioctl warning developers that if they add any ioctl commands, they should use super-user checks where necessary, as it is possible for PRISON root to make it this far in execution. - Add super-user checks for the execution of the SIOCGETVIFCNT and SIOCGETSGCNT IP multicast ioctl commands. - Add a super-user check to rip_ctloutput(). If the calling cred is PRISON root, make sure the socket option name is IP_HDRINCL, otherwise deny the request. Although this patch corrects a number of security problems associated with raw sockets and prisons, the warning in jail(8) should still apply, and by default we should keep the default value of security.jail.allow_raw_sockets MIB to 0 (or disabled) until we are certain that we have tracked down all the problems. Looking forward, we will probably want to eliminate the references to curthread. This may be a MFC candidate for RELENG_5. Reviewed by: rwatson Approved by: bmilekic (mentor)	2004-08-21 17:38:57 +00:00
Robert Watson	e6ccd70936	When prepending space onto outgoing UDP datagram payloads to hold the UDP/IP header, make sure that space is also allocated for the link layer header. If an mbuf must be allocated to hold the UDP/IP header (very likely), then this will avoid an additional mbuf allocation at the link layer. This trick is also used by TCP and other protocols to avoid extra calls to the mbuf allocator in the ethernet (and related) output routines.	2004-08-21 16:14:04 +00:00
Andre Oppermann	ce63226177	Fix a stupid typo which prevented an ipfw KLD unload from successfully cleaning up its remains. Do not terminate 'if' lines with ';'. Spotted by: claudio@OpenBSD.ORG (sitting 3m from my desk) Pointy hat to: andre	2004-08-20 00:36:55 +00:00
Andre Oppermann	70222723f3	When unloading ipfw module use callout_drain() to make absolutely sure that all callouts are stopped and finished. Move it before IPFW_LOCK() to avoid deadlocking when draining callouts.	2004-08-19 23:31:40 +00:00
Andre Oppermann	6f2d4ea6f8	For IPv6 access pointer to tcpcb only after we have checked it is valid. Found by: Coverity's automated analysis (via Ted Unangst)	2004-08-19 20:16:17 +00:00
Andre Oppermann	50ab727669	Give a useful error message if someone tries to compile IPFIREWALL into the kernel without specifying PFIL_HOOKS as well.	2004-08-19 18:38:23 +00:00
Andre Oppermann	9108601915	Do not unconditionally ignore IPDIVERT and IPFIREWALL_FORWARD when building the ipfw KLD. For IPFIREWALL_FORWARD this does not have any side effects. If the module has it but not the kernel it just doesn't do anything. For IPDIVERT the KLD will be unloadable if the kernel doesn't have IPDIVERT compiled in too. However this is the least disturbing behaviour. The user can just recompile either module or the kernel to match the other one. The access to the machine is not denied if ipfw refuses to load.	2004-08-19 17:59:26 +00:00
Andre Oppermann	e4c97eff8e	Bring back the sysctl 'net.inet.ip.fw.enable' to unbreak the startup scripts and to be able to disable ipfw if it was compiled directly into the kernel.	2004-08-19 17:38:47 +00:00
Robert Watson	5c32ea6517	Push down pcbinfo and inpcb locking from udp_send() into udp_output(). This provides greater context for the locking and allows us to avoid locking the pcbinfo structure if not binding operations will take place (i.e., already bound, connected, and no expliti sendto() address).	2004-08-19 01:13:10 +00:00
Robert Watson	4c2bb15a89	In in_pcbrehash(), do assert the inpcb lock as well as the pcbinfo lock.	2004-08-19 01:11:17 +00:00
Robert Watson	0f48e25b63	Fix build of ip_input.c with "options IPSEC" -- the "pass:" label is used with both FAST_IPSEC and IPSEC, but was defined for only FAST_IPSEC.	2004-08-18 03:11:04 +00:00
Peter Wemm	1e5cc10dc2	Make the kernel compile again if you are not using PFIL_HOOKS	2004-08-18 00:37:46 +00:00
Andre Oppermann	9b932e9e04	Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland and preserves the ipfw ABI. The ipfw core packet inspection and filtering functions have not been changed, only how ipfw is invoked is different. However there are many changes how ipfw is and its add-on's are handled: In general ipfw is now called through the PFIL_HOOKS and most associated magic, that was in ip_input() or ip_output() previously, is now done in ipfw_check_[in\|out]() in the ipfw PFIL handler. IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to be diverted is checked if it is fragmented, if yes, ip_reass() gets in for reassembly. If not, or all fragments arrived and the packet is complete, divert_packet is called directly. For 'tee' no reassembly attempt is made and a copy of the packet is sent to the divert socket unmodified. The original packet continues its way through ip_input/output(). ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet with the new destination sockaddr_in. A check if the new destination is a local IP address is made and the m_flags are set appropriately. ip_input() and ip_output() have some more work to do here. For ip_input() the m_flags are checked and a packet for us is directly sent to the 'ours' section for further processing. Destination changes on the input path are only tagged and the 'srcrt' flag to ip_forward() is set to disable destination checks and ICMP replies at this stage. The tag is going to be handled on output. ip_output() again checks for m_flags and the 'ours' tag. If found, the packet will be dropped back to the IP netisr where it is going to be picked up by ip_input() again and the directly sent to the 'ours' section. When only the destination changes, the route's 'dst' is overwritten with the new destination from the forward m_tag. Then it jumps back at the route lookup again and skips the firewall check because it has been marked with M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with 'option IPFIREWALL_FORWARD' to enable it. DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will then inject it back into ip_input/ip_output() after it has served its time. Dummynet packets are tagged and will continue from the next rule when they hit the ipfw PFIL handlers again after re-injection. BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS. More detailed changes to the code: conf/files Add netinet/ip_fw_pfil.c. conf/options Add IPFIREWALL_FORWARD option. modules/ipfw/Makefile Add ip_fw_pfil.c. net/bridge.c Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw is still directly invoked to handle layer2 headers and packets would get a double ipfw when run through PFIL_HOOKS as well. netinet/ip_divert.c Removed divert_clone() function. It is no longer used. netinet/ip_dummynet.[ch] Neither the route 'ro' nor the destination 'dst' need to be stored while in dummynet transit. Structure members and associated macros are removed. netinet/ip_fastfwd.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_fw.h Removed 'ro' and 'dst' from struct ip_fw_args. netinet/ip_fw2.c (Re)moved some global variables and the module handling. netinet/ip_fw_pfil.c New file containing the ipfw PFIL handlers and module initialization. netinet/ip_input.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. ip_forward() does not longer require the 'next_hop' struct sockaddr_in argument. Disable early checks if 'srcrt' is set. netinet/ip_output.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. netinet/ip_var.h Add ip_reass() as general function. (Used from ipfw PFIL handlers for IPDIVERT.) netinet/raw_ip.c Directly check if ipfw and dummynet control pointers are active. netinet/tcp_input.c Rework the 'ipfw forward' to local code to work with the new way of forward tags. netinet/tcp_sack.c Remove include 'opt_ipfw.h' which is not needed here. sys/mbuf.h Remove m_claim_next() macro which was exclusively for ipfw 'forward' and is no longer needed. Approved by: re (scottl)	2004-08-17 22:05:54 +00:00
Robert Watson	a4f757cd5d	White space cleanup for netinet before branch: - Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net>	2004-08-16 18:32:07 +00:00
David E. O'Brien	5af87d0ea1	Put the 'antispoof' opcode in the proper place in the opcode list such that it doesn't break the ipfw2 ABI.	2004-08-16 12:05:19 +00:00
David Malone	1f44b0a1b5	Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD have already done this, so I have styled the patch on their work: 1) introduce a ip_newid() static inline function that checks the sysctl and then decides if it should return a sequential or random IP ID. 2) named the sysctl net.inet.ip.random_id 3) IPv6 flow IDs and fragment IDs are now always random. Flow IDs and frag IDs are significantly less common in the IPv6 world (ie. rarely generated per-packet), so there should be smaller performance concerns. The sysctl defaults to 0 (sequential IP IDs). Reviewed by: andre, silby, mlaier, ume Based on: NetBSD MFC after: 2 months	2004-08-14 15:32:40 +00:00
Poul-Henning Kamp	e7581f0fc2	Fix outgoing ICMP on global instance.	2004-08-14 14:21:09 +00:00
Christian S.J. Peron	31c88a3043	Add the ability to associate ipfw rules with a specific prison ID. Since the only thing truly unique about a prison is it's ID, I figured this would be the most granular way of handling this. This commit makes the following changes: - Adds tokenizing and parsing for the ``jail'' command line option to the ipfw(8) userspace utility. - Append the ipfw opcode list with O_JAIL. - While Iam here, add a comment informing others that if they want to add additional opcodes, they should append them to the end of the list to avoid ABI breakage. - Add ``fw_prid'' to the ipfw ucred cache structure. - When initializing ucred cache, if the process is jailed, set fw_prid to the prison ID, otherwise set it to -1. - Update man page to reflect these changes. This change was a strong motivator behind the ucred caching mechanism in ipfw. A sample usage of this new functionality could be: ipfw add count ip from any to any jail 2 It should be noted that because ucred based constraints are only implemented for TCP and UDP packets, the same applies for jail associations. Conceptual head nod by: pjd Reviewed by: rwatson Approved by: bmilekic (mentor)	2004-08-12 22:06:55 +00:00
David Malone	849112666a	In tcp6_ctlinput, lock tcbinfo around the call to syncache_unreach so that the locks held are the same as the IPv4 case. Reviewed by: rwatson	2004-08-12 18:19:36 +00:00
Andre Oppermann	9d804f818c	Fix two cases of incorrect IPQ_UNLOCK'ing in the merged ip_reass() function. The first one was going to 'dropfrag', which unlocks the IPQ, before the lock was aquired; The second one doing a unlock and then a 'goto dropfrag' which led to a double-unlock. Tripped over by: des	2004-08-12 08:37:42 +00:00
Robert Watson	c19c5239a6	When udp_send() fails, make sure to free the control mbufs as well as the data mbuf. This was done in most error cases, but not the case where the inpcb pointer is surprisingly NULL.	2004-08-12 01:34:27 +00:00
Andre Oppermann	420a281164	Backout removal of UMA_ZONE_NOFREE flag for all zones which are established for structures with timers in them. It might be that a timer might fire even when the associated structure has already been free'd. Having type- stable storage in this case is beneficial for graceful failure handling and debugging. Discussed with: bosko, tegge, rwatson	2004-08-11 20:30:08 +00:00
Andre Oppermann	4efb805c0c	Remove the UMA_ZONE_NOFREE flag to all uma_zcreate() calls in the IP and TCP code. This flag would have prevented giving back excessive free slabs to the global pool after a transient peak usage.	2004-08-11 17:08:31 +00:00
Andre Oppermann	67d0b24ed1	Make use of in_localip() function and replace previous direct LIST_FOREACH loops over INADDR_HASH.	2004-08-11 12:32:10 +00:00
Andre Oppermann	2eccc90b61	Add the function in_localip() which returns 1 if an internet address is for the local host and configured on one of its interfaces.	2004-08-11 11:49:48 +00:00
Andre Oppermann	6e234ede37	Only invoke verify_path() for verrevpath and versrcreach when we have an IP packet.	2004-08-11 11:41:11 +00:00
Andre Oppermann	767981878c	Only check for local broadcast addresses if the mbuf is flagged with M_BCAST.	2004-08-11 10:49:56 +00:00
Andre Oppermann	0b17fba7bc	Consistently use NULL for pointer comparisons.	2004-08-11 10:46:15 +00:00
Andre Oppermann	de2e5d1e20	Make IP fastforwarding ALTQ-aware by adding the input traffic conditioner check and disabling the early output interface queue length check.	2004-08-11 10:42:59 +00:00
Andre Oppermann	2f6e6e9b4c	Correct the displayed bandwidth calculation for a readout via sysctl. The saved value does not have to be scaled with HZ; it is already in bytes per second. Only the multiply by eight remains to show bits per second (bps).	2004-08-11 10:12:16 +00:00
Robert Watson	27f74fd0ed	Assert the locks of inpcbinfo's and inpcb's passed into in_pcbconnect() and in_pcbconnect_setup(), since these functions frob the port and address state of inpcbs.	2004-08-11 04:35:20 +00:00
Andre Oppermann	bb7c5b3055	Make a comment that IP source routing is not SMP and PREEMPTION safe.	2004-08-09 16:17:37 +00:00
Andre Oppermann	a5053398d4	Make a comment that "ipfw forward" is not SMP and PREEMPTION safe.	2004-08-09 16:16:10 +00:00
Andre Oppermann	5f9541ecbd	New ipfw option "antispoof": For incoming packets, the packet's source address is checked if it belongs to a directly connected network. If the network is directly connected, then the interface the packet came on in is compared to the interface the network is connected to. When incoming interface and directly connected interface are not the same, the packet does not match. Usage example: ipfw add deny ip from any to any not antispoof in Manpage education by: ru	2004-08-09 16:12:10 +00:00
Robert Watson	f31f65a708	Pass pcbinfo structures to in6_pcbnotify() rather than pcbhead structures, allowing in6_pcbnotify() to lock the pcbinfo and each inpcb that it notifies of ICMPv6 events. This prevents inpcb assertions from firing when IPv6 generates and delievers event notifications for inpcbs. Reported by: kuriyama Tested by: kuriyama	2004-08-06 03:45:45 +00:00
Robert Watson	9c1df6951f	When iterating the UDP inpcb list processing an inbound broadcast or multicast packet, we don't need to acquire the inpcb mutex unless we are actually using inpcb fields other than the bound port and address. Since we hold the pcbinfo lock already, these can't change. Defer acquiring the inpcb mutex until we have a high chance of a match. This avoids about 120 mutex operations per UDP broadcast packet received on one of my work systems. Reviewed by: sam	2004-08-06 02:08:31 +00:00
Robert Watson	98aed8ca56	Now that IPv6 performs basic in6pcb and inpcb locking, enable inpcb lock assertions even if IPv6 is compiled into the kernel. Previously, inclusion of IPv6 and locking assertions would result in a rapid assertion failure as IPv6 was not properly locking inpcbs.	2004-08-04 18:27:55 +00:00
Joe Marcus Clarke	5c7e7e80cc	Fix Skinny and PPTP NAT'ing after the introduction of the {ip,tcp,udp}_next functions. Basically, the ip_next() function was used to get the PPTP and Skinny headers when tcp_next() should have been used instead. Symptoms of this included a segfault in natd when trying to process a PPTP or Skinny packet. Approved by: des	2004-08-04 15:17:08 +00:00
Andre Oppermann	81007fd4eb	o Delayed checksums are now calculated in divert_packet() for diverted packets Remove the XXX-escaped code that did it in ip_output()'s IPHACK section.	2004-08-03 14:13:36 +00:00
Andre Oppermann	24a098ea9b	o Move the inflight sysctls to their own sub-tree under net.inet.tcp to be more consistent with the other sysctls around it.	2004-08-03 13:54:11 +00:00
Andre Oppermann	f0cada84b1	o Move all parts of the IP reassembly process into the function ip_reass() to make it fully self-contained. o ip_reass() now returns a new mbuf with the reassembled packet and ip->ip_len including the IP header. o Computation of the delayed checksum is moved into divert_packet(). Reviewed by: silby	2004-08-03 12:31:38 +00:00
Jeffrey Hsu	2ff39e1543	Fix bug with tracking the previous element in a list. Found by: edrt@citiz.net Submitted by: pavlin@icir.org	2004-08-03 02:01:44 +00:00
Yaroslav Tykhiy	a4eb4405e3	Disallow a particular kind of port theft described by the following scenario: Alice is too lazy to write a server application in PF-independent manner. Therefore she knocks up the server using PF_INET6 only and allows the IPv6 socket to accept mapped IPv4 as well. An evil hacker known on IRC as cheshire_cat has an account in the same system. He starts a process listening on the same port as used by Alice's server, but in PF_INET. As a consequence, cheshire_cat will distract all IPv4 traffic supposed to go to Alice's server. Such sort of port theft was initially enabled by copying the code that implemented the RFC 2553 semantics on IPv4/6 sockets (see inet6(4)) for the implied case of the same owner for both connections. After this change, the above scenario will be impossible. In the same setting, the user who attempts to start his server last will get EADDRINUSE. Of course, using IPv4 mapped to IPv6 leads to security complications in the first place, but there is no reason to make it even more unsafe. This change doesn't apply to KAME since it affects a FreeBSD-specific part of the code. It doesn't modify the out-of-box behaviour of the TCP/IP stack either as long as mapping IPv4 to IPv6 is off by default. MFC after: 1 month	2004-07-28 13:03:07 +00:00
Jayanth Vijayaraghavan	5d3b1b7556	Fix a bug in the sack code that was causing data to be retransmitted with the FIN bit set for all segments, if a FIN has already been sent before. The fix will allow the FIN bit to be set for only the last segment, in case it has to be retransmitted. Fix another bug that would have caused snd_nxt to be pulled by len if there was an error from ip_output. snd_nxt should not be touched during sack retransmissions.	2004-07-28 02:15:14 +00:00
Jayanth Vijayaraghavan	e9f2f80e09	Fix for a SACK bug where the very last segment retransmitted from the SACK scoreboard could result in the next (untransmitted) segment to be skipped.	2004-07-26 23:41:12 +00:00
John-Mark Gurney	0aa8ce5012	compare pointer against NULL, not 0 when inpcb is NULL, this is no longer invalid since jlemon added the tcp_twstart function... this prevents close "failing" w/ EINVAL when it really was successful... Reviewed by: jeremy (NetBSD)	2004-07-26 21:29:56 +00:00
Colin Percival	56f21b9d74	Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb	2004-07-26 07:24:04 +00:00
Andre Oppermann	55db762b76	Extend versrcreach by checking against the rt_flags for RTF_REJECT and RTF_BLACKHOLE as well. To quote the submitter: The uRPF loose-check implementation by the industry vendors, at least on Cisco and possibly Juniper, will fail the check if the route of the source address is pointed to Null0 (on Juniper, discard or reject route). What this means is, even if uRPF Loose-check finds the route, if the route is pointed to blackhole, uRPF loose-check must fail. This allows people to utilize uRPF loose-check mode as a pseudo-packet-firewall without using any manual filtering configuration -- one can simply inject a IGP or BGP prefix with next-hop set to a static route that directs to null/discard facility. This results in uRPF Loose-check failing on all packets with source addresses that are within the range of the nullroute. Submitted by: James Jun <james@towardex.com>	2004-07-21 19:55:14 +00:00
Robert Watson	2d01d331c6	M_PREPEND() the IP header on to the front of an outgoing raw IP packet using M_DONTWAIT rather than M_WAITOK to avoid sleeping on memory while holding a mutex.	2004-07-20 20:52:30 +00:00
Jayanth Vijayaraghavan	04f0d9a0ea	Let IN_FASTREOCOVERY macro decide if we are in recovery mode. Nuke sackhole_limit for now. We need to add it back to limit the total number of sack blocks in the system.	2004-07-19 22:37:33 +00:00
Jayanth Vijayaraghavan	f787edd847	Fix a potential panic in the SACK code that was causing 1) data to be sent to the right of snd_recover. 2) send more data then whats in the send buffer. The fix is to postpone sack retransmit to a subsequent recovery episode if the current retransmit pointer is beyond snd_recover. Thanks to Mohan Srinivasan for helping fix the bug. Submitted by:Daniel Lang	2004-07-19 22:06:01 +00:00
David Malone	932312d60b	Fix the !INET6 build. Reported by: alc	2004-07-17 21:40:14 +00:00
David Malone	969860f3ed	The tcp syncache code was leaving the IPv6 flowlabel uninitialised for the SYN\|ACK packet and then letting in6_pcbconnect set the flowlabel later. Arange for the syncache/syncookie code to set and recall the flow label so that the flowlabel used for the SYN\|ACK is consistent. This is done by using some of the cookie (when tcp cookies are enabeled) and by stashing the flowlabel in syncache. Tested and Discovered by: Orla McGann <orly@cnri.dit.ie> Approved by: ume, silby MFC after: 1 month	2004-07-17 19:44:13 +00:00
Max Laier	c550f2206d	Define semantic of M_SKIP_FIREWALL more precisely, i.e. also pass associated icmp_error() packets. While here retire PACKET_TAG_PF_GENERATED (which served the same purpose) and use M_SKIP_FIREWALL in pf as well. This should speed up things a bit as we get rid of the tag allocations. Discussed with: juli	2004-07-17 05:10:06 +00:00
Juli Mallett	765d141c78	Make M_SKIP_FIREWALL a global (and semantic) flag, preventing anything from using M_PROTO6 and possibly shooting someone's foot, as well as allowing the firewall to be used in multiple passes, or with a packet classifier frontend, that may need to explicitly allow a certain packet. Presently this is handled in the ipfw_chk code as before, though I have run with it moved to upper layers, and possibly it should apply to ipfilter and pf as well, though this has not been investigated. Discussed with: luigi, rwatson	2004-07-17 02:40:13 +00:00
Hajimu UMEMOTO	8a59da300c	when IN6P_AUTOFLOWLABEL is set, the flowlabel is not set on outgoing tcp connections. Reported by: Orla McGann <orly@cnri.dit.ie> Reviewed by: Orla McGann <orly@cnri.dit.ie> Obtained from: KAME	2004-07-16 18:08:13 +00:00
Poul-Henning Kamp	3e019deaed	Do a pass over all modules in the kernel and make them return EOPNOTSUPP for unknown events. A number of modules return EINVAL in this instance, and I have left those alone for now and instead taught MOD_QUIESCE to accept this as "didn't do anything".	2004-07-15 08:26:07 +00:00
Stefan Farfeleder	439dfb0c35	Remove erroneous semicolons.	2004-07-13 16:06:19 +00:00
Robert Watson	7cfc690440	After each label in tcp_input(), assert the inpcbinfo and inpcb lock state that we expect.	2004-07-12 19:28:07 +00:00
Brian Somers	0ac4013324	Change the following environment variables to kernel options: bootp -> BOOTP bootp.nfsroot -> BOOTP_NFSROOT bootp.nfsv3 -> BOOTP_NFSV3 bootp.compat -> BOOTP_COMPAT bootp.wired_to -> BOOTP_WIRED_TO - i.e. back out the previous commit. It's already possible to pxeboot(8) with a GENERIC kernel. Pointed out by: dwmalone	2004-07-08 22:35:36 +00:00
Brian Somers	59e1ebc9b5	Change the following kernel options to environment variables: BOOTP -> bootp BOOTP_NFSROOT -> bootp.nfsroot BOOTP_NFSV3 -> bootp.nfsv3 BOOTP_COMPAT -> bootp.compat BOOTP_WIRED_TO -> bootp.wired_to This lets you PXE boot with a GENERIC kernel by putting this sort of thing in loader.conf: bootp="YES" bootp.nfsroot="YES" bootp.nfsv3="YES" bootp.wired_to="bge1" or even setting the variables manually from the OK prompt.	2004-07-08 13:40:33 +00:00
Dag-Erling Smørgrav	de47739e71	Push WARNS back up to 6, but define NO_WERROR; I want the warts out in the open where people can see them and hopefully fix them.	2004-07-06 12:15:24 +00:00
Dag-Erling Smørgrav	9fa0fd2682	Introduce inline {ip,udp,tcp}_next() functions which take a pointer to an {ip,udp,tcp} header and return a void * pointing to the payload (i.e. the first byte past the end of the header and any required padding). Use them consistently throughout libalias to a) reduce code duplication, b) improve code legibility, c) get rid of a bunch of alignment warnings.	2004-07-06 12:13:28 +00:00
Dag-Erling Smørgrav	e3e2c21639	Rewrite twowords() to access its argument through a char pointer and not a short pointer. The previous implementation seems to be in a gray zone of the C standard, and GCC generates incorrect code for it at -O2 or higher on some platforms.	2004-07-06 09:22:18 +00:00
Dag-Erling Smørgrav	95347a8ee0	Temporarily lower WARNS to 3 while I figure out the alignment issues on alpha.	2004-07-06 08:44:41 +00:00
Dag-Erling Smørgrav	ed01a58215	Make libalias WARNS?=6-clean. This mostly involves renaming variables named link, foo_link or link_foo to lnk, foo_lnk or lnk_foo, fixing signed / unsigned comparisons, and shoving unused function arguments under the carpet. I was hoping WARNS?=6 might reveal more serious problems, and perhaps the source of the -O2 breakage, but found no smoking gun.	2004-07-05 11:10:57 +00:00
Dag-Erling Smørgrav	ffcb611a9d	Parenthesize return values.	2004-07-05 10:55:23 +00:00
Dag-Erling Smørgrav	f311ebb4ec	Mechanical whitespace cleanup.	2004-07-05 10:53:28 +00:00
Poul-Henning Kamp	e6bbb69149	Add LibAliasOutTry() which checks a packet for a hit in the tables, but does not create a new entry if none is found.	2004-07-04 12:53:07 +00:00
Ruslan Ermilov	1a0a934547	Mechanically kill hard sentence breaks.	2004-07-02 23:52:20 +00:00
Jayanth Vijayaraghavan	a0445c2e2c	On receiving 3 duplicate acknowledgements, SACK recovery was not being entered correctly. Fix this problem by separating out the SACK and the newreno cases. Also, check if we are in FASTRECOVERY for the sack case and if so, turn off dupacks. Fix an issue where the congestion window was not being incremented by ssthresh. Thanks to Mohan Srinivasan for finding this problem.	2004-07-01 23:34:06 +00:00
Ruslan Ermilov	c9a246418d	Bumped document date. Fixed markup. Fixed examples to match the new API.	2004-07-01 17:51:48 +00:00
Poul-Henning Kamp	e3e244bff6	Rwatson, write 100 times for tomorrow: First unlock, then assign NULL to pointer.	2004-06-27 21:54:34 +00:00
Pawel Jakub Dawidek	0a44517d3a	Those are unneeded too.	2004-06-27 09:06:10 +00:00
Pawel Jakub Dawidek	46e3b1cbe7	Add two missing includes and remove two uneeded. This is quite serious fix, because even with MAC framework compiled in, MAC entry points in those two files were simply ignored.	2004-06-27 09:03:22 +00:00
Robert Watson	1e4d7da707	Reduce the number of unnecessary unlock-relocks on socket buffer mutexes associated with performing a wakeup on the socket buffer: - When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append(). - When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets. For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.	2004-06-26 19:10:39 +00:00
Robert Watson	3f9d1ef905	Remove spl's from TCP protocol entry points. While not all locking is merged here yet, this will ease the merge process by bringing the locked and unlocked versions into sync.	2004-06-26 17:50:50 +00:00
Paul Saab	652178a12a	White space & spelling fixes Submitted by: Xin LI <delphij@frontfree.net>	2004-06-25 04:11:26 +00:00
Bruce M Simpson	37332f049f	Whitespace.	2004-06-25 02:29:58 +00:00
Robert Watson	5905999b2f	Broaden scope of the socket buffer lock when processing an ACK so that the read and write of sb_cc are atomic. Call sbdrop_locked() instead of sbdrop() since we already hold the socket buffer lock.	2004-06-24 03:07:27 +00:00
Robert Watson	927c5cea3f	Protect so_oobmark with with SOCKBUF_LOCK(&so->so_rcv), and broaden locking in tcp_input() for TCP packets with urgent data pointers to hold the socket buffer lock across testing and updating oobmark from just protecting sb_state. Update socket locking annotations	2004-06-24 02:57:12 +00:00
Robert Watson	a138d21769	In ip_ctloutput(), acquire the inpcb lock around some of the basic inpcb flag and status updates.	2004-06-24 02:05:47 +00:00
Robert Watson	d67ec3dd48	When asserting non-Giant locks in the network stack, also assert Giant if debug.mpsafenet=0, as any points that require synchronization in the SMPng world also required it in the Giant-world: - inpcb locks (including IPv6) - inpcbinfo locks (including IPv6) - dummynet subsystem lock - ipfw2 subsystem lock	2004-06-24 02:01:48 +00:00
Robert Watson	3f11a2f374	Introduce sbreserve_locked(), which asserts the socket buffer lock on the socket buffer having its limits adjusted. sbreserve() now acquires the lock before calling sbreserve_locked(). In soreserve(), acquire socket buffer locks across read-modify-writes of socket buffer fields, and calls into sbreserve/sbrelease; make sure to acquire in keeping with the socket buffer lock order. In tcp_mss(), acquire the socket buffer lock in the calling context so that we have atomic read-modify -write on buffer sizes.	2004-06-24 01:37:04 +00:00
Paul Saab	76947e3222	Move the sack sysctl's under net.inet.tcp.sack net.inet.tcp.do_sack -> net.inet.tcp.sack.enable net.inet.tcp.sackhole_limit -> net.inet.tcp.sack.sackhole_limit Requested by: wollman	2004-06-23 21:34:07 +00:00
Paul Saab	6d90faf3d8	Add support for TCP Selective Acknowledgements. The work for this originated on RELENG_4 and was ported to -CURRENT. The scoreboarding code was obtained from OpenBSD, and many of the remaining changes were inspired by OpenBSD, but not taken directly from there. You can enable/disable sack using net.inet.tcp.do_sack. You can also limit the number of sack holes that all senders can have in the scoreboard with net.inet.tcp.sackhole_limit. Reviewed by: gnn Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan)	2004-06-23 21:04:37 +00:00
Robert Watson	bb7479a613	Acquire socket lock around frobbing of socket state in divert sockets.	2004-06-22 04:00:51 +00:00
Robert Watson	ffcbc0e4c5	Prefer use of the inpcb as a MAC label source for outgoing packets sent via divert sockets, when available.	2004-06-22 03:58:50 +00:00
Robert Watson	d330008e3b	If debug.mpsafenet is set, initialize TCP callouts as CALLOUT_MPSAFE.	2004-06-20 21:44:50 +00:00
Robert Watson	1f82efb3b7	Assert the inpcb lock before letting MAC check whether we can deliver to the inpcb in tcp_input().	2004-06-20 20:17:29 +00:00
Robert Watson	1b83216eda	IP multicast code no longer needs to acquire Giant before appending an mbuf onto a socket buffer. This is left over from debug.mpsafenet affecting the forwarding/bridging plane only.	2004-06-20 20:10:05 +00:00
Robert Watson	4e397bc524	In tcp_ctloutput(), don't hold the inpcb lock over a call to ip_ctloutput(), as it may need to perform blocking memory allocations. This also improves consistency with locking relative to other points that call into ip_ctloutput(). Bumped into by: Grover Lines <grover@ceribus.net>	2004-06-18 20:22:21 +00:00
Bruce M Simpson	4f450ff9a5	Check that m->m_pkthdr.rcvif is not NULL before checking if a packet was received on a broadcast address on the input path. Under certain circumstances this could result in a panic, notably for locally-generated packets which do not have m_pkthdr.rcvif set. This is a similar situation to that which is solved by src/sys/netinet/ip_icmp.c rev 1.66. PR: kern/52935	2004-06-18 12:58:45 +00:00
Bruce M Simpson	f3e0b7ef7f	Appease GCC.	2004-06-18 09:53:58 +00:00
Bruce M Simpson	5214cb3f59	If SO_DEBUG is enabled for a TCP socket, and a received segment is encapsulated within an IPv6 datagram, do not abuse the 'ipov' pointer when registering trace records. 'ipov' is specific to IPv4, and will therefore be uninitialized. [This fandango is only necessary in the first place because of our host-byte-order IP field pessimization.] PR: kern/60856 Submitted by: Galois Zheng	2004-06-18 03:31:07 +00:00
Bruce M Simpson	da181cc144	Don't set FIN on a retransmitted segment after a FIN has been sent, unless the segment really contains the last of the data for the stream. PR: kern/34619 Obtained from: OpenBSD (tcp_output.c rev 1.47) Noticed by: Joseph Ishac Reviewed by: George Neville-Neil	2004-06-18 02:47:59 +00:00
Bruce M Simpson	27de0135ce	Ensure that dst is bzeroed before calling rtalloc_ign(), to avoid possible routing table corruption. PR: kern/40563, freebsd4/432 (KAME) Obtained from: NetBSD (in_gif.c rev 1.26.10.1) Requested by: Jean-Luc Richier	2004-06-18 02:04:07 +00:00
Max Laier	7c1fe95333	Commit pf version 3.5 and link additional files to the kernel build. Version 3.5 brings: - Atomic commits of ruleset changes (reduce the chance of ending up in an inconsistent state). - A 30% reduction in the size of state table entries. - Source-tracking (limit number of clients and states per client). - Sticky-address (the flexibility of round-robin with the benefits of source-hash). - Significant improvements to interface handling. - and many more ...	2004-06-16 23:24:02 +00:00
Max Laier	a306c902b8	Prepare for pf 3.5 import: - Remove pflog and pfsync modules. Things will change in such a fashion that there will be one module with pf+pflog that can be loaded into GENERIC without problems (which is what most people want). pfsync is no longer possible as a module. - Add multicast address for in-kernel multicast pfsync protocol. Protocol glue will follow once the import is done. - Add one more mbuf tag	2004-06-16 22:59:06 +00:00
Maxim Konovalov	ef14c36965	o connect(2): if there is no a route to the destination do not pick up the first local ip address for the source ip address, return ENETUNREACH instead. Submitted by: Gleb Smirnoff Reviewed by: -current (silence)	2004-06-16 10:02:36 +00:00
Bruce M Simpson	d420fcda27	Fix build for IPSEC && !INET6 PR: kern/66125 Submitted by: Cyrille Lefevre	2004-06-16 09:35:07 +00:00
Bruce M Simpson	49b19bfc47	Reverse a patch which has no effect on -CURRENT and should probably be applied directly to -STABLE. Noticed by: iedowse Pointy hat to: bms	2004-06-16 08:50:14 +00:00
Bruce M Simpson	57ab3660ff	In ip_forward(), when calculating the MTU in effect for an IPSEC transport mode tunnel, take the per-route MTU into account, if and only if it is non-zero (as found in struct rt_metrics/rt_metrics_lite). PR: kern/42727 Obtained from: NetBSD (ip_input.c rev 1.151)	2004-06-16 08:33:09 +00:00
Bruce M Simpson	e6b0a57025	In ip_forward(), set m->m_pkthdr.len correctly such that the mbuf chain is sane, and ipsec4_getpolicybyaddr() will therefore complete. PR: kern/42727 Obtained from: KAME (kame/freebsd4/sys/netinet/ip_input.c rev 1.42)	2004-06-16 08:28:54 +00:00
Bruce M Simpson	34e3ccb34b	Disconnect a temporarily-connected UDP socket in out-of-mbufs case. This fixes the problem of UDP sockets getting wedged in a connected state (and bound to their destination) under heavy load. Temporary bind/connect should probably be deleted in future as an optimization, as described in "A Faster UDP" [Partridge/Pink 1993]. Notes: - INP_LOCK() is already held in udp_output(). The connection is in effect happening at a layer lower than the socket layer, therefore in theory socket locking should not be needed. - Inlining the in_pcbdisconnect() operation buys us nothing (in the case of the current state of the code), as laddr is not part of the inpcb hash or the udbinfo hash. Therefore there should be no need to rehash after restoring laddr in the error case (this was a concern of the original author of the patch). PR: kern/41765 Requested by: gnn Submitted by: Jinmei Tatuya (with cleanups) Tested by: spray(8)	2004-06-16 05:41:00 +00:00
Robert Watson	a97719a4c5	Convert GIANT_REQUIRED to NET_ASSERT_GIANT for socket access.	2004-06-16 03:36:06 +00:00
Robert Watson	7721f5d760	Grab the socket buffer send or receive mutex when performing a read-modify-write on the sb_state field. This commit catches only the "easy" ones where it doesn't interact with as yet unmerged locking.	2004-06-15 03:51:44 +00:00
Robert Watson	c0b99ffa02	The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.	2004-06-14 18:16:22 +00:00
Max Laier	02b199f158	Link ALTQ to the build and break with ABI for struct ifnet. Please recompile your (network) modules as well as any userland that might make sense of sizeof(struct ifnet). This does not change the queueing yet. These changes will follow in a seperate commit. Same with the driver changes, which need case by case evaluation. __FreeBSD_version bump will follow. Tested-by: (i386)LINT	2004-06-13 17:29:10 +00:00
Doug Rabson	b8b3323469	Add a new driver to support IP over firewire. This driver is intended to conform to the rfc2734 and rfc3146 standard for IP over firewire and should eventually supercede the fwe driver. Right now the broadcast channel number is hardwired and we don't support MCAP for multicast channel allocation - more infrastructure is required in the firewire code itself to fix these problems.	2004-06-13 10:54:36 +00:00
Robert Watson	310e7ceb94	Socket MAC labels so_label and so_peerlabel are now protected by SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.	2004-06-13 02:50:07 +00:00
Robert Watson	395a08c904	Extend coverage of SOCK_LOCK(so) to include so_count, the socket reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS	2004-06-12 20:47:32 +00:00
Christian S.J. Peron	d316f2cf4f	Modify ip fw so that whenever UID or GID constraints exist in a ruleset, the pcb is looked up once per ipfw_chk() activation. This is done by extracting the required information out of the PCB and caching it to the ipfw_chk() stack. This should greatly reduce PCB looking contention and speed up the processing of UID/GID based firewall rules (especially with large UID/GID rulesets). Some very basic benchmarks were taken which compares the number of in_pcblookup_hash(9) activations to the number of firewall rules containing UID/GID based contraints before and after this patch. The results can be viewed here: o http://people.freebsd.org/~csjp/ip_fw_pcb.png Reviewed by: andre, luigi, rwatson Approved by: bmilekic (mentor)	2004-06-11 22:17:14 +00:00
Robert Watson	c1d587c848	Remove unneeded Giant acquisition in divert_packet(), which is left over from debug.mpsafenet affecting only the forwarding plane. Giant is now acquired in the ithread/netisr or in the system call code.	2004-06-11 04:06:51 +00:00
Robert Watson	c14800e6ff	Lock down parallel router_info list for tracking multicast IGMP versions of various routers seen: - Introduce igmp_mtx. - Protect global variable 'router_info_head' and list fields in struct router_info with this mutex, as well as igmp_timers_are_running. - find_rti() asserts that the caller acquires igmp_mtx. - Annotate a failure to check the return value of MALLOC(..., M_NOWAIT).	2004-06-11 03:42:37 +00:00
Ruslan Ermilov	dd4d62c7d8	init_tables() must be run after sys/net/route.c:route_init().	2004-06-10 20:20:37 +00:00
Ruslan Ermilov	cd8b5ae0ae	Introduce a new feature to IPFW2: lookup tables. These are useful for handling large sparse address sets. Initial implementation by Vsevolod Lobko <seva@ip.net.ua>, refined by me. MFC after: 1 week	2004-06-09 20:10:38 +00:00
Hajimu UMEMOTO	cad1917d48	do not send icmp response if the original packet is encrypted. Obtained from: KAME MFC after: 1 week	2004-06-07 09:56:59 +00:00
Bosko Milekic	ac830b58d1	Move the locking of the pcb into raw_output(). Organize code so that m_prepend() is not called with possibility to wait while the pcb lock is held. What still needs revisiting is whether the ripcbinfo lock is really required here. Discussed with: rwatson	2004-06-03 03:15:29 +00:00
Poul-Henning Kamp	5dba30f15a	add missing #include <sys/module.h>	2004-05-30 20:27:19 +00:00
Poul-Henning Kamp	41ee9f1c69	Add some missing <sys/module.h> includes which are masked by the one on death-row in <sys/kernel.h>	2004-05-30 17:57:46 +00:00
Christian S.J. Peron	b5ef991561	Add a super-user check to ipfw_ctl() to make sure that the calling process is a non-prison root. The security.jail.allow_raw_sockets sysctl variable is disabled by default, however if the user enables raw sockets in prisons, prison-root should not be able to interact with firewall rule sets. Approved by: rwatson, bmilekic (mentor)	2004-05-25 15:02:12 +00:00
Yaroslav Tykhiy	4658dc8325	When checking for possible port theft, skip over a TCP inpcb unless it's in the closed or listening state (remote address == INADDR_ANY). If a TCP inpcb is in any other state, it's impossible to steal its local port or use it for port theft. And if there are both closed/listening and connected TCP inpcbs on the same localIP:port couple, the call to in_pcblookup_local() will find the former due to the design of that function. No objections raised in: -net, -arch MFC after: 1 month	2004-05-20 06:35:02 +00:00
Maxim Konovalov	a49b21371a	o Calculate a number of bytes to copy (cnt) correctly: +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ \| \| \|C\| \| \| \| \| \| \| \| IP \|N\|O\|L\|P\| \| IP \| \| IP \| \| #1 \|O\|D\|E\|T\| \| #2 \| \| #n \| \| \|P\|E\|N\|R\| \| \| \| \| +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ ^ ^<---- cnt - (IPOPT_MINOFF - 1) ---->\| \| \| src \| +-- cp[IPOPT_OFF + 1] + sizeof(struct in_addr) \| dst +-- cp[IPOPT_OFF + 1] PR: kern/66386 Submitted by: Andrei Iltchenko MFC after: 3 weeks	2004-05-11 19:14:44 +00:00
Maxim Konovalov	d0946241ac	o IFNAMSIZ does include the trailing \0. Approved by: andre o Document net.inet.icmp.reply_src.	2004-05-07 01:24:53 +00:00
Andre Oppermann	2bde81acd6	Provide the sysctl net.inet.ip.process_options to control the processing of IP options. net.inet.ip.process_options=0 Ignore IP options and pass packets unmodified. net.inet.ip.process_options=1 Process all IP options (default). net.inet.ip.process_options=2 Reject all packets with IP options with ICMP filter prohibited message. This sysctl affects packets destined for the local host as well as those only transiting through the host (routing). IP options do not have any legitimate purpose anymore and are only used to circumvent firewalls or to exploit certain behaviours or bugs in TCP/IP stacks. Reviewed by: sam (mentor)	2004-05-06 18:46:03 +00:00
Robert Watson	c18b97c630	Switch to using the inpcb MAC label instead of socket MAC label when labeling new mbufs created from sockets/inpcbs in IPv4. This helps avoid the need for socket layer locking in the lower level network paths where inpcb locks are already frequently held where needed. In particular: - Use the inpcb for label instead of socket in raw_append(). - Use the inpcb for label instead of socket in tcp_output(). - Use the inpcb for label instead of socket in tcp_respond(). - Use the inpcb for label instead of socket in tcp_twrespond(). - Use the inpcb for label instead of socket in syncache_respond(). While here, modify tcp_respond() to avoid assigning NULL to a stack variable and centralize assertions about the inpcb when inp is assigned. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research	2004-05-04 02:11:47 +00:00
Robert Watson	87f2bb8caf	Assert inpcb lock in udp_append(). Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research	2004-05-04 01:08:15 +00:00
Robert Watson	cbe42d48bd	Assert the inpcb lock on 'last' in udp_append(), since it's always called with it, and also requires it. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research	2004-05-04 00:10:16 +00:00
Maxim Konovalov	1a0c4873ed	o Fix misindentation in the previous commit.	2004-05-03 17:15:34 +00:00
Andre Oppermann	7652802b06	Back out a change that slipped into the previous commit for which other supporting parts have not yet been committed. Remove pre-mature IP options ignoring option.	2004-05-03 16:07:13 +00:00
Andre Oppermann	06bb56f43c	Optimize IP fastforwarding some more: o New function ip_findroute() to reduce code duplication for the route lookup cases. (luigi) o Store ip_len in host byte order on the stack instead of using it via indirection from the mbuf. This allows to defer the host byte conversion to a later point and makes a quicker fallback to normal ip_input() processing. (luigi) o Check if route is dampned with RTF_REJECT flag and drop packet already here when ARP is unable to resolve destination address. An ICMP unreachable is sent to inform the sender. o Check if interface output queue is full and drop packet already here. No ICMP notification is sent because signalling source quench is depreciated. o Check if media_state is down (used for ethernet type interfaces) and drop the packet already here. An ICMP unreachable is sent to inform the sender. o Do not account sent packets to the interface address counters. They are only for packets with that 'ia' as source address. o Update and clarify some comments. Submitted by: luigi (most of it)	2004-05-03 13:52:47 +00:00
Darren Reed	2f3f1e6773	Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier.	2004-05-02 15:10:17 +00:00
Darren Reed	7fbb130049	oops, I forgot this file in a prior commit (change was still sitting here, uncommitted): Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.	2004-05-02 15:07:37 +00:00
Darren Reed	ab884d993e	Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.	2004-05-02 06:36:30 +00:00
Bosko Milekic	5a59cefcd1	Give jail(8) the feature to allow raw sockets from within a jail, which is less restrictive but allows for more flexible jail usage (for those who are willing to make the sacrifice). The default is off, but allowing raw sockets within jails can now be accomplished by tuning security.jail.allow_raw_sockets to 1. Turning this on will allow you to use things like ping(8) or traceroute(8) from within a jail. The patch being committed is not identical to the patch in the PR. The committed version is more friendly to APIs which pjd is working on, so it should integrate into his work quite nicely. This change has also been presented and addressed on the freebsd-hackers mailing list. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/65800	2004-04-26 19:46:52 +00:00
Mike Silbersack	80dd2a81fb	Tighten up reset handling in order to make reset attacks as difficult as possible while maintaining compatibility with the widest range of TCP stacks. The algorithm is as follows: --- For connections in the ESTABLISHED state, only resets with sequence numbers exactly matching last_ack_sent will cause a reset, all other segments will be silently dropped. For connections in all other states, a reset anywhere in the window will cause the connection to be reset. All other segments will be silently dropped. --- The necessity of accepting all in-window resets was discovered by jayanth and jlemon, both of whom have seen TCP stacks that will respond to FIN-ACK packets with resets not meeting the strict last_ack_sent check. Idea by: Darren Reed Reviewed by: truckman, jlemon, others(?)	2004-04-26 02:56:31 +00:00
Luigi Rizzo	b2a8ac7ca5	Another small set of changes to reduce diffs with the new arp code.	2004-04-25 15:00:17 +00:00
Luigi Rizzo	491522eade	remove a stale comment on the behaviour of arpresolve	2004-04-25 14:06:23 +00:00
Luigi Rizzo	cfff63f1b8	Start the arp timer at init time. It runs so rarely that it makes no sense to wait until the first request.	2004-04-25 12:50:14 +00:00
Luigi Rizzo	cd46a114fc	This commit does two things: 1. rt_check() cleanup: rt_check() is only necessary for some address families to gain access to the corresponding arp entry, so call it only in/near the resolve() routines where it is actually used -- at the moment this is arpresolve(), nd6_storelladdr() (the call is embedded here), and atmresolve() (the call is just before atmresolve to reduce the number of changes). This change will make it a lot easier to decouple the arp table from the routing table. There is an extra call to rt_check() in if_iso88025subr.c to determine the routing info length. I have left it alone for the time being. The interface of arpresolve() and nd6_storelladdr() now changes slightly: + the 'rtentry' parameter (really a hint from the upper level layer) is now passed unchanged from _output(), so it becomes the route to the final destination and not to the gateway. + the routines will return 0 if resolution is possible, non-zero otherwise. + arpresolve() returns EWOULDBLOCK in case the mbuf is being held waiting for an arp reply -- in this case the error code is masked in the caller so the upper layer protocol will not see a failure. 2. arpcom untangling Where possible, use 'struct ifnet' instead of 'struct arpcom' variables, and use the IFP2AC macro to access arpcom fields. This mostly affects the netatalk code. === Detailed changes: === net/if_arcsubr.c rt_check() cleanup, remove a useless variable net/if_atmsubr.c rt_check() cleanup net/if_ethersubr.c rt_check() cleanup, arpcom untangling net/if_fddisubr.c rt_check() cleanup, arpcom untangling net/if_iso88025subr.c rt_check() cleanup netatalk/aarp.c arpcom untangling, remove a block of duplicated code netatalk/at_extern.h arpcom untangling netinet/if_ether.c rt_check() cleanup (change arpresolve) netinet6/nd6.c rt_check() cleanup (change nd6_storelladdr)	2004-04-25 09:24:52 +00:00
Mike Silbersack	6b2fc10b64	Wrap two long lines in the previous commit.	2004-04-23 23:29:49 +00:00
Andre Oppermann	2d166c0202	Correct an edge case in tcp_mss() where the cached path MTU from tcp_hostcache would have overridden a (now) lower MTU of an interface or route that changed since first PMTU discovery. The bug would have caused TCP to redo the PMTU discovery when not strictly necessary. Make a comment about already pre-initialized default values more clear. Reviewed by: sam	2004-04-23 22:44:59 +00:00
Andre Oppermann	22b5770b99	Add the option versrcreach to verify that a valid route to the source address of a packet exists in the routing table. The default route is ignored because it would match everything and render the check pointless. This option is very useful for routers with a complete view of the Internet (BGP) in the routing table to reject packets with spoofed or unrouteable source addresses. Example: ipfw add 1000 deny ip from any to any not versrcreach also known in Cisco-speak as: ip verify unicast source reachable-via any Reviewed by: luigi	2004-04-23 14:28:38 +00:00
Andre Oppermann	b62dccc7e5	Fix a potential race when purging expired hostcache entries. Spotted by: luigi	2004-04-23 13:54:28 +00:00
Mike Silbersack	174624e01d	Take out an unneeded variable I forgot to remove in the last commit, and make two small whitespace fixes so that diffs vs rev 1.142 are minimal.	2004-04-22 08:34:55 +00:00
Mike Silbersack	6ac48b7409	Simplify random port allocation, and add net.inet.ip.portrange.randomized, which can be used to turn off randomized port allocation if so desired. Requested by: alfred	2004-04-22 08:32:14 +00:00
Bruce M Simpson	de9f59f850	Fix a typo in a comment.	2004-04-20 19:04:24 +00:00
Mike Silbersack	6dd946b3f7	Switch from using sequential to random ephemeral port allocation, implementation taken directly from OpenBSD. I've resisted committing this for quite some time because of concern over TIME_WAIT recycling breakage (sequential allocation ensures that there is a long time before ports are recycled), but recent testing has shown me that my fears were unwarranted.	2004-04-20 06:45:10 +00:00
Mike Silbersack	c1537ef063	Enhance our RFC1948 implementation to perform better in some pathlogical TIME_WAIT recycling cases I was able to generate with http testing tools. In short, as the old algorithm relied on ticks to create the time offset component of an ISN, two connections with the exact same host, port pair that were generated between timer ticks would have the exact same sequence number. As a result, the second connection would fail to pass the TIME_WAIT check on the server side, and the SYN would never be acknowledged. I've "fixed" this by adding random positive increments to the time component between clock ticks so that ISNs will always be increasing, no matter how quickly the port is recycled. Except in such contrived benchmarking situations, this problem should never come up in normal usage... until networks get faster. No MFC planned, 4.x is missing other optimizations that are needed to even create the situation in which such quick port recycling will occur.	2004-04-20 06:33:39 +00:00
Luigi Rizzo	ac912b2dc8	Replace Bcopy with 'the real thing' as in the rest of the file.	2004-04-18 11:45:49 +00:00
Luigi Rizzo	e6e51f0518	In an effort to simplify the routing code, try to deprecate rtalloc() in favour of rtalloc_ign(), which is what would end up being called anyways. There are 25 more instances of rtalloc() in net*/ and about 10 instances of rtalloc_ign()	2004-04-14 01:13:14 +00:00
Warner Losh	f36cfd49ad	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson	2004-04-07 20:46:16 +00:00
Ruslan Ermilov	390cdc6a76	Fixed a bug in previous revision: compute the payload checksum before we convert ip_len into a network byte order; in_delayed_cksum() still expects it in host byte order. The symtom was the ``in_cksum_skip: out of data by %d'' complaints from the kernel. To add to the previous commit log. These fixes make tcpdump(1) happy by not complaining about UDP/TCP checksum being bad for looped back IP multicast when multicast router is deactivated. Reported by: Vsevolod Lobko	2004-04-07 10:01:39 +00:00
Bruce Evans	30a4ab088a	Fixed misspelling of IPPORT_MAX as USHRT_MAX. Don't include <sys/limits.h> to implement this mistake. Fixed some nearby style bugs (initialization in declaration, misformatting of this initialization, missing blank line after the declaration, and comparision of the non-boolean result of the initialization with 0 using "!". In KNF, "!" is not even used to compare booleans with 0).	2004-04-06 10:59:11 +00:00
Robert Watson	47f32f6fa6	Two missed in previous commit -- compare pointer with NULL rather than using it as a boolean.	2004-04-05 00:52:05 +00:00
Robert Watson	24459934e9	Prefer NULL to 0 when checking pointer values as integers or booleans.	2004-04-05 00:49:07 +00:00
Pawel Jakub Dawidek	52710de1cb	Fix a panic possibility caused by returning without releasing locks. It was fixed by moving problemetic checks, as well as checks that doesn't need locking before locks are acquired. Submitted by: Ryan Sommers <ryans@gamersimpact.com> In co-operation with: cperciva, maxim, mlaier, sam Tested by: submitter (previous patch), me (current patch) Reviewed by: cperciva, mlaier (previous patch), sam (current patch) Approved by: sam Dedicated to: enough!	2004-04-04 20:14:55 +00:00
Luigi Rizzo	f7c5baa1c6	+ arpresolve(): remove an unused argument + struct ifnet: remove unused fields, move ipv6-related field close to each other, add a pointer to l3<->l2 translation tables (arp,nd6, etc.) for future use. + struct route: remove an unused field, move close to each other some fields that might likely go away in the future	2004-04-04 06:14:55 +00:00
Daniel Eischen	ab39bc9a92	Unbreak natd. Reported and submitted by: Sean McNeil (sean at mcneil.com)	2004-04-02 17:57:57 +00:00
Dag-Erling Smørgrav	e271f829b8	Raise WARNS level to 2.	2004-03-31 21:33:55 +00:00
Dag-Erling Smørgrav	2871c50186	Deal with aliasing warnings. Reviewed by: ru Approved by: silence on the lists	2004-03-31 21:32:58 +00:00
Robert Watson	7101d752b2	Invert the logic of NET_LOCK_GIANT(), and remove the one reference to it. Previously, Giant would be grabbed at entry to the IP local delivery code when debug.mpsafenet was set to true, as that implied Giant wouldn't be grabbed in the driver path. Now, we will use this primitive to conditionally grab Giant in the event the entire network stack isn't running MPSAFE (debug.mpsafenet == 0).	2004-03-28 23:12:19 +00:00
Pawel Jakub Dawidek	56dc72c3b6	Remove unused argument.	2004-03-28 15:48:00 +00:00
Pawel Jakub Dawidek	b0330ed929	Reduce 'td' argument to 'cred' (struct ucred) argument in those functions: - in_pcbbind(), - in_pcbbind_setup(), - in_pcbconnect(), - in_pcbconnect_setup(), - in6_pcbbind(), - in6_pcbconnect(), - in6_pcbsetport(). "It should simplify/clarify things a great deal." --rwatson Requested by: rwatson Reviewed by: rwatson, ume	2004-03-27 21:05:46 +00:00
Pawel Jakub Dawidek	6823b82399	Remove unused argument. Reviewed by: ume	2004-03-27 20:41:32 +00:00
Hajimu UMEMOTO	a5d1aae31a	Validate IPv6 socket options more carefully to avoid a panic. PR: kern/61513 Reviewed by: cperciva, nectar	2004-03-26 19:52:18 +00:00
Pawel Jakub Dawidek	8da601dfb7	Remove unused function. It was used in FreeBSD 4.x, but now we're using cr_canseesocket().	2004-03-25 15:12:12 +00:00
Ruslan Ermilov	26f16ebeb1	Untangle IP multicast routing interaction with delayed payload checksums. Compute the payload checksum for a locally originated IP multicast where God intended, in ip_mloopback(), rather than doing it in ip_output() and only when multicast router is active. This is more correct as we do not fool ip_input() that the packet has the correct payload checksum when in fact it does not (when multicast router is inactive). This is also more efficient if we don't join the multicast group we send to, thus allowing the hardware to checksum the payload.	2004-03-25 08:46:27 +00:00
Robert Watson	bdae44a844	Lock down global variables in if_gre: - Add gre_mtx to protect global softc list. - Hold gre_mtx over various list operations (insert, delete). - Centralize if_gre interface teardown in gre_destroy(), and call this from modevent unload and gre_clone_destroy(). - Export gre_mtx to ip_gre.c, which walks the gre list to look up gre interfaces during encapsulation. Add a wonking comment on how we need some sort of drain/reference count mechanism to keep gre references alive while in use and simultaneous destroy. This commit does not lockdown softc data, which follows in a future commit.	2004-03-22 16:04:43 +00:00
Matthew N. Dodd	2964fb6538	- Fix indentation lost by 'diff -b'. - Un-wrap short line.	2004-03-21 18:51:26 +00:00
Matthew N. Dodd	64bf80ce1b	Remove interface type specific code from arprequest(), and in_arpinput(). The AF_ARP case in the (*if_output)() routine will handle the interface type specific bits. Obtained from: NetBSD	2004-03-21 06:36:05 +00:00
Dag-Erling Smørgrav	f0f93429cf	Run through indent(1) so I can read the code without getting a headache. The result isn't quite knf, but it's knfer than the original, and far more consistent.	2004-03-16 21:30:41 +00:00
Matthew N. Dodd	e952fa39de	De-register.	2004-03-14 00:44:11 +00:00
Robert Watson	fe5a02c927	Lock down IP-layer encapsulation library: - Add encapmtx to protect ip_encap.c global variables (encapsulation list). - Unifdef #ifdef 0 pieces of encap_init() which was (and now really is) basically a no-op. - Lock encapmtx when walking encaptab, modifying it, comparing entries, etc. - Remove spl's. Note that currently there's no facilite to make sure outstanding use of encapsulation methods on a table entry have drained bfore we allow a table entry to be removed. As such, it's currently the caller's responsibility to make sure that draining takes place. Reviewed by: mlaier	2004-03-10 02:48:50 +00:00
Robert Watson	846840ba95	Scrub unused variable zeroin_addr.	2004-03-10 01:01:04 +00:00
Jeffrey Hsu	a062038267	To comply with the spec, do not copy the TOS from the outer IP header to the inner IP header of the PIM Register if this is a PIM Null-Register message. Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2004-03-08 07:47:27 +00:00
Jeffrey Hsu	4c9792f9d3	Include <sys/types.h> for autoconf/automake detection. Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2004-03-08 07:45:32 +00:00
Max Laier	b81dae751b	Add some missing DUMMYNET_UNLOCK() in config_pipe(). Noticed by: Simon Coggins Approved by: bms(mentor)	2004-03-03 01:33:22 +00:00
Max Laier	4672d81921	Two minor follow-ups on the MT_TAG removal: ifp is now passed explicitly to ether_demux; no need to look it up again. Make mtag a global var in ip_input. Noticed by: rwatson Approved by: bms(mentor)	2004-03-02 14:37:23 +00:00
Robert Watson	6200a93f82	Rename NET_PICKUP_GIANT() to NET_LOCK_GIANT(), and NET_DROP_GIANT() to NET_UNLOCK_GIANT(). While they are used in similar ways, the semantics are quite different -- NET_LOCK_GIANT() and NET_UNLOCK_GIANT() directly wrap mutex lock and unlock operations, whereas drop/pickup special case the handling of Giant recursion. Add a comment saying as much. Add NET_ASSERT_GIANT(), which conditionally asserts Giant based on the value of debug_mpsafenet.	2004-03-01 22:37:01 +00:00
Hajimu UMEMOTO	04d3a45241	fix -O0 compilation without INET6. Pointed out by: ru	2004-03-01 19:10:31 +00:00
Robert Watson	768bbd68cc	Remove unneeded {} originally used to hold local variables for dummynet in a code block, as the variable is now gone. Submitted by: sam	2004-02-28 19:50:43 +00:00
Robert Watson	a7b6a14aee	Remove now unneeded arguments to tcp_twrespond() -- so and msrc. These were needed by the MAC Framework until inpcbs gained labels. Submitted by: sam	2004-02-28 15:12:20 +00:00
Max Laier	25a4adcec4	Bring eventhandler callbacks for pf. This enables pf to track dynamic address changes on interfaces (dailup) with the "on (<ifname>)"-syntax. This also brings hooks in anticipation of tracking cloned interfaces, which will be in future versions of pf. Approved by: bms(mentor)	2004-02-26 04:27:55 +00:00
Max Laier	cc5934f5af	Tweak existing header and other build infrastructure to be able to build pf/pflog/pfsync as modules. Do not list them in NOTES or modules/Makefile (i.e. do not connect it to any (automatic) builds - yet). Approved by: bms(mentor)	2004-02-26 03:53:54 +00:00
Don Lewis	47934cef8f	Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms	2004-02-26 00:27:04 +00:00
Max Laier	ac9d7e2618	Re-remove MT_TAGs. The problems with dummynet have been fixed now. Tested by: -current, bms(mentor), me Approved by: bms(mentor), sam	2004-02-25 19:55:29 +00:00
Bruce Evans	0613995bd0	Fixed namespace pollution in rev.1.74. Implementation of the syncache increased <netinet/tcp_var>'s already large set of prerequisites, and this was handled badly. Just don't declare the complete syncache struct unless <netinet/pcb.h> is included before <netinet/tcp_var.h>. Approved by: jlemon (years ago, for a more invasive fix)	2004-02-25 13:03:01 +00:00
Bruce Evans	a545b1dc4d	Don't use the negatively-opaque type uma_zone_t or be chummy with <vm/uma.h>'s idempotency indentifier or its misspelling.	2004-02-25 11:53:19 +00:00
Jeffrey Hsu	89c02376fc	Relax a KASSERT condition to allow for a valid corner case where the FIN on the last segment consumes an extra sequence number. Spurious panic reported by Mike Silbersack <silby@silby.com>.	2004-02-25 08:53:17 +00:00
Andre Oppermann	12e2e97051	Convert the tcp segment reassembly queue to UMA and limit the maximum amount of segments it will hold. The following tuneables and sysctls control the behaviour of the tcp segment reassembly queue: net.inet.tcp.reass.maxsegments (loader tuneable) specifies the maximum number of segments all tcp reassemly queues can hold (defaults to 1/16 of nmbclusters). net.inet.tcp.reass.maxqlen specifies the maximum number of segments any individual tcp session queue can hold (defaults to 48). net.inet.tcp.reass.cursegments (readonly) counts the number of segments currently in all reassembly queues. net.inet.tcp.reass.overflows (readonly) counts how often either the global or local queue limit has been reached. Tested by: bms, silby Reviewed by: bms, silby	2004-02-24 15:27:41 +00:00
Pawel Jakub Dawidek	41fe0c8ad5	Fixed ucred structure leak. Approved by: scottl (mentor) PR: 54163 MFC after: 3 days	2004-02-19 14:13:21 +00:00
Max Laier	36e8826ffb	Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is not working properly with the patch in place. Approved by: bms(mentor)	2004-02-18 00:04:52 +00:00
Hajimu UMEMOTO	da0f40995d	IPSEC and FAST_IPSEC have the same internal API now; so merge these (IPSEC has an extra ipsecstat) Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>	2004-02-17 14:02:37 +00:00
Bruce M Simpson	88f6b0435e	Shorten the name of the socket option used to enable TCP-MD5 packet treatment. Submitted by: Vincent Jardin	2004-02-16 22:21:16 +00:00
Hajimu UMEMOTO	70dbc6cbfc	don't update outgoing ifp, if ipsec tunnel mode encapsulation was not made. Obtained from: KAME	2004-02-16 17:05:06 +00:00
Bruce M Simpson	91179f796d	Spell types consistently throughout this file. Do not use the __packed attribute, as we are often #include'd from userland without <sys/cdefs.h> in front of us, and it is not strictly necessary. Noticed by: Sascha Blank	2004-02-16 14:40:56 +00:00
Bruce M Simpson	32ff046639	Final brucification pass. Spell types consistently (u_int). Remove bogus casts. Remove unnecessary parenthesis. Submitted by: bde	2004-02-14 21:49:48 +00:00
Max Laier	97075d0c0a	Do not expose ip_dn_find_rule inline function to userland and unbreak world. ----------------------------------------------------------------------	2004-02-13 22:26:36 +00:00
Max Laier	189a0ba4e7	Do not check receive interface when pfil(9) hook changed address. Approved by: bms(mentor)	2004-02-13 19:20:43 +00:00
Max Laier	1094bdca51	This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing them mostly with packet tags (one case is handled by using an mbuf flag since the linkage between "caller" and "callee" is direct and there's no need to incur the overhead of a packet tag). This is (mostly) work from: sam Silence from: -arch Approved by: bms(mentor), sam, rwatson	2004-02-13 19:14:16 +00:00
Bruce M Simpson	265ed01285	Brucification. Submitted by: bde	2004-02-13 18:21:45 +00:00
Hajimu UMEMOTO	efddf5c64d	supported IPV6_RECVPATHMTU socket option. Obtained from: KAME	2004-02-13 14:50:01 +00:00
Bruce M Simpson	b30190b542	Update the prototype for tcpsignature_apply() to reflect the spelling of the types used by m_apply()'s callback function, f, as documented in mbuf(9). Noticed by: njl	2004-02-12 20:16:09 +00:00
Bruce M Simpson	bca0e5bfc3	style(9) pass; whitespace and comments. Submitted by: njl	2004-02-12 20:12:48 +00:00
Bruce M Simpson	a0194ef1ea	Remove an unnecessary initialization that crept in from the code which verifies TCP-MD5 digests. Noticed by: njl	2004-02-12 20:08:28 +00:00
Bruce M Simpson	45d370ee8b	Fix a typo; left out preprocessor conditional for sigoff variable, which is only used by TCP_SIGNATURE code. Noticed by: Roop Nanuwa	2004-02-11 09:46:54 +00:00
Bruce M Simpson	1cfd4b5326	Initial import of RFC 2385 (TCP-MD5) digest support. This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC. For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence. Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB. There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity. Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem. This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment. Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request. Sponsored by: sentex.net	2004-02-11 04:26:04 +00:00
Hajimu UMEMOTO	f073c60f73	pass pcb rather than so. it is expected that per socket policy works again.	2004-02-03 18:20:55 +00:00
Andre Oppermann	b74d89bbbb	Add sysctl net.inet.icmp.reply_src to specify the interface name used for the ICMP reply source in reponse to packets which are not directly addressed to us. By default continue with with normal source selection. Reviewed by: bms	2004-02-02 22:53:16 +00:00
Andre Oppermann	1488eac8ec	More verbose description of the source ip address selection for ICMP replies. Reviewed by: bms	2004-02-02 22:17:09 +00:00
Poul-Henning Kamp	be8a62e821	Introduce the SO_BINTIME option which takes a high-resolution timestamp at packet arrival. For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL since it has higher resolution and lower overhead. Simultaneous use of the two options is possible and they will return consistent timestamps. This introduces an extra test and a function call for SO_TIMEVAL, but I have not been able to measure that.	2004-01-31 10:40:25 +00:00
Maxim Sobolev	4c83789253	Remove NetBSD'isms (add FreeBSD'isms?), which makes gre(4) working again.	2004-01-30 09:03:01 +00:00
Ruslan Ermilov	0ca2861fc9	Correct the descriptions of the net.inet.{udp,raw}.recvspace sysctls.	2004-01-27 22:17:39 +00:00
Maxim Sobolev	7735aeb9bb	Add support for WCCPv2. It should be enablem manually using link2 ifconfig(8) flag since header for version 2 is the same but IP payload is prepended with additional 4-bytes field. Inspired by: Roman Synyuk <roman@univ.kiev.ua> MFC after: 2 weeks	2004-01-26 12:33:56 +00:00
Maxim Sobolev	6e628b8187	(whilespace-only) Kill trailing spaces.	2004-01-26 12:21:59 +00:00
Andre Oppermann	241f1e33b1	Remove leftover FREE() from changes in rev 1.50. Noticed by: Jun Kuriyama <kuriyama@imgsrc.co.jp>	2004-01-23 01:39:12 +00:00
Andre Oppermann	201d185b69	Split the overloaded variable 'win' into two for their specific purposes: recwin and sendwin. This removes a big source of confusion and makes following the code much easier. Reviewed by: sam (mentor) Obtained from: DragonFlyBSD rev 1.6 (hsu)	2004-01-22 23:22:14 +00:00
Andre Oppermann	1ddba8d63e	Move the reduction by one of the syncache limit after the zone has been allocated. Reviewed by: sam (mentor) Obtained from: DragonFlyBSD rev 1.6 (hsu)	2004-01-22 23:14:48 +00:00
Andre Oppermann	73080de2be	Remove an unused variable and put the sockaddr_in6 onto the stack instead of malloc'ing it. Reviewed by: sam (mentor) Obtained from: DragonFlyBSD rev 1.6 (hsu)	2004-01-22 23:10:11 +00:00
Jeffrey Hsu	61a36e3dfc	Merge from DragonFlyBSD rev 1.10: date: 2003/09/02 10:04:47; author: hsu; state: Exp; lines: +5 -6 Account for when Limited Transmit is not congestion window limited. Obtained from: DragonFlyBSD	2004-01-20 21:40:25 +00:00
Poul-Henning Kamp	5e289f9eb6	Mostly mechanical rework of libalias: Makes it possible to have multiple packet aliasing instances in a single process by moving all static and global variables into an instance structure called "struct libalias". Redefine a new API based on s/PacketAlias/LibAlias/g Add new "instance" argument to all functions in the new API. Implement old API in terms of the new API.	2004-01-17 10:52:21 +00:00
Hajimu UMEMOTO	548c676b32	do not deref freed pointer Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net> Reviewed by: itojun	2004-01-13 09:51:47 +00:00
Andre Oppermann	bed824fa90	Disable the minmssoverload connection drop by default until the detection logic is refined.	2004-01-12 15:46:04 +00:00
Don Lewis	e29ef13f6c	Check that sa_len is the appropriate value in tcp_usr_bind(), tcp6_usr_bind(), tcp_usr_connect(), and tcp6_usr_connect() before checking to see whether the address is multicast so that the proper errno value will be returned if sa_len is incorrect. The checks are identical to the ones in in_pcbbind_setup(), in6_pcbbind(), and in6_pcbladdr(), which are called after the multicast address check passes. MFC after: 30 days	2004-01-10 08:53:00 +00:00
Andre Oppermann	1ddc17c1d5	Reduce TCP_MINMSS default to 216. The AX.25 protocol (packet radio) is frequently used with an MTU of 256 because of slow speeds and a high packet loss rate.	2004-01-09 14:14:10 +00:00
Andre Oppermann	53369ac9bb	Limiters and sanity checks for TCP MSS (maximum segement size) resource exhaustion attacks. For network link optimization TCP can adjust its MSS and thus packet size according to the observed path MTU. This is done dynamically based on feedback from the remote host and network components along the packet path. This information can be abused to pretend an extremely low path MTU. The resource exhaustion works in two ways: o during tcp connection setup the advertized local MSS is exchanged between the endpoints. The remote endpoint can set this arbitrarily low (except for a minimum MTU of 64 octets enforced in the BSD code). When the local host is sending data it is forced to send many small IP packets instead of a large one. For example instead of the normal TCP payload size of 1448 it forces TCP payload size of 12 (MTU 64) and thus we have a 120 times increase in workload and packets. On fast links this quickly saturates the local CPU and may also hit pps processing limites of network components along the path. This type of attack is particularly effective for servers where the attacker can download large files (WWW and FTP). We mitigate it by enforcing a minimum MTU settable by sysctl net.inet.tcp.minmss defaulting to 256 octets. o the local host is reveiving data on a TCP connection from the remote host. The local host has no control over the packet size the remote host is sending. The remote host may chose to do what is described in the first attack and send the data in packets with an TCP payload of at least one byte. For each packet the tcp_input() function will be entered, the packet is processed and a sowakeup() is signalled to the connected process. For example an attack with 2 Mbit/s gives 4716 packets per second and the same amount of sowakeup()s to the process (and context switches). This type of attack is particularly effective for servers where the attacker can upload large amounts of data. Normally this is the case with WWW server where large POSTs can be made. We mitigate this by calculating the average MSS payload per second. If it goes below 'net.inet.tcp.minmss' and the pps rate is above 'net.inet.tcp.minmssoverload' defaulting to 1000 this particular TCP connection is resetted and dropped. MITRE CVE: CAN-2004-0002 Reviewed by: sam (mentor) MFC after: 1 day	2004-01-08 17:40:07 +00:00
Andre Oppermann	bf87c82ebb	If path mtu discovery is enabled set the DF bit in all cases we send packets on a tcp connection. PR: kern/60889 Tested by: Richard Wendland <richard@wendland.org.uk> Approved by: re (scottl)	2004-01-08 11:17:11 +00:00
Andre Oppermann	e0f630ea7a	Do not set the ip_id to zero when DF is set on packet and restore the general pre-randomid behaviour. Setting the ip_id to zero causes several problems with packet reassembly when a device along the path removes the DF bit for some reason. Other BSD and Linux have found and fixed the same issues. PR: kern/60889 Tested by: Richard Wendland <richard@wendland.org.uk> Approved by: re (scottl)	2004-01-08 11:13:40 +00:00
Andre Oppermann	dba7bc6a65	Enable the following TCP options by default to give it more exposure: rfc3042 Limited retransmit rfc3390 Increasing TCP's initial congestion Window inflight TCP inflight bandwidth limiting All my production server have it enabled and there have been no issues. I am confident about having them on by default and it gives us better overall TCP performance. Reviewed by: sam (mentor)	2004-01-06 23:29:46 +00:00
Andre Oppermann	87c3bd2755	According to RFC1812 we have to ignore ICMP redirects when we are acting as router (ipforwarding enabled). This doesn't fix the problem that host routes from ICMP redirects are never removed from the kernel routing table but removes the problem for machines doing packet forwarding. Reviewed by: sam (mentor)	2004-01-06 23:20:07 +00:00
Ruslan Ermilov	3b95e1346a	Document the net.inet.ip.subnets_are_local sysctl.	2003-12-30 16:05:03 +00:00
Maxim Sobolev	73d7ddbc56	Sync with NetBSD: if_gre.c rev.1.41-1.49 o Spell output with two ts. o Remove assigned-to but not used variable. o fix grammatical error in a diagnostic message. o u_short -> u_int16_t. o gi_len is ip_len, so it has to be network byteorder. if_gre.h rev.1.11-1.13 o prototype must not have variable name. o u_short -> u_int16_t. o Spell address with two d's. ip_gre.c rev.1.22-1.29 o KNF - return is not a function. o The "osrc" variable in gre_mobile_input() is only ever set but not referenced; remove it. o correct (false) assumptions on mbuf chain. not sure if it really helps, but anyways, it is necessary to perform m_pullup. o correct arg to m_pullup (need to count IP header size as well). o remove redundant adjustment of m->m_pkthdr.len. o clear m_flags just for safety. o tabify. o u_short -> u_int16_t. MFC after: 2 weeks	2003-12-30 11:41:43 +00:00
Sam Leffler	437ffe1823	o eliminate widespread on-stack mbuf use for bpf by introducing a new bpf_mtap2 routine that does the right thing for an mbuf and a variable-length chunk of data that should be prepended. o while we're sweeping the drivers, use u_int32_t uniformly when when prepending the address family (several places were assuming sizeof(int) was 4) o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated mbufs have been eliminated; this may better be moved to the bpf routines Reviewed by: arch@ and several others	2003-12-28 03:56:00 +00:00
Maxim Konovalov	fad1d65260	o Fix a comment: softticks lives in sys/kern/kern_timeout.c. PR: kern/60613 Submitted by: Gleb Smirnoff MFC after: 3 days	2003-12-27 14:08:53 +00:00
Hajimu UMEMOTO	8b8a0cef40	NULL is not 0. Submitted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>	2003-12-24 18:22:04 +00:00
Ruslan Ermilov	3117579171	I didn't notice it right away, but check the right length too.	2003-12-23 14:08:50 +00:00
Ruslan Ermilov	78e2d2bd28	Fix a problem introduced in revision 1.84: m_pullup() does not necessarily return the same mbuf chain so we need to recompute mtod() consumers after pulling up.	2003-12-23 13:33:23 +00:00
Peter Wemm	a89ec05e3e	Catch a few places where NULL (pointer) was used where 0 (integer) was expected.	2003-12-23 02:36:43 +00:00
Sam Leffler	ededbec187	o move mutex init/destroy logic to the module load/unload hooks; otherwise they are initialized twice when the code is statically configured in the kernel because the module load method gets invoked before the user application calls ip_mrouter_init o add a mutex to synchronize the module init/done operations; this sort of was done using the value of ip_mroute but X_ip_mrouter_done sets it to NULL very early on which can lead to a race against ip_mrouter_init--using the additional mutex means this is safe now o don't call ip_mrouter_reset from ip_mrouter_init; this now happens once at module load and X_ip_mrouter_done does the appropriate cleanup work to insure the data structures are in a consistent state so that a subsequent init operation inherits good state Reviewed by: juli	2003-12-20 18:32:48 +00:00
John Baldwin	a5b061f9d2	Fix some becuase -> because typos. Reported by: Marco Wertejuk <wertejuk@mwcis.com>	2003-12-17 16:12:01 +00:00
Robert Watson	2d92ec9858	Switch TCP over to using the inpcb label when responding in timed wait, rather than the socket label. This avoids reaching up to the socket layer during connection close, which requires locking changes. To do this, introduce MAC Framework entry point mac_create_mbuf_from_inpcb(), which is called from tcp_twrespond() instead of calling mac_create_mbuf_from_socket() or mac_create_mbuf_netlayer(). Introduce MAC Policy entry point mpo_create_mbuf_from_inpcb(), and implementations for various policies, which generally just copy label data from the inpcb to the mbuf. Assert the inpcb lock in the entry point since we require consistency for the inpcb label reference. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-12-17 14:55:11 +00:00
Maxim Konovalov	1c86761b2a	o IN_MULTICAST wants an address in host byte order. PR: kern/60304 Submitted by: demon MFC after: 1 week	2003-12-16 18:21:47 +00:00
Maksim Yevmenkin	a6a66f5c4c	Do not panic when flushing dummynet firewall rules Reviewed by: andre Approved by: re (scottl)	2003-12-06 09:01:25 +00:00
Andre Oppermann	f5bd8e9aff	Swap destination and source arguments of two bcopy() calls. Before committing the initial tcp_hostcache I changed them from memcpy() to conform with FreeBSD style without realizing the difference in argument definition. This fixes hostcache operation for IPv6 (in general and explicitly IPv6 path mtu discovery) and T/TCP (RFC1644). Submitted by: Taku YAMAMOTO <taku@cent.saitama-u.ac.jp> Approved by: re (rwatson)	2003-12-02 21:25:12 +00:00
Sam Leffler	d559f5c3d8	Include opt_ipsec.h so IPSEC/FAST_IPSEC is defined and the appropriate code is compiled in to support the O_IPSEC operator. Previously no support was included and ipsec rules were always matching. Note that we do not return an error when an ipsec rule is added and the kernel does not have IPsec support compiled in; this is done intentionally but we may want to revisit this (document this in the man page). PR: 58899 Submitted by: Bjoern A. Zeeb Approved by: re (rwatson)	2003-12-02 00:23:45 +00:00
Andre Oppermann	cd6c4060c8	Fix an optimization where I made an ifdef'd out section to broad. When the hostcache bucket limit is reached the last bucket wasn't removed from the bucket row but inserted a few lines later at the bucket row head again. This leads to infinite loop when the same bucket row is accessed the next time for a lookup/insert or purge action. Tested by: imp, Matt Smith Approved by: re (rwatson)	2003-11-28 16:33:03 +00:00
Andre Oppermann	623f556031	Fix verify_rev_path() function. The author of this function tried to cut corners which completely broke down when the routing table locking was introduced. Reviewed by: sam (mentor) Approved by: re (rwatson)	2003-11-27 09:40:13 +00:00
Andre Oppermann	0cfbbe3bde	Make sure all uses of stack allocated struct route's are properly zeroed. Doing a bzero on the entire struct route is not more expensive than assigning NULL to ro.ro_rt and bzero of ro.ro_dst. Reviewed by: sam (mentor) Approved by: re (scottl)	2003-11-26 20:31:13 +00:00
Sam Leffler	5bd311a566	Split the "inp" mutex class into separate classes for each of divert, raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness complaints. Reviewed by: rwatson Approved by: re (rwatson)	2003-11-26 01:40:44 +00:00
Andre Oppermann	943ae30252	Restructure a too broad ifdef which was disabling the setting of the tcp flightsize sysctl value for local networks in the !INET6 case. Approved by: re (scottl)	2003-11-25 20:58:59 +00:00
Sam Leffler	6714d7c751	Correct a problem where ipfw-generated packets were being returned for ipfw processing w/o an indication the packets were generated by ipfw--and so should not be processed (this manifested itself as a LOR.) The flag bit in the mbuf that was used to mark the packets was not listed in M_COPYFLAGS so if a packet had a header prepended (as done by IPsec) the flag was lost. Correct this by defining a new M_PROTO6 flag and use it to mark packets that need this processing. Reviewed by: bms Approved by: re (rwatson) MFC after: 2 weeks	2003-11-24 03:57:03 +00:00
Sam Leffler	6a3ca7514d	Use MPSAFE callouts only when debug.mpsafenet is 1. Both timer routines potentially transmit packets that may enter KAME IPsec w/o Giant if the callouts are marked MPSAFE. Reviewed by: ume Approved by: re (rwatson)	2003-11-23 18:13:41 +00:00
Thomas Moestl	1f831750b5	bzero() the the sockaddr used for the destination address for rtalloc_ign() in in_pcbconnect_setup() before it is filled out. Otherwise, stack junk would be left in sin_zero, which could cause host routes to be ignored because they failed the comparison in rn_match(). This should fix the wrong source address selection for connect() to 127.0.0.1, among other things. Reviewed by: sam Approved by: re (rwatson)	2003-11-23 03:02:00 +00:00
Andre Oppermann	97d8d152c2	Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)	2003-11-20 20:07:39 +00:00
Andre Oppermann	26d02ca7ba	Remove RTF_PRCLONING from routing table and adjust users of it accordingly. The define is left intact for ABI compatibility with userland. This is a pre-step for the introduction of tcp_hostcache. The network stack remains fully useable with this change. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)	2003-11-20 19:47:31 +00:00
Maxim Konovalov	dbf7b38125	Fix an arguments order in check_uidgid() call. PR: kern/59314 Submitted by: Andrey V. Shytov Approved by: re (rwatson, jhb)	2003-11-20 10:28:33 +00:00
Robert Watson	a557af222b	Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-18 00:39:07 +00:00
Olivier Houchard	8c8268cb4f	In rip_abort(), unlock the inpcb if we didn't detach it, or we may recurse on the lock before destroying the mutex. Submitted by: sam	2003-11-17 19:21:53 +00:00
Brian Feldman	633461295a	Fix a few cases where MT_TAG-type "fake mbufs" are created on the stack, but do not have mh_nextpkt initialized. Somtimes what's there is "1", and the ip_input() code pukes trying to m_free() it, rendering divert sockets and such broken. This really underscores the need to get rid of MT_TAG. Reviewed by: rwatson	2003-11-17 03:17:49 +00:00
Andre Oppermann	be7e82e44a	Make two casts correct for all types of 64bit platforms. Explained by: bde	2003-11-16 12:50:33 +00:00
Andre Oppermann	df903fee84	Correct a cast to make it compile on 64bit platforms (noticed by tinderbox) and remove two unneccessary variable initializations. Make the introduction comment more clear with regard which parts of the packet are touched. Requested by: luigi	2003-11-15 17:03:37 +00:00
Andre Oppermann	c76ff7084f	Make ipstealth global as we need it in ip_fastforward too.	2003-11-15 01:45:56 +00:00
Andre Oppermann	02c1c7070e	Remove the global one-level rtcache variable and associated complex locking and rework ip_rtaddr() to do its own rtlookup. Adopt all its callers to this and make ip_output() callable with NULL rt pointer. Reviewed by: sam (mentor)	2003-11-14 21:48:57 +00:00
Andre Oppermann	9188b4a169	Introduce ip_fastforward and remove ip_flow. Short description of ip_fastforward: o adds full direct process-to-completion IPv4 forwarding code o handles ip fragmentation incl. hw support (ip_flow did not) o sends icmp needfrag to source if DF is set (ip_flow did not) o supports ipfw and ipfilter (ip_flow did not) o supports divert, ipfw fwd and ipfilter nat (ip_flow did not) o returns anything it can't handle back to normal ip_input Enable with sysctl -w net.inet.ip.fastforwarding=1 Reviewed by: sam (mentor)	2003-11-14 21:02:22 +00:00
Sam Leffler	f7bbe2c0f1	add missing inpcb lock before call to tcp_twclose (which reclaims the inpcb) Supported by: FreeBSD Foundation	2003-11-13 05:18:23 +00:00
Sam Leffler	1b73ca0bf1	o reorder some locking asserts to reflect the order of the locks o correct a read-lock assert in in_pcblookup_local that should be a write-lock assert (since time wait close cleanups may alter state) Supported by: FreeBSD Foundation	2003-11-13 05:16:56 +00:00
Andre Oppermann	16d6c90f5d	Move global variables for icmp_input() to its stack. With SMP or preemption two CPUs can be in the same function at the same time and clobber each others variables. Remove register declaration from local variables. Reviewed by: sam (mentor)	2003-11-13 00:32:13 +00:00
Andre Oppermann	2683ceb661	Do not fragment a packet with hardware assistance if it has the DF bit set. Reviewed by: sam (mentor)	2003-11-12 23:35:40 +00:00
Bruce M Simpson	83453a06de	Add a new sysctl knob, net.inet.udp.strict_mcast_mship, to the udp_input path. This switch toggles between strict multicast delivery, and traditional multicast delivery. The traditional (default) behaviour is to deliver multicast datagrams to all sockets which are members of that group, regardless of the network interface where the datagrams were received. The strict behaviour is to deliver multicast datagrams received on a particular interface only to sockets whose membership is bound to that interface. Note that as a matter of course, multicast consumers specifying INADDR_ANY for their interface get joined on the interface where the default route happens to be bound. This switch has no effect if the interface which the consumer specifies for IP_ADD_MEMBERSHIP is not UP and RUNNING. The original patch has been cleaned up somewhat from that submitted. It has been tested on a multihomed machine with multiple QuickTime RTP streams running over the local switch, which doesn't do IGMP snooping. PR: kern/58359 Submitted by: William A. Carrel Reviewed by: rwatson MFC after: 1 week	2003-11-12 20:17:11 +00:00
Andre Oppermann	122aad88d5	dropwithreset is not needed in this case as tcp_drop() is already notifying the other side. Before we were sending two RST packets.	2003-11-12 19:38:01 +00:00
Robert Watson	eca8a663d4	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-12 03:14:31 +00:00
Sam Leffler	a0bf1601a7	correct typos Pointed out by: Mike Silbersack	2003-11-11 18:16:54 +00:00
Sam Leffler	3d0b255a9a	o add missing inpcb locking in tcp_respond o replace spl's with lock assertions Supported by: FreeBSD Foundation	2003-11-11 17:54:47 +00:00
Sam Leffler	383df78dc8	use Giant-less callouts when debug_mpsafenet is non-zero Supported by: FreeBSD Foundation	2003-11-10 23:29:33 +00:00
Ian Dowse	3ab2096b80	In in_pcbconnect_setup(), don't use the cached inp->inp_route unless it is marked as RTF_UP. This appears to fix a crash that was sometimes triggered when dhclient(8) tried to send a packet after an interface had been detatched. Reviewed by: sam	2003-11-10 22:45:37 +00:00
Jeffrey Hsu	1ce43e2348	Mark TCP syncache timer as not Giant-free ready yet.	2003-11-10 20:42:04 +00:00
Sam Leffler	7138d65c3f	replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF macros that expand to include assertions when the system is built with INVARIANTS Supported by: FreeBSD Foundation	2003-11-08 23:36:32 +00:00
Sam Leffler	252f24a2cf	divert socket fixups: o pickup Giant in divert_packet to protect sbappendaddr since it can be entered through MPSAFE callouts or through ip_input when mpsafenet is 1 o add missing locking on output o add locking to abort and shutdown o add a ctlinput handler to invalidate held routing table references on an ICMP redirect (may not be needed) Supported by: FreeBSD Foundation	2003-11-08 23:09:42 +00:00
Sam Leffler	8484384564	assert optional inpcb is passed in locked Supported by: FreeBSD Foundation	2003-11-08 23:03:29 +00:00
Sam Leffler	59daba27d9	add locking assertions Supported by: FreeBSD Foundation	2003-11-08 23:02:36 +00:00
Sam Leffler	3c47a187b7	assert inpcb is locked in udp_output Supported by: FreeBSD Foundation	2003-11-08 23:00:48 +00:00
Sam Leffler	c29afad673	o correct locking problem: the inpcb must be held across tcp_respond o add assertions in tcp_respond to validate inpcb locking assumptions o use local variable instead of chasing pointers in tcp_respond Supported by: FreeBSD Foundation	2003-11-08 22:59:22 +00:00
Sam Leffler	2a0746208b	use local values instead of chasing pointers Supported by: FreeBSD Foundation	2003-11-08 22:57:13 +00:00
Sam Leffler	fa286d7db2	replace mtx_assert by INP_LOCK_ASSERT Supported by: FreeBSD Foundation	2003-11-08 22:55:52 +00:00
Sam Leffler	50d7c061a3	add some missing locking Supported by: FreeBSD Foundation	2003-11-08 22:53:41 +00:00
Sam Leffler	1d78192b35	the sbappendaddr call in socket_send must be protected by Giant because it can happen from an MPSAFE callout Supported by: FreeBSD Foundation	2003-11-08 22:51:18 +00:00
Sam Leffler	e3f268fc89	add locking assertions that turn into noops if INET6 is configured; this is necessary because the ipv6 code shares the in_pcb code with ipv4 but (presently) lacks proper locking Supported by: FreeBSD Foundation	2003-11-08 22:48:27 +00:00
Sam Leffler	7902224c6b	o add a flags parameter to netisr_register that is used to specify whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive Supported by: FreeBSD Foundation	2003-11-08 22:28:40 +00:00
Sam Leffler	27a940c9a2	unbreak compilation of FAST_IPSEC Supported by: FreeBSD Foundation	2003-11-08 00:34:34 +00:00
Sam Leffler	aab621f060	MFp4: reminder that random id code is not reentrant Supported by: FreeBSD Foundation	2003-11-07 23:31:29 +00:00
Sam Leffler	8f1ee3683d	Move uid/gid checking logic out of line and lock inpcb usage. This has a LOR between IPFW inpcb locks but I'm committing it now as the lesser of two evils (the other being unlocked use of in_pcblookup). Supported by: FreeBSD Foundation	2003-11-07 23:26:57 +00:00
Hajimu UMEMOTO	aef3a65eb7	use ipsec_getnhist() instead of obsoleted ipsec_gethist(). Submitted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net> Reviewed by: Ari Suutari <ari@suutari.iki.fi> (ipfw@)	2003-11-07 20:25:47 +00:00
Sam Leffler	ad67584665	Fix locking of the ip forwarding cache. We were holding a reference to a routing table entry w/o bumping the reference count or locking against the entry being free'd. This caused major havoc (for some reason it appeared most frequently for folks running natd). Fix is to bump the reference count whenever we copy the route cache contents into a private copy so the entry cannot be reclaimed out from under us. This is a short term fix as the forthcoming routing table changes will eliminate this cache entirely. Supported by: FreeBSD Foundation	2003-11-07 01:47:52 +00:00
Hajimu UMEMOTO	0f9ade718d	- cleanup SP refcnt issue. - share policy-on-socket for listening socket. - don't copy policy-on-socket at all. secpolicy no longer contain spidx, which saves a lot of memory. - deep-copy pcb policy if it is an ipsec policy. assign ID field to all SPD entries. make it possible for racoon to grab SPD entry on pcb. - fixed the order of searching SA table for packets. - fixed to get a security association header. a mode is always needed to compare them. - fixed that the incorrect time was set to sadb_comb_{hard\|soft}_usetime. - disallow port spec for tunnel mode policy (as we don't reassemble). - an user can define a policy-id. - clear enc/auth key before freeing. - fixed that the kernel crashed when key_spdacquire() was called because key_spdacquire() had been implemented imcopletely. - preparation for 64bit sequence number. - maintain ordered list of SA, based on SA id. - cleanup secasvar management; refcnt is key.c responsibility; alloc/free is keydb.c responsibility. - cleanup, avoid double-loop. - use hash for spi-based lookup. - mark persistent SP "persistent". XXX in theory refcnt should do the right thing, however, we have "spdflush" which would touch all SPs. another solution would be to de-register persistent SPs from sptree. - u_short -> u_int16_t - reduce kernel stack usage by auto variable secasindex. - clarify function name confusion. ipsec__policy -> ipsec__pcbpolicy. - avoid variable name confusion. (struct inpcbpolicy )pcb_sp, spp (struct secpolicy ), sp (struct secpolicy ) - count number of ipsec encapsulations on ipsec4_output, so that we can tell ip_output() how to handle the packet further. - When the value of the ul_proto is ICMP or ICMPV6, the port field in "src" of the spidx specifies ICMP type, and the port field in "dst" of the spidx specifies ICMP code. - avoid from applying IPsec transport mode to the packets when the kernel forwards the packets. Tested by: nork Obtained from: KAME	2003-11-04 16:02:05 +00:00
Robert Watson	3de758d3e3	Note that when ip_output() is called from ip_forward(), it will already have its options inserted, so the opt argument to ip_output() must be NULL.	2003-11-03 18:03:05 +00:00
Robert Watson	eecfe773aa	Remove comment about desire for eventual explicit labeling of ICMP header copy made on input path: this is now handled differently. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-11-03 18:01:38 +00:00
Sam Leffler	04df2fbbb8	Remove bogus RTFREE that was added in rev 1.47. The rmx code operates directly on the radix tree and does not hold any routing table refernces. This fixes the reference counting problems that manifested itself as a panic during unmount of filesystems that were mounted by NFS over an interface that had been removed. Supported by: FreeBSD Foundation	2003-11-03 06:11:44 +00:00
Sam Leffler	9ce7877897	Correct rev 1.56 which (incorrectly) reversed the test used to decide if in_pcbpurgeif0 should be invoked. Supported by: FreeBSD Foundation	2003-11-03 03:22:39 +00:00
Mike Silbersack	4bd4fa3fe6	Add an additional check to the tcp_twrecycleable function; I had previously only considered the send sequence space. Unfortunately, some OSes (windows) still use a random positive increments scheme for their syn-ack ISNs, so I must consider receive sequence space as well. The value of 250000 bytes / second for Microsoft's ISN rate of increase was determined by testing with an XP machine.	2003-11-02 07:47:03 +00:00
Mike Silbersack	96af9ea52b	- Add a new function tcp_twrecycleable, which tells us if the ISN which we will generate for a given ip/port tuple has advanced far enough for the time_wait socket in question to be safely recycled. - Have in_pcblookup_local use tcp_twrecycleable to determine if time_Wait sockets which are hogging local ports can be safely freed. This change preserves proper TIME_WAIT behavior under normal circumstances while allowing for safe and fast recycling whenever ephemeral port space is scarce.	2003-11-01 07:30:08 +00:00
Brooks Davis	9bf40ede4a	Replace the if_name and if_unit members of struct ifnet with new members if_xname, if_dname, and if_dunit. if_xname is the name of the interface and if_dname/unit are the driver name and instance. This change paves the way for interface renaming and enhanced pseudo device creation and configuration symantics. Approved By: re (in principle) Reviewed By: njl, imp Tested On: i386, amd64, sparc64 Obtained From: NetBSD (if_xname)	2003-10-31 18:32:15 +00:00
Sam Leffler	9c63e9dbd7	Overhaul routing table entry cleanup by introducing a new rtexpunge routine that takes a locked routing table reference and removes all references to the entry in the various data structures. This eliminates instances of recursive locking and also closes races where the lock on the entry had to be dropped prior to calling rtrequest(RTM_DELETE). This also cleans up confusion where the caller held a reference to an entry that might have been reclaimed (and in some cases used that reference). Supported by: FreeBSD Foundation	2003-10-30 23:02:51 +00:00
Sam Leffler	d0402f1b73	Potential fix for races shutting down callouts when unloading the module. Previously we grabbed the mutex used by the callouts, then stopped the callout with callout_stop, but if the callout was already active and blocked by the mutex then it would continue later and reference the mutex after it was destroyed. Instead stop the callout first then lock. Supported by: FreeBSD Foundation	2003-10-29 19:15:00 +00:00
Sam Leffler	3520e9d61d	o add locking to protect routing table refcnt manipulations o add some more debugging help for figuring out why folks are getting complaints about releasing routing table entries with a zero refcnt o fix comment that talked about spl's o remove duplicate define of DUMMYNET_DEBUG Supported by: FreeBSD Foundation	2003-10-29 19:03:58 +00:00
Hajimu UMEMOTO	59dfcba4aa	add ECN support in layer-3. - implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c. make ip{,6}_ecn_egress() return integer to tell the caller that this packet should be dropped. - handle ECN at fragment reassembly in ip_input.c and frag6.c. Obtained from: KAME	2003-10-29 15:07:04 +00:00
Hajimu UMEMOTO	11de19f44d	ip6_savecontrol() argument is redundant	2003-10-29 12:52:28 +00:00
Sam Leffler	9c855a36c1	Introduce the notion of "persistent mbuf tags"; these are tags that stay with an mbuf until it is reclaimed. This is in contrast to tags that vanish when an mbuf chain passes through an interface. Persistent tags are used, for example, by MAC labels. Add an m_tag_delete_nonpersistent function to strip non-persistent tags from mbufs and use it to strip such tags from packets as they pass through the loopback interface and when turned around by icmp. This fixes problems with "tag leakage". Pointed out by: Jonathan Stone Reviewed by: Robert Watson	2003-10-29 05:40:07 +00:00
Sam Leffler	395bb18680	speedup stream socket recv handling by tracking the tail of the mbuf chain instead of walking the list for each append Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)	2003-10-28 05:47:40 +00:00
Hajimu UMEMOTO	618d51bbdc	revert following unwanted changes: - __packed to __attribute__((__packed__) - uintN_t back to u_intN_t Reported by: bde	2003-10-25 10:57:08 +00:00
Hajimu UMEMOTO	16cd67e933	correct namespace pollution. Submitted by: bde	2003-10-25 09:37:10 +00:00
Hajimu UMEMOTO	c302f5bc07	remove the ip6r0_addr and ip6r0_slmap members from ip6_rthdr0{} according to rfc2292bis. Obtained from: KAME	2003-10-24 20:37:05 +00:00
Hajimu UMEMOTO	5434eaa208	correct tab and order.	2003-10-24 19:51:49 +00:00
Hajimu UMEMOTO	f95d46333d	Switch Advanced Sockets API for IPv6 from RFC2292 to RFC3542 (aka RFC2292bis). Though I believe this commit doesn't break backward compatibility againt existing binaries, it breaks backward compatibility of API. Now, the applications which use Advanced Sockets API such as telnet, ping6, mld6query and traceroute6 use RFC3542 API. Obtained from: KAME	2003-10-24 18:26:30 +00:00
Mike Silbersack	0709c23335	Reduce the number of tcp time_wait structs to maxsockets / 5; this ensures that at most 20% of sockets can be in time_wait at one time, ensuring that time_wait sockets do not starve real connections from inpcb structures. No implementation change is needed, jlemon already implemented a nice LRU-ish algorithm for tcp_tw structure recycling. This should reduce the need for sysadmins to lower the default msl on busy servers.	2003-10-24 05:44:14 +00:00
Sam Leffler	ac6b0748be	o restructure initialization code so data structures are setup when loaded as a module o cleanup data structures on module unload when no application has been started (i.e. kldload, kldunload w/o mrtd) o remove extraneous unlocks immediately prior to destroying them Supported by: FreeBSD Foundation	2003-10-24 00:09:18 +00:00
Mike Silbersack	184dcdc7c8	Change all SYSCTLS which are readonly and have a related TUNABLE from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.	2003-10-21 18:28:36 +00:00
Hajimu UMEMOTO	b339980338	enclose IPv6 part with ifdef INET6. Obtained from: KAME	2003-10-20 16:19:01 +00:00
Hajimu UMEMOTO	31b3783c8d	correct linkmtu handling. Obtained from: KAME	2003-10-20 15:27:48 +00:00
Hajimu UMEMOTO	31b1bfe1b0	- add dom_if{attach,detach} framework. - transition to use ifp->if_afdata. Obtained from: KAME	2003-10-17 15:46:31 +00:00
Sam Leffler	f51f805f7e	pfil hooks can modify packet contents so check if the destination address has been changed when PFIL_HOOKS is enabled and, if it has, arrange for the proper action by ip*_forward. Supported by: FreeBSD Foundation Submitted by: Pyun YongHyeon	2003-10-16 16:25:25 +00:00
Sam Leffler	b15694110f	Drop dummynet lock when calling back into the network stack to deliver packets. This eliminates a LOR with Giant that caused outbound pipes to fail. Supported by: FreeBSD Foundation	2003-10-16 16:21:25 +00:00
Kirk McKusick	b03587f06a	Malloc buckets of size 128 have been having their 64-byte offset trashed after being freed. This has caused several panics including kern/42277 related to soft updates. Jim Kuhn tracked the problem down to ipfw limit rule processing. In the expiry of dynamic rules, it is possible for an O_LIMIT_PARENT rule to be removed when it still has live children. When the children eventually do expire, a pointer to the (long gone) parent is dereferenced and a count decremented. Since this memory can, and is, allocated for other purposes (in the case of kern/42277 an inodedep structure), chaos ensues. The offset in question in inodedep is the offset of the 16 bit count field in the ipfw2 ipfw_dyn_rule. Submitted by: Jim Kuhn <jkuhn@sandvine.com> Reviewed by: "Evgueni V. Gavrilov" <aquatique@rusunix.org> Reviewed by: Ben Pfountz <netprince@vt.edu> MFC after: 1 week	2003-10-16 02:00:12 +00:00
Sam Leffler	b35a1e5d66	purge extraneous ';'s Supported by: FreeBSD Foundation Noticed by: bde	2003-10-15 18:19:28 +00:00
Sam Leffler	929b31ddab	Lock ip forwarding route cache. While we're at it, remove the global variable ipforward_rt by introducing an ip_forward_cacheinval() call to use to invalidate the cache. Supported by: FreeBSD Foundation	2003-10-14 19:19:12 +00:00
Sam Leffler	888c2a3c4e	remove dangling ';'s` that were harmless Supported by: FreeBSD Foundation	2003-10-14 18:45:50 +00:00
Hajimu UMEMOTO	06cd0a3f97	- fix typo in comment. - style. Obtained from: KAME	2003-10-07 17:46:18 +00:00
Hajimu UMEMOTO	1ae02d474a	nuke unused ICMPV6CTL_NAMES and KEYCTL_NAMES macros.	2003-10-07 15:14:33 +00:00
Hajimu UMEMOTO	8c99329e89	return(code) -> return (code) Obtained from: KAME	2003-10-07 15:02:29 +00:00
Sam Leffler	d1dd20be6e	Locking for updates to routing table entries. Each rtentry gets a mutex that covers updates to the contents. Note this is separate from holding a reference and/or locking the routing table itself. Other/related changes: o rtredirect loses the final parameter by which an rtentry reference may be returned; this was never used and added unwarranted complexity for locking. o minor style cleanups to routing code (e.g. ansi-fy function decls) o remove the logic to bump the refcnt on the parent of cloned routes, we assume the parent will remain as long as the clone; doing this avoids a circularity in locking during delete o convert some timeouts to MPSAFE callouts Notes: 1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level applications cannot/do-no know about mutex's. Doing this requires that the mutex be the last element in the structure. A better solution is to introduce an externalized version of struct rtentry but this is a major task because of the intertwining of rtentry and other data structures that are visible to user applications. 2. There are known LOR's that are expected to go away with forthcoming work to eliminate many held references. If not these will be resolved prior to release. 3. ATM changes are untested. Sponsored by: FreeBSD Foundation Obtained from: BSD/OS (partly)	2003-10-04 03:44:50 +00:00
Sam Leffler	87002f0dc1	hookup ctlinput for fast ipsec versions of esp+ah protocols Supported by: FreeBSD Foundation	2003-10-03 22:06:36 +00:00
Sam Leffler	12394d06d8	place some kernel-specific data structures under #ifdef _KERNEL Sponsored by: FreeBSD Foundation	2003-10-03 20:58:56 +00:00
Bruce M Simpson	c3b52d6499	Shorten 'bad gateway' AF_LINK message. Submitted by: green	2003-10-03 17:22:14 +00:00
Bruce M Simpson	beb2ced8ac	Make arp_rtrequest()'s 'bad gateway' messages slightly more informative, to aid me in tracking down LLINFO inconsistencies in the routing table. Discussed with: fenner	2003-10-03 17:21:17 +00:00
Bruce M Simpson	b75bead1f2	Only delete the route if arplookup() tried to create it. Do not delete RTF_STATIC routes. Do not check for RTF_HOST so as to avoid being DoSed when an RTF_GENMASK route exists in the table. Add a more verbose comment about exactly what this code does. Submitted by: ru	2003-10-03 09:19:23 +00:00
Ruslan Ermilov	deb62e2887	By popular demand, added the "static ARP" per-interface option.	2003-10-01 08:32:37 +00:00
Hajimu UMEMOTO	5c6ebad8f6	add /CONSTCOND/ to reduce diffs against latest KAME. Obtained from: KAME	2003-09-25 13:40:06 +00:00
Bruce M Simpson	85cc199400	Fix a logic error in the check to see if arplookup() should free the route. Noticed by: Mike Hogsett Reviewed by: ru	2003-09-24 20:52:25 +00:00
Sam Leffler	134ea22494	o update PFIL_HOOKS support to current API used by netbsd o revamp IPv4+IPv6+bridge usage to match API changes o remove pfil_head instances from protosw entries (no longer used) o add locking o bump FreeBSD version for 3rd party modules Heavy lifting by: "Max Laier" <max@love2party.net> Supported by: FreeBSD Foundation Obtained from: NetBSD (bits of pfil.h and pfil.c)	2003-09-23 17:54:04 +00:00
Bruce M Simpson	fedf1d01a2	Fix a bug in arplookup(), whereby a hostile party on a locally attached network could exhaust kernel memory, and cause a system panic, by sending a flood of spoofed ARP requests. Approved by: jake (mentor) Reported by: Apple Product Security <product-security@apple.com>	2003-09-23 16:39:31 +00:00
Joe Marcus Clarke	68f1756b2a	Grrr...add the Skinny alias code forgotten in the last commit.	2003-09-23 07:42:33 +00:00
Joe Marcus Clarke	b07fbc17e9	Add Cisco Skinny Station protocol support to libalias, natd, and ppp. Skinny is the protocol used by Cisco IP phones to talk to Cisco Call Managers. With this code, one can use a Cisco IP phone behind a FreeBSD NAT gateway. Currently, having the Call Manager behind the NAT gateway is not supported. More information on enabling Skinny support in libalias, natd, and ppp can be found in those applications' manpages. PR: 55843 Reviewed by: ru Approved by: ru MFC after: 30 days	2003-09-23 07:41:55 +00:00
Sam Leffler	598345da4b	Bandaid locking change: mark static rule mutex recursive so re-entry when sending an ICMP packet doesn't cause a panic. A better solution is needed; possibly defering the transmit to a dedicated thread. Observed by: "Aaron Wohl" <freebsd@soith.com>	2003-09-17 22:06:47 +00:00
Sam Leffler	f34f3a7097	shuffle code so we don't "continue" and miss a needed unlock operation Observed by: Wiktor Niesiobedzki <w@evip.pl>	2003-09-17 21:13:16 +00:00
Sam Leffler	293941a556	Add locking. o change timeout to MPSAFE callout o restructure rule deletion to deal with locking requirements o replace static buffer used for ipfw control operations with malloc'd storage Sponsored by: FreeBSD Foundation	2003-09-17 00:56:50 +00:00
Sam Leffler	91176902bc	Minor fixups + add locking. o change time to MPSAFE callout o make debug printfs conditional on DUMMYNET_DEBUG and runtime controllable by net.inet.ip.dummynet.debug o make boot-time printf dependent on bootverbose Sponsored by: FreeBSD Foundation	2003-09-17 00:54:04 +00:00
Ruslan Ermilov	78f94aa951	Fix a bunch of off-by-one errors in the range checking code.	2003-09-11 21:40:21 +00:00
Ruslan Ermilov	8e75a37bb0	Fixed -Wpointer-arith warning. Submitted by: Stefan Farfeleder PR: bin/56653	2003-09-09 23:50:57 +00:00
Ruslan Ermilov	fe08efe680	mdoc(7): Use the new feature of the .In macro.	2003-09-08 19:57:22 +00:00
Sam Leffler	468cf6f61a	Add locking. Special thanks to Pavlin Radoslavov <pavlin@icir.org> for testing and fixing numerous problems. Sponsored by: FreeBSD Foundation Reviewed by: Pavlin Radoslavov <pavlin@icir.org>	2003-09-06 04:53:43 +00:00
Sam Leffler	2fad1e931e	lock ip fragment queues Submitted by: Robert Watson <rwatson@freebsd.org> Obtained from: BSD/OS	2003-09-05 00:10:33 +00:00
Sam Leffler	26f91065e7	o add locking o move the global divsrc socket address to a local variable instead of locking it Sponsored by: FreeBSD Foundation	2003-09-05 00:00:51 +00:00
Bruce M Simpson	8a538743b5	PR: kern/56343 Reviewed by: tjr Approved by: jake (mentor)	2003-09-03 02:19:29 +00:00
Mike Silbersack	3390d47670	Implement MBUF_STRESS_TEST mark II. Changes from the original implementation: - Fragmentation is handled by the function m_fragment, which can be called from whereever fragmentation is needed. Note that this function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing use. - m_fragment works slightly differently from the old fragmentation code in that it allocates a seperate mbuf cluster for each fragment. This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent fragments. While that is a nice feature in practice, it nerfed the usefulness of mbuf_stress_test. - Add two modes of random fragmentation. Chains with fragments all of the same random length and chains with fragments that are each uniquely random in length may now be requested.	2003-09-01 05:55:37 +00:00
Sam Leffler	638ed548b7	add locking NB: There is a known LOR on the forwarding path; this needs to be resolved together with a similar issue in the bridge. For the moment it is believed to be benign. Sponsored by: FreeBSD Fondation	2003-09-01 05:12:36 +00:00
Sam Leffler	611ceef62a	remove warning about use of old divert sockets; this was marked for removal before 5.2 Reviewed by: silence on -net and -arch	2003-09-01 04:27:34 +00:00
Sam Leffler	3b6dd5a9d0	add locking Sponsored by: FreeBSD Foundation	2003-09-01 04:23:48 +00:00
Robert Watson	f19389746e	Remove redundant initialization of rti; SLIST_FOREACH does that for us.	2003-08-28 22:15:05 +00:00
Robert Watson	6b48911b00	M_PREPEND() with an argument of M_TRYWAIT can fail, meaning the returned mbuf can be NULL. Check for NULL in rip_output() when prepending an IP header. This prevents mbuf exhaustion from causing a local kernel panic when sending raw IP packets. PR: kern/55886 Reported by: Pawel Malachowski <pawmal-posting@freebsd.lublin.pl> MFC after: 3 days	2003-08-26 14:11:48 +00:00
Jeffrey Hsu	578c5e1212	Remove redundant bzero. Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2003-08-24 08:27:57 +00:00
Robert Watson	baee0c3e66	Introduce two new MAC Framework and MAC policy entry points: mac_reflect_mbuf_icmp() mac_reflect_mbuf_tcp() These entry points permit MAC policies to do "update in place" changes to the labels on ICMP and TCP mbuf headers when an ICMP or TCP response is generated to a packet outside of the context of an existing socket. For example, in respond to a ping or a RST packet to a SYN on a closed port. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-08-21 18:39:16 +00:00
Robert Watson	b8ecbcd287	Before digging into IGMP locking, do a whitespace and prototype cleanup: prefer tabs to 8 spaces, focus on consistent indentation, prefer modern C function prototypes. Not all the way to style(9), but substantially closer.	2003-08-20 17:32:17 +00:00
Robert Watson	6c4b2ad305	Move from a custom-crafted singly-linked list to the SLIST_* macros from queue(3). Improve vertical compactness by using a IGMP_PRINTF() macro rather than #ifdefing IGMP_DEBUG a large number of debugging printfs. Reviewed by: mdodd (SLIST changes)	2003-08-20 17:09:01 +00:00
Bruce M Simpson	8afa230470	Add the IP_ONESBCAST option, to enable undirected IP broadcasts to be sent on specific interfaces. This is required by aodvd, and may in future help us in getting rid of the requirement for BPF from our import of isc-dhcp. Suggested by: fenestro Obtained from: BSD/OS Reviewed by: mini, sam Approved by: jake (mentor)	2003-08-20 14:46:40 +00:00
Sam Leffler	c06eb4e293	Change instances of callout_init that specify MPSAFE behaviour to use CALLOUT_MPSAFE instead of "1" for the second parameter. This does not change the behaviour; it just makes the intent more clear.	2003-08-19 17:51:11 +00:00
Jeffrey Hsu	9ba208b413	* Bug fix in bw_meter_process(): the periodically processed bins of bw_meter entries were processed up to one second ahead. After an unappropriate rescheduling of some of the bw_meter entries, the upcalls weren't delivered. * pim_register_prepare() uses the appropriate sw_csum flag to call ip_fragment() so the IP checksum is computed properly. * Modify pim_register_prepare() to take care of IP packets that don't need fragmentation. * Add-back in_delayed_cksum() to encap_send(), because it seems it should be there. Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2003-08-19 17:22:51 +00:00
Sam Leffler	53b57cd1ab	add missing unlock when in_pcballoc returns an error	2003-08-19 17:11:46 +00:00
David E. O'Brien	4f4a104ee8	style.Makefile(5)	2003-08-18 15:25:39 +00:00
Gordon Tetlow	41d8423f71	Stage 3 of dynamic root support. Make all the libraries needed to run binaries in /bin and /sbin installed in /lib. Only the versioned files reside in /lib, the .so symlink continues to live /usr/lib so the toolchain doesn't need to be modified.	2003-08-17 08:28:46 +00:00
Hartmut Brandt	a9ca5bdbd0	The syncache has made use of TCPDEBUG problematic, because the SYN segments are lost for the application. This broke, for example, ports/benchmarks/dbs which needs the SYN segment to filter the contents of the trace buffer for the connection it is interested in. This patch makes the SYN segments available again. Unfortunately they are now associated with the listening socket instead of the new one, so a change to applications is required, but without this patch it wouldn't work altogether. PR: kern/45966	2003-08-13 10:20:57 +00:00
Hartmut Brandt	91f467d592	The tcp_trace call needs the length of the header. Unfortunately the code has rotten a bit so that the header length is not correct at the point when tcp_trace is called. Temporarily compute the correct value before the call and restore the old value after. This makes ports/benchmarks/dbs to almost work. This is a NOP unless you compile with TCPDEBUG.	2003-08-13 08:50:42 +00:00
Hartmut Brandt	3c653157a5	A number of patches in the last years have created new return paths in tcp_input that leave the function before hitting the tcp_trace function call for the TCPDEBUG option. This has made TCPDEBUG mostly useless (and tools like ports/benchmarks/dbs not working). Add tcp_trace calls to the return paths that could be identified in this maze. This is a NOP unless you compile with TCPDEBUG.	2003-08-13 08:46:54 +00:00
Hartmut Brandt	b24521d779	Change the code that enables/disables the ATM channel to use the new ATMIOCOPENVCC/CLOSEVCC. This allows us to not only use UBR channels for IP over ATM, but also CBR, VBR and ABR. Change the format of the link layer address to specify the channel characteristics. The old format is still supported and opens UBR channels.	2003-08-12 14:20:32 +00:00
Jeffrey Hsu	59ca77f4a1	New PIM header files. Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2003-08-07 18:17:43 +00:00
Jeffrey Hsu	1e78ac216e	1. Basic PIM kernel support Disabled by default. To enable it, the new "options PIM" must be added to the kernel configuration file (in addition to MROUTING): options MROUTING # Multicast routing options PIM # Protocol Independent Multicast 2. Add support for advanced multicast API setup/configuration and extensibility. 3. Add support for kernel-level PIM Register encapsulation. Disabled by default. Can be enabled by the advanced multicast API. 4. Implement a mechanism for "multicast bandwidth monitoring and upcalls". Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2003-08-07 18:16:59 +00:00
John Baldwin	8b149b5131	Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)	2003-08-07 15:04:27 +00:00
Hartmut Brandt	20e57b1045	Ups. I forgot this one in the SIOCATMENA/SIOCATMDIS removal commit. This change allows one to specify almost the complete traffic parameters for IPoverATM channels through the routing table. Up to now we used 4 byte DL addresses (flag, vpi, vciH, vciL). This format is still allowed. If the address is longer, however, the 5th byte is interpreted as the traffic class (UBR, CBR, VBR or ABR) and the remaining bytes are the parameters for this traffic class: UBR: 0 byte or 3 byte PCR CBR: 3 byte PCR VBR: 3 byte PCR, 3 byte SCR, 3 byte MBS ABR: 3 byte PCR, 3 byte MCR, 3 byte ICR, 3 byte TBE, 1 byte NRM, 1 byte TRM, 2 bytes ADTF, 1 byte RIF, 1 byte RDF and 1 byte CDF A script to generate the corresponding 'route add' arguments will follow soon.	2003-08-06 15:56:37 +00:00
Jeffrey Hsu	1b6002ec30	* makes mfc[MFCTBLSIZ] and vif[MAXVIFS] tables accessible via sysctl: - sysctlbyname("net.inet.ip.mfctable", ...) - sysctlbyname("net.inet.ip.viftable", ...) This change is needed so netstat can use sysctlbyname() to read the data from those tables. Otherwise, in some cases "netstat -g" may fail to report the multicast forwarding information (e.g., if we run a multicast router on PicoBSD). * Bug fix: when sending IGMPMSG_WRONGVIF upcall to the multicast routing daemon, set properly "im->im_vif" to the receiving incoming interface of the packet that triggered that upcall rather than to the expected incoming interface of that packet. * Bug fix: add missing increment of counter "mrtstat.mrts_upcalls" * Few formatting nits (e.g., replace extra spaces with TABs) Submitted by: Pavlin Radoslavov <pavlin@icir.org>	2003-08-05 17:01:33 +00:00
Hartmut Brandt	7e3d4432af	When adding a channel for INET failed at the device level (ioctl) the code used to call rtrequest(RTM_DELETE, ...). This is a problem, because the function that just has called us (route_output) is not really happy with the route it just is creating beeing ripped out from under it. Unfortunately we also cannot return an error from ifa_rtrequest. Therefore mark the route just as RTF_REJECT.	2003-08-05 14:59:06 +00:00
Hartmut Brandt	5246b4ff88	Make this file to conform more to style(9) before really touching it.	2003-08-05 13:58:04 +00:00
Maxim Konovalov	e1bd2f381a	o Fix a typo in previous commit.	2003-07-31 10:24:36 +00:00
Maxim Konovalov	853af3f3f0	o Do not overwrite saved interrupt priority level by alloc_hash(), use a separate variable. o Restore interrupt priority level before return (no-op in HEAD). Spotted by: Don Bowman <don@sandvine.com> MFC after: 5 days	2003-07-25 09:59:16 +00:00
Sam Leffler	1f76a5e218	add IPSEC_FILTERGIF suport for FAST_IPSEC PR: kern/51922 Submitted by: Eric Masson <e-masson@kisoft-services.com> MFC after: 1 week	2003-07-22 18:58:34 +00:00
Mike Silbersack	7dc7f0311e	Minor fix to the MBUF_STRESS_TEST code so that it keeps pkthdr.len consistant at all times. (Some debugging code I'm working on is tripped otherwise.) MFC after: 3 days	2003-07-19 05:50:32 +00:00
Robert Watson	83503a9227	Add a comment above rip_ctloutput() documenting that the privilege check for raw IP system management operations is often (although not always) implicit due to the namespacing of raw IP sockets. I.e., you have to have privilege to get a raw IP socket, so much of the management code sitting on raw IP sockets assumes that any requests on the socket should be granted privilege. Obtained from: TrustedBSD Project Product of: France	2003-07-18 16:10:36 +00:00
Jeffrey Hsu	a12569ec4f	Drop Giant around syncache timer processing.	2003-07-17 11:19:25 +00:00
Luigi Rizzo	4805529cf8	Allow set 31 to be used for rules other than 65535. Set 31 is still special because rules belonging to it are not deleted by the "ipfw flush" command, but must be deleted explicitly with "ipfw delete set 31" or by individual rule numbers. This implement a flexible form of "persistent rules" which you might want to have available even after an "ipfw flush". Note that this change does not violate POLA, because you could not use set 31 in a ruleset before this change. sbin/ipfw changes to allow manipulation of set 31 will follow shortly. Suggested by: Paul Richards	2003-07-15 23:07:34 +00:00
Jeffrey Hsu	9d11646de7	Unify the "send high" and "recover" variables as specified in the lastest rev of the spec. Use an explicit flag for Fast Recovery. [1] Fix bug with exiting Fast Recovery on a retransmit timeout diagnosed by Lu Guohan. [2] Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com> Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2] Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>, Sally Floyd <floyd@acm.org> [1]	2003-07-15 21:49:53 +00:00
Luigi Rizzo	72e02d4dac	Implement comments embedded into ipfw2 instructions. Since we already had 'O_NOP' instructions which always match, all I needed to do is allow the NOP command to have arbitrary length (i.e. move its label in a different part of the switch() which validates instructions). The kernel must know nothing about comments, everything else is done in userland (which will be described in the upcoming ipfw2.c commit).	2003-07-12 05:54:17 +00:00
Luigi Rizzo	7a1dfbc0d3	Merge the handlers of O_IP_SRC_MASK and O_IP_DST_MASK opcodes, and support matching a list of addr/mask pairs so one can write more efficient rulesets which were not possible before e.g. add 100 skipto 1000 not src-ip 10.0.0.0/8,127.0.0.1/8,192.168.0.0/16 The change is fully backward compatible. ipfw2 and manpage commit to follow. MFC after: 3 days	2003-07-08 07:44:42 +00:00
Luigi Rizzo	c3e5b9f154	Implement the 'ipsec' option to match packets coming out of an ipsec tunnel. Should work with both regular and fast ipsec (mutually exclusive). See manpage for more details. Submitted by: Ari Suutari (ari.suutari@syncrontech.com) Revised by: sam MFC after: 1 week	2003-07-04 21:42:32 +00:00
Luigi Rizzo	f030c1518d	Correct some comments, add opcode O_IPSEC to match packets coming out of an ipsec tunnel.	2003-07-04 21:39:51 +00:00
Luigi Rizzo	5d3b4c2480	Remove a stale comment, fix indentation.	2003-06-28 14:23:22 +00:00
Luigi Rizzo	b5f3c4cff3	whitespace fix	2003-06-28 14:16:53 +00:00
Luigi Rizzo	9c1cfc8650	remove unused file (ipfw2 is the default in RELENG_5 and above; the old ipfw1 has been unused and unmaintained for a long time).	2003-06-24 07:12:11 +00:00
Luigi Rizzo	ec4270c021	Fix typo in a (commented out) debugging string. Spotted by: diff	2003-06-23 21:38:21 +00:00
Luigi Rizzo	67ab48d1ae	Remove whitespace at end of line.	2003-06-23 21:18:56 +00:00
Luigi Rizzo	44c884e134	Add support for multiple values and ranges for the "iplen", "ipttl", "ipid" options. This feature has been requested by several users. On passing, fix some minor bugs in the parser. This change is fully backward compatible so if you have an old /sbin/ipfw and a new kernel you are not in trouble (but you need to update /sbin/ipfw if you want to use the new features). Document the changes in the manpage. Now you can write things like ipfw add skipto 1000 iplen 0-500 which some people were asking to give preferential treatment to short packets. The 'MFC after' is just set as a reminder, because I still need to merge the Alpha/Sparc64 fixes for ipfw2 (which unfortunately change the size of certain kernel structures; not that it matters a lot since ipfw2 is entirely optional and not the default...) PR: bin/48015 MFC after: 1 week	2003-06-22 17:33:19 +00:00
Mike Silbersack	fcaf9f9146	Map icmp time exceeded responses to EHOSTUNREACH rather than 0 (no error); this makes connect act more sensibly in these cases. PR: 50839 Submitted by: Barney Wolff <barney@pit.databus.com> Patch delayed by laziness of: silby MFC after: 1 week	2003-06-17 06:21:08 +00:00
Ruslan Ermilov	ada24e690c	In the PKT_ALIAS_PROXY_ONLY mode, make sure to preserve the original source IP address, as promised in the manual page. Spotted by: Vaclav Petricek	2003-06-13 21:54:01 +00:00
Ruslan Ermilov	9c88dc8855	Removed a couple of .Xo/.Xc that are leftovers of the "ninth-argument limit" mdoc(7) atavism.	2003-06-13 21:39:22 +00:00
Ruslan Ermilov	7176089886	Clarify that original address and port when doing transparent proxying are _destination_ address and port.	2003-06-13 21:36:24 +00:00
Ruslan Ermilov	61de149d30	Added myself to the AUTHORS section.	2003-06-13 21:32:01 +00:00
Philippe Charnier	9703a107f2	The .Fn function	2003-06-08 09:53:08 +00:00
Robert Watson	042bbfa3b5	When setting fragment queue pointers to NULL, or comparing them with NULL, use NULL rather than 0 to improve readability.	2003-06-06 19:32:48 +00:00
Jeffrey Hsu	f058535deb	Compensate for decreasing the minimum retransmit timeout. Reviewed by: jlemon	2003-06-04 10:03:55 +00:00
Bernd Walter	330462a315	Change handling to support strong alignment architectures such as alpha and sparc64. PR: alpha/50658 Submitted by: rizzo Tested on: alpha	2003-06-04 01:17:37 +00:00
Kelly Yancey	ed7ea0e1ab	Account for packets processed at layer-2 (i.e. net.link.ether.ipfw=1). MFC after: 2 weeks	2003-06-02 23:54:09 +00:00
Ruslan Ermilov	234dfc904a	A new API function PacketAliasRedirectDynamic() can be used to mark a fully specified static link as dynamic; i.e. make it a one-time link.	2003-06-01 23:15:00 +00:00
Ruslan Ermilov	f1a529f3da	Make the PacketAliasSetAddress() function call optional. If it is not called, and no static rules match an outgoing packet, the latter retains its source IP address. This is in support of the "static NAT only" mode.	2003-06-01 22:49:59 +00:00
Poul-Henning Kamp	4df05d61bd	Remove unused variables. Found by: FlexeLint	2003-06-01 09:20:38 +00:00
Poul-Henning Kamp	e4d2978dd8	Add /* FALLTHROUGH */ Found by: FlexeLint	2003-05-31 19:07:22 +00:00
Garrett Wollman	6e49b1fe55	Don't generate an ip_id for packets with the DF bit set; ip_id is only meaningful for fragments. Also don't bother to byte-swap the ip_id when we do generate it; it is only used at the receiver as a nonce. I tried several different permutations of this code with no measurable difference to each other or to the unmodified version, so I've settled on the one for which gcc seems to generate the best code. (If anyone cares to microoptimize this differently for an architecture where it actually matters, feel free.) Suggested by: Steve Bellovin's paper in IMW'02	2003-05-31 17:55:21 +00:00
Robert Watson	430c635447	Correct a bug introduced with reduced TCP state handling; make sure that the MAC label on TCP responses during TIMEWAIT is properly set from either the socket (if available), or the mbuf that it's responding to. Unfortunately, this is made somewhat difficult by the TCP code, as tcp_twstart() calls tcp_twrespond() after discarding the socket but without a reference to the mbuf that causes the "response". Passing both the socket and the mbuf works arounds this--eventually it might be good to make sure the mbuf always gets passed in in "response" scenarios but working through this provided to complicate things too much. Approved by: re (scottl) Reviewed by: hsu Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-05-07 05:26:27 +00:00
Robert Watson	688fe1d954	Trim a call to mac_create_mbuf_from_mbuf() since m_tag meta-data copying for mbuf headers now works properly in m_dup_pkthdr(), so we don't need to do an explicit copy. Approved by: re (jhb) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-05-06 20:34:04 +00:00
Matthew N. Dodd	e97c58c8cf	Add definitions for IN6ADDR_LINKLOCAL_ALLMDNS_INIT and INADDR_ALLMDNS_GROUP.	2003-04-29 22:03:46 +00:00
Matthew N. Dodd	4957466b8e	IP_RECVTTL socket option. Reviewed by: Stuart Cheshire <cheshire@apple.com>	2003-04-29 21:36:18 +00:00
Alexander Kabaev	104a9b7e3e	Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>	2003-04-29 13:36:06 +00:00
David E. O'Brien	152385d122	Explicitly declare 'int' parameters.	2003-04-21 16:27:46 +00:00
David E. O'Brien	bfd738788b	style.Makefile(5)	2003-04-20 18:38:59 +00:00
Mike Silbersack	53dcc544a8	Rename MBUF_FRAG_TEST to MBUF_STRESS_TEST as it will be extended to include more than just frag tests.	2003-04-12 06:11:46 +00:00
Robert Watson	cacd79e2c9	Remove a potential panic condition introduced by reduced TCP wait state. Those changed attempted to work around the changed invariant that inp->in_socket was sometimes now NULL, but the logic wasn't quite right, meaning that inp->in_socket would be dereferenced by cr_canseesocket() if security.bsd.see_other_uids, jail, or MAC were in use. Attempt to clarify and correct the logic. Note: the work-around originally introduced with the reduced TCP wait state handling to use cr_cansee() instead of cr_canseesocket() in this case isn't really right, although it "Does the right thing" for most of the cases in the base system. We'll need to address this at some point in the future. Pointed out by: dcs Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-04-10 20:33:10 +00:00
Dag-Erling Smørgrav	fe58453891	Introduce an M_ASSERTPKTHDR() macro which performs the very common task of asserting that an mbuf has a packet header. Use it instead of hand- rolled versions wherever applicable. Submitted by: Hiten Pandya <hiten@unixdaemons.com>	2003-04-08 14:25:47 +00:00
Dag-Erling Smørgrav	212059bd83	Replace memcpy() and ovbcopy() with bcopy(); ditch some caddr_t usage.	2003-04-04 12:14:00 +00:00
Matthew N. Dodd	2c56e246fa	Back out support for RFC3514. RFC3514 poses an unacceptale risk to compliant systems.	2003-04-02 20:14:44 +00:00
Matthew N. Dodd	4f6425f7ae	- Use the correct constant define. - Add a missing break.	2003-04-02 18:02:58 +00:00
Matthew N. Dodd	8faf6df9b3	Sync constant define with NetBSD. Requested by: Tom Spindler <dogcow@babymeat.com>	2003-04-02 10:28:47 +00:00
Jeffrey Hsu	48d2549c3e	Observe conservation of packets when entering Fast Recovery while doing Limited Transmit. Only artificially inflate the congestion window by 1 segment instead of the usual 3 to take into account the 2 already sent by Limited Transmit. Approved in principle by: Mark Allman <mallman@grc.nasa.gov>, Hari Balakrishnan <hari@nms.lcs.mit.edu>, Sally Floyd <floyd@icir.org>	2003-04-01 21:16:46 +00:00
Matthew N. Dodd	09139a4537	Implement support for RFC 3514 (The Security Flag in the IPv4 Header). (See: ftp://ftp.rfc-editor.org/in-notes/rfc3514.txt) This fulfills the host requirements for userland support by way of the setsockopt() IP_EVIL_INTENT message. There are three sysctl tunables provided to govern system behavior. net.inet.ip.rfc3514: Enables support for rfc3514. As this is an Informational RFC and support is not yet widespread this option is disabled by default. net.inet.ip.hear_no_evil If set the host will discard all received evil packets. net.inet.ip.speak_no_evil If set the host will discard all transmitted evil packets. The IP statistics counter 'ips_evil' (available via 'netstat') provides information on the number of 'evil' packets recieved. For reference, the '-E' option to 'ping' has been provided to demonstrate and test the implementation.	2003-04-01 08:21:44 +00:00
Maxim Konovalov	7778283b40	Fix indentation.	2003-03-27 15:00:10 +00:00
Maxim Konovalov	be1e4c5162	o Protect set_fs_param() by splimp(9). Quote from kern/37573: There is an obvious race in netinet/ip_dummynet.c:config_pipe(). Interrupts are not blocked when changing the params of an existing pipe. The specific crash observed: ... -> config_pipe -> set_fs_parms -> config_red malloc a new w_q_lookup table but take an interrupt before intializing it, interrupt handler does: ... -> dummynet_io -> red_drops red_drops dereferences the uninitialized (zeroed) w_q_lookup table. o Flush accumulated credits for idle pipes. o Flush accumulated credits when change pipe characteristics. o Change dn_flow_queue.numbytes type to unsigned long. Overlapping dn_flow_queue->numbytes in ready_event() leads to numbytes becomes negative and SET_TICKS() macro returns a very big value. heap_insert() overlaps dn_key again and inserts a queue to a ready heap with a sched_time points to the past. That leads to an "infinity" loop. PR: kern/33234, kern/37573, misc/42459, kern/43133, kern/44045, kern/48099 Submitted by: Mike Hibler <mike@cs.utah.edu> (kern/37573) MFC after: 6 weeks	2003-03-27 14:56:36 +00:00
Robert Watson	5e7ce4785f	Modify the mac_init_ipq() MAC Framework entry point to accept an additional flags argument to indicate blocking disposition, and pass in M_NOWAIT from the IP reassembly code to indicate that blocking is not OK when labeling a new IP fragment reassembly queue. This should eliminate some of the WITNESS warnings that have started popping up since fine-grained IP stack locking started going in; if memory allocation fails, the creation of the fragment queue will be aborted. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-03-26 15:12:03 +00:00
Maxime Henrion	511e01e2d6	Try to make the MBUF_FRAG_TEST code work better. - Don't try to fragment the packet if it's smaller than mbuf_frag_size. - Preserve the size of the mbuf chain which is modified by m_split(). - Check that m_split() didn't return NULL. - Make it so we don't end up with two M_PKTHDR mbuf in the chain. - Use m->m_pkthdr.len instead of m->m_len so that we fragment the whole chain and not just the first mbuf. - Fix a nearby style bug and rework the logic of the loops so that it's more clear. This is still not quite right, because we're clearly abusing m_split() to do something it was not designed for, but at least it works now. We should probably move this code into a m_fragment() function when it's correct.	2003-03-25 23:49:14 +00:00
Mike Silbersack	9d9edc5693	Add the MBUF_FRAG_TEST option. When compiled in, this option allows you to tell ip_output to fragment all outgoing packets into mbuf fragments of size net.inet.ip.mbuf_frag_size bytes. This is an excellent way to test if network drivers can properly handle long mbuf chains being passed to them. net.inet.ip.mbuf_frag_size defaults to 0 (no fragmentation) so that you can at least boot before your network driver dies. :)	2003-03-25 05:45:05 +00:00
Maxime Henrion	aecfcdb824	Use __packed instead of __attribute__((__packed__)).	2003-03-22 00:25:14 +00:00
Matthew N. Dodd	57842a38fd	Add a sysctl node allowing the specification of an address mask to use when replying to ICMP Address Mask Request packets.	2003-03-21 15:43:06 +00:00
Matthew N. Dodd	21150298bb	Add comments regarding the ICMP timestamp fields.	2003-03-21 15:28:10 +00:00
Crist J. Clark	010dabb047	Add a 'verrevpath' option that verifies the interface that a packet comes in on is the same interface that we would route out of to get to the packet's source address. Essentially automates an anti-spoofing check using the information in the routing table. Experimental. The usage and rule format for the feature may still be subject to change.	2003-03-15 01:13:00 +00:00
Jeffrey Hsu	7792ea2700	Greatly simplify the unlocking logic by holding the TCP protocol lock until after FIN_WAIT_2 processing. Helped with debugging: Doug Barton	2003-03-13 11:46:57 +00:00
Jeffrey Hsu	da3a8a1a4f	Add support for RFC 3390, which allows for a variable-sized initial congestion window.	2003-03-13 01:43:45 +00:00
Jeffrey Hsu	582a954b00	Implement the Limited Transmit algorithm (RFC 3042).	2003-03-12 20:27:28 +00:00
Sam Leffler	4a692a1fc2	correct two more flag misuses; m_tag* use malloc flags	2003-03-12 14:45:22 +00:00
Jonathan Lemon	a3b6edc353	Remove check for t_state == TCPS_TIME_WAIT and introduce the tw structure. Sponsored by: DARPA, NAI Labs	2003-03-08 22:07:52 +00:00
Jonathan Lemon	607b0b0cc9	Remove a panic(); if the zone allocator can't provide more timewait structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer. Sponsored by: DARPA, NAI Labs	2003-03-08 22:06:20 +00:00
Peter Wemm	3c6b084e96	Finish driving a stake through the heart of netns and the associated ifdefs scattered around the place - its dead Jim! The SMB stuff had stolen AF_NS, make it official.	2003-03-05 19:24:24 +00:00
Jonathan Lemon	1cafed3941	Update netisr handling; Each SWI now registers its queue, and all queue drain routines are done by swi_net, which allows for better queue control at some future point. Packets may also be directly dispatched to a netisr instead of queued, this may be of interest at some installations, but currently defaults to off. Reviewed by: hsu, silby, jayanth, sam Sponsored by: DARPA, NAI Labs	2003-03-04 23:19:55 +00:00
Dag-Erling Smørgrav	521f364b80	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).	2003-03-02 16:54:40 +00:00
Jonathan Lemon	272c5dfe93	In timewait state, if the incoming segment is a pure in-sequence ack that matches snd_max, then do not respond with an ack, just drop the segment. This fixes a problem where a simultaneous close results in an ack loop between two time-wait states. Test case supplied by: Tim Robbins <tjr@FreeBSD.ORG> Sponsored by: DARPA, NAI Labs	2003-02-26 18:20:41 +00:00
Jonathan Lemon	ef6b48deb9	The TCP protocol lock may still be held if the reassembly queue dropped FIN. Detect this case and drop the lock accordingly. Sponsored by: DARPA, NAI Labs	2003-02-26 13:55:13 +00:00
Mike Silbersack	a75a485d62	Fix a condition so that ip reassembly queues are emptied immediately when maxfragpackets is dropped to 0. Noticed by: bmah	2003-02-26 07:28:35 +00:00
Robert Watson	9327ee33bf	When generating a TCP response to a connection, not only test if the tcpcb is NULL, but also its connected inpcb, since we now allow elements of a TCP connection to hang around after other state, such as the socket, has been recycled. Tested by: dcs Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2003-02-25 14:08:41 +00:00
Maxim Konovalov	b36f5b3735	style(9): join lines.	2003-02-25 11:53:11 +00:00
Maxim Konovalov	99e8617d24	Ip reassembly queue structure has ipq_nfrags now. Count a number of dropped ip fragments precisely. Reviewed by: silby	2003-02-25 11:49:01 +00:00
Jeffrey Hsu	edf02ff15d	Hold the TCP protocol lock while modifying the connection hash table.	2003-02-25 01:32:03 +00:00
Mike Silbersack	af9c7d06d5	Fix a comment which didn't match the new cookie behavior. Submitted by: Scott Renfro <scott@renfro.org> MFC after: 1 day	2003-02-24 03:15:48 +00:00
Jeffrey Hsu	11a20fb8b6	tcp_twstart() need to be called with the TCP protocol lock held to avoid a race condition with the TCP timer routines.	2003-02-24 00:52:03 +00:00
Jeffrey Hsu	2fbef91887	Pass the right function to callout_reset() for a compressed TIME-WAIT control block.	2003-02-24 00:48:12 +00:00
Mike Silbersack	a432399c56	Improve the security and performance of syncookies: Security improvements: - Increase the size of each syncookie secret from 32 to 128 bits in order to make brute force attacks on the secrets much more difficult. - Always return the lowest order dword from the MD5 hash; this allows us to expose 2 more bits of the cookie and makes ACK floods which seek to guess the cookie value more difficult. Performance improvements: - Increase the lifetime of each syncookie from 4 seconds to 16 seconds. This increases the usefulness of syncookies during an attack. - From Yahoo!: Reduce the number of calls to MD5Update; this results in a ~17% increase in cookie generation time here. Reviewed by: hsu, jayanth, jlemon, nectar MFC After: 15 seconds	2003-02-23 19:04:23 +00:00
Jonathan Lemon	f243998be5	Yesterday just wasn't my day. Remove testing delta that crept into the diff. Pointy hat provided by: sam	2003-02-23 15:40:36 +00:00
Sam Leffler	14dd6717f8	Add a new config option IPSEC_FILTERGIF to control whether or not packets coming out of a GIF tunnel are re-processed by ipfw, et. al. By default they are not reprocessed. With the option they are. This reverts 1.214. Prior to that change packets were not re-processed. After they were which caused problems because packets do not have distinguishing characteristics (like a special network if) that allows them to be filtered specially. This is really a stopgap measure designed for immediate MFC so that 4.8 has consistent handling to what was in 4.7. PR: 48159 Reviewed by: Guido van Rooij <guido@gvr.org> MFC after: 1 day	2003-02-23 00:47:06 +00:00
Jonathan Lemon	a14c749f04	Check to see if the TF_DELACK flag is set before returning from tcp_input(). This unbreaks delack handling, while still preserving correct T/TCP behavior Tested by: maxim Sponsored by: DARPA, NAI Labs	2003-02-22 21:54:57 +00:00
Mike Silbersack	375386e284	Add the ability to limit the number of IP fragments allowed per packet, and enable it by default, with a limit of 16. At the same time, tweak maxfragpackets downward so that in the worst possible case, IP reassembly can use only 1/2 of all mbuf clusters. MFC after: 3 days Reviewed by: hsu Liked by: bmah	2003-02-22 06:41:47 +00:00
Poul-Henning Kamp	d25ecb917b	- m = m_gethdr(M_NOWAIT, MT_HEADER); + m = m_gethdr(M_DONTWAIT, MT_HEADER); 'nuff said.	2003-02-21 23:17:12 +00:00
Crist J. Clark	b0d226932e	The ancient and outdated concept of "privileged ports" in UNIX-type OSes has probably caused more problems than it ever solved. Allow the user to retire the old behavior by specifying their own privileged range with, net.inet.ip.portrange.reservedhigh default = IPPORT_RESERVED - 1 net.inet.ip.portrange.reservedlo default = 0 Now you can run that webserver without ever needing root at all. Or just imagine, an ftpd that can really drop privileges, rather than just set the euid, and still do PORT data transfers from 20/tcp. Two edge cases to note, # sysctl net.inet.ip.portrange.reservedhigh=0 Opens all ports to everyone, and, # sysctl net.inet.ip.portrange.reservedhigh=65535 Locks all network activity to root only (which could actually have been achieved before with ipfw(8), but is somewhat more complicated). For those who stick to the old religion that 0-1023 belong to root and root alone, don't touch the knobs (or even lock them by raising securelevel(8)), and nothing changes.	2003-02-21 05:28:27 +00:00
Jonathan Lemon	8608c4c1f9	Remove unused variables in the IPSEC case. Submitted by: Lars Eggert <larse@ISI.EDU>	2003-02-20 18:22:21 +00:00
Jonathan Lemon	ffae8c5a7e	Unbreak non-IPV6 compilation. Caught by: phk Sponsored by: DARPA, NAI Labs	2003-02-19 23:43:04 +00:00
Jonathan Lemon	340c35de6a	Add a TCP TIMEWAIT state which uses less space than a fullblown TCP control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs	2003-02-19 22:32:43 +00:00
Jonathan Lemon	7990938421	Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so the routine does not require a tcpcb to operate. Since we no longer keep template mbufs around, move pseudo checksum out of this routine, and merge it with the length update. Sponsored by: DARPA, NAI Labs	2003-02-19 22:18:06 +00:00
Jonathan Lemon	414462252a	Correct comments.	2003-02-19 21:33:46 +00:00
Jonathan Lemon	3bfd6421c2	Clean up delayed acks and T/TCP interactions: - delay acks for T/TCP regardless of delack setting - fix bug where a single pass through tcp_input might not delay acks - use callout_active() instead of callout_pending() Sponsored by: DARPA, NAI Labs	2003-02-19 21:18:23 +00:00
Warner Losh	a163d034fa	Back out M_* changes, per decision of the TRB. Approved by: trb	2003-02-19 05:47:46 +00:00
Maxim Konovalov	b52d5ea3d2	o Fix ipfw uid rules: socheckuid() returns 0 when uid matches a socket cr_uid. Note: we do not have socheckuid() in RELENG_4, ip_fw2.c uses its own macro for a similar purpose that is why ipfw2 in RELENG_4 processes uid rules correctly. I will MFC the diff for code consistency. Reported by: Oleg Baranov <ol@csa.ru> Reviewed by: luigi MFC after: 1 month	2003-02-17 13:39:57 +00:00
Jeffrey Hsu	4b40c56c28	Take advantage of pre-existing lock-free synchronization and type stable memory to avoid acquiring SMP locks during expensive copyout process.	2003-02-15 02:37:57 +00:00
Jeffrey Hsu	85e8b24343	The protocol lock is always held in the dropafterack case, so we don't need to check for it at runtime.	2003-02-13 22:14:22 +00:00
Jeffrey Hsu	3dc7ebf9ff	in_pcbnotifyall() requires an exclusive protocol lock for notify functions which modify the connection list, namely, tcp_notify().	2003-02-12 23:55:07 +00:00
Jeffrey Hsu	6d45d64a8f	Properly document that syncache timer processing requires an exclusive TCP protocol lock.	2003-02-12 00:42:12 +00:00
Seigo Tanimura	cd6c2a8874	s/IPSSEC/IPSEC/	2003-02-11 10:51:56 +00:00
Jeffrey Hsu	24652ff6e1	Get cosmetic changes out of the way before I add routing table SMP locks.	2003-02-10 22:01:34 +00:00
Orion Hodson	022695f82a	Avoid multiply for preemptive arp calculation since it hits every ethernet packet sent. Prompted by: Jeffrey Hsu <hsu@FreeBSD.org>	2003-02-08 15:05:15 +00:00
Orion Hodson	73224fb019	MFS 1.64.2.22: Re-enable non pre-emptive ARP requests. Submitted by: "Diomidis Spinellis" <dds@aueb.gr> PR: kern/46116	2003-02-04 05:28:08 +00:00
Crist J. Clark	39eb27a4a9	Add the TCP flags to the log message whenever log_in_vain is 1, not just when set to 2. PR: kern/43348 MFC after: 5 days	2003-02-02 22:06:56 +00:00
Mike Silbersack	ecf44c01f4	Move a comment and optimize the frag timeout code a slight bit. Submitted by: maxim MFC with: The previous two revisions	2003-02-01 05:59:51 +00:00
Sam Leffler	9359ad861e	FAST_IPSEC bandaid: act like KAME and ignore ENOENT error codes from ipsec4_process_packet; they happen when a packet is dropped because an SA acquire is initiated Submitted by: Doug Ambrisko <ambrisko@verniernetworks.com>	2003-01-30 05:45:45 +00:00
Sam Leffler	28a34902c4	remove the restriction on build a kernel with FAST_IPSEC and INET6; you still don't want to use the two together, but it's ok to have them in the same kernel (the problem that initiated this bandaid has long since been fixed)	2003-01-30 05:43:08 +00:00
Mike Silbersack	d4d5315c23	Fix a bug with syncookies; previously, the syncache's MSS size was not initialized until after a syncookie was generated. As a result, all connections resulting from a returned cookie would end up using a MSS of ~512 bytes. Now larger packets will be used where possible. MFC after: 5 days	2003-01-29 03:49:49 +00:00
Poul-Henning Kamp	4ee6e70ef3	Check bounds for index before dereferencing memory past end of array. Found by: FlexeLint	2003-01-28 22:44:12 +00:00
Jeffrey Hsu	93f798891a	Avoid lock order reversal by expanding the scope of the AF_INET radix tree lock to cover the ARP data structures.	2003-01-28 20:22:19 +00:00
Mike Silbersack	ac64c8668b	A few fixes to rev 1.221 - Honor the previous behavior of maxfragpackets = 0 or -1 - Take a better stab at fragment statistics - Move / correct a comment Suggested by: maxim@ MFC after: 7 days	2003-01-28 03:39:39 +00:00
Mike Silbersack	402062e80c	Merge the best parts of maxfragpackets and maxnipq together. (Both functions implemented approximately the same limits on fragment memory usage, but in different fashions.) End user visible changes: - Fragment reassembly queues are freed in a FIFO manner when maxfragpackets has been reached, rather than all reassembly stopping. MFC after: 5 days	2003-01-26 01:44:05 +00:00
Alfred Perlstein	44956c9863	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.	2003-01-21 08:56:16 +00:00
Maxim Konovalov	2adf7582da	De-anonymity a couple of messages I missed in a previous sweep. Move one of them under DEB macro. Noticed by: Wiktor Niesiobedzki <w@evip.pl>	2003-01-20 13:03:34 +00:00
Maxim Konovalov	8ec22a9363	If the first action is O_LOG adjust a pointer to the real one, unbreaks skipto + log rules. Reported by: Wiktor Niesiobedzki <w@evip.pl> MFC after: 1 week	2003-01-20 11:58:34 +00:00
Jeffrey Hsu	314e5a3daf	Optimize away call to bzero() in the common case by directly checking if a connection has any cached TAO information.	2003-01-18 19:03:26 +00:00
Jeffrey Hsu	f5c5746047	Fix long-standing bug predating FreeBSD where calling connect() twice on a raw ip socket will crash the system with a null-dereference.	2003-01-18 01:10:55 +00:00
Jeffrey Hsu	c996428c32	SMP locking for ARP.	2003-01-17 07:59:35 +00:00
Matthew Dillon	fe41ca530c	Introduce the ability to flag a sysctl for operation at secure level 2 or 3 in addition to secure level 1. The mask supports up to a secure level of 8 but only add defines through CTLFLAG_SECURE3 for now. As per the missif in the log entry for 1.11 of ip_fw2.c which added the secure flag to the IPFW sysctl's in the first place, change the secure level requirement from 1 to 3 now that we have support for it. Reviewed by: imp With Design Suggestions by: imp	2003-01-14 19:35:33 +00:00
Jeffrey Hsu	cb942153c8	Fix NewReno. Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>	2003-01-13 11:01:20 +00:00
Thomas Moestl	a9a7a91220	Clear the target hardware address field when generating an ARP request. Reviewed by: nectar MFC after: 1 week	2003-01-10 00:04:53 +00:00
Jeffrey Hsu	b21bf9a59b	Validate inp before de-referencing it. Submitted by: pb	2003-01-05 07:56:24 +00:00
Jens Schweikhardt	9d5abbddbf	Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.	2003-01-01 18:49:04 +00:00
Sam Leffler	9967cafc49	Correct mbuf packet header propagation. Previously, packet headers were sometimes propagated using M_COPY_PKTHDR which actually did something between a "move" and a "copy" operation. This is replaced by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it from the source mbuf) and m_dup_pkthdr which copies the packet header contents including any m_tag chain. This corrects numerous problems whereby mbuf tags could be lost during packet manipulations. These changes also introduce arguments to m_tag_copy and m_tag_copy_chain to specify if the tag copy work should potentially block. This introduces an incompatibility with openbsd which we may want to revisit. Note that move/dup of packet headers does not handle target mbufs that have a cluster bound to them. We may want to support this; for now we watch for it with an assert. Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG\|M_LASTFRAG. Supported by: Vernier Networks Reviewed by: Robert Watson <rwatson@FreeBSD.org>	2002-12-30 20:22:40 +00:00
Matthew Dillon	07fd333df3	Remove the PAWS ack-on-ack debugging printf(). Note that the original RFC 1323 (PAWS) says in 4.2.1 that the out of order / reverse-time-indexed packet should be acknowledged as specified in RFC-793 page 69 then dropped. The original PAWS code in FreeBSD (1994) simply acknowledged the segment unconditionally, which is incorrect, and was fixed in 1.183 (2002). At the moment we do not do checks for SYN or FIN in addition to (tlen != 0), which may or may not be correct, but the worst that ought to happen should be a retry by the sender.	2002-12-30 19:31:04 +00:00
Sam Leffler	069f35d328	correct style bogons	2002-12-30 18:45:31 +00:00
Ian Dowse	ed1a13b18f	Bridged packets are supplied to the firewall with their IP header in network byte order, but icmp_error() expects the IP header to be in host order and the code here did not perform the necessary swapping for the bridged case. This bug causes an "icmp_error: bad length" panic when certain length IP packets (e.g. ip_len == 0x100) are rejected by the firewall with an ICMP response. MFC after: 3 days	2002-12-27 17:43:25 +00:00
Jeffrey Hsu	abe239cfe2	Validate inp to prevent an use after free.	2002-12-24 21:00:31 +00:00
Maxim Konovalov	f4ef616f98	o De-anonymity dummynet(4) and ipfw(4) messages, prepend them by 'dummynet: ' and 'ipfw: ' prefixes. PR: kern/41609	2002-12-24 13:45:24 +00:00
Jeffrey Hsu	956b0b653c	SMP locking for radix nodes.	2002-12-24 03:03:39 +00:00
Pierre Beyssac	1ba7727b9e	Remove forgotten INP_UNLOCK(inp) in my previous commit. Reported by: hsu	2002-12-22 13:04:08 +00:00
Pierre Beyssac	87cd4001b5	In syncache_timer(), don't attempt to lock the inpcb structure associated with the syncache entry: in case tcp_close() has been called on the corresponding listening socket, the lock has been destroyed as a side effect of in_pcbdetach(), causing a panic when we attempt to lock on it. Reviewed by: hsu	2002-12-21 19:59:47 +00:00
Sam Leffler	00f21882a0	replace the special-purpose rate-limiting code with the general facility just added; this tries to maintain the same behaviour vis a vis printing the rate-limiting messages but need tweaking	2002-12-21 00:08:20 +00:00
Jeffrey Hsu	9a39fc9d73	Eliminate a goto. Fix some line breaks.	2002-12-20 11:24:02 +00:00
Jeffrey Hsu	540e8b7e31	Unravel a nested conditional. Remove an unneeded local variable.	2002-12-20 11:16:52 +00:00
Jeffrey Hsu	f320a1bfd2	Expand scope of TCP protocol lock to cover syncache data structures.	2002-12-20 00:24:19 +00:00
Bosko Milekic	86fea6be59	o Untangle the confusion with the malloc flags {M_WAITOK, M_NOWAIT} and the mbuf allocator flags {M_TRYWAIT, M_DONTWAIT}. o Fix a bpf_compat issue where malloc() was defined to just call bpf_alloc() and pass the 'canwait' flag(s) along. It's been changed to call bpf_alloc() but pass the corresponding M_TRYWAIT or M_DONTWAIT flag (and only one of those two). Submitted by: Hiten Pandya <hiten@unixdaemons.com> (hiten->commit_count++)	2002-12-19 22:58:27 +00:00
Jeffrey Hsu	19fc74fb60	Lock up ifaddr reference counts.	2002-12-18 11:46:59 +00:00
Poul-Henning Kamp	11aee0b4b0	Remove unused and incorrectly maintained variable "in_interfaces"	2002-12-17 19:30:04 +00:00
Matthew Dillon	967adce8df	Fix syntax in last commit.	2002-12-17 00:24:48 +00:00
Maxim Konovalov	616fa7460c	o Trim EOL whitespaces. MFC after: 1 week	2002-12-15 10:24:36 +00:00
Maxim Konovalov	21ef23ab3f	o s/if_name[16]/if_name[IFNAMSIZ]/ Reviewed by: luigi MFC after: 1 week	2002-12-15 10:23:02 +00:00
Maxim Konovalov	2713a5bebb	o M_DONTWAIT is mbuf(9) flag: malloc(M_DONTWAIT) -> malloc(M_NOWAIT). The bug does not affect anything because M_NOWAIT == M_DONTWAIT. Reviewed by: luigi MFC after: 1 week	2002-12-15 10:21:30 +00:00
Maxim Konovalov	83b75b7621	o Fix byte order logging issue: sa.sin_port is already in host byte order. PR: kern/45964 Submitted by: Sascha Blank <sblank@tiscali.de> Reviewed by: luigi MFC after: 1 week	2002-12-15 09:44:02 +00:00
Matthew Dillon	d7ff8ef62a	Change tcp.inflight_min from 1024 to a production default of 6144. Create a sysctl for the stabilization value for the bandwidth delay product (inflight) algorithm and document it. MFC after: 3 days	2002-12-14 21:00:17 +00:00
Matthew Dillon	1ab4789dc2	Bruce forwarded this tidbit from an analysis Van Jacobson did on an apparent ack-on-ack problem with FreeBSD. Prof. Jacobson noticed a case in our TCP stack which would acknowledge a received ack-only packet, which is not legal in TCP. Submitted by: Van Jacobson <van@packetdesign.com>, bmah@packetdesign.com (Bruce A. Mah) MFC after: 7 days	2002-12-14 07:31:51 +00:00
Maxim Sobolev	16199bf2d3	MFS: recognize gre packets used in the WCCP protocol. Approved by: re	2002-12-07 14:22:05 +00:00
Luigi Rizzo	97850a5dd9	Move fw_one_pass from ip_fw2.c to ip_input.c so that neither bridge.c nor if_ethersubr.c depend on IPFIREWALL. Restore the use of fw_one_pass in if_ethersubr.c ipfw.8 will be updated with a separate commit. Approved by: re	2002-11-20 19:07:27 +00:00
Luigi Rizzo	032dcc7680	Back out some style changes. They are not urgent, I will put them back in after 5.0 is out. Requested by: sam Approved by: re	2002-11-20 19:00:54 +00:00
Luigi Rizzo	b375c9ec2c	Back out the ip_fragment() code -- it is not urgent to have it in now, I will put it back in in a better form after 5.0 is out. Requested by: sam, rwatson, luigi (on second thought) Approved by: re	2002-11-20 18:56:25 +00:00
Mike Silbersack	df285b3d1d	Add a sysctl to control the generation of source quench packets, and set it to 0 by default. Partially obtained from: NetBSD Suggested by: David Gilbert MFC after: 5 days	2002-11-19 17:06:06 +00:00
Luigi Rizzo	9b77fbf0a2	Fix function headers and remove 'register' variable declarations.	2002-11-17 17:04:19 +00:00
Luigi Rizzo	3e372e140c	Move the ip_fragment code from ip_output() to a separate function, so that it can be reused elsewhere (there is a number of places where it can be useful). This also trims some 200 lines from the body of ip_output(), which helps readability a bit. (This change was discussed a few weeks ago on the mailing lists, Julian agreed, silence from others. It is not a functional change, so i expect it to be ok to commit it now but i am happy to back it out if there are objections). While at it, fix some function headers and replace m_copy() with m_copypacket() where applicable. MFC after: 1 week	2002-11-17 16:30:44 +00:00
Luigi Rizzo	20fab86349	Minor documentation changes and indentation fix. Replace m_copy() with m_copypacket() where applicable. While at it, fix some function headers and remove 'register' from variable declarations.	2002-11-17 16:13:08 +00:00
Luigi Rizzo	4e8fe3210d	Cleanup some of the comments, and reformat long lines. Replace m_copy() with m_copypacket() where applicable. Replace "if (a.s_addr ...)" with "if (a.s_addr != INADDR_ANY ...)" to make it clear what the code means. While at it, fix some function headers and remove 'register' from variable declarations. MFC after: 3 days	2002-11-17 16:02:17 +00:00
Luigi Rizzo	bbb4330b61	Massive cleanup of the ip_mroute code. No functional changes, but: + the mrouting module now should behave the same as the compiled-in version (it did not before, some of the rsvp code was not loaded properly); + netinet/ip_mroute.c is now truly optional; + removed some redundant/unused code; + changed many instances of '0' to NULL and INADDR_ANY as appropriate; + removed several static variables to make the code more SMP-friendly; + fixed some minor bugs in the mrouting code (mostly, incorrect return values from functions). This commit is also a prerequisite to the addition of support for PIM, which i would like to put in before DP2 (it does not change any of the existing APIs, anyways). Note, in the process we found out that some device drivers fail to properly handle changes in IFF_ALLMULTI, leading to interesting behaviour when a multicast router is started. This bug is not corrected by this commit, and will be fixed with a separate commit. Detailed changes: -------------------- netinet/ip_mroute.c all the above. conf/files make ip_mroute.c optional net/route.c fix mrt_ioctl hook netinet/ip_input.c fix ip_mforward hook, move rsvp_input() here together with other rsvp code, and a couple of indentation fixes. netinet/ip_output.c fix ip_mforward and ip_mcast_src hooks netinet/ip_var.h rsvp function hooks netinet/raw_ip.c hooks for mrouting and rsvp functions, plus interface cleanup. netinet/ip_mroute.h remove an unused and optional field from a struct Most of the code is from Pavlin Radoslavov and the XORP project Reviewed by: sam MFC after: 1 week	2002-11-15 22:53:53 +00:00
Sam Leffler	eec3a0b17f	track changes to not strip the Ethernet header from input packets Reviewed by: many Approved by: re	2002-11-14 23:46:04 +00:00
Sam Leffler	ccb2acfe1b	track bpf changes Reviewed by: many Approved by: re	2002-11-14 23:45:13 +00:00
Maxim Konovalov	8ef1565d2b	Due to a memory alignment sizeof(struct ipfw_flow_id) is bigger than ipfw_flow_id structure actual size and bcmp(3) may fail to compare them properly. Compare members of these structures instead. PR: kern/44078 Submitted by: Oleg Bulyzhin <oleg@rinet.ru> Reviewed by: luigi MFC after: 2 weeks	2002-11-13 11:31:44 +00:00
Jeffrey Hsu	e1e1b6e892	Turn off duplicate lock checking for inp locks because udp_input() intentionally locks two inp records simultaneously.	2002-11-12 20:44:38 +00:00
Sam Leffler	6f0d017cf4	a better solution to building FAST_IPSEC w/o INET6 Submitted by: Jeffrey Hsu <hsu@FreeBSD.org>	2002-11-10 17:17:32 +00:00
Alfred Perlstein	29f194457c	Fix instances of macros with improperly parenthasized arguments. Verified by: md5	2002-11-09 12:55:07 +00:00
Sam Leffler	9c0a8ace11	temporarily disallow FAST_IPSEC and INET6 to avoid potential panics; will correct this before 5.0 release	2002-11-08 23:50:32 +00:00
Sam Leffler	e8539d32f0	FAST_IPSEC fixups: o fix #ifdef typo o must use "bounce functions" when dispatched from the protosw table don't know how this stuff was missed in my testing; must've committed the wrong bits Pointy hat: sam Submitted by: "Doug Ambrisko" <ambrisko@verniernetworks.com>	2002-11-08 23:37:50 +00:00
Sam Leffler	58fcadfc0f	fixup FAST_IPSEC build w/o INET6	2002-11-08 23:33:59 +00:00
Sam Leffler	ab94ca3cec	correct fast ipsec logic: compare destination ip address against the contents of the SA, not the SP Submitted by: "Doug Ambrisko" <ambrisko@verniernetworks.com>	2002-11-08 23:11:02 +00:00
John Baldwin	2d4e26522d	Cast a ptrdiff_t to an int to printf.	2002-11-08 14:52:26 +00:00
Jeff Roberson	1645d0903e	- Consistently update snd_wl1, snd_wl2, and rcv_up in the header prediction code. Previously, 2GB worth of header predicted data could leave these variables too far out of sequence which would cause problems after receiving a packet that did not match the header prediction. Submitted by: Bill Baumann <bbaumann@isilon.com> Sponsored by: Isilon Systems, Inc. Reviewed by: hsu, pete@isilon.com, neal@isilon.com, aaronp@isilon.com	2002-10-31 23:24:13 +00:00
Jeffrey Hsu	30613f5610	Don't need to check if SO_OOBINLINE is defined. Don't need to protect isipv6 conditional with INET6. Fix leading indentation in 2 lines.	2002-10-30 08:32:19 +00:00
Bill Fenner	4d3ffc9841	Renumber IPPROTO_DIVERT out of the range of valid IP protocol numbers. This allows socket() to return an error when the kernel is not built with IPDIVERT, and doesn't prevent future applications from using the "borrowed" IP protocol number. The sysctl net.inet.raw.olddiverterror controls whether opening a socket with the "borrowed" IP protocol fails with an accompanying kernel printf; this code should last only a couple of releases. Approved by: re	2002-10-29 16:46:13 +00:00
Maxim Konovalov	a98d88ad3e	Lower a priority of "session drop" messages. Requested by: Eugene Grosbein <eugen@kuzbass.ru> MFC after: 3 days	2002-10-29 08:53:14 +00:00
Maxime Henrion	d28e8b3a0d	Oops, forgot to commit this file. This is part of the fix for ipfw2 panics on sparc64.	2002-10-24 22:32:13 +00:00
Maxime Henrion	7c697970f4	Fix ipfw2 panics on 64-bit platforms. Quoting luigi: In order to make the userland code fully 64-bit clean it may be necessary to commit other changes that may or may not cause a minor change in the ABI. Reviewed by: luigi	2002-10-24 18:04:44 +00:00
Luigi Rizzo	18f13da2be	src and dst address were erroneously swapped in SRC_SET and DST_SET commands. Use the correct one. Also affects ipfw2 in -stable.	2002-10-24 18:01:53 +00:00
Maxime Henrion	56e77afa59	Fix kernel build on sparc64 in the IPDIVERT case.	2002-10-24 09:58:50 +00:00
Ian Dowse	efac726eeb	Unbreak the automatic remapping of an INADDR_ANY destination address to the primary local IP address when doing a TCP connect(). The tcp_connect() code was relying on in_pcbconnect (actually in_pcbladdr) modifying the passed-in sockaddr, and I failed to notice this in the recent change that added in_pcbconnect_setup(). As a result, tcp_connect() was ending up using the unmodified sockaddr address instead of the munged version. There are two cases to handle: if in_pcbconnect_setup() succeeds, then the PCB has already been updated with the correct destination address as we pass it pointers to inp_faddr and inp_fport directly. If in_pcbconnect_setup() fails due to an existing but dead connection, then copy the destination address from the old connection.	2002-10-24 02:02:34 +00:00
Maxim Konovalov	ba3a9d459c	Kill EOL spaces. Approved by: luigi MFC after: 1 week	2002-10-23 10:07:55 +00:00
Maxim Konovalov	6b6874b20c	Use syslog for messages about dropped sessions, do not flood a console. Suggested by: Eugene Grosbein <eugen@kuzbass.ru> Approved by: luigi MFC after: 1 week	2002-10-23 10:05:19 +00:00
SUZUKI Shinsuke	2754d95d85	fixed a kernel crash by "ifconfig stf0 inet 1.2.3.4" MFC after: 1 week	2002-10-22 22:50:38 +00:00
Ian Dowse	c557ae16ce	Implement a new IP_SENDSRCADDR ancillary message type that permits a server process bound to a wildcard UDP socket to select the IP address from which outgoing packets are sent on a per-datagram basis. When combined with IP_RECVDSTADDR, such a server process can guarantee to reply to an incoming request using the same source IP address as the destination IP address of the request, without having to open one socket per server IP address. Discussed on: -net Approved by: re	2002-10-21 20:40:02 +00:00
Ian Dowse	90162a4e87	Remove the "temporary connection" hack in udp_output(). In order to send datagrams from an unconnected socket, we used to first block input, then connect the socket to the sendmsg/sendto destination, send the datagram, and finally disconnect the socket and unblock input. We now use in_pcbconnect_setup() to check if a connect() would have succeeded, but we never record the connection in the PCB (local anonymous port allocation is still recorded, though). The result from in_pcbconnect_setup() authorises the sending of the datagram and selects the local address and port to use, so we just construct the header and call ip_output(). Discussed on: -net Approved by: re	2002-10-21 20:10:05 +00:00
Ian Dowse	5200e00e72	Replace in_pcbladdr() with a more generic inner subroutine for in_pcbconnect() called in_pcbconnect_setup(). This version performs all of the functions of in_pcbconnect() except for the final committing of changes to the PCB. In the case of an EADDRINUSE error it can also provide to the caller the PCB of the duplicate connection, avoiding an extra in_pcblookup_hash() lookup in tcp_connect(). This change will allow the "temporary connect" hack in udp_output() to be removed and is part of the preparation for adding the IP_SENDSRCADDR control message. Discussed on: -net Approved by: re	2002-10-21 13:55:50 +00:00
Poul-Henning Kamp	53be11f680	Fix two instances of variant struct definitions in sys/netinet: Remove the never completed _IP_VHL version, it has not caught on anywhere and it would make us incompatible with other BSD netstacks to retain this version. Add a CTASSERT protecting sizeof(struct ip) == 20. Don't let the size of struct ipq depend on the IPDIVERT option. This is a functional no-op commit. Approved by: re	2002-10-20 22:52:07 +00:00
Robert Watson	c740509854	When a packet is multicast encapsulated, give labeled policies the opportunity to preserve the label. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-10-20 21:59:00 +00:00
Ian Dowse	4b932371f4	Split out most of the logic from in_pcbbind() into a new function called in_pcbbind_setup() that does everything except commit the changes to the PCB. There should be no functional change here, but in_pcbbind_setup() will be used by the soon-to-appear IP_SENDSRCADDR control message implementation to check or allocate the source address and port. Discussed on: -net Approved by: re	2002-10-20 21:44:31 +00:00
Maxime Henrion	d7f4d27a7a	Several malloc() calls were passing the M_DONTWAIT flag which is an mbuf allocation flag. Use the correct M_NOWAIT malloc() flag. Fortunately, both were defined to 1, so this commit is a no-op.	2002-10-19 11:31:50 +00:00
Hajimu UMEMOTO	b6e2845324	last arg of in6?_gif_output() is not used any more. Obtained from: KAME MFC after: 3 weeks	2002-10-17 17:47:55 +00:00
Alfred Perlstein	dde2897f82	de-__P().	2002-10-16 22:27:27 +00:00
Hajimu UMEMOTO	ab94625826	use encapcheck. Obtained from: KAME MFC after: 3 weeks	2002-10-16 20:16:49 +00:00
Hajimu UMEMOTO	9426aedf7f	- after gif_set_tunnel(), psrc/pdst may be null. set IFF_RUNNING accordingly. - set IFF_UP on SIOCSIFADDR. be consistent with others. - set if_addrlen explicitly (just in case) - multi destination mode is long gone. - missing break statement - add gif_set_tunnel(), so that we can set tunnel address from within the kernel at ease. - encap_attach/detach dynamically on ioctls - move encap_attach() to dedicated function in in*_gif.c Obtained from: KAME MFC after: 3 weeks	2002-10-16 19:49:37 +00:00
Matthew Dillon	abac41a659	Fix oops in my last commit, I was calculating a new length but then not using it. (The code is already correct in -stable). Found by: silby	2002-10-16 19:16:33 +00:00
Guido van Rooij	2f591ab8fe	Get rid of checking for ip sec history. It is true that packets are not supposed to be checked by the firewall rules twice. However, because the various ipsec handlers never call ip_input(), this never happens anyway. This fixes the situation where a gif tunnel is encrypted with IPsec. In such a case, after IPsec processing, the unencrypted contents from the GIF tunnel are fed back to the ipintrq and subsequently handeld by ip_input(). Yet, since there still is IPSec history attached, the packets coming out from the gif device are never fed into the filtering code. This fix was sent to Itojun, and he pointed towartds http://www.netbsd.org/Documentation/network/ipsec/#ipf-interaction. This patch actually implements what is stated there (specifically: Packet came from tunnel devices (gif(4) and ipip(4)) will still go through ipf(4). You may need to identify these packets by using interface name directive in ipf.conf(5). Reviewed by: rwatson MFC after: 3 weeks	2002-10-16 09:01:48 +00:00
Sam Leffler	9b65723081	correct PCB locking in broadcast/multicast case that was exposed by change to use udp_append Reviewed by: hsu	2002-10-16 02:33:28 +00:00
Sam Leffler	b9234fafa0	Tie new "Fast IPsec" code into the build. This involves the usual configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation). As noted previously, don't use FAST_IPSEC with INET6 at the moment. Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks	2002-10-16 02:25:05 +00:00
Sam Leffler	5d84645305	Replace aux mbufs with packet tags: o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month	2002-10-16 01:54:46 +00:00
Sean Chittenden	927a76bb5e	Increase the max dummynet hash size from 1024 to 65536. Default is still 1024. Silence on: -net, -ipfw 4weeks+ Reviewed by: dd Approved by: knu (mentor) MFC after: 3 weeks	2002-10-12 07:45:23 +00:00
Matthew Dillon	c8d50f2414	turn off debugging by default if bandwidth delay product limiting is turned on (it is already off in -stable).	2002-10-10 21:41:30 +00:00
Matthew Dillon	28257b5ccc	Update various comments mainly related to retransmit/FIN that I documented while working on a previous bug. Fix a PERSIST bug. Properly account for a FIN sent during a PERSIST. MFC after: 7 days	2002-10-10 19:21:50 +00:00
Maxim Konovalov	a5428e3a9a	Fix IPOPT_TS processing: do not overwrite IP address by timestamp. PR: misc/42121 Submitted by: Praveen Khurjekar <praveen@codito.com> Reviewed by: silence on -net MFC after: 1 month	2002-10-10 12:03:36 +00:00
Maxim Sobolev	748bb23dcc	Since bpf is no longer an optional component, remove associated ifdef's. Submitted by: don't quite remember - the name of the sender disappeared with the rest of my inbox. :(	2002-10-02 09:38:17 +00:00
Mike Barcroft	c0ec31f93e	Include <sys/cdefs.h> so the visibility conditionals are available. (This should have been included with the previous revision.)	2002-10-02 04:22:34 +00:00
Mike Barcroft	0cd4a9031e	Use visibility conditionals. Only TCP_NODELAY ends up being defined in the standards case.	2002-10-02 04:19:47 +00:00
Matthew Dillon	a84db8f49e	Guido found another bug. There is a situation with timestamped TCP packets where FreeBSD will send DATA+FIN and A W2K box will ack just the DATA portion. If this occurs after FreeBSD has done a (NewReno) fast-retransmit and is recovering it (dupacks > threshold) it triggers a case in tcp_newreno_partial_ack() (tcp_newreno() in stable) where tcp_output() is called with the expectation that the retransmit timer will be reloaded. But tcp_output() falls through and returns without doing anything, causing the persist timer to be loaded instead. This causes the connection to hang until W2K gives up. This occurs because in the case where only the FIN must be acked, the 'len' calculation in tcp_output() will be 0, a lot of checks will be skipped, and the FIN check will also be skipped because it is designed to handle FIN retransmits, not forced transmits from tcp_newreno(). The solution is to simply set TF_ACKNOW before calling tcp_output() to absolute guarentee that it will run the send code and reset the retransmit timer. TF_ACKNOW is already used for this purpose in other cases. For some unknown reason this patch also seems to greatly reduce the number of duplicate acks received when Guido runs his tests over a lossy network. It is quite possible that there are other tcp_newreno{_partial_ack()} cases which were not generating the expected output which this patch also fixes. X-MFC after: Will be MFC'd after the freeze is over	2002-09-30 18:55:45 +00:00
Poul-Henning Kamp	37c841831f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
Peter Wemm	224af215a6	Zap now-unused SHLIB_MINOR	2002-09-28 00:25:32 +00:00
Maxim Konovalov	cb7641e85b	Slightly rearrange a code in rev. 1.164: o Move len initialization closer to place of its first usage. o Compare len with 0 to improve readability. o Explicitly zero out phlen in ip_insertoptions() in failure case. Suggested by: jhb Reviewed by: jhb MFC after: 2 weeks	2002-09-23 08:56:24 +00:00
Alfred Perlstein	ebc82cbbf0	s/__attribute__((__packed__))/__packed/g	2002-09-23 06:25:08 +00:00
Mike Silbersack	c1c36a2c68	Fix issue where shutdown(socket, SHUT_RD) was effectively ignored for TCP sockets. NetBSD PR: 18185 Submitted by: Sean Boudreau <seanb@qnx.com> MFC after: 3 days	2002-09-22 02:54:07 +00:00
Poul-Henning Kamp	a5554bf05b	Use m_fixhdr() rather than roll our own.	2002-09-18 19:43:01 +00:00
Matthew Dillon	fa55172bc0	Guido reported an interesting bug where an FTP connection between a Windows 2000 box and a FreeBSD box could stall. The problem turned out to be a timestamp reply bug in the W2K TCP stack. FreeBSD sends a timestamp with the SYN, W2K returns a timestamp of 0 in the SYN+ACK causing FreeBSD to calculate an insane SRTT and RTT, resulting in a maximal retransmit timeout (60 seconds). If there is any packet loss on the connection for the first six or so packets the retransmit case may be hit (the window will still be too small for fast-retransmit), causing a 60+ second pause. The W2K box gives up and closes the connection. This commit works around the W2K bug. 15:04:59.374588 FREEBSD.20 > W2K.1036: S 1420807004:1420807004(0) win 65535 <mss 1460,nop,wscale 2,nop,nop,timestamp 188297344 0> (DF) [tos 0x8] 15:04:59.377558 W2K.1036 > FREEBSD.20: S 4134611565:4134611565(0) ack 1420807005 win 17520 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF) Bug reported by: Guido van Rooij <guido@gvr.org>	2002-09-17 22:21:37 +00:00
Maxim Sobolev	563a9b6ecb	Remove __RCSID(). Submitted by: bde	2002-09-17 11:31:41 +00:00
Maxim Konovalov	1cf4349926	Explicitly clear M_FRAG flag on a mbuf with the last fragment to unbreak ip fragments reassembling for loopback interface. Discussed with: bde, jlemon Reviewed by: silence on -net MFC after: 2 weeks	2002-09-17 11:20:02 +00:00
Maxim Konovalov	e079ba8d93	In rare cases when there is no room for ip options ip_insertoptions() can fail and corrupt a header length. Initialize len and check what ip_insertoptions() returns. Reviewed by: archie, silence on -net MFC after: 5 days	2002-09-17 11:13:04 +00:00
Jennifer Yang	4a03a8a8c7	Tempary fix for inet6. The final fix is to change in6_pcbnotify to take pcbinfo instead of pcbhead. It is on the way.	2002-09-17 03:19:43 +00:00
Maxim Sobolev	2b82e3b367	Remove superfluous break.	2002-09-10 09:18:33 +00:00
Maxim Sobolev	565bb857d0	Since from now on encap_input() also catches IPPROTO_MOBILE and IPPROTO_GRE packets in addition to IPPROTO_IPV4 and IPPROTO_IPV6, explicitly specify IPPROTO_IPV4 or IPPROTO_IPV6 instead of -1 when calling encap_attach(). MFC after: 28 days (along with other if_gre changes)	2002-09-09 09:36:47 +00:00
Maxim Sobolev	c23d234cce	Reduce namespace pollution by staticizing everything, which doesn't need to be visible from outside of the module.	2002-09-06 18:16:03 +00:00
Maxim Sobolev	8e96e13e6a	Add a new gre(4) driver, which could be used to create GRE (RFC1701) and MOBILE (RFC2004) IP tunnels. Obrained from: NetBSD	2002-09-06 17:12:50 +00:00
Bruce Evans	40545cf5fc	Fixed namespace pollution in uma changes: - use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't a prerequisite. - don't include <sys/uma.h>. Namespace pollution makes "opaque" types like uma_zone_t perfectly non-opaque. Such types should never be used (see style(9)). Fixed subsequently grwon dependencies of this header on its own pollution: - include <sys/_mutex.h> and its prerequisite <sys/_lock.h> instead of depending on namespace pollution 2 layers deep in <sys/uma.h>.	2002-09-05 19:48:52 +00:00
Bruce Evans	c74af4fac1	Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of depending on namespace pollution 4 layers deep in <netinet/in_pcb.h>. Removed unused includes. Sorted includes.	2002-09-05 15:33:30 +00:00
Maxim Sobolev	386fefa3a0	Add in_hosteq() and in_nullhost() macros to make life of developers porting NetBSD code a little bit easier. Obtained from: NetBSD	2002-09-04 09:55:50 +00:00
Darren Reed	1851791868	some ipfilter files that accidently got imported here	2002-08-29 13:27:26 +00:00
Darren Reed	070700595d	This commit was generated by cvs2svn to compensate for changes in r102514, which included commits to RCS files with non-trunk default branches.	2002-08-28 13:26:01 +00:00
Philippe Charnier	93b0017f88	Replace various spelling with FALLTHROUGH which is lint()able	2002-08-25 13:23:09 +00:00
Crist J. Clark	784d7650f7	Lock the sysctl(8) knobs that turn ip{,6}fw(8) firewalling and firewall logging on and off when at elevated securelevel(8). It would be nice to be able to only lock these at securelevel >= 3, like rules are, but there is no such functionality at present. I don't see reason to be adding features to securelevel(8) with MAC being merged into 5.0. PR: kern/39396 Reviewed by: luigi MFC after: 1 week	2002-08-25 03:50:29 +00:00
Matthew Dillon	4f1e1f32b6	Correct bug in t_bw_rtttime rollover, #undef USERTT	2002-08-24 17:22:44 +00:00
Archie Cobbs	4a6a94d8d8	Replace (ab)uses of "NULL" where "0" is really meant.	2002-08-22 21:24:01 +00:00
Mike Barcroft	abbd890233	o Merge <machine/ansi.h> and <machine/types.h> into a new header called <machine/_types.h>. o <machine/ansi.h> will continue to live so it can define MD clock macros, which are only MD because of gratuitous differences between architectures. o Change all headers to make use of this. This mainly involves changing: #ifdef _BSD_FOO_T_ typedef _BSD_FOO_T_ foo_t; #undef _BSD_FOO_T_ #endif to: #ifndef _FOO_T_DECLARED typedef __foo_t foo_t; #define _FOO_T_DECLARED #endif Concept by: bde Reviewed by: jake, obrien	2002-08-21 16:20:02 +00:00
Don Lewis	26ef6ac4df	Create new functions in_sockaddr(), in6_sockaddr(), and in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in* structure and initialize it with the address and port information passed as arguments. Use calls to these new functions to replace code that is replicated multiple times in in_setsockaddr(), in_setpeeraddr(), in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and in6_mapped_peeraddr(). Inline COMMON_END in tcp_usr_accept() so that we can call in_sockaddr() with temporary copies of the address and port after the PCB is unlocked. Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC() inside in6_mapped_peeraddr() while the PCB is locked) by changing the implementation of tcp6_usr_accept() to match tcp_usr_accept(). Reviewed by: suz	2002-08-21 11:57:12 +00:00
Juli Mallett	ded7008a07	Enclose IPv6 addresses in brackets when they are displayed printable with a TCP/UDP port seperated by a colon. This is for the log_in_vain facility. Pointed out by: Edward J. M. Brocklesby Reviewed by: ume MFC after: 2 weeks	2002-08-19 19:47:13 +00:00
Luigi Rizzo	306fe283a1	Raise limit for port lists to 30 entries/ranges. Remove a duplicate "logging" message, and identify the firewall as ipfw2 in the boot message.	2002-08-19 04:45:01 +00:00
Matthew Dillon	1fcc99b5de	Implement TCP bandwidth delay product window limiting, similar to (but not meant to duplicate) TCP/Vegas. Add four sysctls and default the implementation to 'off'. net.inet.tcp.inflight_enable enable algorithm (defaults to 0=off) net.inet.tcp.inflight_debug debugging (defaults to 1=on) net.inet.tcp.inflight_min minimum window limit net.inet.tcp.inflight_max maximum window limit MFC after: 1 week	2002-08-17 18:26:02 +00:00
Jeffrey Hsu	c068736a61	Cosmetic-only changes for readability. Reviewed by: (early form passed by) bde Approved by: itojun (from core@kame.net)	2002-08-17 02:05:25 +00:00
Luigi Rizzo	99e5e64504	sys/netinet/ip_fw2.c: Implement the M_SKIP_FIREWALL bit in m_flags to avoid loops for firewall-generated packets (the constant has to go in sys/mbuf.h). Better comments on keepalive generation, and enforce dyn_rst_lifetime and dyn_fin_lifetime to be less than dyn_keepalive_period. Enforce limits (up to 64k) on the number of dynamic buckets, and retry allocation with smaller sizes. Raise default number of dynamic rules to 4096. Improved handling of set of rules -- now you can atomically enable/disable multiple sets, move rules from one set to another, and swap sets. sbin/ipfw/ipfw2.c: userland support for "noerror" pipe attribute. userland support for sets of rules. minor improvements on rule parsing and printing. sbin/ipfw/ipfw.8: more documentation on ipfw2 extensions, differences from ipfw1 (so we can use the same manpage for both), stateful rules, and some additional examples. Feedback and more examples needed here.	2002-08-16 10:31:47 +00:00
Alfred Perlstein	e88894d39a	make the strings for tcptimers, tanames and prurequests const to silence warnings.	2002-08-16 09:07:59 +00:00
Robert Watson	365433d9b8	Code formatting sync to trustedbsd_mac: don't perform an assignment in an if clause. PR: Submitted by: Reviewed by: Approved by: Obtained from: MFC after:	2002-08-15 22:04:31 +00:00
Robert Watson	fb95b5d3c3	Rename mac_check_socket_receive() to mac_check_socket_deliver() so that we can use the names _receive() and _send() for the receive() and send() checks. Rename related constants, policy implementations, etc. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 18:51:27 +00:00
Jeffrey Hsu	b5addd8564	Reset dupack count in header prediction. Follow-on to rev 1.39. Reviewed by: jayanth, Thomas R Henderson <thomas.r.henderson@boeing.com>, silby, dillon	2002-08-15 17:13:18 +00:00
Luigi Rizzo	4bbf3b8b3a	Kernel support for a dummynet option: When a pipe or queue has the "noerror" attribute, do not report drops to the caller (ip_output() and friends). (2 lines to implement it, 2 lines to document it.) This will let you simulate losses on the sender side as if they happened in the middle of the network, i.e. with no explicit feedback to the sender. manpage and ipfw2.c changes to follow shortly, together with other ipfw2 changes. Requested by: silby MFC after: 3 days	2002-08-15 16:53:43 +00:00
Robert Watson	ecd3e8ff5a	It's now sufficient to rely on a nested include of _label.h to make sure all structures in ip_var.h are defined, so remove include of mac.h. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 14:34:45 +00:00
Robert Watson	9daf40feaa	Perform a nested include of _label.h if #ifdef _KERNEL. This will satisfy consumers of ip_var.h that need a complete definition of struct ipq and don't include mac.h. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 14:34:02 +00:00
Robert Watson	3b6aad64bf	Add mac.h -- raw_ip.c was depending on nested inclusion of mac.h which is no longer present. Pointed out by: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-15 14:27:46 +00:00
Poul-Henning Kamp	ae89fdaba7	remove spurious printf	2002-08-13 19:13:23 +00:00
Jennifer Yang	3d6ade3a03	Assert that the inpcb lock is held when calling tcp_output(). Approved by: hsu	2002-08-12 03:22:46 +00:00
Luigi Rizzo	43405724ec	One bugfix and one new feature. The bugfix (ipfw2.c) makes the handling of port numbers with a dash in the name, e.g. ftp-data, consistent with old ipfw: use \\ before the - to consider it as part of the name and not a range separator. The new feature (all this description will go in the manpage): each rule now belongs to one of 32 different sets, which can be optionally specified in the following form: ipfw add 100 set 23 allow ip from any to any If "set N" is not specified, the rule belongs to set 0. Individual sets can be disabled, enabled, and deleted with the commands: ipfw disable set N ipfw enable set N ipfw delete set N Enabling/disabling of a set is atomic. Rules belonging to a disabled set are skipped during packet matching, and they are not listed unless you use the '-S' flag in the show/list commands. Note that dynamic rules, once created, are always active until they expire or their parent rule is deleted. Set 31 is reserved for the default rule and cannot be disabled. All sets are enabled by default. The enable/disable status of the sets can be shown with the command ipfw show sets Hopefully, this feature will make life easier to those who want to have atomic ruleset addition/deletion/tests. Examples: To add a set of rules atomically: ipfw disable set 18 ipfw add ... set 18 ... # repeat as needed ipfw enable set 18 To delete a set of rules atomically ipfw disable set 18 ipfw delete set 18 ipfw enable set 18 To test a ruleset and disable it and regain control if something goes wrong: ipfw disable set 18 ipfw add ... set 18 ... # repeat as needed ipfw enable set 18 ; echo "done "; sleep 30 && ipfw disable set 18 here if everything goes well, you press control-C before the "sleep" terminates, and your ruleset will be left active. Otherwise, e.g. if you cannot access your box, the ruleset will be disabled after the sleep terminates. I think there is only one more thing that one might want, namely a command to assign all rules in set X to set Y, so one can test a ruleset using the above mechanisms, and once it is considered acceptable, make it part of an existing ruleset.	2002-08-10 04:37:32 +00:00
Mike Silbersack	a9ce5e05b5	Handle PMTU discovery in syn-ack packets slightly differently; rely on syncache flags instead of directly accessing the route entry. MFC after: 3 days	2002-08-05 22:34:15 +00:00
Luigi Rizzo	1cbd978e96	bugfix: move check for udp_blackhole before the one for icmp_bandlim. MFC after: 3 days	2002-08-04 20:50:13 +00:00
Luigi Rizzo	ea779ff36c	Fix handling of packets which matched an "ipfw fwd" rule on the input side.	2002-08-03 14:59:45 +00:00
Robert Watson	e316463a86	When preserving the IP header in extra mbuf in the IP forwarding case, also preserve the MAC label. Note that this mbuf allocation is fairly non-optimal, but not my fault. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-02 20:45:27 +00:00
Robert Watson	09a555cbf9	Work to fix LINT build. Reported by: phk	2002-08-02 18:08:14 +00:00
Robert Watson	bdb3fa1832	Introduce support for Mandatory Access Control and extensible kernel access control. Add MAC support for the UDP protocol. Invoke appropriate MAC entry points to label packets that are generated by local UDP sockets, and to authorize delivery of mbufs to local sockets both in the multicast/broadcast case and the unicast case. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 21:37:34 +00:00
Robert Watson	d00e44fb4a	Document the undocumented assumption that at least one of the PCB pointer and incoming mbuf pointer will be non-NULL in tcp_respond(). This is relied on by the MAC code for correctness, as well as existing code. Obtained from: TrustedBSD PRoject Sponsored by: DARPA, NAI Labs	2002-08-01 03:54:43 +00:00
Robert Watson	0070e096d7	Introduce support for Mandatory Access Control and extensible kernel access control. Add support for labeling most out-going ICMP messages using an appropriate MAC entry point. Currently, we do not explicitly label packet reflect (timestamp, echo request) ICMP events, implicitly using the originating packet label since the mbuf is reused. This will be made explicit at some point. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-08-01 03:53:04 +00:00
Robert Watson	c488362e1a	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the TCP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check socket and mbuf labels before permitting delivery to a socket. Assign labels to newly accepted connections when the syncache/cookie code has done its business. Also set peer labels as convenient. Currently, MAC policies cannot influence the PCB matching algorithm, so cannot implement polyinstantiation. Note that there is at least one case where a PCB is not available due to the TCP packet not being associated with any socket, so we don't label in that case, but need to handle it in a special manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 19:06:49 +00:00
Robert Watson	4ea889c666	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the raw IP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check the socket and mbuf labels before permitting delivery to a socket, permitting MAC policies to selectively allow delivery of raw IP mbufs to various raw IP sockets that may be open. Restructure the policy checking code to compose IPsec and MAC results in a more readable manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 18:30:34 +00:00
Robert Watson	4ed84624a2	Introduce support for Mandatory Access Control and extensible kernel access control. When fragmenting an IP datagram, invoke an appropriate MAC entry point so that MAC labels may be copied (...) to the individual IP fragment mbufs by MAC policies. When IP options are inserted into an IP datagram when leaving a host, preserve the label if we need to reallocate the mbuf for alignment or size reasons. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 17:21:01 +00:00
Robert Watson	36b0360b37	Introduce support for Mandatory Access Control and extensible kernel access control. Instrument the code managing IP fragment reassembly queues (struct ipq) to invoke appropriate MAC entry points to maintain a MAC label on each queue. Permit MAC policies to associate information with a queue based on the mbuf that caused it to be created, update that information based on further mbufs accepted by the queue, influence the decision making process by which mbufs are accepted to the queue, and set the label of the mbuf holding the reassembled datagram following reassembly completetion. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 17:17:51 +00:00
Robert Watson	0ec4b12334	Introduce support for Mandatory Access Control and extensible kernel access control. When generating an IGMP message, invoke a MAC entry point to permit the MAC framework to label its mbuf appropriately for the target interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 16:46:56 +00:00
Robert Watson	19527d3e22	Introduce support for Mandatory Access Control and extensible kernel access control. When generating an ARP query, invoke a MAC entry point to permit the MAC framework to label its mbuf appropriately for the interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 16:45:16 +00:00
Robert Watson	d3990b06e1	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke the MAC framework to label mbuf created using divert sockets. These labels may later be used for access control on delivery to another socket, or to an interface. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI LAbs	2002-07-31 16:42:47 +00:00
Robert Watson	549e4c9e4e	Introduce support for Mandatory Access Control and extensible kernel access control. Label IP fragment reassembly queues, permitting security features to be maintained on those objects. ipq_label will be used to manage the reassembly of fragments into IP datagrams using security properties. This permits policies to deny the reassembly of fragments, as well as influence the resulting label of a datagram following reassembly. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-30 23:09:20 +00:00
Maxim Konovalov	d46a53126c	Use a common way to release locks before exit. Reviewed by: hsu	2002-07-29 09:01:39 +00:00
Don Lewis	5c38b6dbce	Wire the sysctl output buffer before grabbing any locks to prevent SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.	2002-07-28 19:59:31 +00:00
Hajimu UMEMOTO	66ef17c4b6	make setsockopt(IPV6_V6ONLY, 0) actuall work for tcp6. MFC after: 1 week	2002-07-25 18:10:04 +00:00
Hajimu UMEMOTO	eccb7001ee	cleanup usage of ip6_mapped_addr_on and ip6_v6only. now, ip6_mapped_addr_on is unified into ip6_v6only. MFC after: 1 week	2002-07-25 17:40:45 +00:00
Luigi Rizzo	be1826c354	Only log things net.inet.ip.fw.verbose is set	2002-07-24 02:41:19 +00:00
Ruslan Ermilov	61a875d706	Don't forget to recalculate the IP checksum of the original IP datagram embedded into ICMP error message. Spotted by: tcpdump 3.7.1 (-vvv) MFC after: 3 days	2002-07-23 00:16:19 +00:00
Ruslan Ermilov	88c39af35f	Don't shrink socket buffers in tcp_mss(), application might have already configured them with setsockopt(SO_*BUF), for RFC1323's scaled windows. PR: kern/11966 MFC after: 1 week	2002-07-22 22:31:09 +00:00
Hajimu UMEMOTO	854d3b19a2	do not refer to IN6P_BINDV6ONLY anymore. Obtained from: KAME MFC after: 1 week	2002-07-22 15:51:02 +00:00
John Polstra	8ea8a6804b	Fix overflows in intermediate calculations in sysctl_msec_to_ticks(). At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle to be reported as negative. MFC after: 3 days	2002-07-20 23:48:59 +00:00
Robert Watson	69dac2ea47	Don't export 'struct ipq' from kernel, instead #ifdef _KERNEL. As kernel data structures pick up security and synchronization primitives, it becomes increasingly desirable not to arbitrarily export them via include files to userland, as the userland applications pick up new #include dependencies. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-20 22:46:20 +00:00
Matthew Dillon	d65bf08af3	Add the tcps_sndrexmitbad statistic, keep track of late acks that caused unnecessary retransmissions.	2002-07-19 18:29:38 +00:00
Matthew Dillon	701bec5a38	Introduce two new sysctl's: net.inet.tcp.rexmit_min (default 3 ticks equiv) This sysctl is the retransmit timer RTO minimum, specified in milliseconds. This value is designed for algorithmic stability only. net.inet.tcp.rexmit_slop (default 200ms) This sysctl is the retransmit timer RTO slop which is added to every retransmit timeout and is designed to handle protocol stack overheads and delayed ack issues. Note that the original code applied a 1-second RTO minimum but never applied real slop to the RTO calculation, so any RTO calculation over one second would have no slop and thus not account for protocol stack overheads (TCP timestamps are not a measure of protocol turnaround!). Essentially, the original code made the RTO calculation almost completely irrelevant. Please note that the 200ms slop is debateable. This commit is not meant to be a line in the sand, and if the community winds up deciding that increasing it is the correct solution then it's easy to do. Note that larger values will destroy performance on lossy networks while smaller values may result in a greater number of unnecessary retransmits.	2002-07-18 19:06:12 +00:00
Luigi Rizzo	90780c4b05	Move IPFW2 definition before including ip_fw.h Make indentation of new parts consistent with the style used for this file.	2002-07-18 05:18:41 +00:00
Matthew Dillon	22fd54d461	I don't know how the minimum retransmit timeout managed to get set to one second but it badly breaks throughput on networks with minor packet loss. Complaints by: at least two people tracked down to this. MFC after: 3 days	2002-07-17 23:32:03 +00:00
Luigi Rizzo	318aa87b59	Fix a panic when doing "ipfw add pipe 1 log ..." Also synchronize ip_dummynet.c with the version in RELENG_4 to ease MFC's.	2002-07-17 07:21:42 +00:00
Luigi Rizzo	a8c102a2ec	Implement keepalives for dynamic rules, so they will not expire just because you leave your session idle. Also, put in a fix for 64-bit architectures (to be revised). In detail: ip_fw.h * Reorder fields in struct ip_fw to avoid alignment problems on 64-bit machines. This only masks the problem, I am still not sure whether I am doing something wrong in the code or there is a problem elsewhere (e.g. different aligmnent of structures between userland and kernel because of pragmas etc.) * added fields in dyn_rule to store ack numbers, so we can generate keepalives when the dynamic rule is about to expire ip_fw2.c * use a local function, send_pkt(), to generate TCP RST for Reset rules; * save about 250 bytes by cleaning up the various snprintf() in ipfw_log() ... * ... and use twice as many bytes to implement keepalives (this seems to be working, but i have not tested it extensively). Keepalives are generated once every 5 seconds for the last 20 seconds of the lifetime of a dynamic rule for an established TCP flow. The packets are sent to both sides, so if at least one of the endpoints is responding, the timeout is refreshed and the rule will not expire. You can disable this feature with sysctl net.inet.ip.fw.dyn_keepalive=0 (the default is 1, to have them enabled). MFC after: 1 day (just kidding... I will supply an updated version of ipfw2 for RELENG_4 tomorrow).	2002-07-14 23:47:18 +00:00
Luigi Rizzo	3956b02345	Avoid dereferencing a null pointer in ro_rt. This was always broken in HEAD (the offending statement was introduced in rev. 1.123 for HEAD, while RELENG_4 included this fix (in rev. 1.99.2.12 for RELENG_4) and I inadvertently deleted it in 1.99.2.30. So I am also restoring these two lines in RELENG_4 now. We might need another few things from 1.99.2.30.	2002-07-12 22:08:47 +00:00
Don Lewis	2d20c83f93	Back out the previous change, since it looks like locking udbinfo provides sufficient protection.	2002-07-12 09:55:48 +00:00
Don Lewis	bb1dd7a45a	Lock inp while we're accessing it.	2002-07-12 08:05:22 +00:00
Don Lewis	0e1eebb846	Defer calling SYSCTL_OUT() until after the locks have been released.	2002-07-11 23:18:43 +00:00
Don Lewis	142b2bd644	Reduce the nesting level of a code block that doesn't need to be in an else clause.	2002-07-11 23:13:31 +00:00
Luigi Rizzo	c7ea683135	Change one variable to make it easier to switch between ipfw and ipfw2	2002-07-09 06:53:38 +00:00
Luigi Rizzo	b3063f064c	Fix a bug caused by dereferencing an invalid pointer when no punch_fw was used. Fix another couple of bugs which prevented rules from being installed properly. On passing, use IPFW2 instead of NEW_IPFW to compile the new code, and slightly simplify the instruction generation code.	2002-07-08 22:57:35 +00:00
Luigi Rizzo	d63b346ab1	No functional changes, but: Following Darren's suggestion, make Dijkstra happy and rewrite the ipfw_chk() main loop removing a lot of goto's and using instead a variable to store match status. Add a lot of comments to explain what instructions are supposed to do and how -- this should ease auditing of the code and make people more confident with it. In terms of code size: the entire file takes about 12700 bytes of text, about 3K of which are for the main function, ipfw_chk(), and 2K (ouch!) for ipfw_log().	2002-07-08 22:46:01 +00:00
Luigi Rizzo	7d4d3e9051	Remove one unused command name.	2002-07-08 22:39:19 +00:00
Luigi Rizzo	5185195169	Forgot to update one field name in one of the latest commits.	2002-07-08 22:37:55 +00:00
Luigi Rizzo	5e43aef891	Implement the last 2-3 missing instructions for ipfw, now it should support all the instructions of the old ipfw. Fix some bugs in the user interface, /sbin/ipfw. Please check this code against your rulesets, so i can fix the remaining bugs (if any, i think they will be mostly in /sbin/ipfw). Once we have done a bit of testing, this code is ready to be MFC'ed, together with a bunch of other changes (glue to ipfw, and also the removal of some global variables) which have been in -current for a couple of weeks now. MFC after: 7 days	2002-07-05 22:43:06 +00:00
Brian Somers	27cc91fbf8	Remove trailing whitespace	2002-07-01 11:19:40 +00:00
Jesper Skriver	eb538bfd64	Extend the effect of the sysctl net.inet.tcp.icmp_may_rst so that, if we recieve a ICMP "time to live exceeded in transit", (type 11, code 0) for a TCP connection on SYN-SENT state, close the connection. MFC after: 2 weeks	2002-06-30 20:07:21 +00:00
Jonathan Lemon	0080a004d7	One possible code path for syncache_respond() is: syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B) Which winds up deleting a different entry from the syncache. Handle this by not utilizing the next entry in the timer chain until after syncache_respond() completes. The case of A == B should not be possible. Problem found by: Don Bowman <don@sandvine.com>	2002-06-28 19:12:38 +00:00
Doug Rabson	24f8fd9fd1	Fix warning. Reviewed by: luigi	2002-06-28 08:36:26 +00:00
Luigi Rizzo	9758b77ff1	The new ipfw code. This code makes use of variable-size kernel representation of rules (exactly the same concept of BPF instructions, as used in the BSDI's firewall), which makes firewall operation a lot faster, and the code more readable and easier to extend and debug. The interface with the rest of the system is unchanged, as witnessed by this commit. The only extra kernel files that I am touching are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In userland I only had to touch those programs which manipulate the internal representation of firewall rules). The code is almost entirely new (and I believe I have written the vast majority of those sections which were taken from the former ip_fw.c), so rather than modifying the old ip_fw.c I decided to create a new file, sys/netinet/ip_fw2.c . Same for the user interface, which is in sbin/ipfw/ipfw2.c (it still compiles to /sbin/ipfw). The old files are still there, and will be removed in due time. I have not renamed the header file because it would have required touching a one-line change to a number of kernel files. In terms of user interface, the new "ipfw" is supposed to accepts the old syntax for ipfw rules (and produce the same output with "ipfw show". Only a couple of the old options (out of some 30 of them) has not been implemented, but they will be soon. On the other hand, the new code has some very powerful extensions. First, you can put "or" connectives between match fields (and soon also between options), and write things like ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any This should make rulesets slightly more compact (and lines longer!), by condensing 2 or more of the old rules into single ones. Also, as an example of how easy the rules can be extended, I have implemented an 'address set' match pattern, where you can specify an IP address in a format like this: 10.20.30.0/26{18,44,33,22,9} which will match the set of hosts listed in braces belonging to the subnet 10.20.30.0/26 . The match is done using a bitmap, so it is essentially a constant time operation requiring a handful of CPU instructions (and a very small amount of memmory -- for a full /24 subnet, the instruction only consumes 40 bytes). Again, in this commit I have focused on functionality and tried to minimize changes to the other parts of the system. Some performance improvement can be achieved with minor changes to the interface of ip_fw_chk_t. This will be done later when this code is settled. The code is meant to compile unmodified on RELENG_4 (once the PACKET_TAG_* changes have been merged), for this reason you will see #ifdef __FreeBSD_version in a couple of places. This should minimize errors when (hopefully soon) it will be time to do the MFC.	2002-06-27 23:02:18 +00:00
Maxime Henrion	7627c6cbcc	Warning fixes for 64 bits platforms. With this last fix, I can build a GENERIC sparc64 kernel with -Werror. Reviewed by: luigi	2002-06-27 11:02:06 +00:00
Luigi Rizzo	713a6ea063	Just a comment on some additional consistency checks that could be added here.	2002-06-26 21:00:53 +00:00
Kenneth D. Merry	98cb733c67	At long last, commit the zero copy sockets code. MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.	2002-06-26 03:37:47 +00:00
Jeffrey Hsu	6fd22caf91	Avoid unlocking the inp twice if badport_bandlim() returns -1. Reported by: jlemon	2002-06-24 22:25:00 +00:00
Jeffrey Hsu	f14e4cfe33	Style bug: fix 4 space indentations that should have been tabs. Submitted by: jlemon	2002-06-24 16:47:02 +00:00
Luigi Rizzo	f10e85d797	Slightly restructure the #ifdef INET6 sections to make the code more readable. Remove the six "register" attributes from variables tcp_output(), the compiler surely knows well how to allocate them.	2002-06-23 21:25:36 +00:00
Luigi Rizzo	410bb1bfe2	Move two global variables to automatic variables within the only function where they are used (they are used with TCPDEBUG only).	2002-06-23 21:22:56 +00:00
Luigi Rizzo	4d2e36928d	Move some global variables in more appropriate places. Add XXX comments to mark places which need to be taken care of if we want to remove this part of the kernel from Giant. Add a comment on a potential performance problem with ip_forward()	2002-06-23 20:48:26 +00:00
Luigi Rizzo	51aed12e52	fix bad indentation and whitespace resulting from cut&paste	2002-06-23 09:15:43 +00:00
Luigi Rizzo	dfd1ae2f86	fix indentation of a comment	2002-06-23 09:14:24 +00:00
Luigi Rizzo	a5924d6100	fix a typo in a comment	2002-06-23 09:13:46 +00:00
Luigi Rizzo	ec3057db9e	Remove ip_fw_fwd_addr (forgotten in previous commit) remove some extra whitespace.	2002-06-23 09:03:42 +00:00
Luigi Rizzo	2b25acc158	Remove (almost all) global variables that were used to hold packet forwarding state ("annotations") during ip processing. The code is considerably cleaner now. The variables removed by this change are: ip_divert_cookie used by divert sockets ip_fw_fwd_addr used for transparent ip redirection last_pkt used by dynamic pipes in dummynet Removal of the first two has been done by carrying the annotations into volatile structs prepended to the mbuf chains, and adding appropriate code to add/remove annotations in the routines which make use of them, i.e. ip_input(), ip_output(), tcp_input(), bdg_forward(), ether_demux(), ether_output_frame(), div_output(). On passing, remove a bug in divert handling of fragmented packet. Now it is the fragment at offset 0 which sets the divert status of the whole packet, whereas formerly it was the last incoming fragment to decide. Removal of last_pkt required a change in the interface of ip_fw_chk() and dummynet_io(). On passing, use the same mechanism for dummynet annotations and for divert/forward annotations. option IPFIREWALL_FORWARD is effectively useless, the code to implement it is very small and is now in by default to avoid the obfuscation of conditionally compiled code. NOTES: * there is at least one global variable left, sro_fwd, in ip_output(). I am not sure if/how this can be removed. * I have deliberately avoided gratuitous style changes in this commit to avoid cluttering the diffs. Minor stule cleanup will likely be necessary * this commit only focused on the IP layer. I am sure there is a number of global variables used in the TCP and maybe UDP stack. * despite the number of files touched, there are absolutely no API's or data structures changed by this commit (except the interfaces of ip_fw_chk() and dummynet_io(), which are internal anyways), so an MFC is quite safe and unintrusive (and desirable, given the improved readability of the code). MFC after: 10 days	2002-06-22 11:51:02 +00:00
Jeffrey Hsu	2ded288c88	Fix logic which resulted in missing a call to INP_UNLOCK(). Submitted by: jlemon, mux	2002-06-21 22:54:16 +00:00
Jeffrey Hsu	2d40081d1f	TCP notify functions can change the pcb list.	2002-06-21 22:52:48 +00:00
Peter Wemm	532cf61bcf	Solve the 'unregistered netisr 18' information notice with a sledgehammer. Register the ISR early, but do not actually kick off the timer until we see some activity. This still saves us from running the arp timers on a system with no network cards.	2002-06-20 01:27:40 +00:00
Seigo Tanimura	03e4918190	Remove so*_locked(), which were backed out by mistake.	2002-06-18 07:42:02 +00:00
Jeffrey Hsu	3ce144ea88	Notify functions can destroy the pcb, so they have to return an indication of whether this happenned so the calling function knows whether or not to unlock the pcb. Submitted by: Jennifer Yang (yangjihui@yahoo.com) Bug reported by: Sid Carter (sidcarter@symonds.net)	2002-06-14 08:35:21 +00:00
Mike Silbersack	eb5afeba22	Re-commit w/fix: Ensure that the syn cache's syn-ack packets contain the same ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks This time, make sure that ipv4 specific code (aka all of the above) is only run in the ipv4 case.	2002-06-14 03:08:05 +00:00
Mike Silbersack	70d2b17029	Back out ip_tos/ip_ttl/DF "fix", it just panic'd my box. :) Pointy-hat to: silby	2002-06-14 02:43:20 +00:00
Mike Silbersack	21c3b2fc69	Ensure that the syn cache's syn-ack packets contain the same ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks	2002-06-14 02:36:34 +00:00
Jeffrey Hsu	9c68f33a9d	Because we're holding an exclusive write lock on the head, references to the new inp cannot leak out even though it has been placed on the head list.	2002-06-13 23:14:58 +00:00
Jeffrey Hsu	61ffc0b1a6	The UDP head was unlocked too early in one unicast case. Submitted by: bug reported by arr	2002-06-12 15:21:41 +00:00
Jeffrey Hsu	73dca2078d	Fix logic which resulted in missing a call to INP_UNLOCK().	2002-06-12 03:11:06 +00:00
Jeffrey Hsu	3cfcc388ea	Fix typo where INP_INFO_RLOCK should be INP_INFO_RUNLOCK. Submitted by: tegge, jlemon Prefer LIST_FOREACH macro. Submitted by: jlemon	2002-06-12 03:08:08 +00:00
Jeffrey Hsu	7a9378e7f5	Remember to initialize the control block head mutex.	2002-06-11 10:58:57 +00:00
Jeffrey Hsu	3d9baf34c0	Fix typo. Submitted by: Kyunghwan Kim <redjade@atropos.snu.ac.kr>	2002-06-11 10:56:49 +00:00
Jeffrey Hsu	e98d6424af	Every array elt is initialized in the following loop, so remove unnecessary M_ZERO.	2002-06-10 23:48:37 +00:00
Jeffrey Hsu	f76fcf6d4c	Lock up inpcb. Submitted by: Jennifer Yang <yangjihui@yahoo.com>	2002-06-10 20:05:46 +00:00
Seigo Tanimura	4cc20ab1f0	Back out my lats commit of locking down a socket, it conflicts with hsu's work. Requested by: hsu	2002-05-31 11:52:35 +00:00
Garrett Wollman	c7c5d95d56	Avoid unintentional trigraph.	2002-05-30 20:53:45 +00:00
Andrew R. Reiter	db40007d42	- Change the newly turned INVARIANTS #ifdef blocks (they were changed from DIAGNOSTIC yesterday) into KASSERT()'s as these help to increase code readability.	2002-05-21 18:52:24 +00:00
Andrew R. Reiter	4cb674c960	- Turn a few DIAGNOSTIC into INVARIANTS since they are really sanity checks.	2002-05-20 22:05:13 +00:00
Andrew R. Reiter	1e404e4e86	- Turn a DIAGNOSTIC into an INVARIANTS since it's a sanity check. Use proper ``if'' statement style.	2002-05-20 22:04:19 +00:00
Andrew R. Reiter	e16f6e6200	- Turn a #ifdef DIAGNOSTIC to #ifdef INVARIANTS as the code from this line through the #endif is really a sanity check. Reviewed by: jake	2002-05-20 21:50:39 +00:00
Seigo Tanimura	243917fe3b	Lock down a socket, milestone 1. o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred	2002-05-20 05:41:09 +00:00
Kelly Yancey	c3a2190cdc	Reset token-ring source routing control field on receipt of ethernet frame without source routing information. This restores the behaviour in this scenario to that of prior to my last commit.	2002-05-15 01:03:32 +00:00
Robert Watson	f83c7ad731	Modify the arguments to syncache_socket() to include the mbuf (m) that results in the syncache entry being turned into a socket. While it's not used in the main tree, this is required in the MAC tree so that labels can be propagated from the mbuf to the socket. This is also useful if you're doing things like transparent IP connection hijacking and you want to use the syncache/cookie mechanism, but we won't go there. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-05-14 18:57:55 +00:00
Luigi Rizzo	4b9840932d	Add ipfw hooks to ether_demux() and ether_output_frame(). Ipfw processing of frames at layer 2 can be enabled by the sysctl variable net.link.ether.ipfw=1 Consider this feature experimental, because right now, the firewall is invoked in the places indicated below, and controlled by the sysctl variables listed on the right. As a consequence, a packet can be filtered from 1 to 4 times depending on the path it follows, which might make a ruleset a bit hard to follow. I will add an ipfw option to tell if we want a given rule to apply to ether_demux() and ether_output_frame(), but we have run out of flags in the struct ip_fw so i need to think a bit on how to implement this. to upper layers \| \| +----------->-----------+ ^ V [ip_input] [ip_output] net.inet.ip.fw.enable=1 \| \| ^ V [ether_demux] [ether_output_frame] net.link.ether.ipfw=1 \| \| +->- [bdg_forward]-->---+ net.link.ether.bridge_ipfw=1 ^ V \| \| to devices	2002-05-13 10:37:19 +00:00
Luigi Rizzo	2f8707ca5d	Remove custom definitions (IP_FW_TCPF_SYN etc.) of TCP header flags which are the same as the original ones (TH_SYN etc.)	2002-05-13 10:21:13 +00:00
Luigi Rizzo	201efb1913	Add code to match MAC header fields (at the moment supported on bridged packets only, soon to come also for packets on ordinary ether_input() and ether_output() paths. The syntax is ipfw add <action> MAC dst src type where dst and src can be "any" or a MAC address optionallyfollowed by a mask, e.g. 10:20:30:40:50 10:20:30:40:50/32 10:20:30:40:50&ff:ff:ff:f0:ff:0f and type can be a single ethernet type, a range, or a type followed by a mask (values are always in hexadecimal) e.g. 0800 0800-0806 0800/8 0800&03ff Note, I am still uncertain on what is the best format for inputting these values, having the values in hexadecimal is convenient in most cases but can be confusing sometimes. Suggestions welcome. Implement suggestion from PR 37778 to allow "not me" on destination and source IP. The code in the PR was slightly wrong and interfered with the normal handling of IP addresses. This version hopefully is correct. Minor cleanup of the code, in some places moving the indentation to 4 spaces because the code was becoming too deep. Eventually, in a separate commit, I will move the whole file to 4 space indent.	2002-05-12 20:43:50 +00:00
Dima Dorfman	11612afabe	s/demon/daemon/	2002-05-12 00:22:38 +00:00
Mike Barcroft	9828cf071d	Remove some duplicate types that should have been removed as part of the rearranging in the previous revision. Pointy hat to: cvs update (merging), mike (for not noticing)	2002-05-11 23:28:51 +00:00
Luigi Rizzo	d60315bef5	Cleanup the interface to ip_fw_chk, two of the input arguments were totally useless and have been removed. ip_input.c, ip_output.c: Properly initialize the "ip" pointer in case the firewall does an m_pullup() on the packet. Remove some debugging code forgotten long ago. ip_fw.[ch], bridge.c: Prepare the grounds for matching MAC header fields in bridged packets, so we can have 'etherfw' functionality without a lot of kernel and userland bloat.	2002-05-09 10:34:57 +00:00
Kelly Yancey	42fdfc126a	Move ISO88025 source routing information into sockaddr_dl's sdl_data field. This returns the sdl_data field to a variable-length field. More importantly, this prevents a easily-reproduceable data-corruption bug when the interface name plus the hardware address exceed the sdl_data field's original 12 byte limit. However, token-ring interfaces may still overflow the new sdl_data field's 46 byte limit if the interface name exceeds 6 characters (since 6 characters for interface name plus 6 for hardware address plus 34 for source routing = the size of sdl_data). Further refinements could overcome this limitation but would break binary compatibility; this commit only addresses fixing the bug for commonly-occuring cases without breaking binary compatibility with the intention that the functionality can be MFC'ed to -stable. See message ID's (both send to -arch): 20020421013332.F87395-100000@gateway.posi.net 20020430181359.G11009-300000@gateway.posi.net for a more thorough description of the bug addressed and how to reproduce it. Approved by: silence on -arch and -net Sponsored by: NTT Multimedia Communications Labs MFC after: 1 week	2002-05-07 22:14:06 +00:00
Hajimu UMEMOTO	8117063142	Revised MLD-related definitions - Used mld_xxx and MLD_xxx instead of mld6_xxx and MLD6_xxx according to the official defintions in rfc2292bis (macro definitions for backward compatibility were provided) - Changed the first member of mld_hdr{} from mld_hdr to mld_icmp6_hdr to avoid name space conflict in C++ This change makes ports/net/pchar compilable again under -CURRENT. Obtained from: KAME	2002-05-06 16:28:25 +00:00
Luigi Rizzo	43d11e8453	Indentation and comments cleanup, no functional change. MFC after: 3 days	2002-05-05 21:27:47 +00:00
Alfred Perlstein	f132072368	Redo the sigio locking. Turn the sigio sx into a mutex. Sigio lock is really only needed to protect interrupts from dereferencing the sigio pointer in an object when the sigio itself is being destroyed. In order to do this in the most unintrusive manner change pgsigio's sigio * argument into a **, that way we can lock internally to the function.	2002-05-01 20:44:46 +00:00
Alfred Perlstein	59017610b2	Fix some edge cases where bad string handling could occur. Submitted by: ps	2002-05-01 08:29:41 +00:00
Alfred Perlstein	ef1047305e	cleanup: fix line wraps, add some comments, fix macro definitions, fix for(;;) loops.	2002-05-01 08:08:24 +00:00
Crist J. Clark	0f56b10c4b	Enlighten those who read the FINE POINTS of the documentation a bit more on how ipfw(8) deals with tiny fragments. While we're at it, add a quick log message to even let people know we dropped a packet. (Note that the second FINE POINT is somewhat redundant given the first, but since the code is there, leave the docs for it.) MFC after: 1 day	2002-05-01 06:29:16 +00:00
Seigo Tanimura	960ed29c4b	Revert the change of #includes in sys/filedesc.h and sys/socketvar.h. Requested by: bde Since locking sigio_lock is usually followed by calling pgsigio(), move the declaration of sigio_lock and the definitions of SIGIO_*() to sys/signalvar.h. While I am here, sort include files alphabetically, where possible.	2002-04-30 01:54:54 +00:00
Seigo Tanimura	d48d4b2501	Add a global sx sigio_lock to protect the pointer to the sigio object of a socket. This avoids lock order reversal caused by locking a process in pgsigio(). sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now require sigio_lock to be locked. Provide sowwakeup_locked(), soisconnected_locked(), and so on in case where we have to modify a socket and wake up a process atomically.	2002-04-27 08:24:29 +00:00
Mike Barcroft	58631bbe0e	Rearrange <netinet/in.h> so that it is easier to conditionalize sections for various standards. Conditionalize sections for various standards. Use standards conforming spelling for types in the sockaddr_in structure.	2002-04-24 01:26:11 +00:00
Mike Barcroft	c6e43821cd	Add sa_family_t type to <sys/_types.h> and typedefs to <netinet/in.h> and <sys/socket.h>. Previously, sa_family_t was only typedef'd in <sys/socket.h>.	2002-04-20 02:24:35 +00:00
SUZUKI Shinsuke	88ff5695c1	just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD. (based on freebsd4-snap-20020128) Reviewed by: ume MFC after: 1 week	2002-04-19 04:46:24 +00:00
SUZUKI Shinsuke	f361efa0be	initialize local variable explicitly Reviewed by: ume Obtained from: Fujitsu guys MFC after: 1 week	2002-04-11 02:14:21 +00:00
Mike Silbersack	898568d8ab	Remove some ISN generation code which has been unused since the syncache went in. MFC after: 3 days	2002-04-10 22:12:01 +00:00
Mike Silbersack	f2697d4d75	Totally nuke IPPORT_USERRESERVED, it is no longer used anywhere, update remaining comments to reflect new ephemeral port range. Reminded by: Maxim Konovalov <maxim@macomnet.ru> MFC after: 3 days	2002-04-10 19:30:58 +00:00
Mike Barcroft	13c3fcc238	Unconditionalize the definition of INET_ADDRSTRLEN and INET6_ADDRSTRLEN. Doing this helps expose bogus redefinitions in 3rd party software.	2002-04-10 11:59:02 +00:00
Brian Somers	6ce6e2be71	Remove the code that masks an EEXIST returned from rtinit() when calling ioctl(SIOC[AS]IFADDR). This allows the following: ifconfig xx0 inet 1.2.3.1 netmask 0xffffff00 ifconfig xx0 inet 1.2.3.17 netmask 0xfffffff0 alias ifconfig xx0 inet 1.2.3.25 netmask 0xfffffff8 alias ifconfig xx0 inet 1.2.3.26 netmask 0xffffffff alias but would (given the above) reject this: ifconfig xx0 inet 1.2.3.27 netmask 0xfffffff8 alias due to the conflicting netmasks. I would assert that it's wrong to mask the EEXIST returned from rtinit() as in the above scenario, the deletion of the 1.2.3.25 address will leave the 1.2.3.27 address as unroutable as it was in the first place. Offered for review on: -arch, -net Discussed with: stephen macmanus <stephenm@bayarea.net> MFC after: 3 weeks	2002-04-10 01:42:44 +00:00
Brian Somers	5a43847d1c	Don't add host routes for interface addresses of 0.0.0.0/8 -> 0.255.255.255. This change allows bootp to work with more than one interface, at the expense of some rather ``wrong'' looking code. I plan to MFC this in place of luigi's recent #ifdef BOOTP stuff that was committed to this file in -stable, as that's slightly more wrong that this is. Offered for review on: -arch, -net MFC after: 2 weeks	2002-04-10 01:42:32 +00:00
John Baldwin	ad278afdf0	Change the first argument of prison_xinpcb() to be a thread pointer instead of a proc pointer so that prison_xinpcb() can use td_ucred.	2002-04-09 20:04:10 +00:00
Mike Silbersack	c3b2fe55ba	Update comments to reflect the recent ephemeral port range change. Noticed by: ru MFC After: 1 day	2002-04-09 18:01:26 +00:00
Matthew N. Dodd	b0c570df3f	Retire this copy; it now lives in sys/net/fddi.h.	2002-04-05 19:24:38 +00:00
John Baldwin	6008862bc2	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
John Baldwin	44731cab3b	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
Mike Barcroft	8822d3fb83	o Implement <sys/_types.h>, a new header for storing types that are MI, not required to be a fixed size, and used in multiple headers. This will grow in time, as more things move here from <sys/types.h> and <machine/ansi.h>. o Add missing type definitions (uint16_t and uint32_t) to <arpa/inet.h> and <netinet/in.h>. o Reduce pollution in <sys/types.h> by using `#if _FOO_T_DECLARED' widgets to avoid including <sys/stdint.h>. o Add some missing type definitions to <unistd.h> and note the ones that still need to be added. o Make use of <sys/_types.h> primitives in <grp.h> and <sys/types.h>. Reviewed by: bde	2002-04-01 08:12:25 +00:00
Bruce Evans	c1cd65bae8	Fixed some style bugs in the removal of __P(()). Continuation lines were not outdented to preserve non-KNF lining up of code with parentheses. Switch to KNF formatting.	2002-03-24 10:19:10 +00:00
Robert Watson	29dc1288b0	Merge from TrustedBSD MAC branch: Move the network code from using cr_cansee() to check whether a socket is visible to a requesting credential to using a new function, cr_canseesocket(), which accepts a subject credential and object socket. Implement cr_canseesocket() so that it does a prison check, a uid check, and add a comment where shortly a MAC hook will go. This will allow MAC policies to seperately instrument the visibility of sockets from the visibility of processes. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-03-22 19:57:41 +00:00
Ruslan Ermilov	e3f406b3c1	Prevent icmp_reflect() from calling ip_output() with a NULL route pointer which will then result in the allocated route's reference count never being decremented. Just flood ping the localhost and watch refcnt of the 127.0.0.1 route with netstat(1). Submitted by: jayanth Back out ip_output.c,v 1.143 and ip_mroute.c,v 1.69 that allowed ip_output() to be called with a NULL route pointer. The previous paragraph shows why this was a bad idea in the first place. MFC after: 0 days	2002-03-22 16:45:54 +00:00
Mike Silbersack	9e5a5ed4c5	Change the ephemeral port range from 1024-5000 to 49152-65535. This increases the number of concurrent outgoing connections from ~4000 to ~16000. Other OSes (Solaris, OS X, NetBSD) and many other NAT products have already made this change without ill effects, so we should not run into any problems. MFC after: 1 week	2002-03-22 03:28:11 +00:00

... 20 21 22 23 24 ...

3455 commits