opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-07-12 18:51:57 -04:00

Author	SHA1	Message	Date
Randall Stewart	d55b0b1b09	- Bug fix managing congestion parameter on immediate retransmittion by handover event (fast mobility code) - Fixed problem of mobility code which is caused by remaining parameters in the deleted primary destination. - Add a missing lock. When a peer sends an INIT, and while we are processing it to send an INIT-ACK the socket is closed, we did not hold a lock to keep the socket from going away. Add protection for this case. - Fix so that arwnd is alway uses the minimal rwnd if the user has set the socket buffer smaller. Found this when the test org decided to see what happens when you set in a rwnd of 10 bytes (which is not allowed per RFC .. 4k is minimum). - Fixes so a cookie-echo ootb will NOT cause an abort to be sent. This was happening in a MPI collision case. - Examined all panics and unless there was no recovery, moved any that were not already to INVARANTS. Approved by: re@freebsd.org (gnn)	2007-10-01 03:22:29 +00:00
Maxim Konovalov	eeb36ca3d5	o For dynamic rules log a parent rule number. Prefix a log message by 'ipfw: '. PR: kern/115755 Submitted by: sem Approved by: re (gnn) MFC after: 4 weeks	2007-09-29 15:01:41 +00:00
Konstantin Belousov	586b4a0e50	Revert rev. 1.94. After recent tcp backouts, tcp_close() may return NULL. Check the return value of tcp_close() being NULL before dereferencing it in #ifdef TCPDEBUG block. Reviewed by: rwatson Approved by: re (gnn)	2007-09-24 14:46:27 +00:00
Mike Silbersack	e2f2059f68	Two changes: - Reintegrate the ANSI C function declaration change from tcp_timer.c rev 1.92 - Reorganize the tcpcb structure so that it has a single pointer to the "tcp_timer" structure which contains all of the tcp timer callouts. This change means that when the single tcp timer change is reintegrated, tcpcb will not change in size, and therefore the ABI between netstat and the kernel will not change. Neither of these changes should have any functional impact. Reviewed by: bmah, rrs Approved by: re (bmah)	2007-09-24 05:26:24 +00:00
Christian S.J. Peron	bc60490a88	Certain consumers of rtalloc like gif(4) and if_stf(4) lookup the route and once they are done with it, call rtfree(). rtfree() should only be used when we are certain we hold the last reference to the route. This bug results in console messages like the following: rtfree: 0xc40f7000 has 1 refs This patch switches the rtfree() to use RTFREE_LOCKED() instead, which should handle the reference counting on the route better. Approved by: re@ (gnn) Reviewed by: bms Reported by: many via net@ and current@ Tested by: many	2007-09-23 17:50:17 +00:00
Randall Stewart	baf3da661c	- fix (global) address handling in the presence of duplicates, the last interface should own the address, but the current code fumbles the handoff. This fixes that. - move address related debugs to PCB4 and add additional ones to help in debugging address problems. Approved by: re@freebsd.org (K Smith)	2007-09-21 04:19:33 +00:00
Randall Stewart	c99efcf633	- The address lock is changed to a rwlock. This also involves macro changes to have a RLOCK and a WLOCK and placing the correct version within the code. - The INP-INFO lock is changed to a rwlock. - When sctp_shutdown() is called on Mac OS X, the socket lock is held. So call sctp_chunk_output with SCTP_SO_LOCKED and not SCTP_SO_NOT_LOCKED. - Add SCTP_IPI_ADDR_[RW]LOCK and SCTP_IPI_ADDR_[RW]UNLOCK for Mac OS X. - u_int64_t -> uint64_t - add missing addr unlock for error return path Approved by: re@freebsd.org (K Smith)	2007-09-18 15:16:39 +00:00
Randall Stewart	0dc12c958a	- For the 1-to-1 model, fix an off by one error that allowed an extra connection over the backlog (by one) Approved by: re@freebsd.org (B. Mah)	2007-09-16 23:03:38 +00:00
Randall Stewart	3232788ef2	- Get rid of unsused constants for sysctl variables. - Fix panic from mutex unlock on freed lock when ASCONF-ACK aborts an assoc - Fix panic from addr lock recursion when ASCONFs are queued in the front states - ASCONFs "queued" in the front states should really be bundled after the COOKIE-ACK, not in front of it - Fix issue with addresses deleted in the front states from being sent with ASCONF(DELETE)-- replaced sctp_asconf_queue_add_sa() with delete specific function - Comment change in sctp.h the drafts are now RFC's Approved by: re@freebsd.org (B Mah)	2007-09-15 19:07:42 +00:00
Randall Stewart	b27a6b7d73	- DF bit was on for COOKIE-ECHO chunks. This is incorrect and should be OFF letting IP fragment large cookie-echos. - Rename sysctl variable logging to log_level. - Fix description of sysctl variable stats. - Add sysctl variable log to make sctp_log readable via sysctl mechanism (this is by compile switch and targets non KTR platforms or when someone wants to do performance wise tracing). - Removed debug code Approved by: re@freebsd.org (B Mah)	2007-09-13 14:43:54 +00:00
Randall Stewart	04ee05e815	- Incorrect error EAGAIN returned for invalid send on a locked stream (using EEOR mode). Changed to EINVAL (in sctp_output.c) - Static analysis comments added - fix in mobility code to return a value (static analysis found). - sctp6_notify function made visible instead of static (this is needed for Panda). Approved by: re@freebsd.org (B Mah)	2007-09-13 10:36:43 +00:00
Randall Stewart	19cf67115c	- Removed debug code and more C++ style comments in the mobility code in sctp_asconf.c Approved by: re@freebsd.org (B Mah)	2007-09-10 21:01:56 +00:00
Randall Stewart	b7a446b8b7	- Added some comments to tell where the htcp code comes from. - Fix a LOR on Mac OS X: Do not hold an stcb lock when calling soisconnected for a socket which has the SS_INCOMP bit set on so_state. - fix a comment to be non c++ style. Approved by: re@freebsd.org (B Mah)	2007-09-10 17:06:25 +00:00
Ken Smith	a258946554	Make sure that either inp is NULL or we have obtained a lock on it before jumping to dropunlock to avoid a panic. While here move the calls to ipsec4_in_reject() and ipsec6_in_reject() so they are after we obtain the lock on inp. Original patch to avoid panic: pjd Review of locking adjustments: gnn, sam Approved by: re (rwatson)	2007-09-10 14:49:32 +00:00
Robert Watson	f5514f084e	Further UDPv4 cleanup: - Resort includes a bit. - Correct typos and wording problems in comments. - Rename udpcksum to udp_cksum to be consistent with other UDP-related configuration variables. - Remove indirection of udp_notify through local notify variable in udp_ctlinput(), which is presumably due to copying and pasting from TCP, where multiple notify routines exist. Approved by: re (kensmith)	2007-09-10 14:22:15 +00:00
Randall Stewart	851b7298b3	- send call has a reference to uio->uio_resid in the recent send code, but uio may be NULL on sendfile calls. Change to use sndlen variable. - EMSGSIZE is not being returned in non-blocking mode and needs a small tweak to look if the msg would ever fit when returning EWOULDBLOCK. - FWD-TSN has a bug in stream processing which could cause a panic. This is a follow on to the codenomicon fix. - PDAPI level 1 and 2 do not work unless the reader gets his returned buffer full. Fix so we can break out when at level 1 or 2. - Fix fast-handoff features to copy across properly on accepted sockets - Fix sctp_peeloff() system call when no true system call exists to screen arguments for errors. In cases where a real system call exists the system call itself does this. - Fix raddr leak in recent add-ip code change for bundled asconfs (even when non-bundled asconfs are received) - Make sure ipi_addr lock is held when walking global addr list. Need to change this lock type to a rwlock(). - Add don't wake flag on both input and output when the socket is closing. - When deleting an address verify the interface is correct before allowing the delete to process. This protects panda and unnumbered. - Clean up old sysctl stuff and get rid of the old Open/Net BSD structures. - Add a function to watch the ranges in the sysctl sets. - When appending in the reassembly queue, validate that the assoc has not gone to about to be freed. If so (in the middle) abort out. Note this especially effects MAC I think due to the lock/unlock they do (or with LOCK testing in place). - Netstat patch to get rid of warnings. - Make sure that no data gets queued to inactive/unconfirmed destinations. This especially effect CMT but also makes a impact on regular SCTP as well. - During init collision when we detect seq number out of sync we need to treat it like Case C and discard the cookie (no invarient needed here). - Atomic access to the random store. - When we declare a vtag good, we need to shove it into the time wait hash to prevent further use. When the tag is put into the assoc hash, we need to remove it from the twait hash (where it will surely be). This prevents duplicate tag assignments. - Move decr-ref count to better protect sysctl out of data. - ltrace error corrections in sctp6_usrreq.c - Add hook for interface up/down to be sent to us. - Make sysctl() exported structures independent of processor architecture. - Fix route and src addr cache clearing for delete address case. - Make sure address marked SCTP_DEL_IP_ADDRESS is never selected as src addr. - in icmp handling fixed so we actually look at the icmp codes to figure out what to do. - Modified mobility code. Reception of DELETE IP ADDRESS for a primary destination and SET PRIMARY for a new primary destination is used for retransmission trigger to the new primary destination. Also, in this case, destination of chunks in send_queue are changed to the new primary destination. - Fix so that we disallow sending by mbuf to ever have EEOR mode set upon it. Approved by: re@freebsd.org (B Mah)	2007-09-08 17:48:46 +00:00
Randall Stewart	ceaad40ae7	- Locking compatiability changes. This involves adding additional flags to many function calls. The flags only get used in BSD when we compile with lock testing. These flags allow apple to escape the "giant" lock it holds on the socket and have more fine-grained locking in the NKE. It also allows us to test (with witness) the locking used by apple via a compile switch (manually applied). Approved by: re@freebsd.org(B Mah)	2007-09-08 11:35:11 +00:00
Robert Watson	85d9437250	Back out tcp_timer.c:1.93 and associated changes that reimplemented the many TCP timers as a single timer, but retain the API changes necessary to reintroduce this change. This will back out the source of at least two reported problems: lock leaks in certain timer edge cases, and TCP timers continuing to fire after a connection has closed (a bug previously fixed and then reintroduced with the timer rewrite). In a follow-up commit, some minor restylings and comment changes performed after the TCP timer rewrite will be reapplied, and a further change to allow the TCP timer rewrite to be added back without disturbing the ABI. The new design is believed to be a good thing, but the outstanding issues are leading to significant stability/correctness problems that are holding up 7.0. This patch was generated by silby, but is being committed by proxy due to poor network connectivity for silby this week. Approved by: re (kensmith) Submitted by: silby Tested by: rwatson, kris Problems reported by: peter, kris, others	2007-09-07 09:19:22 +00:00
Brian Feldman	598fa04675	Repair ALTQ-tagging rules in IPFW which got broken in the last PF import. The PF mbuf-tagging support routines changed to link the allocated tags into the provided mbuf themselves, so the left-over m_tag_prepend() was trying to add a bogus (usually NULL) tag. Reviewed by: mlaier Approved by: re	2007-08-29 19:34:28 +00:00
Randall Stewart	2afb3e849f	- During shutdown pending, when the last sack came in and the last message on the send stream was "null" but still there, a state we allow, we could get hung and not clean it up and wait for the shutdown guard timer to clear the association without a graceful close. Fix this so that that we properly clean up. - Added support for Multiple ASCONF per new RFC. We only (so far) accept input of these and cannot yet generate a multi-asconf. - Sysctl'd support for experimental Fast Handover feature. Always disabled unless sysctl or socket option changes to enable. - Error case in add-ip where the peer supports AUTH and ADD-IP but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to ABORT in this case. - According to the Kyoto summit of socket api developers (Solaris, Linux, BSD). We need to have: o non-eeor mode messages be atomic - Fixed o Allow implicit setup of an assoc in 1-2-1 model if using the sctp_**() send calls - Fixed o Get rid of HAVE_XXX declarations - Done o add a sctp_pr_policy in hole in sndrcvinfo structure - Done o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch! - Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize when we close sending out the data and disabling Nagle. - Change key concatenation order to match the auth RFC - When sending OOTB shutdown_complete always do csum. - Don't send PKT-DROP to a PKT-DROP - For abort chunks just always checksums same for shutdown-complete. - inpcb_free front state had a bug where in queue data could wedge an assoc. We need to just abandon ones in front states (free_assoc). - If a peer sends us a 64k abort, we would try to assemble a response packet which may be larger than 64k. This then would be dropped by IP. Instead make a "minimum" size for us 64k-2k (we want at least 2k for our initack). If we receive such an init discard it early without all the processing. - When we peel off we must increment the tcb ref count to keep it from being freed from underneath us. - handling fwd-tsn had bugs that caused memory overwrites when given faulty data, fixed so can't happen and we also stop at the first bad stream no. - Fixed so comm-up generates the adaption indication. - peeloff did not get the hmac params copied. - fix it so we lock the addr list when doing src-addr selection (in future we need to use a multi-reader/one writer lock here) - During lowlevel output, we could end up with a _l_addr set to null if the iterator is calling the output routine. This means we would possibly crash when we gather the MTU info. Fix so we only do the gather where we have a src address cached. - we need to be sure to set abort flag on conn state when we receive an abort. - peeloff could leak a socket. Moved code so the close will find the socket if the peeloff fails (uipc_syscalls.c) Approved by: re@freebsd.org(Ken Smith)	2007-08-27 05:19:48 +00:00
Maxim Konovalov	4a296ec798	o Fix bug I introduced in the previous commit (ipfw set extention): pack a set number correctly. Submitted by: oleg o Plug a memory leak. Submitted by: oleg and Andrey V. Elsukov Approved by: re (kensmith) MFC after: 1 week	2007-08-26 18:38:31 +00:00
Randall Stewart	c4739e2f47	- Fix address add handling to clear cached routes and source addresses when peer acks the add in case the routing table changes. - Fix sctp_lower_sosend to send shutdown chunk for mbuf send case when sndlen = 0 and sinfoflag = SCTP_EOF - Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data, So that it does not send the "null" data mbuf out and cause it to get freed twice. - Fix so auto-asconf sysctl actually effect the socket's asconf state. - Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets. - Memset bug in sctp_output.c (arguments were reversed) submitted found and reported by Dave Jones (davej@codemonkey.org.uk). - PD-API point needs to be invoked >= not just > to conform to socket api draft this fixes sctp_indata.c in the two places need to be >=. - move M_NOTIFICATION to use M_PROTO5. - PEER_ADDR_PARAMS did not fail properly if you specify an address that is not in the association with a valid assoc_id. This meant you got or set the stcb level values instead of the destination you thought you were going to get/set. Now validate if the stcb is non-null and the net is NULL that the sa_family is set and the address is unspecified otherwise return an error. - The thread based iterator could crash if associations were freed at the exact time it was running. rework the worker thread to use the increment/decrement to prevent this and no longer use the markers that the timer based iterator uses. - Fix the memleak in sctp_add_addr_to_vrf() for the case when it is detected that ifa is already pointing to a ifn. - Fix it so that if someone is so insane that they drop the send window below the minimal add mark, they still can send. - Changed all state for associations to use mask safe macro. - During front states in association freeing in sctp_inpcbfree, we had a locking problem where locks were not in place where they should have been. - Free association calls were not testing the return value in sctp_inpcb_free() properly... others should be cast void returns where we don't care about the return value. - If a reference count is held on an assoc, even from the "force free" we should not do the actual free.. but instead let the timer free it. - When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED flag is set, we must NOT process the packet but handle it like ootb. This is because while freeing an assoc we release the locks to get all the higher order locks so we can purge all the hash tables. This leaves a hole if a packet comes in just at that point. Now sctp_common_input_processing() will call the ootb code in such a case. - Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes it so we don't have a conflict (I think this is a covertity change). We made this change AFTER some conversation and looking to make sure that M_PROTO5 does not have a problem between SCTP and the 802.11 stuff (which is the only other place its used). - Fixed lock order reversal and missing atomic protection around locked_tcb during association lookup and the 1-2-1 model. - Added debug to source address selection. - V6 output must always do checksum even for loopback. - Remove more locks around inp that are not needed for an atomically added/subtracted ref count. - slight optimization in the way we zero the array in sctp_sack_check() - It was possible to respond to a ABORT() with bad checksum with a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT send a PKT-DROP to any ABORT(). - Add an option for local logging (useful for macintosh or when you need better performing during debugging). Note no commands are here to get the log info, you must just use kgdb. - The timer code needs to be aware of if it needs to call sctp_sack_check() to slide the maps and adjust the cum-ack. This is because it may be out of sync cum-ack wise. - Added threshold managment logging. - If the user picked just the right size, that just filled the send window minus one mtu, we would enter a forever loop not copying and at the same time not blocking. Change from < to <= solves this. - Sysctl added to control the fragment interleave level which defaults to 1. - My rwnd control was not being used to control the rwnd properly (we did not add and subtract to it :-() this is now fixed so we handle small messages (1 byte etc) better to bring our rwnd down more slowly. Approved by: re@freebsd.org (Bruce Mah)	2007-08-24 00:53:53 +00:00
Randall Stewart	2dad8a55be	- Remove extra comment for 7.0 (no GIANT here). - Remove unneeded WLOCK/UNLOCK of inp for getting TCB lock. - Fix panic that may occur when freeing an assoc that has partial delivery in progress (may dereference null socket pointer when queuing partial delivery aborted notification) - Some spacing and comment fixes. - Fix address add handling to clear cached routes and source addresses when peer acks the add in case the routing table changes. Approved by: re@freebsd.org (Bruce Mah)	2007-08-16 01:51:22 +00:00
Qing Li	8cb5ba02d8	Use the sequence number comparison macro to compare projected_offset against isn_offset to account for wrap around. Reviewed by: gnn, kmacy, silby Submitted by: yusheng.huang@bluecoat.com Approved by: re MFC: 3 days	2007-08-16 01:35:55 +00:00
Christian S.J. Peron	b244c8ad14	Over the past couple of years, there have been a number of reports relating the use of divert sockets to dead locks. A number of LORs have been reported between divert and a number of other network subsystems including: IPSEC, Pfil, multicast, ipfw and others. Other dead locks could occur because of recursive entry into the IP stack. This change should take care of most if not all of these issues. A summary of the changes follow: - We disallow multicast operations on divert sockets. It really doesn't make semantic sense to allow this, since typically you would set multicast parameters on multicast end points. NOTE: As a part of this change, we actually dis-allow multicast options on any socket that IS a divert socket OR IS NOT a SOCK_RAW or SOCK_DGRAM family - We check to see if there are any socket options that have been specified on the socket, and if there was (which is very un-common and also probably doesnt make sense to support) we duplicate the mbuf carrying the options. - We then drop the INP/INFO locks over the call to ip_output(). It should be noted that since we no longer support multicast operations on divert sockets and we have duplicated any socket options, we no longer need the reference to the pcb to be coherent. - Finally, we replaced the call to ip_input() to use netisr queuing. This should remove the recursive entry into the IP stack from divert. By dropping the locks over the call to ip_output() we eliminate all the lock ordering issues above. By switching over to netisr on the inbound path, we can no longer recursively enter the ip_input() code via divert. I have tested this change by using the following command: ipfwpcap -r 8000 - \| tcpdump -r - -nn -v This should exercise the input and re-injection (outbound) path, which is very similar to the work load performed by natd(8). Additionally, I have run some ospf daemons which have a heavy reliance on raw sockets and multicast. Approved by: re@ (kensmith) MFC after: 1 month LOR: 163 LOR: 181 LOR: 202 LOR: 203 Discussed with: julian, andre et al (on freebsd-net) In collaboration with: bms [1], rwatson [2] [1] bms helped out with the multicast decisions [2] rwatson submitted the original netisr patches and came up with some of the original ideas on how to combat this issue.	2007-08-06 22:06:36 +00:00
Randall Stewart	63981c2b40	- change number assignments for SHA225-512 (match artisync for bakeoff.. using the next sequential ones) - In cookie processing 1-2-1, we did not increment the stcb refcnt before releasing the tcb lock. We need to do this to keep the tcb from being freed by a abort or ?? unlikely but worth doing. Also get rid of unneed INP_WLOCK. - extra receive info included the rcvinfo which killed the padding/alignment. We now redefine all the fields properly so they both align properly both to 128 bytes. - A peeled off socket would not close without an error due to its misguided idea that sctp_disconnect() was not supported on it. This fixes it so it goes through the proper path. - When an assoc was being deleted after abort (via a timer) a small race condition exists where we might take a packet for the old assoc (since we are waiting for a cleanup timer). This state especially happens in mac. We now add a state in the asoc so these can properly handle the packet as OOTB. Approved by: re@freebsd.org(Ken Smith)	2007-08-06 15:46:46 +00:00
Robert Watson	0bf686c125	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
Bjoern A. Zeeb	cc977adc71	Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL. Also rename the related functions in a similar way. There are no functional changes. For a packet coming in with IPsec tunnel mode, the default is to only call into the firewall with the "outer" IP header and payload. With this option turned on, in addition to the "outer" parts, the "inner" IP header and payload are passed to the firewall too when going through ip_input() the second time. The option was never only related to a gif(4) tunnel within an IPsec tunnel and thus the name was very misleading. Discussed at: BSDCan 2007 Best new name suggested by: rwatson Reviewed by: rwatson Approved by: re (bmah)	2007-08-05 16:16:15 +00:00
Peter Wemm	c4a184bdc4	Change TCPTV_MIN to be independent of HZ. While it was documented to be in ticks "for algorithm stability" when originally committed, it turns out that it has a significant impact in timing out connections. When we changed HZ from 100 to 1000, this had a big effect on reducing the time before dropping connections. To demonstrate, boot with kern.hz=100. ssh to a box on local ethernet and establish a reliable round-trip-time (ie: type a few commands). Then unplug the ethernet and press a key. Time how long it takes to drop the connection. The old behavior (with hz=100) caused the connection to typically drop between 90 and 110 seconds of getting no response. Now boot with kern.hz=1000 (default). The same test causes the ssh session to drop after just 9-10 seconds. This is a big deal on a wifi connection. With kern.hz=1000, change sysctl net.inet.tcp.rexmit_min from 3 to 30. Note how it behaves the same as when HZ was 100. Also, note that when booting with hz=100, net.inet.tcp.rexmit_min used to be 30. This commit changes TCPTV_MIN to be scaled with hz. rexmit_min should always be about 30. If you set hz to Really Slow(TM), there is a safety feature to prevent a value of 0 being used. This may be revised in the future, but for the time being, it restores the old, pre-hz=1000 behavior, which is significantly less annoying. As a workaround, to avoid rebooting or rebuilding a kernel, you can run "sysctl net.inet.tcp.rexmit_min=30" and add "net.inet.tcp.rexmit_min=30" to /etc/sysctl.conf. This is safe to run from 6.0 onwards. Approved by: re (rwatson) Reviewed by: andre, silby	2007-07-31 22:11:55 +00:00
Dag-Erling Smørgrav	218cbbea9a	Make tcpstates[] static, and make sure TCPSTATES is defined before <netinet/tcp_fsm.h> is included into any compilation unit that needs tcpstates[]. Also remove incorrect extern declarations and TCPDEBUG conditionals. This allows kernels both with and without TCPDEBUG to build, and unbreaks the tinderbox. Approved by: re (rwatson)	2007-07-30 11:06:42 +00:00
Bruce A. Mah	e251d2f4f6	Fix a typo in a log message: s/Reveived/Received/. Approved by: re (rwatson)	2007-07-29 20:13:22 +00:00
Matt Jacob	24face5416	Fix compilation problems- tcpstates is only available if TCPDEBUG is set. Approved by: re (in spirit)	2007-07-29 01:31:33 +00:00
Mike Silbersack	e3020cfd3c	Fix a panic introduced in rev 1.126. Approved by: re (rwatson)	2007-07-28 20:13:40 +00:00
Andre Oppermann	773673c133	Provide a sysctl to toggle reporting of TCP debug logging: sys.net.inet.tcp.log_debug = 1 It defaults to enabled for the moment and is to be turned off for the next release like other diagnostics from development branches. It is important to note that sysctl sys.net.inet.tcp.log_in_vain uses the same logging function as log_debug. Enabling of the former also causes the latter to engage, but not vice versa. Use consistent terminology in tcp log messages: "ignored" means a segment contains invalid flags/information and is dropped without changing state or issuing a reply. "rejected" means a segments contains invalid flags/information but is causing a reply (usually RST) and may cause a state change. Approved by: re (rwatson)	2007-07-28 12:20:39 +00:00
Andre Oppermann	cdaf208d09	o Move setting/resetting logic of syncache timer from macro SYNCACHE_TIMEOUT to new function syncache_timeout(). o Fix inverted timeout callout engagement logic to actually enable the timer for the bucket row. Before SYN\|ACK was not retransmitted. o Simplify SYN\|ACK retransmit timeout backoff calculation. o Improve logging of retransmit and timeout events. o Reset timeout when duplicate SYN arrives. o Add comments. o Rearrange SYN cookie statistics counting. Bug found by: silby Submitted by: silby (different version) Approved by: re (rwatson)	2007-07-28 12:02:05 +00:00
Andre Oppermann	19bc77c549	o Move all detailed checks for RST in LISTEN state from tcp_input() to syncache_rst(). o Fix tests for flag combinations of RST and SYN, ACK, FIN. Before a RST for a connection in syncache did not properly free the entry. o Add more detailed logging. Approved by: re (rwatson)	2007-07-28 11:51:44 +00:00
Robert Watson	c6b2899785	Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove definition of NET_CALLOUT_MPSAFE, which is no longer required now that debug.mpsafenet has been removed. The once over: bz Approved by: re (kensmith)	2007-07-28 07:31:30 +00:00
Mike Silbersack	c325962b47	Export the contents of the syncache to netstat. Approved by: re (kensmith) MFC after: 2 weeks	2007-07-27 00:57:06 +00:00
Andre Oppermann	564aab1fe6	Fix comments in tcp_do_segment(). Approved by: re (kensmith)	2007-07-25 18:48:24 +00:00
Randall Stewart	1b649582bb	- take out a needless panic under invariants for sctp_output.c - Fix addrs's error checking of sctp_sendx(3) when addrcnt is less than SCTP_SMALL_IOVEC_SIZE - re-add back inpcb_bind local address check bypass capability - Fix it so sctp_opt_info is independant of assoc_id postion. - Fix cookie life set to use MSEC_TO_TICKS() macro. - asconf changes o More comment changes/clarifications related to the old local address "not" list which is now an explicit restricted list. o Rename some functions for clarity: - sctp_add/del_local_addr_assoc to xxx_local_addr_restricted() - asconf related iterator functions to sctp_asconf_iterator_xxx() o Fix bug when the same address is deleted and added (and removed from the asconf queue) where the ifa is "freed" twice refcount wise, possibly freeing it completely. o Fix bug in output where the first ASCONF would not go out after the last address is changed (e.g. only goes out when retransmitted). o Fix bug where multiple ASCONFs can be bundled in the same packet with the and with the same serial numbers. o Fix asconf stcb iterator to not send ASCONF until after all work queue entries have been processed. o Change behavior so that when the last address is deleted (auto asconf on a bound all endpoint) no action is taken until an address is added; at that time, an ASCONF add+delete is sent (if the assoc is still up). o Fix local address counting so that address scoping is taken into account. o #ifdef SCTP_TIMER_BASED_ASCONF the old timer triggered sending of ASCONF (after an RTO). The default now is to send ASCONF immediately (except for the case of changing/deleting the last usable address). Approved by: re(ken smith)@freebsd.org	2007-07-24 20:06:02 +00:00
Randall Stewart	52be287ebb	- remove duplicate code from sctp_asconf.c - remove duplicate #include <sys/priv.h> that is not under #ifdef FreeBSD version to allow compile on 6.1 - static analysis changes per the cisco SA tool including: o some SA_IGNORE comments o some checks for NULL before unlock. o type corrections int -> size_t - Fix it so sctp_alloc_asoc takes a thread/proc argument. Without this we pass a NULL in to bind on implicit assoc setup and crash :-( Approved by: re@freebsd.org(Ken Smith)	2007-07-21 21:41:32 +00:00
Robert Watson	08af97b790	Attempt to improve feature parity between UDPv4 and UDPv6 by merging UDPv4 features to UDPv6: - Add MAC checks on delivery and MAC labeling on transmit. - Check for (and reject) datagrams with destination port 0. - For multicast delivery, check the source port only if the socket being considered as a destination has been connected. - Implement UDP blackholing based on net.inet.udp.blackhole. - Add a new ICMPv6 unreachable reply rate limiting category for failed delivery attempts and implement rate limiting for UDPv6 (submitted by bz). Approved by: re (kensmith) Reviewed by: bz	2007-07-19 22:34:25 +00:00
Randall Stewart	18e198d3a3	- added pre-checks to the bindx call. - use proper tick gathering macro instead of ticks directly. - Placed reasonable boundaries on sets that a user can do that are converted to ticks from ms. - Fix CMT_PF to always check to be sure CMT is on. - Fix ticks use of CMT_PF. - put back code to allow asconfs to be queued while INITs are in flight and before the assoc is established. - During window probes, an ack'd packet might be left with the window probe mark on it causing it to be retransmitted. Change so that the flight decrease macro clears the window_probe mark. - Additional logging flight size/reading and ASOC LOG. This is only enabled if you manually insert things into opt_sctp.h since its a set of debug code only. - Found an interesting SMP race in the way data was appended which could cause a reader to lose a part of a message, had to reorder when we marked the message was complete to after the data was appended. - bug in ADD-IP for the subset bound socket case when the peer has only one address - fix ASCONF implicit success/error handling case - proper support of jails in Freebsd 6> - copy out the timeval for the 64 bit sparc world on cookie-echo alignment error crashes without this). Approved by: re(Ken Smith)	2007-07-17 20:58:26 +00:00
Randall Stewart	b54d3a6c48	- Modular congestion control, with RFC2581 being the default. - CMT_PF states added (w/sysctl to turn the PF version on) - sctp_input.c had a missing incr of cookie case when the auth was bad. This meant a free was called without an increment to refcnt, added increment like rest of code. - There was a case, unlikely, when the scope of the destination changed (this is a TSNH case). In that case, it would not free the alloc'ed asoc (in sctp_input.c). - When listed addresses found a colliding cookie/Init, then the collided upon tcb was not unlocked in sctp_pcb.c - Add error checking on arguments of sctp_sendx(3) to prevent it from referencing a NULL pointer. - Fix an error return of sctp_sendx(3), it was returing ENOMEM not -1. - Get assoc id was changed to use the sanctified socket api method for getting a assoc id (PEER_ADDR_INFO instead of PEER_ADDR_PARAMS). - Fix it so a peeled off socket will get a proper error return if it trys to send to a different address then it is connected to. - Fix so that select_a_stream can avoid an endless loop that could hang a caller. - time_entered (state set time) was not being set in all cases to the time we went established. Approved by: re(ken smith)	2007-07-14 09:36:28 +00:00
Robert Watson	43bbb6aa10	Further cleanup of UDPv4: - Move udp_sendspace and udp_recvspace global variables and associated sysctls to the top of the file where most other such things are present. - Rename static variable 'blackhole' to 'udp_blackhole' and unstaticize so that we can add blackhole support for UDPv6 using the same MIB variable. - Move udp_append() above udp_input() to match the function order in udp6_usrreq.c. Approved by: re (kensmith)	2007-07-10 09:30:46 +00:00
Bruce M Simpson	d90b8675c2	Fix a regression in IPv4 multicast join path (IP_ADD_MEMBERSHIP). With the in_mcast.c code, if an interface for an IPv4 multicast join was not specified, and a route did not exist for the specified group in the unicast forwarding tables, the join would be rejected with the error EADDRNOTAVAIL. This change restores the old behaviour whereby if no interface is specified, and no route exists for the group destination, the IPv4 address list is walked to find a non-loopback, multicast-capable interface to satisfy the join request. This should resolve problems with starting multicast services during system boot or when a default forwarding entry does not exist. Approved by: re (rwatson)	2007-07-09 10:36:47 +00:00
Robert Watson	bd84d20457	Minor UDPv4 cleanup: capitalize comment, move statistics update after mbuf free to be consistent with other error handling, and release socket buffer lock before freeing mbufs and statistics updates rather than after. Approved by: re (kensmith)	2007-07-07 09:46:34 +00:00
Peter Wemm	477d44c467	Fix a second warning, introduced by my last "fix". I committed the wrong diff from the wrong machine. Pointy hat to: peter Approved by: re (rwatson - blanket, several days ago)	2007-07-05 06:04:46 +00:00
Peter Wemm	9fb5d4c064	Fix cast-qualifiers warning when INET6 is not present Approved by: re (rwatson)	2007-07-05 05:55:57 +00:00
Max Laier	60ee384760	Link pf 4.1 to the build: - move ftp-proxy from libexec to usr.sbin - add tftp-proxy - new altq mtag link Approved by: re (kensmith)	2007-07-03 12:46:08 +00:00
George V. Neville-Neil	b2630c2934	Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing	2007-07-03 12:13:45 +00:00
Randall Stewart	5bead43650	- Consolidate the code that free's chunks to actually also call the sctp_free_remote_address() function. - Assure that when we allocate a chunk the whoTo is NULL, also when we free it and place it into the cache we NULL it (that way the consolidation code will always work). - Fix a small race, when a empty data holder is left on the stream out queue, and both sides do a shutdown, the empty data holder would prevent us from sending a SHUTDOWN-ACK and at the same time we never would cleanup the empty holder (since nothing was ever in queue). We now add a utility function that a) cleans up empty holders and b) properly determines if there are still pending data chunks on the stream out wheel. Approved by: re@freebsd.org (Ken Smith)	2007-07-02 19:22:22 +00:00
Robert Watson	02dd4b5cbd	Continue pre-7.0 privilege cleanup: update suser(9) comments to be priv(9) comments. Approved by: re (bmah)	2007-07-02 15:44:30 +00:00
George V. Neville-Neil	0d29af67f2	Fix a dangling netinet6 to netipsec transition for SCTP include files. Approved by: re	2007-07-01 14:18:20 +00:00
George V. Neville-Neil	2cb64cb272	Commit IPv6 support for FAST_IPSEC to the tree. This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing	2007-07-01 11:41:27 +00:00
Randall Stewart	9ceab0faf0	- When a SCTP socket is closed, but the last data SACK is lost, we would incorrectly abort the association instead of retransmitting the SACK. Approved by: re@freebsd.org (Ken Smith)	2007-06-29 15:14:23 +00:00
Randall Stewart	97c76f10a0	- Update bindx address checking to properly screen out address per the socket api, adding port validation. We allow port 0 or the already bound port number and no others. Approved by: re@freebsd.org (Ken Smith)	2007-06-25 19:05:26 +00:00
Randall Stewart	a964e8de4c	- Fix type casts in calling sctp_m_getptr, it expects a int not an unsigned (returned by sizeof) also add cast to comparison check for size bounds. Approved by: re(bmah@freebsd.org)	2007-06-22 14:40:09 +00:00
Randall Stewart	671d309c7c	- Fix stream reset so it limits the number of streams that can be listed - Fix fwd-tsn to use proper accessor so it does not overrun mbufs - Fix stream reset error reporting to actually work (it has always been broken if the peer rejects a stream reset) - Some 64 bit friendly changes Approved by: re(bmah@freebsd.org)	2007-06-22 13:50:56 +00:00
Randall Stewart	ea1fbec59a	- Two more static analisys bugs found by cisco's tool on a subsequent run.	2007-06-18 22:36:52 +00:00
Randall Stewart	eacc51c5b6	- Fixes cstatic issues found by cisco sa tool (missing frees and such on error legs) - align sctp_sockstore to 64 bit boundary ..	2007-06-18 21:59:15 +00:00
Maxim Konovalov	d069a5d478	o Make ipfw set more robust -- now it is possible: - to show a specific set: ipfw set 3 show - to delete rules from the set: ipfw set 9 delete 100 200 300 - to flush the set: ipfw set 4 flush - to reset rules counters in the set: ipfw set 1 zero PR: kern/113388 Submitted by: Andrey V. Elsukov Approved by: re (kensmith) MFC after: 6 weeks	2007-06-18 17:52:37 +00:00
Randall Stewart	d95ddf0251	Add additional logging level mask for packet_logging too.	2007-06-18 13:57:37 +00:00
Randall Stewart	19d8ca2eaf	- The packet log needs to copy all of the buffer not to the end.	2007-06-17 23:43:37 +00:00
Randall Stewart	75298de2a0	Back out last change to inpcb_free. Turns out we need to hold off freeing if there is data pending ... someone might do send/close. Which means we want the data to go and then close it after startup. Added comments to the code as well to note that this is done for a reason.	2007-06-17 19:27:46 +00:00
Matt Jacob	cce418d3bf	Make gcc4.2 happy and zero save_ip for the unlikely (blackhole != 0) codepath.	2007-06-17 04:07:11 +00:00
Randall Stewart	e42a0f5e72	- For sctp_input/sctp6_input add announcment when a packet arrives (debug) - re-factor the packet drop in sctp_output a bit more, we don't need the trim after all, but the size calc is now corrected. - When a assoc is in the COOKIE-ECHO/COOKIE-WAIT state and the user closes, it should not matter if data is queued, the assoc should be purged. - In error leg a missing free_chunk when iph comes in NULL (should not happen but just in case).	2007-06-17 01:36:02 +00:00
Matt Jacob	27d65ef267	Replace incorrect local OFFSET_OF macro with the correct and generic offsetof macro.	2007-06-17 00:33:34 +00:00
Matt Jacob	fbdd20a1ae	Simplification to quiet a gcc4.2 warning. Just by setting match.s_addr to nonzero you fulfill the same function as the variable 'cmp'. so you might as well zero match and test against it later. Reviewed by: timeout on review request	2007-06-17 00:31:24 +00:00
Randall Stewart	ca2cc3feac	- Better handle sending large pkt-drops. We were not triming the data with m_adj if a large pkt arrived with a bad csum some systems can't handle you not triming the tail (think panda :-D)	2007-06-16 14:03:15 +00:00
Randall Stewart	48dabb921d	- Raise max range of sctp_logging sysctl so panda does not disallow us to turn on logging levels.	2007-06-16 03:28:18 +00:00
Randall Stewart	72fb6fdb41	- Matthew's changes to get inlines out, plus a few of my own to deal with the VRF inline function -> becomes a macro now. Submitted by: Matthew Jacobs	2007-06-16 00:33:47 +00:00
Matt Jacob	3c010a416c	Garbage collect some debug code that not only no longer could work but in fact probably causes a random pointer dereferences. Garbage collect the tp variable too.	2007-06-15 22:54:11 +00:00
Randall Stewart	b9e7085a57	Name change SCTP_KTR_SUBSYS -> KTR_SCTP	2007-06-15 20:54:12 +00:00
Randall Stewart	0a374fd92a	Remove extraneous extern (its gotten from sctp_sysctl.h)	2007-06-15 20:23:41 +00:00
Randall Stewart	cba882dfcc	When removing a stream from the output-stream-wheel, if its the first stream we saw we must update the starting point in the wheel, else we may loop in an endless loop.	2007-06-15 19:49:13 +00:00
Randall Stewart	e1461651a4	- Update the comment lines in sctp_input.c - We need to init the INP_LOCK since otherwise for non-SMP kernels you crash when you set the TOS.	2007-06-15 19:28:58 +00:00
Bruce M Simpson	f64a3b042a	Stub out imported IGMPv3 definitions which clash with those of the XORP router; the IGMPv3 definitions will be updated at a later point in time when IGMPv3/MLDv2 support is fully merged.	2007-06-15 18:59:10 +00:00
Randall Stewart	458303da65	- Issue one, new stack reduction left packet_drop handling still thinking it had the whole chunk. This could cause a crash if a large packet drop came in. Fixed by adjusting the trunc length down to the limit. - Large sacks with lots of segments could also have same issue. Changed duplicate and segment handling to use proper get_m_ptr function to pull each block from mbuf chains.	2007-06-15 17:59:57 +00:00
Randall Stewart	22a6719709	- Add VRF id to sctp_ifa structure, needed mainly in panda but useful during deletes of ifa's in diff VRF's when applicable.	2007-06-15 03:16:48 +00:00
Randall Stewart	629b8f3e0f	KTR_GEN -> KTR_SUBSYS (for Kris).	2007-06-15 02:34:36 +00:00
Randall Stewart	80fefe0a08	- Fix so ifn's are properly deleted when the ref count goes to 0. - Fix so VRF's will clean themselves up when no references are around. - Allow sctp_ifa to be passed into inpcb_bind, addr_mgmt_ep_sa to bypass normal validation checks. - turn auto-asconf off for subset bound sockets - Moves all logging to use KTR. This gets rid of most of the logging #ifdef's with a few exceptions reducing the number of config options for SCTP.	2007-06-14 22:59:04 +00:00
Randall Stewart	db4fd95b0e	- fix bindx to check addresses against socket's protocol family	2007-06-13 14:39:41 +00:00
Robert Watson	2281b8f054	Remove IPX over IP tunneling support, which allows IPX routing over IP tunnels, and was not MPSAFE. The code can be easily restored in the event that someone with an IPX over IP tunnel configuration can work with me to test patches. This removes one of five remaining consumers of NET_NEEDS_GIANT. Approved by: re (kensmith)	2007-06-13 14:01:43 +00:00
Randall Stewart	9a97252585	- Fixed cookie handling to calc an RTO when its an INIT collision case. - Fixed RTO calc to maintain a seperate variable to track if a RTO calc as been done, this allows the RTO var to be doubled during initial timeouts. - Reduces the amount of stack used by process control. - Use a constant for the peer chunk overhead. - Name change to spell candidate correctly.	2007-06-13 01:31:53 +00:00
Bruce M Simpson	71498f308b	Import rewrite of IPv4 socket multicast layer to support source-specific and protocol-independent host mode multicast. The code is written to accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work. This change only pertains to FreeBSD's use as a multicast end-station and does not concern multicast routing; for an IGMPv3/MLDv2 router implementation, consider the XORP project. The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6, which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html Summary * IPv4 multicast socket processing is now moved out of ip_output.c into a new module, in_mcast.c. * The in_mcast.c module implements the IPv4 legacy any-source API in terms of the protocol-independent source-specific API. * Source filters are lazy allocated as the common case does not use them. They are part of per inpcb state and are covered by the inpcb lock. * struct ip_mreqn is now supported to allow applications to specify multicast joins by interface index in the legacy IPv4 any-source API. * In UDP, an incoming multicast datagram only requires that the source port matches the 4-tuple if the socket was already bound by source port. An unbound socket SHOULD be able to receive multicasts sent from an ephemeral source port. * The UDP socket multicast filter mode defaults to exclusive, that is, sources present in the per-socket list will be blocked from delivery. * The RFC 3678 userland functions have been added to libc: setsourcefilter, getsourcefilter, setipv4sourcefilter, getipv4sourcefilter. * Definitions for IGMPv3 are merged but not yet used. * struct sockaddr_storage is now referenced from <netinet/in.h>. It is therefore defined there if not already declared in the same way as for the C99 types. * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF which are then interpreted as interface indexes) is now deprecated. * A patch for the Rhyolite.com routed in the FreeBSD base system is available in the -net archives. This only affects individuals running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces. * Make IPv6 detach path similar to IPv4's in code flow; functionally same. * Bump __FreeBSD_version to 700048; see UPDATING. This work was financially supported by another FreeBSD committer. Obtained from: p4://bms_netdev Submitted by: Wilbert de Graaf (original work) Reviewed by: rwatson (locking), silence from fenner, net@ (but with encouragement)	2007-06-12 16:24:56 +00:00
Randall Stewart	35918f8571	- Restructure so bindx functions are not done inline to socket option but are a seperate call that can be re-used if needed. - 64 bit issues o re-arrange cookie so it is better 64 bit aligned o For wire level things we need the packed attribute.	2007-06-12 11:21:00 +00:00
Robert Watson	32f9753cfb	Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project	2007-06-12 00:12:01 +00:00
Andre Oppermann	f194524fb1	Fix a case in tcp_do_segment() where tcp_update_sack_list() would be called with an incorrect segment end value. tcp_reass() may trim segments when they overlap with already existing ones in the reassembly queue. Instead of saving the segment end value before the call to tcp_reass() compute it on the fly based on the effective segment length afterwards. This bug was not really problematic as no information got lost and the eventual SACK information computation was correct nontheless. MFC after: 1 week	2007-06-10 21:07:21 +00:00
Andre Oppermann	e8949f7407	Fix style for comments, be more verbose and add some more.	2007-06-10 20:59:22 +00:00
Andre Oppermann	104ebb2a45	Make the handling of the tcp window explicit for the SYN_SENT case in tcp_outout(). This is currently not strictly necessary but paves the way to simplify the entire SYN options handling quite a bit. Clarify comment. No change in effective behavour with this commit. RFC1323 requires the window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself never be scaled.	2007-06-09 21:19:12 +00:00
Andre Oppermann	5396d0f8d8	Remove some bogosity from the SYN_SENT case in tcp_do_segment and simplify handling of the send/receive window scaling. No change in effective behavour. RFC1323 requires the window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself never be scaled. Noticed by: yar	2007-06-09 21:09:49 +00:00
Andre Oppermann	b7de7d87a0	Don't send pure window updates when the peer has closed the connection and won't ever send more data.	2007-06-09 19:39:14 +00:00
Andre Oppermann	f58747375d	Handle a race condition on >2 core machines in tcp_timer() when a timer issues a shutdown and a simultaneous close on the socket happens. This race condition is inherent in the current socket/ inpcb life cycle system but can be handled well. Reported by: kris Tested by: kris (on 8-core machine)	2007-06-09 17:49:39 +00:00
Randall Stewart	2bf083e4c9	- Opps.. takes out debug printfs I accidentally left in :-(	2007-06-09 13:53:27 +00:00
Randall Stewart	d00aff5d79	- fix send_failed notification contents - Reorder send failed to be in correct order. - Fixed calulation of init-ack to be right off mbuf lengths instead of the precalculated value. This will fix one 64 bit platform issue.	2007-06-09 13:46:57 +00:00
Yaroslav Tykhiy	22b971db87	Replace a constant with an already defined symbolic name for it. Tested with: md5(1)	2007-06-08 13:43:28 +00:00
Yaroslav Tykhiy	dba3c50842	Add a sysctl for the purge run interval so that it can be tuned along with the rest of hostcache parameters. The new sysctl name is `net.inet.tcp.hostcache.prune'.	2007-06-08 13:35:51 +00:00
Randall Stewart	108df27c0b	- RTO was not being initialized to 0, thus the rtt calculation algoritm would not go through the proper initialization. - The initialization was incorrect as well, causing problems in sat networks with > 1sec RTT - Get rid of magic numbers in RTT calculations.	2007-06-08 10:57:11 +00:00
Andre Oppermann	45024be06f	In tcp_hc_insert() we may have the case where we have hit the global cache size limit but this bucket row is empty. Normally we want to recycle the oldest entry in the bucket row. If there isn't any the TAILQ_REMOVE leads to a panic by trying to remove a non-existing element. Fix this by just returning NULL and failing the insert. This is not a problem as the TCP hostache is only advisory. Submitted by: jhb	2007-06-07 21:41:50 +00:00
Andre Oppermann	1f939165ce	Correctly print SEQ and IRS in the corresponding log message in syncache_expand().	2007-06-06 22:10:12 +00:00
Gleb Smirnoff	e9bf9fb67c	Do not leak lock in the case of EEXIST error. PR: kern/92776 Submitted by: Ed Schouten <Ed.Schouten tunix.nl>	2007-06-06 14:21:49 +00:00
Randall Stewart	5f26a41d17	- Fixes a case where doing a sysctl would leave locks held when coping out association data. - Fixes a small bug that prevented the SCTP_UNORDERED indication from going up to the app on a recv in the sinfo_flags field.	2007-06-06 00:40:41 +00:00
David Malone	041b706b2f	Despite several examples in the kernel, the third argument of sysctl_handle_int is not sizeof the int type you want to export. The type must always be an int or an unsigned int. Remove the instances where a sizeof(variable) is passed to stop people accidently cut and pasting these examples. In a few places this was sysctl_handle_int was being used on 64 bit types, which would truncate the value to be exported. In these cases use sysctl_handle_quad to export them and change the format to Q so that sysctl(1) can still print them.	2007-06-04 18:25:08 +00:00
Randall Stewart	f4c93d2405	- fix initial pcb vrf setting when the initial vrf is not the default_vrf_id - Missing lock/unlock of inp added as well in the v6 side. - IFN hash table moves to sctppcbinfo since indexes are unique across systems (including different VRFs) this makes it easier to do ifn lookups.	2007-06-02 11:05:08 +00:00
Randall Stewart	ad21a36485	- Take out the broken table-id concept. Panda Routers have a M-VRF concept that is NOT well thought out for a multi-homed transport protocol. So the useless table-id entries passed around need to be removed. - Add a event timer for the zero copy api. - Fix a bug in sctp_timer.c when searching for an alternate with the largest ssthresh (the compare was wrong).	2007-06-01 11:19:54 +00:00
Jeff Roberson	1c4bcd050a	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)	2007-06-01 01:12:45 +00:00
Robert Watson	abc7d91030	(1) In tcp_usrclosed(), tp can never become NULL, so don't test for NULL before handling the socket disconnection case. (2) Clean up surrounding comments and formatting. Found with: Coverity Prevent(tm) (1) CID: 2203	2007-05-31 12:06:02 +00:00
Randall Stewart	4c9179ad6c	- Fixed (Apple) compiler warnings in sctp_input.c, sctputil.c, sctp_output.c - Fixed a LOR in handling a cookie. Turns out create lock is applied. And if we abort processing, this causes LOR. Changed to force the timer to clean up, that way create lock is released.	2007-05-30 22:34:21 +00:00
Randall Stewart	0696e1203e	- Fix a memory overwrite when the mapping array is expanded, size of expansion was not taken int consideration. - Fix so vtag hash is 1 bigger so that it modulo's out correctly, avoids a panic when restart with right modulo happens. - do not dereference stcb when control->do_not_ref_stcb is set - Fix up packet logging to not often use a lock and also to add to options. - Fix some logging option duplication in the sctputil.h	2007-05-30 17:39:45 +00:00
Randall Stewart	3c6f353630	Adds gcc attribute to prevent inlining of a function. If it goes inline we may well blow the stack if witness and such are enabled.	2007-05-29 14:17:47 +00:00
Randall Stewart	6b4ae3566a	- Fix spelling errors in comments per Ruslan (.. thanks... )	2007-05-29 11:53:27 +00:00
Randall Stewart	207304d4b7	- Fixes so we won't try to start a timer when we hold a wq lock for the iterator. Panda uses a silly recursive lock they hold through the timer. - Add poor mans wireshark compile option.. - Allocate and start using SCTP_M_XXX for all SCTP_MALLOC() calls. - sysctl now will get back the refcnt for viewing by onlookers. Reviewed by: gnn	2007-05-29 09:29:03 +00:00
Andre Oppermann	8d573cc158	Make log messages more verbose and simpler to understand for non-experts. Update comments to be more conscious, verbose and fully reflect reality.	2007-05-28 23:27:44 +00:00
Andre Oppermann	e885b205c6	Fix indentation of the syncache_expand() section in tcp_input().	2007-05-28 11:35:40 +00:00
Randall Stewart	d61a0ae066	- fixed autclose to not allow setting on 1-2-1 model. - bounded cookie-life to 1 second minimum in socket option set. - Delayed_ack_time becomes delayed_ack per new socket api document. - Improve port number selection, we now use low/high bounds and no chance of a endless loop. Only one call to random per bind as well. - fixes so set_peer_primary pre-screens addresses to be valid to this host. - maxseg did not allow setting on an assoc basis. We needed to thus track and use an association value instead of a inp value. - Fixed ep get of HB status to report back properly. - use settings flag to tell if assoc level hb is on off not the timer.. since the timer may still run if unconf address are present. - check for crazy ENABLE/DISABLE conditions. - set and get of pmtud (fixed path mtu) not always taking into account ovh. - Getting PMTU info on stcb only needs to return PMTUD_ENABLED if any net is doing PMTU discovery. - Panic or warning fixed to not do so when a valid ip frag is taking place. - sndrcvinfo appearing in both inp and stcb was full size, instead of the non-pad version. This saves about 92 bytes from each struct by carefully converting to use the smaller version. - one-2-one model get(maxseg) would always get ep value, never the tcb's value. - The delayed ack time could be under a tick, this fixes so it bounds it to at least 1 tick for platforms whos tick is more than a ms. - Fragment interleave level set to wrong default value. - Fragment interleave could not set level 0. - Defered stream reset was broken due to a guard check and ntohl issue. - Found two lock order reversals and fixed. - Tighten up address checking, if the user gives an address the sa_len had better be set properly. - Get asoc by assoc-id would return a locked tcb when it was asked not to if the tcb was in the restart hash. - sysctl to dig down and get more association details Reviewed by: gnn	2007-05-28 11:17:24 +00:00
Andre Oppermann	a160e6302c	Refactor and rewrite in parts the SYN handling code on listen sockets in tcp_input(): o tighten the checks on allowed TCP flags to be RFC793 and tcp-secure conform o log check failures to syslog at LOG_DEBUG level o rearrange the code flow to be easier to follow o add KASSERTs to validate assumptions of the code flow Add sysctl net.inet.tcp.syncache.rst_on_sock_fail defaulting to enable that controls the behavior on socket creation failure for a otherwise successful 3-way handshake. The socket creation can fail due to global memory shortage, listen queue limits and file descriptor limits. The sysctl allows to chose between two options to deal with this. One is to send a reset to the other endpoint to notify it about the failure (default). The other one is to ignore and treat the failure as a transient error and have the other endpoint retransmit for another try. Reviewed by: rwatson (in general)	2007-05-28 11:03:53 +00:00
Robert Watson	e487a5e2a0	Normalize spelling and grammar in TCP hostcache comments.	2007-05-27 19:39:26 +00:00
Robert Watson	c214db75f2	In tcp_timer_2msl(), tp can never become NULL, so don't check it for NULL before entering tcp_trace(). Found with: Coverity Prevent(tm) CID: 1840	2007-05-27 17:52:02 +00:00
Robert Watson	b312d4b0ba	Don't assign sp to the value of s when we're about to assign it instead to s + strlen(s). Found with: Coverity Prevent(tm) CID: 2243	2007-05-27 17:02:54 +00:00
Andre Oppermann	faedb66c2a	The printf %b list in PRINT_TH_FLAGS has to be in octal numbering. Thus convert \8 to \10 and the warnings go away. Pointed out by: sam, ru, thompsa	2007-05-25 21:28:49 +00:00
Andre Oppermann	a250f3820c	Add CWR back into the PRINT_TH_FLAGS list as gcc42 doesn't complain about \8 in a string anymore.	2007-05-23 19:16:21 +00:00
Andre Oppermann	ec05a17370	In tcp_log_addrs(): o add the hex output of the th_flags field to the example log line in comments o simplify the log line length calculation and make it less evil o correct the test for the length panic; the line isn't on the stack but malloc'ed	2007-05-23 19:07:53 +00:00
Andre Oppermann	d2ddf5d4b0	Be more restrictive with segment validity checks in syncache_expand() and log check failures to syslog at LOG_DEBUG level. Always prefill the sc->sc_ts field to use it in the checks.	2007-05-18 21:42:25 +00:00
Andre Oppermann	5df429a002	o Add syslog logging under LOG_DEBUG to various failures caused by bogus segments o Add more KASSERT()s o Update comments	2007-05-18 21:13:01 +00:00
Andre Oppermann	df541e5fc1	Add tcp_log_addrs() function to generate and standardized TCP log line for use thoughout the tcp subsystem. It is IPv4 and IPv6 aware creates a line in the following format: "TCP: [1.2.3.4]:50332 to [1.2.3.4]:80 tcpflags <RST>" A "\n" is not included at the end. The caller is supposed to add further information after the standard tcp log header. The function returns a NUL terminated string which the caller has to free(s, M_TCPLOG) after use. All memory allocation is done with M_NOWAIT and the return value may be NULL in memory shortage situations. Either struct in_conninfo \|\| (struct tcphdr && (struct ip \|\| struct ip6_hdr) have to be supplied. Due to ip[6].h header inclusion limitations and ordering issues the struct ip and struct ip6_hdr parameters have to be casted and passed as void * pointers. tcp_log_addrs(struct in_conninfo inc, struct tcphdr th, void ip4hdr, void ip6hdr) Usage example: struct ip ip; char tcplog; if (tcplog = tcp_log_addrs(NULL, th, (void *)ip, NULL)) { log(LOG_DEBUG, "%s; %s: Connection attempt to closed port\n", tcplog, __func__); free(s, M_TCPLOG); }	2007-05-18 19:58:37 +00:00
John Baldwin	0ba5d2eedb	Fix statistical accounting for bytes and packets during sack retransmits. MFC after: 1 week Submitted by: mohans	2007-05-18 19:56:24 +00:00
JINMEI Tatuya	187069853c	- Disabled responding to NI queries from a global address by default as specified in RFC4620. A new flag for icmp6_nodeinfo was added to enable the feature. - Also cleaned up the code so that the semantics of the icmp6_nodeinfo flags is clearer (i.e., defined specific macro names instead of using hard-coded values). Approved by: gnn (mentor) MFC after: 1 week	2007-05-17 21:20:24 +00:00
Randall Stewart	3c503c28da	- Fixed 1-2-1 model to not worry about associd in sockopts - Fixed RTOinfo for bounding. - Fixed connect() to return ECONNREFUSED when an ABORT is received. - Added comments to direct Static Analysis not to look at some things it does not understand (comments are /* sa_ignore XXXXX */) - Bind when colliding was broken, missing not_found = 1 before checking to see if the port was in use caused endless bind loop. - Cookie life needs to be in milliseconds to conform to socket api. - Cookie life is not supposed to change if its 0, On the assoc level set we changed it to 0 opps. - Two more static analysis issues identified by the cisco tool. Null checks needed. - An issue for sendfile(). Need to validate the correct input argument. - When sending failed due to a no route to host, we leaked the mbuf chain failing to call m_freem(). - Fix #ifdef issue for getting hash block len when HAVE_SHA2 is NOT defined Reviewed by: gnn	2007-05-17 12:16:24 +00:00
Oleg Bulyzhin	7e17f8b864	Unbreak IPv4 kernel build.	2007-05-17 00:05:13 +00:00
Robert Watson	6751f8364e	Remove leading spaces before tabs spotted thanks to silby using kwrite to read ip_input.c.	2007-05-16 20:46:58 +00:00
Andre Oppermann	abb91d889a	Remove now unused stuff forgotten in the previous commit.	2007-05-16 17:55:22 +00:00
Andre Oppermann	2104448fe7	Move TIME_WAIT related functions and timer handling from files other than repo copied tcp_subr.c into tcp_timewait.c#1.284: tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck() tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset() tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop() tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan() This is a mechanical move with appropriate renames and making them static if used only locally. The tcp_tw_2msl_scan() cleanup function is still run from the tcp_slowtimo() in tcp_timer.c.	2007-05-16 17:14:25 +00:00
David Malone	39629c92cc	When verifying the IPv4 UDP checksum, don't overwrite the checksum value in the mbuf with the result of the calculation. Previously, if we chose to return an ICMP message, the quoted UDP checksum bytes would be different to what was sent. PR: 112471 Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz> MFC after: 3 weeks	2007-05-16 09:12:16 +00:00
Andre Oppermann	ec9c755352	Complete the (mechanical) move of the TCP reassembly and timewait functions from their origininal place to their own files. TCP Reassembly from tcp_input.c -> tcp_reass.c TCP Timewait from tcp_subr.c -> tcp_timewait.c	2007-05-13 22:16:13 +00:00
Andre Oppermann	57615c7e86	Drop everything that doesn't belong into this new file. It's neither functional not connected to the build yet.	2007-05-11 21:17:53 +00:00
Andre Oppermann	1433541aa4	Drop everything that doesn't belong into this new file. It's neither functional nor connected to the build yet.	2007-05-11 21:04:57 +00:00
Andre Oppermann	0489b64c5e	Make the TCP timer callout obtain Giant if the network stack is marked as non-mpsafe. This change is to be removed when all protocols are mp-safe.	2007-05-11 20:52:47 +00:00
Andre Oppermann	504abdc6e6	Add the timestamp offset to struct tcptw so we can generate proper ACKs in TIME_WAIT state that don't get dropped by the PAWS check on the receiver.	2007-05-11 18:29:39 +00:00
Robert Watson	632bbf0f5b	Coalesce two identical UCB licenses into a single license instance with one set of copyright years. White space and comment cleanup. Export $FreeBSD$ via __FBSDID.	2007-05-11 11:21:43 +00:00
Robert Watson	b34aab2337	Minor white space and style cleanups.	2007-05-11 11:05:30 +00:00
Robert Watson	c59b9aa51b	White space and style cleanup.	2007-05-11 11:00:48 +00:00
Robert Watson	d22e451d5b	Minor white space/style normalization.	2007-05-11 10:50:31 +00:00
Robert Watson	4d41cc2fe6	Normalize style a bit: reduce pseudo-randomness of comment layout and white space. Remove 'register'.	2007-05-11 10:48:30 +00:00
Robert Watson	54d642bbe5	Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr protocol entry points using functions named proto_getsockaddr and proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr. While it's true that sockaddrs are allocated and set, the net effect is to retrieve (get) the socket address or peer address from a socket, not set it, so align names to that intent.	2007-05-11 10:20:51 +00:00
Robert Watson	169db7b25d	Remove unneeded wrappers for in_setsockaddr() and in_setpeeraddr(), which used to exist so pcbinfo locks could be acquired, but are no longer required as a result of socket/pcb reference model refinements.	2007-05-11 09:54:53 +00:00
Andre Oppermann	4b8e42baab	Fix an incorrect replace of a timer reference made during the TCP timer rewrite in rev. 1.132. This unmasked yet another bug that causes certain connections to get indefinately stuck in LAST_ACK state.	2007-05-10 23:11:29 +00:00
Robert Watson	f2565d68a4	Move universally to ANSI C function declarations, with relatively consistent style(9)-ish layout.	2007-05-10 15:58:48 +00:00
Randall Stewart	ad81507eed	Two major items here: - All printf that was surrounded by #ifdef SCTP_DEBUG moves to a macro that does all of this. This removes all printfs from the code and makes the code more portable and easier to read. - Static Analysis (cisco) - found a few bugs, but mostly we add checks for NULL pointers and such to make the tool happy. We now pass the Cisco SA tools checks except for where it does not understand tailq/lists. We still need to look at the coverity tools output too (this is like the cisco SA tool) and see if it wants us to fix any other items. Hopefully this will be the last major churn in the code other than bug fixes.	2007-05-09 13:30:06 +00:00
Maxim Konovalov	d30d90dc80	o Fix style(9) bugs introduced in the last commit. Pointed out by: bde	2007-05-09 11:39:46 +00:00
Maxim Konovalov	10fe523e99	o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build. PR: kern/112517 Submitted by: vd	2007-05-09 06:09:40 +00:00
Randall Stewart	b100636770	- Copyright change, cisco's silly tool wants it to say: "Copyright (c) 2001-2007, by Cisco Systems," instead of *Copyright (c) 2001-2007, Cisco Systems," - Also fix a few straglers that were still in 2006.	2007-05-08 17:01:12 +00:00
Randall Stewart	b0552ae214	- Get rid of the sctp_inpcb_free() "magic numbers", now they are sensible defines that tell what you are directing the function to do.	2007-05-08 15:53:03 +00:00
Randall Stewart	6e55db5445	- Static analyisis fixes for cisco's commit (this is equivilant to the coverity tool.. may even be the same one.. not sure). - A bug in the way sctp_abort() and friends were setting the IP_CLOSE flag.. and NOT passing the last argument as a (,1)... so that things would get freed..	2007-05-08 14:32:53 +00:00
Randall Stewart	17205ecc85	- More macros for OS compatabilty - PR-SCTP would ignore FWD-TSN's above a rwnd's worth of TSN's (1 byte msgs).. this left the peer hopelessly out of sync.. or an attacker. So now we abort the assoc. - New IFN hash, also rename hashes to match addr/ifn now that the vrf has multiple. - Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default as defined in the Socket API ID. - Export MTU information via sysctl. - Vrf's need table id's. This is default for BSD, but may be other things later when BSD fully supports VRFs. - Additional stream reset bug (caught by cisco dev-test). - Additional validations for the address in sending a message (socket api). -------- and ----- - Fix association notifications not to give the active open side false notifications. - Fix so sendfile and SENDALL will work properly (missing flag to say socket sender is done). - Fix Bug that prevented COOKIES from being retransmitted. - Break out connectx into helper sub-models so that iox routines can reuse the helpers. - When an address is added during system init (non-dynamic mode) make sure that the "defer use" flag is not set. its compiling on XR now :-D Reviewed by: gnn	2007-05-08 00:21:05 +00:00
Robert Watson	9df79d84c1	Rather than selectively zeroing fields in the tcp_debug structure throughout tcp_trace(), zero the entire structure up front. Minor style fixes.	2007-05-07 14:05:23 +00:00
Robert Watson	6db851a281	Since udp_peeraddr() and udp_sockaddr() directly wrap in_setpeeraddr() and in_setsockaddr(), containing only stale comments on why they exist, remove them and initialize the protosw for UDP to directly reference in_setpeeraddr() and in_setsockaddr().	2007-05-07 13:51:24 +00:00
Robert Watson	af1ee11d54	Minor style tweaks.	2007-05-07 13:47:39 +00:00
Robert Watson	434a0d24dd	When setting up timewait state for a TCP connection, don't hold the socket lock over a crhold() of so_cred: so_cred is constant after socket creation, so doesn't require locking to read.	2007-05-07 13:04:25 +00:00
Andre Oppermann	1a5537409f	Remove unused requested_s_scale from struct tcpcb.	2007-05-06 16:04:36 +00:00
Andre Oppermann	3529149e9a	Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead of a decdicated sack_enable int for this bool. Change all users accordingly.	2007-05-06 15:56:31 +00:00
Andre Oppermann	0ca3f933eb	o Remove redundant tcp reassembly check in header prediction code o Rearrange code to make intent in TCPS_SYN_SENT case more clear o Assorted style cleanup o Comment clarification for tcp_dropwithreset()	2007-05-06 15:41:06 +00:00
Andre Oppermann	c5ad39b910	Reorder the TCP header prediction test to check for the most volatile values first to spend less time on a fallback to normal processing.	2007-05-06 15:23:51 +00:00
Andre Oppermann	679d9708b6	Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segment and change it to a void function. We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late late segments arriving for such a connection is handled directly in the TW code.	2007-05-06 15:16:05 +00:00
Andre Oppermann	37ba9d112a	Fix two comments.	2007-05-06 13:38:25 +00:00
Randall Stewart	6114cd961a	Two bugs: - Locks were not being unlocked when an invalid size chunk is sent in. - When a notification comes in, we cannot use it to look up the fragment interleave stream information since its not on a stream.	2007-05-06 00:01:17 +00:00
Robert Watson	6087c3c29e	Add global mutex tcp_debug_mtx, which will protect global TCP debugging state tcp_debug, tcp_debx. Acquire and drop as required in tcp_trace(). Move to ANSI C function header, correct prototype types so that short TCP state is no longer promoted to int unnecessarily. Add comments. MFC after: 3 weeks	2007-05-04 23:43:18 +00:00
Robert Watson	1cd6eadfbb	Tweak comment at end of tcp_input() when calling into tcp_do_segment(): the pcbinfo lock will be released as well, not just the pcb lock.	2007-05-04 17:45:52 +00:00
Randall Stewart	1bb552e88d	Fixes a missing unlock in the one-2-one hash table, if it was full and a collision occured, then we would leave a inp locked. Also fixes a missing inp unlock if IPSEC was on and it failed during the attach. Bug found by Weongyo Jeong.	2007-05-04 15:19:10 +00:00
Bjoern A. Zeeb	7a92401aea	Add support for filtering on Routing Header Type 0 and Mobile IPv6 Routing Header Type 2 in addition to filter on the non-differentiated presence of any Routing Header. MFC after: 3 weeks	2007-05-04 11:15:41 +00:00
Robert Watson	7abab91135	sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing. This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization. While here, fix two historic bugs: (1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon). (2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam). SCTP portion of this patch submitted by rrs.	2007-05-03 14:42:42 +00:00
Randall Stewart	d06c82f169	- Somehow the disable fragment option got lost. We could set/clear it but would not do it. Now we will. - Moved to latest socket api for extended sndrcv info struct. - Moved to support all new levels of fragment interleave (0-2). - Codenomicon security test updates - length checks and such. - Bug in stream reset (2 actually). - setpeerprimary could unlock a null pointer, fixed. - Added a flag in the pcb so netstat can see if we are listening easier. Obtained from: (some of the Listen changes from Weongyo Jeong)	2007-05-02 12:50:13 +00:00
Robert Watson	84ca8aa609	Remove unused pcbinfo arguments to in_setsockaddr() and in_setpeeraddr().	2007-05-01 16:31:02 +00:00
Robert Watson	712fc218a0	Rename some fields of struct inpcbinfo to have the ipi_ prefix, consistent with the naming of other structure field members, and reducing improper grep matches. Clean up and comment structure fields in structure definition.	2007-04-30 23:12:05 +00:00
Maxim Konovalov	1e2f57057d	o Kill EOLWS while I'm here.	2007-04-30 20:26:11 +00:00
Maxim Konovalov	38ec733c53	o Fix strtoul() error conditions check. PR: kern/108211 Submitted by: Yong Tang MFC after: 2 weeks	2007-04-30 20:22:11 +00:00
Andre Oppermann	9fa198bead	o Fix INP lock leak in the minttl case o Remove indirection in the decision of unlocking inp o Further annotation of locking in tcp_input()	2007-04-23 19:41:47 +00:00
Randall Stewart	ee7f985774	Fixes cut and paste bug using wrong pointer reference.	2007-04-23 00:51:49 +00:00
Randall Stewart	58967d8d46	Moves the PCB features and flags from sctp_pcb.h to sctp.h so that netstat can access and display these values.	2007-04-22 12:12:38 +00:00
Randall Stewart	9a6142d8cd	- Somehow the disable fragment option got lost. We could set/clear it but would not do it. Now we will. - Moved to latest socket api for extended sndrcv info struct. - Moved to support all new levels of fragment interleave.	2007-04-22 11:06:27 +00:00
Andre Oppermann	df47e4377b	o Remove unncessary TOF_SIGLEN flag from struct tcpopt o Correctly set to->to_signature in tcp_dooptions() o Update comments	2007-04-20 15:28:01 +00:00
Andre Oppermann	7824d002c0	Add more KASSERT's.	2007-04-20 15:21:29 +00:00
Andre Oppermann	0d957bba48	o Remove unused and redundant TCP option definitions o Replace usage of MAX_TCPOPTLEN with the correctly constructed and derived MAX_TCPOPTLEN	2007-04-20 15:08:09 +00:00
Andre Oppermann	4d6e713043	Remove bogus check for accept queue length and associated failure handling from the incoming SYN handling section of tcp_input(). Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then. Change return value of syncache_add() to void. No status communication is required.	2007-04-20 14:34:54 +00:00
Andre Oppermann	e207f80039	Simplifly syncache_expand() and clarify its semantics. Zero is returned when the ACK is invalid and doesn't belong to any registered connection, either in syncache or through SYN cookies. True but a NULL struct socket is returned when the 3WHS completed but the socket could not be created due to insufficient resources or limits reached. For both cases an RST is sent back in tcp_input(). A logic error leading to a panic is fixed where syncache_expand() would free the mbuf on socket allocation failure but tcp_input() later supplies it to tcp_dropwithreset() to issue a RST to the peer. Reported by: kris (the panic)	2007-04-20 13:51:34 +00:00
Andre Oppermann	0a5df51410	Only update TCP timestamp on SYN duplication if it is present on current SYN in syncache_add(). Otherwise disable timestamps.	2007-04-20 13:36:48 +00:00
Andre Oppermann	c73f70b728	o Plug memory leak in syncache_add() on MAC label allocation failure. o Simplify code flow with 'done' goto label. o Remove mbuf argument from syncache_respond(). It doesn't make use of it.	2007-04-20 13:30:08 +00:00
Randall Stewart	f1f73e5718	- More work on making send lock contention. - Removed free-oqueue cache. - Fix counter for sq entries - Increased the amount of information retained on ASOC_TSN logging on the association. - Made it so with the ASOC_TSN logging on sending or recieving an abort we dump the log. - Went through and added invariant's around some panic's that needed them. - decrements went to atomic_subtact_int instead of add -1 - Removed residual count increment that threw off a strm oq count. - Tracks and complaints if we don't have a LAST fragment and clean up the sp structure. - Track a new stat that counts number of abandoned msgs that happen if you close without reading. - Fix lookup of frag point to be aware of a 0 assoc-id. Reviewed by: gnn	2007-04-19 11:28:43 +00:00
Andre Oppermann	bbf4e1cb47	Make tcp_twrespond() use tcp_addoptions() instead of a home grown version.	2007-04-18 18:14:39 +00:00
Andre Oppermann	9eab54debf	When we run into the syncache entry limits syncache_add() tries to free the oldest entry in the current bucket row. The global entry limit may be smaller than the bucket rows and their limit combined however. Thus only try to free a syncache entry if we found one in this bucket row. Reported by: kris	2007-04-17 15:25:14 +00:00
Robert Watson	c9791cfb3e	Shorten text string for ip_fw2 dynamic rules zone by removing the word "zone", which is generally not present in zone names. This reduces the incidence of line-wrapping in "vmstat -z " using 80-column displays. MFC after: 3 days	2007-04-17 09:28:36 +00:00
Robert Watson	215c8d75b8	Remove unused variable tcbinfo_mtx.	2007-04-15 21:03:23 +00:00
Randall Stewart	f1d6e6dc71	Fix stupid syntax error - Pointy hat to me :-(	2007-04-15 13:03:14 +00:00
Randall Stewart	478d3f0901	- Add more comments to sctps_stats struture in sctp_uio.h - Fix bug that prevented EEOR mode from working and simplified the can_we_split code in the process. - Reduce lock contention for the tcb_send_lock. I did this especially for EEOR mode, still need to look at why I need a lock when removing from the tailq and the ->next is NOT null. A lock fixes it but it implies a bug yet exists. - Activated Andre's proposed changes to better use the mbuf infrastructure. - Fixed places that were not using the aloc macro's to take advantage of the per assoc cache. - Adds ifdef fix so any logging will enable stat_logging to get the right data structures in place (suggested by Max Laier).	2007-04-15 11:58:26 +00:00
Max Laier	d0cf96b407	Fix a typeo - unbreak the build.	2007-04-14 18:27:34 +00:00
Randall Stewart	c105859eee	- fix source address selection when picking an acceptable address - name change of prefered -> preferred - CMT fast recover code added. - Comment fixes in CMT. - We were not giving a reason of cant_start_asoc per socket api if we failed to get init/or/cookie to bring up an assoc. Change so we don't just give a generic "comm lost" but look at actual states of dying assoc. - change "crc32" arguments to "crc32c" to silence strict/noisy compiler warnings when crc32() is also declared - A few minor tweaks to get the portable stuff truely portable for sctp6_usrreq.c :-D - one-2-one style vrf match problem. - window recovery would leave chks marked for retran during window probes on the sent queue. This would then cause an out-of-order problem and assure that the flight size "problem" would occur. - Solves a flight size logging issue that caused rwnd overruns, flight size off as well as false retransmissions.g - Macroize the up and down of flight size. - Fix a ECNE bug in its counting. - The strict_sacks options was causing aborts when window probing was active, fix to make strict sacks a bit smarter about what the next unsent TSN is. - Fixes a one-2-one wakeup bug found by Martin Kulas. - If-defed out form, Andre's copy routines pending his commit of at least m_last().. need to adjust for 6.2 as well.. since m_last won't exist. Reviewed by: gnn	2007-04-14 09:44:09 +00:00
Ruslan Ermilov	7480de4305	Make "struct tcp_timer" visible only to the kernel, and unbreak world.	2007-04-11 14:08:42 +00:00
Andre Oppermann	b8152ba793	Change the TCP timer system from using the callout system five times directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version)	2007-04-11 09:45:16 +00:00
Robert Watson	6493245ded	Add a new privilege, PRIV_NETINET_REUSEPORT, which will replace superuser checks to see whether bind() can reuse a port/address combination while it's already in use (for some definition of use).	2007-04-10 15:58:38 +00:00
Paolo Pisati	c326cd0e62	Prevent the usage of an uninitialized variable: do not accept StartMediaTx message before an OpnRcvChnAck message was received. Reviewed by: glebius Approved by: glebius (mentor) MFC after: 3 days Found with: Coverity Prevent(tm) CID: 498	2007-04-07 09:52:36 +00:00
Paolo Pisati	f4296f2246	Silence Coverity about an unused variable. Reviewed by: glebius Approved by: glebius (mentor) MFC after: 3 days CID: 538	2007-04-07 09:47:39 +00:00
Andre Oppermann	995a77176f	Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add some further INP_INFO_WLOCK_ASSERT() while there.	2007-04-04 18:30:16 +00:00
Andre Oppermann	0c38fd0a7a	Move last tcpcb initialization for the inbound connection case from tcp_input() to syncache_socket() where it belongs and the majority of it already happens. The "tp->snd_up = tp->snd_una" is removed as it is done with the tcp_sendseqinit() macro a few lines earlier.	2007-04-04 16:13:45 +00:00
Andre Oppermann	beaa515e95	Some local and style(9) cleanups.	2007-04-04 15:30:31 +00:00
Andre Oppermann	5dd9dfefd6	Retire unused TCP_SACK_DEBUG.	2007-04-04 14:44:15 +00:00
Andre Oppermann	b728e90260	In tcp_dooptions() skip over SACK options if it is a SYN segment.	2007-04-04 14:39:49 +00:00
Alexander Kabaev	edb2e5dca3	Include string.h for non-kernel builds to get proper memcpy prototype.	2007-04-04 03:16:59 +00:00
Alexander Kabaev	d8164209b3	Include string.h for non-kernel builds to get proper strcpy, strlen prototypes.	2007-04-04 03:14:15 +00:00
Alexander Kabaev	9160afee7c	Do not assign result of (char ) cast to u_char variable.	2007-04-04 03:10:42 +00:00
Julian Elischer	1bd69ee131	Since we switched to using monatomically increasing timestamps, they have been reported back to the userland as being in 1970. Add boot time to the timestamp to give the time in the scale of the 'current' real timescale. Not perfect if you change the time a lot but good enough to keep all the rules correct relative to each other correct in terms of time relative to "now".	2007-04-03 22:45:50 +00:00
Randall Stewart	bff64a4db3	- fixed several places where we did not release INP locks. - fixed a refcount bug in the new ifa structures. - use vrf's from default stcb or inp whenever possible. - Address limits raised to account for a full IP fragmented packet (1000 addresses). - flight size correcting updated to include one message only and to handle case where the peer does not cumack the next segment aka lists 1/1 in sack blocks.. - Various bad init/init-ack handling could cause a panic since we tried to unlock the destroyed mutex. Fixes so we properly exit when we need to destroy an assoc. (Found by Cisco DevTest team :D) - name rename in src-addr-selection from pass to sifa. - route structure typedef'd to allow different platforms and updated into sctp_os_bsd file. - Max retransmissions a chunk can be made added. Reviewed by: gnn	2007-04-03 11:15:32 +00:00
Randall Stewart	5e54f665f0	- Found bug in min split point bundling which caused incorrect, non-bundlable fragmentation. - Added min residual to better control split points for both how big a msg must be as well as how much needs to be left over. - With our new algo in place, we need to implicitly set "end of msg" on the sp-> structure otherwise we end up with "hung" associations. - Room reserved up front in IP header by pushing IP header to back of mbuf. - Fix so FR's peg count of retransmissions needed. - Fix so an unlucky chunk that never gets across will kill the assoc via the kill timer and send an abort too. - Fix bug in sctp_input which can result in a crash. - Do not strip off IP options anymore. - Clean up sctp_calculate_rto(). - Get rid of unused sysctl. - Fixed so we discard all M-Cast - Fixed so port check done AFTER checksum - Fixed bug in fragmentation code that prevented us from fragmenting a small complete message when we needed to. - Window probes were not marked back to unsent and flight adjusted when a sack came in with no window change or accepting of the probe data. We now fix this with having a mark on the net and the chunk so we can clear it out when the sack arrives forcing it to retran just like it was "new" this improves the handling of window probes, which were dropped by the receiver. - Tighten AUTH protocol error checks during INIT/INIT-ACK exchange	2007-03-31 11:47:30 +00:00
Bruce M Simpson	f7e083af90	Fix a bug in IPv4 address configuration exposed by refcounting. * Join the IPv4 all-hosts multicast group 224.0.0.1 once only; that is, when an IPv4 address is first configured on an interface. * Do not join it for subsequent IPv4 addresses as this violates IGMP. * Be sure to leave the group when all IPv4 addresses have been removed from the interface. * Add two DIAGNOSTIC printfs related to the issue. Further care and attention is needed in this area; it is suggested that netinet's attachment to the ifnet structure be compartmentalized and non-implicit. Bug found by: andre MFC after: 1 month	2007-03-29 21:39:22 +00:00
Andre Oppermann	1929eae1cc	When blackholing do a 'dropunlock' in the new world order to prevent the INP_INFO_LOCK from leaking. Reported by: ache Found by: rwatson	2007-03-28 12:58:13 +00:00
Robert Watson	77c78838f0	Remove stale comment about not enabling inpcb and inpcbinfo lock assertions when IPv6 is enabled. MFC after: 3 days	2007-03-28 00:50:20 +00:00
Andre Oppermann	07b64b901a	In tcp_sack_doack() remove too tight KASSERT() added in last revision. This function may be called without any TCP SACK option blocks present. Protect iteration over SACK option blocks by checking for SACK options present flag first. Bug reported by: wkoszek, keramida, Nicolas Blais	2007-03-25 23:27:26 +00:00
Robert Watson	30916a2d1d	Replace a comment about RSVP/mrouting with a different but similar comment explaining that some more locking is needed. The routing pieces are done, but there is an interlocking issue between optionally compiled code and mandatory code. Spotted by: kris	2007-03-25 21:49:50 +00:00
Maxim Konovalov	14739780bd	o Use a define for a buffer size. Prodded by: db o Add missed vars for TCPDEBUG in tcp_do_segment(). Prodded by: tinderbox	2007-03-24 22:15:02 +00:00
Andre Oppermann	302ce8d690	Split tcp_input() into its two functional parts: o tcp_input() now handles TCP segment sanity checks and preparations including the INPCB lookup and syncache. o tcp_do_segment() handles all data and ACK processing and is IPv4/v6 agnostic. Change all KASSERT() messages to ("%s: ", __func__). The changes in this commit are primarily of mechanical nature and no functional changes besides the function split are made. Discussed with: rwatson	2007-03-23 20:16:50 +00:00
Andre Oppermann	4dfdffe9e2	Tidy up some code to conform better to surroundings and style(9), 0 = NULL and space/tab.	2007-03-23 19:11:22 +00:00
Andre Oppermann	fc30a25199	Bring SACK option handling in tcp_dooptions() in line with all other options and ajust users accordingly.	2007-03-23 18:33:21 +00:00
Bruce M Simpson	73ec8173eb	Purge two redundant case labels.	2007-03-23 09:43:36 +00:00
Gleb Smirnoff	1daaa65d3f	Remove global list of all llinfo_arp entries and use a callout per instance expiry of the ARP entries. Since we no longer abuse the IPv4 radix head lock, we can now enter arp_rtrequest() with a lock held on an arbitrary rt_entry. Reviewed by: bms	2007-03-22 10:37:53 +00:00
Andre Oppermann	ad3f9ab320	ANSIfy function declarations and remove register keywords for variables. Consistently apply style to all function declarations.	2007-03-21 19:37:55 +00:00
Andre Oppermann	f7608d9e7f	Match up SYSCTL declarations in style.	2007-03-21 19:34:12 +00:00
Andre Oppermann	eec9d82d8e	Subtract optlen in the maximum length check for TSO and finally avoid slightly oversized TSO mbuf chains. Submitted by: kmacy	2007-03-21 19:04:07 +00:00
Andre Oppermann	b10fbdeafa	Tidy up IPFIREWALL_FORWARD sections and comments.	2007-03-21 18:56:03 +00:00
Andre Oppermann	794235b737	Update and clarify comments in first section of tcp_input().	2007-03-21 18:52:58 +00:00
Andre Oppermann	db33b3e6a7	Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and remove old dead T/TCP code.	2007-03-21 18:49:43 +00:00
Andre Oppermann	574b696407	Tidy up tcp_log_in_vain and blackhole.	2007-03-21 18:36:49 +00:00
Andre Oppermann	85c497918c	Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default it doesn't impede normal operation negatively and is only a few lines of code. It's close relatives blackhole and log_in_vain aren't options either.	2007-03-21 18:25:28 +00:00
Andre Oppermann	e406f5a1c9	Remove tcp_minmssoverload DoS detection logic. The problem it tried to protect us from wasn't really there and it only bloats the code. Should the problem surface in the future we can simply resurrect it from cvs history.	2007-03-21 18:05:54 +00:00
Bruce M Simpson	c7547d1aaf	Increase default size of raw IP send and receive buffers to the same as udp_sendspace, to avoid a situation where jumbograms (datagrams > 9KB) are unnecessarily fragmented. A common use case for this is OSPF link-state database synchronization during adjacency bringup on a high speed network with a large MTU. It is not possible to auto-tune this setting until a socket is bound to a given interface, and because the laddr part of the inpcb tuple may be overridden, it makes no sense to do so. Applications may request a larger socket buffer size by using the SO_SENDBUF and SO_RECVBUF socket options. Certain applications such as Quagga ospfd do not probe for interface MTU and therefore do not increase SO_SENDBUF in this use case. XORP is not affected by this problem as it preemptively uses SO_SENDBUF and SO_RECVBUF to account for any possible additional latency in XRL IPC. PR: kern/108375 Requested by: Vladimir Ivanov MFC after: 1 week	2007-03-20 13:15:20 +00:00
Randall Stewart	62c1ff9c48	- window update sacks sent incorrectly after shutdown which caused extra abort from peer. - RTT time calculation was not being done in express sack handling since it refered to an unused variable (rto_pending). Removed variable. - socket buffer high water access macro-ized.	2007-03-20 10:23:11 +00:00
Bruce M Simpson	ec002fee99	Implement reference counting for ifmultiaddr, in_multi, and in6_multi structures. Detect when ifnet instances are detached from the network stack and perform appropriate cleanup to prevent memory leaks. This has been implemented in such a way as to be backwards ABI compatible. Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti() is unable to detect interface removal by design, as it performs searches on structures which are removed with the interface. With this architectural change, the panics FreeBSD users have experienced with carp and pfsync should be resolved. Obtained from: p4 branch bms_netdev Reviewed by: andre Sponsored by: Garance A Drosehn Idea from: NetBSD MFC after: 1 month	2007-03-20 00:36:10 +00:00
Andre Oppermann	6489fe6553	Match up SYSCTL declaration style.	2007-03-19 19:00:51 +00:00
Andre Oppermann	8b8ed7a78e	Match up SYSCTL_INT declarations in style.	2007-03-19 18:42:27 +00:00
Andre Oppermann	4e02375908	Maintain a pointer and offset pair into the socket buffer mbuf chain to avoid traversal of the entire socket buffer for larger offsets on stream sockets. Adjust tcp_output() make use of it. Tested by: gallatin	2007-03-19 18:35:13 +00:00
Randall Stewart	6a27c37636	Adds a hash table to speed local address lookup on a per VRF basis (BSD has only one VRF currently). Hash table is sized to 16 but may need to be adjusted for machines with large numbers of addresses. Reviewed by: gnn	2007-03-19 11:11:16 +00:00
Randall Stewart	132dea7d5a	- errno -> becomes error in sctp_output.c and sctputil.c - SB_CLEAR macro defined and used for sb clearing. - Fix for CMT express_sack_handling did not do proper pseudo-cumack updates. - Get rid of extraneous function that was never used ip_2_ip6_hdr() - Fixed source address selection bug (initialization problem). - Source address selection debug added.	2007-03-19 06:53:02 +00:00
Bruce M Simpson	27f8eaaf03	In IPv4 fast forwarding path, send ICMP unreachable messages for routes which have RTF_REJECT set and a zero expiry timer. PR: kern/109246 MFC after: 10 days Submitted by: Ingo Flaschberger	2007-03-18 23:05:20 +00:00
Andre Oppermann	9daba64ed5	Unbreak IPv6 after consolidation of TCP options insertion. Submitted by: tegge	2007-03-17 11:52:54 +00:00
Kip Macy	9ad2c608c2	Fix the most obvious of the bugs introduced by recent syncache changes - *ip is not initialized in the case of inet6 connection, but ip->ip_len is being changed anyway Now the question is, why does it think an ipv4 connection is an ipv6 connection? xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.	2007-03-17 06:40:09 +00:00
Robert Watson	8d0d6d112f	Remove unused and #if 0'd net.inet.tcp.tcp_rttdflt sysctl.	2007-03-16 13:42:26 +00:00
Andre Oppermann	02a1a64357	Consolidate insertion of TCP options into a segment from within tcp_output() and syncache_respond() into its own generic function tcp_addoptions(). tcp_addoptions() is alignment agnostic and does optimal packing in all cases. In struct tcpopt rename to_requested_s_scale to just to_wscale. Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled." Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005	2007-03-15 15:59:28 +00:00
Randall Stewart	42551e993f	- Sysctl's move to seperate file - moved away from ifn/ifa access to sctp_ifa/sctp_ifn built and managed by the add-ip code. - cleaned up add-ip code to use the iterator - made iterator be a thread, which enables auto-asconf now. - rewrote and cleaned up source address selection (also made it use new structures). - Fixed a couple of memory leaks. - DACK now settable as to how many packets to delay as well as time. - connectx() to latest socket API, new associd arg. - Fixed issue with revoking and loosing potential to send when we inflate the flight size. We now inflate the cwnd too and deflate it later when the revoked chunk is sent or acked. - Got rid of some temp debug code - src addr selection moved to a common file (sctp_output.c) - Support for simple VRF's (we have support for multi-vfr via compile switch that is scrubbed from BSD but we won't need multi-vrf until we first get VRF :-D) - Rest of mib work for address information now done - Limit number of addresses in INIT/INIT-ACK to a #def (30). Reviewed by: gnn	2007-03-15 11:27:14 +00:00
Bruce M Simpson	5c51891ef7	Diff reduction with NetBSD; use IN_LOCAL_GROUP() to check if an address is within the locally scoped multicast range 224.0.0.0/24.	2007-03-15 08:44:22 +00:00
Bruce M Simpson	1b7f038498	Fix IP_SENDSRCADDR semantics. * To use this option with a UDP socket, it must be bound to a local port, and INADDR_ANY, to disallow possible collisions with existing udp inpcbs bound to the same port on other interfaces at send time. * If the socket is bound to INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be rejected as it is ambiguous. * If the socket is bound to an address other than INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be disallowed by in_pcbbind_setup(). Reviewed by: silence on -net Tested with: src/tools/regression/netinet/ipbroadcast MFC after: 4 days	2007-03-08 15:26:54 +00:00
Qing Li	95ad8418dc	This patch is provided to fix a couple of deployment issues observed in the field. In one situation, one end of the TCP connection sends a back-to-back RST packet, with delayed ack, the last_ack_sent variable has not been update yet. When tcp_insecure_rst is turned off, the code treats the RST as invalid because last_ack_sent instead of rcv_nxt is compared against th_seq. Apparently there is some kind of firewall that sits in between the two ends and that RST packet is the only RST packet received. With short lived HTTP connections, the symptom is a large accumulation of connections over a short period of time . The +/-(1) factor is to take care of implementations out there that generate RST packets with these types of sequence numbers. This behavior has also been observed in live environments. Reviewed by: silby, Mike Karels MFC after: 1 week	2007-03-07 23:21:59 +00:00
Bruce M Simpson	44c4d7b2cb	Purge an out-of-date comment.	2007-03-04 16:32:19 +00:00
Bruce M Simpson	a3fd02d88b	Fix undirected broadcast sends for the case where SO_DONTROUTE has also been set at the socket layer, in our somewhat convoluted IPv4 source selection logic in ip_output(). IP_ONESBCAST is actually a special case of SO_DONTROUTE, as 255.255.255.255 must always be delivered on a local link with a TTL of 1. If IP_ONESBCAST has been set at the socket layer, also perform destination interface lookup for point-to-point interfaces based on the destination address of the link; previously it was not possible to use the option with such interfaces; also, the destination/broadcast address fields map to the same field within struct ifnet, which doesn't help matters. One more valid fix going forward for these issues is to treat 255.255.255.255 as a destination in its own right in the forwarding trie. Other implementations do this. It fits with the use of multiple paths, though it then becomes necessary to specify interface preference. This hack will eventually go away when that comes to pass. Reviewed by: andre MFC after: 1 week	2007-03-01 13:29:30 +00:00
Andre Oppermann	6aa5b62315	Prevent TSO mbuf chain from overflowing a few bytes by subtracting the TCP options size before the TSO total length calculation. Bug found by: kmacy	2007-03-01 13:12:09 +00:00
Mohan Srinivasan	4a32dc299f	In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss(). The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.	2007-02-28 20:48:00 +00:00
Bruce M Simpson	85e0793497	Style: Move declaration of subsystem mutex to where other mutexes are in this file, and use macros for dealing with it.	2007-02-28 20:02:24 +00:00
Gleb Smirnoff	8bec3467b1	Add EHOSTDOWN and ENETUNREACH to the list of soft errors, that shouldn't be returned up to the caller. PR: 100172 Submitted by: "Andrew - Supernews" <andrew supernews.net> Reviewed by: rwatson, bms	2007-02-28 12:47:49 +00:00
Gleb Smirnoff	72757d9a53	Toss the code, that handles errors from ip_output(), to make it more readable: - Merge two embedded if() into one. - Introduce switch() block to handle different kinds of errors. Reviewed by: rwatson, bms	2007-02-28 12:41:49 +00:00
Bruce M Simpson	ad3b9f70ed	Add INADDR_ALLRPTS_GROUP define for 224.0.0.22 for future IGMPv3 support. Obtained from: OpenSolaris	2007-02-27 14:45:37 +00:00
Mohan Srinivasan	7c72af8770	Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate potential issues where the peer does not close, potentially leaving thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl fast_finwait2_recycle, which is disabled by default. Reviewed by: gnn, silby.	2007-02-26 22:25:21 +00:00
Bruce M Simpson	410052125e	Unlock a mutex which should be unlocked before returning. MFC after: 1 week	2007-02-25 14:22:03 +00:00
Bruce M Simpson	6be2e366d6	Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel. It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko, if and only if IPv6 support is enabled for loadable modules. Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).	2007-02-24 11:38:47 +00:00
Robert Watson	afdb42748d	Rename two identically named log_in_vain variables: tcp_input.c's static log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to udp_log_in_vain. MFC after: 1 week	2007-02-20 10:20:03 +00:00
Robert Watson	3329b23659	Gratuitous UDP restyling toward style(9) in 7.x.	2007-02-20 10:13:11 +00:00
Robert Watson	03dc38a48b	#ifdef INET6 printing of inpcb IPv6 addresses in DDB. Patch committed with minor adjustments. Submitted by: Florian C. Smeets <flo at kasimir dot com>	2007-02-18 08:57:23 +00:00
Robert Watson	497057eeea	Add "show inpcb", "show tcpcb" DDB commands, which should come in handy for debugging sblock and other network panics.	2007-02-17 21:02:38 +00:00
Robert Watson	8ca5b13f2f	Remove unused inp6_ifindex field from inpcb, as well as unused macro shortcut for it.	2007-02-16 14:09:24 +00:00
Robert Watson	1f9b46facf	Remove unused in6p_ip6_hlim macro shortcut for non-present inp_depend6.inp6_hlim field in the inpcb.	2007-02-16 13:56:06 +00:00
Randall Stewart	f42a358a6f	- Copyright updates (aka 2007) - ZONE get now also take a type cast so it does the cast like mtod does. - New macro SCTP_LIST_EMPTY, which in bsd is just LIST_EMPTY - Removal of const in some of the static hmac functions (not needed) - Store length changes to allow for new fields in auth - Auth code updated to current draft (this should be the RFC version we think). - use uint8_t instead of u_char in LOOPBACK address comparison - Some u_int32_t converted to uint32_t (in crc code) - A bug was found in the mib counts for ordered/unordered count, this was fixed (was referencing a freed mbuf). - SCTP_ASOCLOG_OF_TSNS added (code will probably disappear after my testing completes. It allows us to keep a small log on each assoc of the last 40 TSN's in/out and stream assignment. It is NOT in options and so is only good for private builds. - Some CMT changes in prep for Jana fixing his problem with reneging when CMT is enabled (Concurrent Multipath Transfer = CMT). - Some missing mib stats added. - Correction to number of open assoc's count in mib - Correction to os_bsd.h to get right sha2 macros - Add of special AUTH_04 flags so you can compile the code with the old format (in case the peer does not yet support the latest auth code). - Nonce sum was incorrectly being set in when ecn_nonce was NOT on. - LOR in listen with implicit bind found and fixed. - Moved away from using mbuf's for socket options to using just data pointers. The mbufs were used to harmonize NetBSD code since both Net and Open used this method. We have decided to move away from that and more conform to FreeBSD style (which makes more sense). - Very very nasty bug found in some of my "debug" code. The cookie_how collision case tracking had an endless loop in it if you got a second retransmission of a cookie collision case. This would lock up a CPU .. ugly.. - auth function goes to using size_t instead of int which conforms to socketapi better - Found the nasty bug that happens after 9 days of testing.. you get the data chunk, deliver it and due to the reference to a ch-> that every now and then has been deleted (depending on the postion in the mbuf) you have an invalid ch->ch.flags.. and thus you don't advance the stream sequence number.. so you block the stream permanently. The fix is to make local variables of these guys and set them up before you have any chance of trimming the mbuf. - style fix in sctp_util.h, not sure how this got bad maybe in the last patch? (aka it may not be in the real source). - Found interesting bug when using the extended snd/rcv info where we would get an error on receiving with this. Thats because it was NOT padded to the same size as the snd_rcv info. We increase (add the pad) so the two structs are the same size in sctp_uio.h - In sctp_usrreq.c one of the most common things we did for socket options was to cast the pointer and validate the size. This as been macro-ized to help make the code more readable. - in sctputil.c two things, the socketapi class found a missing flag type (the next msg is a notification) and a missing scope recovery was also fixed. Reviewed by: gnn	2007-02-12 23:24:31 +00:00
Bruce M Simpson	79760c6bdf	Use MAXTTL. Obtained from: NetBSD	2007-02-10 23:15:28 +00:00
Bruce M Simpson	7a90229b61	If the rendezvous point for a group is not specified, do not send IGMPMSG_WHOLEPKT notifications to the userland PIM routing daemon, as an optimization to mitigate the effects of high multicast forwarding load. This is an experimental change, therefore it must be explicitly enabled by setting the sysctl/tunable net.inet.pim.squelch_wholepkt to a non-zero value. The tunable may be set from the loader or from within the kernel environment when loading ip_mroute.ko as a module. Submitted by: edrt <edrt at citiz.net> See also: http://mailman.icsi.berkeley.edu/pipermail/xorp-users/2005-June/000639.html	2007-02-10 14:48:42 +00:00
Bruce M Simpson	0948f0a28f	Build PIM by default as part of the IPv4 multicast forwarding path. Make PIM dynamically loadable by using encap_attach_func(). PIM may now be loaded into a GENERIC kernel. Tested with: ports/net/pimdd && tcpreplay && wireshark Reviewed by: Pavlin Radoslavov	2007-02-10 13:59:13 +00:00
Bruce M Simpson	f2bf119ead	Store the cached route in vifp in the normal send_packet() case. The VIFF_TUNNEL case no longer exists, therefore this field is free to use, and its use eliminates a static data member.	2007-02-08 23:05:08 +00:00
Bruce M Simpson	162c78d481	Nuke the token bucket filter code. Attempting to request rate limiting by the token bucket filter will result in EINVAL being returned. If you want to rate-limit traffic in future, use ALTQ or dummynet; this isn't a general purpose QoS engine. Preserve the now unused fields in struct vif so as to avoid having to recompile netstat(1) and other tools. Reviewed by: Pavlin Radslavov, Bill Fenner	2007-02-08 22:58:01 +00:00
Bruce M Simpson	aab7b273bf	eliminate redundant macro MC_SEND()	2007-02-07 20:36:33 +00:00
Bruce M Simpson	78cb087e34	Remove support for IPIP tunnels in IPv4 multicast forwarding. XORP has never used them; with mrouted, their functionality may be replaced by explicitly configuring gif(4) instances and specifying them with the 'phyint' keyword. Bump __FreeBSD_version to 700030, and update UPDATING. A doc update is forthcoming. Discussed on: net Reviewed by: fenner MFC after: 3 months	2007-02-07 16:04:13 +00:00
Bruce M Simpson	64e740a352	When fast-forwarding is enabled, do not forward directed IPv4 broadcasts to locally attached broadcast networks. Note well: This relies on the layer 2 route cloning behaviour in BSD. PR: 98799 Tested by: Dmitry Sergienko MFC after: 1 week	2007-02-05 00:15:40 +00:00
Alan Cox	055867a06c	Include opt_ipdivert.h so that the message announcing ipfw correctly describes the state of IPDIVERT.	2007-02-03 22:11:53 +00:00
Bruce M Simpson	d256723b8b	In fast forwarding path, defer processing of 169.254.0.0/16 to ip_input(). See RFC 3927 section 2.7.	2007-02-03 06:46:48 +00:00
Bruce M Simpson	f8429ca2e1	In regular forwarding path, reject packets destined for 169.254.0.0/16 link-local addresses. See RFC 3927 section 2.7.	2007-02-03 06:45:51 +00:00
Bruce M Simpson	d055815799	Comply with RFC 3927, by forcing ARP replies which contain a source address within the link-local IPv4 prefix 169.254.0.0/16, to be broadcast at link layer. Reviewed by: fenner MFC after: 2 weeks	2007-02-02 20:31:44 +00:00
Bruce M Simpson	1baaf8347c	Expose smoothed RTT and RTT variance measurements to userland via socket option TCP_INFO. Note that the units used in the original Linux API are in microseconds, so use a 64-bit mantissa to convert FreeBSD's internal measurements from struct tcpcb from ticks.	2007-02-02 18:34:18 +00:00
Gleb Smirnoff	fbfdcf8735	Since rev. 1.94 of netinet/in.c, the netinet layer frees all its multicast memberships, when interface is detached. Thus, when an underlying interface is detached, we do not need to free our multicast memberships. Reviewed by: bms	2007-02-02 09:39:09 +00:00
Andre Oppermann	6741ecf595	Auto sizing TCP socket buffers. Normally the socket buffers are static (either derived from global defaults or set with setsockopt) and do not adapt to real network conditions. Two things happen: a) your socket buffers are too small and you can't reach the full potential of the network between both hosts; b) your socket buffers are too big and you waste a lot of kernel memory for data just sitting around. With automatic TCP send and receive socket buffers we can start with a small buffer and quickly grow it in parallel with the TCP congestion window to match real network conditions. FreeBSD has a default 32K send socket buffer. This supports a maximal transfer rate of only slightly more than 2Mbit/s on a 100ms RTT trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send buffer auto scaling and the default values below it supports 20Mbit/s at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or 1000%. For the receive side it looks slightly better with a default of 64K buffer size. New sysctls are: net.inet.tcp.sendbuf_auto=1 (enabled) net.inet.tcp.sendbuf_inc=8192 (8K, step size) net.inet.tcp.sendbuf_max=262144 (256K, growth limit) net.inet.tcp.recvbuf_auto=1 (enabled) net.inet.tcp.recvbuf_inc=16384 (16K, step size) net.inet.tcp.recvbuf_max=262144 (256K, growth limit) Tested by: many (on HEAD and RELENG_6) Approved by: re MFC after: 1 month	2007-02-01 18:32:13 +00:00
Andre Oppermann	087b55ea59	Change the way the advertized TCP window scaling is computed. Instead of upper-bounding it to the size of the initial socket buffer lower-bound it to the smallest MSS we accept. Ideally we'd use the actual MSS information here but it is not available yet. For socket buffer auto sizing to be effective we need room to grow the receive window. The window scale shift is determined at connection setup and can't be changed afterwards. The previous, original, method effectively just did a power of two roundup of the socket buffer size at connection setup severely limiting the headroom for larger socket buffers. Tested by: many (as part of the socket buffer auto sizing patch) MFC after: 1 month	2007-02-01 17:39:18 +00:00
Bruce M Simpson	1976bc4af7	Import macros IN_LINKLOCAL(), IN_PRIVATE(), IN_LOCAL_GROUP(), IN_ANY_LOCAL(). This is not a functional change. IN_LINKLOCAL() tests if an address falls within the IPv4 link-local prefix. IN_PRIVATE() tests if an address falls within an RFC 1918 private prefix. IN_LOCAL_GROUP() tests if an address falls within the statically assigned link-local multicast scope specified in RFC 2365. IN_ANY_LOCAL() tests for either of IN_LINKLOCAL() or IN_LOCAL_GROUP(). As with the existing macros in the FreeBSD netinet stack, comparisons are performed in host-byte order. See also: RFC 1918, RFC 2365, RFC 3927 Obtained from: NetBSD (dyoung@) MFC after: 2 weeks	2007-01-31 14:34:47 +00:00
Gleb Smirnoff	3cf0d02480	Make it possible that carpdetach() unlocks on return. Then, in carp_clone_destroy() we are on a safe side, we don't need to unlock the cif, that can me already non-existent at this point. Reported by: Anton Yuzhaninov <citrin rambler-co.ru>	2007-01-25 18:03:40 +00:00
Gleb Smirnoff	62dae1e917	Spacing.	2007-01-25 17:58:16 +00:00
Randall Stewart	93164cf98c	- most all includes (#include <>) migrate to the sctp_os_bsd.h file - Finally all splxx() are removed - Count error fixed in mapping array which might cause a wrong cumack generation. - Invariants around panic for case D + printf when no invariants. - one-to-one model race condition fixed by using a pre-formed connection and then completing the work so accept won't happen on a non-formed association. - Some additional paranoia checks in sctp_output. - Locks that were missing in the accept code. Approved by: gnn	2007-01-18 09:58:43 +00:00
Randall Stewart	44b7479ba2	- Macroizes the V6ONLY flag check. - Added a short time wait (not used yet) constant - Corrected the type of the crc32c table (it was unsigned long and really is a uint32_t - Got rid of the user of MHeaders until they are truely needed by lower layers. - Fixed an initialization problem in the readq structure (ordering was off). - Found yet another collision bug when the random number generator returns two numbers on one side (during a collision) that are the same. Also added some tracking of cookies that will go away when we know that we have the last collision bug gone. - Fixed an init bug for book_size_scale, that was causing Early FR code to run when it should not. - Fixed a flight size tracking bug that was associated with Early FR but due to above bug also effected all FR's - Fixed it so Max Burst also will apply to Fast Retransmit. - Fixed a bug in the temporary logging code that allowed a static log array overflow - hashinit_flags is now used. - Two last mcopym's were converted to the macro sctp_m_copym that has always been used by all other places - macro sctp_m_copym was converted to upper case. - We now validate sinfo_flags on input (we did not before). - Fixed a bug that prevented a user from sending data and immediately shuting down with one send operation. - Moved to use hashdestroy instead of free() in our macros. - Fixed an init problem in our timed_wait vtag where we did not fully initialize our time-wait blocks. - Timer stops were re-positioned. - A pcb cleanup method was added, however this probably will not be used in BSD.. unless we make module loadable protocols - I think this fixes the mysterious timer bug.. it was a ordering of locks problem in the way we did timers. It now conforms to the timeout(9) manual (except for the _drain part, we had to do this a different way due to locks). - Fixed error return code so we get either CONNREUSED or CONNRESET depending on where one is in progression - Purged an unused clone macro. - Fixed a read erro code issue where we were NOT getting the proper error when the connection was reset. - Purged an unused clone macro. - Fixed a read erro code issue where we were NOT getting the proper error when the connection was reset. Approved by: gnn	2007-01-15 15:12:10 +00:00
Maxim Konovalov	95ebcabed8	o Increment requests counter right before send out an ARP query actually. Otherwise the code could lead to the spurious EHOSTDOWN errors. PR: kern/107807 Submitted by: Dmitrij Tejblum MFC after: 1 month	2007-01-14 18:44:17 +00:00
Warner Losh	0befead1e0	Marking this as __packed was needed to get the alignment and offset of members right. However, it also said it was aligned(1), which meant that gcc generated really bad code. Mark this as aligned(4). This makes things a little faster on arm (a couple percent), but also saves about 30k on the size of the kernel for arm. I talked about doing this with bde, but didn't check with him before the commit, so I'm hesitant say 'reviewed by: bde'.	2007-01-12 07:23:31 +00:00
Julian Elischer	7e170af886	Remove two lines that somehow snuck back in after testing. ip is now an argument to the function ipfw_log()	2007-01-09 21:03:07 +00:00
Maxim Konovalov	8b5b885047	o One more typo in the comment. PR: kern/107609 Submitted by: Dr. Markus Waldeck	2007-01-06 13:12:24 +00:00
Paolo Pisati	3d2fff0d3d	Prevent adding a rule with a nat action in case IPFIREWALL_NAT was not defined. Reviewed: luigi	2007-01-05 12:15:31 +00:00
Paolo Pisati	61c0e134f5	Wrap ipfw nat support in a new kernel config option named "IPFIREWALL_NAT": this way nat is turned off by default and POLA is preserved. Reviewed by: rwatson	2007-01-03 11:12:54 +00:00
Julian Elischer	3b62120e87	Remove a bunch of dependencies in the IP header being the first thing in the mbuf. First moves toward being able to cope better with having layer 2 (or other encapsulation data) before the IP header in the packet being examined. More commits to come to round out this functionality. This commit should have no practical effect but clears the way for what is coming. Revirewed by: luigi, yar MFC After: 2 weeks	2007-01-02 19:57:31 +00:00
Warner Losh	6796a2d434	Fix typo in comment. Submitted by: remko	2007-01-01 00:35:34 +00:00
Warner Losh	74eb3236c7	Add comment about udp checksums being off in BSD 4.2 compatibility mode. Submitted by: Dr. Markus Waldeck PR: kern/106657	2006-12-31 21:34:53 +00:00
John Baldwin	54e3607de6	Whitespace fix and remove an extra cast.	2006-12-30 17:53:28 +00:00
Paolo Pisati	ff2f6fe80f	Summer of Code 2005: improve libalias - part 2 of 2 With the second (and last) part of my previous Summer of Code work, we get: -ipfw's in kernel nat -redirect_* and LSNAT support General information about nat syntax and some examples are available in the ipfw (8) man page. The redirect and LSNAT syntax are identical to natd, so please refer to natd (8) man page. To enable in kernel nat in rc.conf, two options were added: o firewall_nat_enable: equivalent to natd_enable o firewall_nat_interface: equivalent to natd_interface Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet to continue being checked by the firewall ruleset after being (de)aliased. NOTA BENE: due to some problems with libalias architecture, in kernel nat won't work with TSO enabled nic, thus you have to disable TSO via ifconfig (ifconfig foo0 -tso). Approved by: glebius (mentor)	2006-12-29 21:59:17 +00:00
Randall Stewart	139bc87fda	a) macro-ization of all mbuf and random number access plus timers. This makes the code more portable and able to change out the mbuf or timer system used more easily ;-) b) removal of all use of pkt-hdr's until only the places we need them (before ip_output routines). c) remove a bunch of code not needed due to <b> aka worrying about pkthdr's :-) d) There was one last reorder problem it looks where if a restart occur's and we release and relock (at the point where we setup our alias vtag) we would end up possibly getting the wrong TSN in place. The code that fixed the TSN's just needed to be shifted around BEFORE the release of the lock.. also code that set the state (since this also could contribute). Approved by: gnn	2006-12-29 20:21:42 +00:00
John Baldwin	08651e1f24	Some whitespace nits and remove a few casts.	2006-12-29 14:58:18 +00:00
Paolo Pisati	ccd57eea11	o made in kernel libalias mpsafe o fixed a comment o made in kernel libalias a bit less verbose (disabled automatic logging everytime a new link is added or deleted) Approved by: glebius (mentor)	2006-12-15 12:50:06 +00:00
Randall Stewart	a5d547add3	1) Fixes on a number of different collision case LOR's. 2) Fix all "magic numbers" to be constants. 3) A collision case that would generate two associations to the same peer due to a missing lock is fixed. 4) Added tracking of where timers are stopped. Approved by: gnn	2006-12-14 17:02:55 +00:00
Christian S.J. Peron	826cef3d75	Fix LOR between the syncache and inpcb locks when MAC is present in the kernel. This LOR snuck in with some of the recent syncache changes. To fix this, the inpcb handling was changed: - Hang a MAC label off the syncache object - When the syncache entry is initially created, we pickup the PCB lock is held because we extract information from it while initializing the syncache entry. While we do this, copy the MAC label associated with the PCB and use it for the syncache entry. - When the packet is transmitted, copy the label from the syncache entry to the mbuf so it can be processed by security policies which analyze mbuf labels. This change required that the MAC framework be extended to support the label copy operations from the PCB to the syncache entry, and then from the syncache entry to the mbuf. These functions really should be referencing the syncache structure instead of the label. However, due to some of the complexities associated with exposing this syncache structure we operate directly on it's label pointer. This should be OK since we aren't making any access control decisions within this code directly, we are merely allocating and copying label storage so we can properly initialize mbuf labels for any packets the syncache code might create. This also has a nice side effect of caching. Prior to this change, the PCB would be looked up/locked for each packet transmitted. Now the label is cached at the time the syncache entry is initialized. Submitted by: andre [1] Discussed with: rwatson [1] andre submitted the tcp_syncache.c changes	2006-12-13 06:00:57 +00:00
Bjoern A. Zeeb	7d32aa0cc9	In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument. This is the "+ one more change" missed in the original commit. Noticed by: tinderbox Pointy hat to: me (#1)	2006-12-12 17:44:46 +00:00
Bjoern A. Zeeb	1d54aa3ba9	MFp4: 92972, 98913 + one more change In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument.	2006-12-12 12:17:58 +00:00
Bruce M Simpson	3dbee59bd4	Back out revision 1.264. Fixing the IP accounting issue, if we plan to do so, needs to be better thought out; the 'fix' introduces a hash lookup and a possible kernel panic. Reported by: Mark Tinguely	2006-12-10 13:44:00 +00:00
Robert Watson	ece4c06484	Improve style(9) conformance of igmp.c.	2006-12-04 00:41:48 +00:00
Warner Losh	850adc0cd7	Make sure that carp_header is 36 bytes long	2006-12-01 18:37:41 +00:00
Paolo Pisati	5910c1c1b9	Make libalias.conf parsing a bit smarter. This closes PR kern/106112. While here, add mbuf's #includes i forgot in the previous commit. Approved by: gleb	2006-12-01 16:34:53 +00:00
Paolo Pisati	e876228edc	Remove m_megapullup from ng_nat and put it under libalias. Approved by: gleb	2006-12-01 16:27:11 +00:00
Robert Watson	e3fd5ffdf1	Consistently use #ifdef INET6 rather than mixing and matching with #if defined(INET6). Don't comment the end of short #ifdef blocks. Comment cleanup. Line wrap.	2006-11-30 10:54:54 +00:00
Sam Leffler	21367f630d	Change error codes returned by protocol operations when an inpcb is marked INP_DROPPED or INP_TIMEWAIT: o return ECONNRESET instead of EINVAL for close, disconnect, shutdown, rcvd, rcvoob, and send operations o return ECONNABORTED instead of EINVAL for accept These changes should reduce confusion in applications since EINVAL is normally interpreted to mean an invalid file descriptor. This change does not conflict with POSIX or other standards I checked. The return of EINVAL has always been possible but rare; it's become more common with recent changes to the socket/inpcb handling and with finer-grained locking and preemption. Note: there are other instances of EINVAL for this state that were left unchanged; they should be reviewed. Reviewed by: rwatson, andre, ru MFC after: 1 month	2006-11-22 17:16:54 +00:00
Bjoern A. Zeeb	89e7e7e32a	Add SCTP as a known upper layer protocol over v6. We are not yet aware of the protocol internals but this way SCTP traffic over v6 will not be discarded. Reported by: Peter Lei via rrs Tested by: Peter Lei <peterlei cisco.com>	2006-11-13 19:07:32 +00:00
Randall Stewart	7f34832b95	In a true restart case, the send_lock was not being aquired. This meant that when we cleanup the outbound we may have one in transit to be added with the old sequence number. This is bad since then we loose a message :( Also the report_outbound needed to have the right lock when its called which it did not.. I added the lock with of course a flag since we want to have the lock before we call it in the restart case. This also fixed the FIX ME case where, in the cookie collision case, we mark for retransmit any that were bundled with the cookie that was dropped. This also means changes to the output routine so we can assure getting the COOKIE-ACK sent BEFORE we retransmit the Data. Approved by: gnn	2006-11-11 22:44:12 +00:00
Randall Stewart	6a91f103b6	Turns out we would reset the TSN seq counter during a colliding INIT. This if fine except when we have data outstanding... we basically reset it to the previous value it was.. so then we end up assigning the same TSN to two different data chunks. This patch: 1) Finds a missing lock for when we change the stream numbers during COOKIE and INIT-ACK processing.. we were NOT locking the send_buffer.. which COULD cause problems (found by inspection looking for <2>) 2) Fixes a case during a colliding INIT where we incorrectly reset the sending Sequence thus in some cases duplicately assigning a TSN. 3) Additional enhancments to logging so we can see strm/tsn in the receiver AND new tracking to watch what the sender is doing with TSN and STRM seq's. Approved by: gnn	2006-11-11 15:59:01 +00:00
Randall Stewart	de0e935b29	This patch fixes a LOR that happens during INIT-ACK collision. We were calling select_a_tag() inside sctp_send_initate_ack(). During collision cases we have a stcb and thus a SCTP_LOCK. When we call select_a_tag it (below it) locks the INFO lock. We now 1) pre-select the nonce-tie-tags in sctputil.c during setup of a tcb. 2) In the other case where we have to select tags, we unlock after incr the ref cnt (so assoc won't go away0 and then do the tag selection followed by a relock and decr the refcnt. Approved by: gnn	2006-11-10 13:34:55 +00:00
Randall Stewart	08598d7067	Fixes an issue with handling of stream reset. When a reset comes in we need to calculate the length and therefore the number of listed streams (if any) based on the TLV type. Otherwise if we get a retran we could in theory panic by sending a notification to a user with a incorrect list and thus no memory listing the streams. Found in IOS by devtest :-) Approved by: gnn	2006-11-09 21:01:07 +00:00
Randall Stewart	03b0b02163	-Fixes first of all the getcred on IPv6 and V4. The copy's were incorrect and so was the locking. -A bug was also found that would create a race and panic when an abort arrived on a socket being read from. -Also fix the reader to get MSG_TRUNC when a partial delivery is aborted. -Also addresses a couple of coverity caught error path memory leaks and a couple of other valid complaints Approved by: gnn	2006-11-08 00:21:13 +00:00
Joe Marcus Clarke	1bc3d4c1d1	Fix TFTP NAT support by making sure the appropriate fingerprinting checks are done. Reviewed by: piso	2006-11-07 21:06:48 +00:00
Robert Watson	b96fbb37da	Convert three new suser(9) calls introduced between when the priv(9) patch was prepared and committed to priv(9) calls. Add XXX comments as, in each case, the semantics appear to differ from the TCP/UDP versions of the calls with respect to jail, and because cr_canseecred() is not used to validate the query. Obtained from: TrustedBSD Project	2006-11-06 14:54:06 +00:00
Randall Stewart	f4ad963c9f	This changes tracks down the EEOR->NonEEOR mode failure to wakeup on close of the sender. It basically moves the return (when the asoc has a reader/writer) further down and gets the wakeup and assoc appending (of the PD-API event) moved up before the return. It also moves the flag set right before the return so we can assure only once adding the PD-API events. Approved by: gnn	2006-11-06 14:34:21 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Ruslan Ermilov	9274ba8a1f	Revert previous commit, and instead make the expression in rev. 1.2 match the style of this file. OK'ed by: rrs	2006-11-05 14:36:59 +00:00
Randall Stewart	50cec91936	Tons of fixes to get all the 64bit issues removed. This also moves two 16 bit int's to become 32 bit values so we do not have to use atomic_add_16. Most of the changes are %p, casts and other various nasty's that were in the orignal code base. With this commit my machine will now do a build universe.. however I as yet have not tested on a 64bit machine .. it may not work :-(	2006-11-05 13:25:18 +00:00
Ruslan Ermilov	11acae799a	Fix pointer arithmetic to be 64-bit friendly.	2006-11-04 08:45:50 +00:00
Ruslan Ermilov	e349e6b8a0	Remove bogus casts that Randall for some reason didn't borrow from my supplied patch.	2006-11-04 08:19:01 +00:00
John Birrell	5051417909	Remove a bogus cast in an attempt to fix the tinderbox builds on lots of arches.	2006-11-04 05:39:39 +00:00
Randall Stewart	562a89b562	More 64 bit pointer fun. %p changed in multiple prints the mtod() was also fixed.	2006-11-03 23:04:34 +00:00
Randall Stewart	249820a7d8	Fix two of the 64bit errors on the printfs.	2006-11-03 21:19:54 +00:00
Randall Stewart	cef8ad061a	Somehow I missed this one. The sys/cdef.h was out of order with respect to the FSBID..	2006-11-03 19:48:56 +00:00
Randall Stewart	73932c69b6	Opps... in my fix up of all the $FreeBSD:$-> $FreeBSD$ I inserted a few to the new files.. but I falied to add the #include <sys/cdef.h> Which causes a compile error.. sorry about that... got it now :-) Approved by:gnn	2006-11-03 17:21:53 +00:00
Randall Stewart	f8829a4a40	Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: ** peterlei@cisco.com tuexen@fh-muenster.de ** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0 So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP. I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-) There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too.. If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-) Reviewed by: gnn Approved by: gnn	2006-11-03 15:23:16 +00:00
Oleg Bulyzhin	35da9180dc	- Use non-recursive mutex. MTX_RECURSE is unnecessary since rev. 1.70 - Pay respect to net.isr.direct: use netisr_dispatch() instead of ip_input() Reviewed by: glebius, rwatson - purge_flow_set(): - Do not leak memory while purging queues which are not bound to pipe. - style(9) cleanup MFC after: 2 months	2006-10-29 12:09:24 +00:00
Oleg Bulyzhin	c2df509a1d	- Convert net.inet.ip.dummynet.curr_time net.inet.ip.dummynet.searches net.inet.ip.dummynet.search_steps to SYSCTL_LONG nodes. It will prevent frequent wrap around on 64bit archs. - Implement simple mechanics for dummynet(4) internal time correction. Under certain circumstances (system high load, dummynet lock contention, etc) dummynet's tick counter can be significantly slower than it should be. (I've observed up to 25% difference on one of my production servers). Since this counter used for packet scheduling, it's accuracy is vital for precise bandwidth limitation. Introduce new sysctl nodes: net.inet.ip.dummynet. tick_lost - number of ticks coalesced by taskqueue thread. tick_adjustment - number of time corrections done. tick_diff - adjusted vs non-adjusted tick counter difference tick_delta - last vs 'standard' tick differnece (usec). tick_delta_sum - accumulated (and not corrected yet) time difference (usec). Reviewed by: glebius MFC after: 2 month	2006-10-27 13:05:37 +00:00
Oleg Bulyzhin	b2b05096fd	Use separate thread for servicing dummynet(4). Utilize taskqueue(9) API. Submitted by: glebius MFC after: 2 month	2006-10-27 11:16:58 +00:00
Oleg Bulyzhin	c447b19f6e	style(9) cleanup. MFC after: 2 month	2006-10-27 10:52:32 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Julian Elischer	010b65f54a	revert last change.. premature.. need to wait until if_ethersubr.c uses pfil to get to ipfw.	2006-10-21 00:16:31 +00:00
Julian Elischer	3df668cc38	Move some variables to a more likely place and remove "temporary" stuff that is not needed any more.	2006-10-20 19:32:08 +00:00
Maxim Konovalov	428b67b194	o Do not do args->f_id.addr_type == 6 when there is IS_IP6_FLOW_ID() exactly for that.	2006-10-11 12:14:28 +00:00
Maxim Konovalov	f16ccf6814	o Kill a nit in the comment.	2006-10-11 12:00:53 +00:00
Maxim Konovalov	5f197ce41e	o Extend not very informative ipfw(4) message 'drop session, too many entries' by src:port and dst:port pairs. IPv6 part is non-functional as ``limit'' does not support IPv6 flows. PR: kern/103967 Submitted by: based on Bruce Campbell patch MFC after: 1 month	2006-10-11 11:52:34 +00:00
Ruslan Ermilov	cc81ddd9db	Merge the rest of my changes.	2006-10-11 07:11:56 +00:00
Paolo Pisati	f3d9aab351	Various mdoc and grammar fixes. Approved by: glebius Reviewed by: glebius, ru	2006-10-08 13:53:45 +00:00
Bjoern A. Zeeb	7002145d8e	Set scope on MC address so IPv6 carp advertisement will not get dropped in ip6_output. In case this fails handle the error directly and log it[1]. In addition permit CARP over v6 in ip_fw2. PR: kern/98622 Similar patch by: suz Discussed with: glebius [1] Tested by: Paul.Dekkers surfnet.nl, Philippe.Pegon crc.u-strasbg.fr MFC after: 3 days	2006-10-07 10:19:58 +00:00
Gleb Smirnoff	f7a679b200	Save space on stack moving token ring stuff to its own hack block.	2006-10-04 11:08:14 +00:00
Gleb Smirnoff	9b9a52b496	Style rev. 1.152.	2006-10-04 10:59:21 +00:00
Andre Oppermann	6a7c943c59	Remove stone-aged and irrelevant "#ifndef notdef".	2006-09-29 16:44:45 +00:00
Bruce M Simpson	910e1364b6	Nits. Submitted by: ru	2006-09-29 16:16:41 +00:00
Bruce M Simpson	2d20d32344	Push removal of mrouted down to the rest of the tree.	2006-09-29 15:45:11 +00:00
Maxim Konovalov	acc03ac6bb	o Convert w/spaces to tabs in the previous commit.	2006-09-29 06:46:31 +00:00
Mike Silbersack	d4bdcb16cc	Rather than autoscaling the number of TIME_WAIT sockets to maxsockets / 5, scale it to min(ephemeral port range / 2, maxsockets / 5) so that people with large gobs of memory and/or large maxsockets settings will not exhaust their entire ephemeral port range with sockets in the TIME_WAIT state during periods of heavy load. Those who wish to tweak the size of the TIME_WAIT zone can still do so with net.inet.tcp.maxtcptw. Reviewed by: glebius, ru	2006-09-29 06:24:26 +00:00
Andre Oppermann	2c30ec0a1f	When tcp_output() receives an error upon sending a packet it reverts parts of its internal state to ignore the failed send and try again a bit later. If the error is EPERM the packet got blocked by the local firewall and the revert may cause the session to get stuck and retry indefinitely. This way we treat it like a packet loss and let the retransmit timer and timeouts do their work over time. The correct behavior is to drop a connection that gets an EPERM error. However this _may_ introduce some POLA problems and a two commit approach was chosen. Discussed with: glebius PR: kern/25986 PR: kern/102653	2006-09-28 18:02:46 +00:00
Andre Oppermann	6a2257d911	When doing TSO correctly do the check to prevent a maximum sized IP packet from overflowing.	2006-09-28 13:59:26 +00:00
Bruce M Simpson	050596b4a0	Fix the IPv4 multicast routing detach path. On interface detach whilst the MROUTER is running, the system would panic as described in the PR. The fix in the PR is a good start, however, the other state associated with the multicast forwarding cache has to be freed in order to avoid leaking memory and other possible panics. More care and attention is needed in this area. PR: kern/82882 MFC after: 1 week	2006-09-28 12:21:08 +00:00
Bruce M Simpson	d966841427	The IPv4 code should clean up multicast group state when an interface goes away. Without this change, it leaks in_multi (and often ether_multi state) if many clonable interfaces are created and destroyed in quick succession. The concept of this fix is borrowed from KAME. Detailed information about this behaviour, as well as test cases, are available in the PR. PR: kern/78227 MFC after: 1 week	2006-09-28 10:04:07 +00:00
Paolo Pisati	7c00cc76f0	Compilation.	2006-09-27 02:08:44 +00:00
Paolo Pisati	be4f3cd0d9	Summer of Code 2005: improve libalias - part 1 of 2 With the first part of my previous Summer of Code work, we get: -made libalias modular: -support for 'particular' protocols (like ftp/irc/etcetc) is no more hardcoded inside libalias, but it's available through external modules loadable at runtime -modules are available both in kernel (/boot/kernel/alias_.ko) and user land (/lib/libalias_) -protocols/applications modularized are: cuseeme, ftp, irc, nbt, pptp, skinny and smedia -added logging support for kernel side -cleanup After a buildworld, do a 'mergemaster -i' to install the file libalias.conf in /etc or manually copy it. During startup (and after every HUP signal) user land applications running the new libalias will try to read a file in /etc called libalias.conf: that file contains the list of modules to load. User land applications affected by this commit are ppp and natd: if libalias.conf is present in /etc you won't notice any difference. The only kernel land bit affected by this commit is ng_nat: if you are using ng_nat, and it doesn't correctly handle ftp/irc/etcetc sessions anymore, remember to kldload the correspondent module (i.e. kldload alias_ftp). General information and details about the inner working are available in the libalias man page under the section 'MODULAR ARCHITECTURE (AND ipfw(4) SUPPORT)'. NOTA BENE: this commit affects _ONLY_ libalias, ipfw in-kernel nat support will be part of the next libalias-related commit. Approved by: glebius Reviewed by: glebius, ru	2006-09-26 23:26:53 +00:00
John-Mark Gurney	e16fa5ca55	fix calculating to_tsecr... This prevents the rtt calculations from going all wonky...	2006-09-26 01:21:46 +00:00
Bruce M Simpson	13c8384424	Fix an incompatibility between CARP and IPv4 multicast routing, whereby the VRRPv2 advertisements will originate from the wrong source address. This only affects kernels compiled with MROUTING and after the MRT_INIT ioctl() has been issued. Set imo_multicast_vif in carp's softc to the invalid value -1 after it is zeroed by softc allocation, to stop the ip_output() path looking up the incorrect source address thinking a vif is set. PR: kern/100532 Submitted by: Bohus Plucinsky MFC after: 1 week	2006-09-25 11:53:54 +00:00
Bruce M Simpson	e2fd806b36	Spleling Submitted by: pjd	2006-09-25 11:48:07 +00:00
Bruce M Simpson	07ea6709ea	Account for output IP datagrams on the ifaddr where they originated from, not the first ifaddr on the ifp. This is similar to what NetBSD does. PR: kern/72936 Submitted by: alfred Reviewed by: andre	2006-09-25 10:11:16 +00:00
John-Mark Gurney	4dc630cdd2	if min is greater than max, prefer max over min... I managed to get a retransmit timer that was going to take 19 days to trigger... Reviewed by: silby	2006-09-25 07:22:39 +00:00
John-Mark Gurney	402865f637	now that we don't automagicly increase the MTU of host routes, when we copy the loopback interface, copy it's mtu also.. This means that we again have large mtu support for local ip addresses...	2006-09-23 19:24:10 +00:00
Bruce M Simpson	f1edc3bde5	Always set the IP version in the TCP input path, to preserve the header field for possible later IPSEC SPD lookup, even when the kernel is built without 'options INET6'. PR: kern/57760 MFC after: 1 week Submitted by: Joachim Schueth	2006-09-23 16:26:31 +00:00
Andre Oppermann	7ff0b850a6	Make tcp_usr_send() free the passed mbufs on error in all cases as the comment to it claims. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-17 13:39:35 +00:00
John Hay	724e825a16	Handle a list of IPv6 src and dst addresses correctly, eg. ipfw add allow ip6 from any to 2000::/16,2002::/16 PR: 102422 (part 3) Submitted by: Andrey V. Elsukov <bu7cher at yandex dot ru> MFC after: 5 days	2006-09-16 10:27:05 +00:00
Andre Oppermann	31ecb34a4e	When doing TSO subtract hdrlen from TCP_MAXWIN to prevent ip->ip_len from wrapping when we generate a maximally sized packet for later segmentation. Noticed by: gallatin Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-15 16:08:09 +00:00
Andrey A. Chernov	239e71c612	Add missing #ifdef INET6 (can't be compiled)	2006-09-14 10:22:35 +00:00
Andre Oppermann	67d828b162	Remove unessary includes and follow common ordering style.	2006-09-13 13:21:17 +00:00
Andre Oppermann	bf6d304ab2	Rewrite of TCP syncookies to remove locking requirements and to enhance functionality: - Remove a rwlock aquisition/release per generated syncookie. Locking is now integrated with the bucket row locking of syncache itself and syncookies no longer add any additional lock overhead. - Syncookie secrets are different for and stored per syncache buck row. Secrets expire after 16 seconds and are reseeded on-demand. - The computational overhead for syncookie generation and verification is one MD5 hash computation as before. - Syncache can be turned off and run with syncookies only by setting the sysctl net.inet.tcp.syncookies_only=1. This implementation extends the orginal idea and first implementation of FreeBSD by using not only the initial sequence number field to store information but also the timestamp field if present. This way we can keep track of the entire state we need to know to recreate the session in its original form. Almost all TCP speakers implement RFC1323 timestamps these days. For those that do not we still have to live with the known shortcomings of the ISN only SYN cookies. The use of the timestamp field causes the timestamps to be randomized if syncookies are enabled. The idea of SYN cookies is to encode and include all necessary information about the connection setup state within the SYN-ACK we send back and thus to get along without keeping any local state until the ACK to the SYN-ACK arrives (if ever). Everything we need to know should be available from the information we encoded in the SYN-ACK. A detailed description of the inner working of the syncookies mechanism is included in the comments in tcp_syncache.c. Reviewed by: silby (slightly earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-13 13:08:27 +00:00
Christian S.J. Peron	d94f2a68f8	Introduce a new entry point, mac_create_mbuf_from_firewall. This entry point exists to allow the mandatory access control policy to properly initialize mbufs generated by the firewall. An example where this might happen is keep alive packets, or ICMP error packets in response to other packets. This takes care of kernel panics associated with un-initialize mbuf labels when the firewall generates packets. [1] I modified this patch from it's original version, the initial patch introduced a number of entry points which were programmatically equivalent. So I introduced only one. Instead, we should leverage mac_create_mbuf_netlayer() which is used for similar situations, an example being icmp_error() This will minimize the impact associated with the MFC Submitted by: mlaier [1] MFC after: 1 week This is a RELENG_6 candidate	2006-09-12 04:25:13 +00:00
Andre Oppermann	384a05bfd0	Fix a NULL pointer dereference of ro->ro_rt->rt_flags by checking for the validity of ro->ro_rt first. This prevents crashing on any non-normally routed IP packet. Coverity CID: 162 (incorrectly, it was re-introduced by previous commit)	2006-09-11 19:56:10 +00:00
John-Mark Gurney	3ae2ad088e	make use of the host route's mtu for processing. This means we can now support a network w/ split mtu's by assigning each host route the correct mtu. an aspiring programmer could write a daemon to probe hosts and find out if they support a larger mtu.	2006-09-10 17:49:09 +00:00
Gleb Smirnoff	3e630ef9a9	Add a sysctl net.inet.tcp.nolocaltimewait that allows to suppress creating a compress TIME WAIT states, if both connection endpoints are local. Default is off.	2006-09-08 13:09:15 +00:00
Ruslan Ermilov	751dea2935	Back when we had T/TCP support, we used to apply different timeouts for TCP and T/TCP connections in the TIME_WAIT state, and we had two separate timed wait queues for them. Now that is has gone, the timeout is always 2*MSL again, and there is no reason to keep two queues (the first was unused anyway!). Also, reimplement the remaining queue using a TAILQ (it was technically impossible before, with two queues).	2006-09-07 13:06:00 +00:00
Andre Oppermann	b3c0f300fb	Second step of TSO (TCP segmentation offload) support in our network stack. TSO is only used if we are in a pure bulk sending state. The presence of TCP-MD5, SACK retransmits, SACK advertizements, IPSEC and IP options prevent using TSO. With TSO the TCP header is the same (except for the sequence number) for all generated packets. This makes it impossible to transmit any options which vary per generated segment or packet. The length of TSO bursts is limited to TCP_MAXWIN. The sysctl net.inet.tcp.tso globally controls the use of TSO and is enabled. TSO enabled sends originating from tcp_output() have the CSUM_TCP and CSUM_TSO flags set, m_pkthdr.csum_data filled with the header pseudo-checksum and m_pkthdr.tso_segsz set to the segment size (net payload size, not counting IP+TCP headers or TCP options). IPv6 currently lacks a pseudo-header checksum function and thus doesn't support TSO yet. Tested by: Jack Vogel <jfvogel-at-gmail.com> Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-07 12:53:01 +00:00
Ruslan Ermilov	3c89486cc7	Remove a microoptimization for i386 that was a micropessimization for amd64.	2006-09-07 09:49:08 +00:00
Andre Oppermann	233dcce118	First step of TSO (TCP segmentation offload) support in our network stack. o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6 o add CSUM_TSO flag to mbuf pkthdr csum_flags field o add tso_segsz field to mbuf pkthdr o enhance ip_output() packet length check to allow for large TSO packets o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities o adjust all callers of tcp_maxmtu[46]() accordingly Discussed on: -current, -net Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-06 21:51:59 +00:00
Andre Oppermann	6fbfd5825f	Check inp_flags instead of inp_vflag for INP_ONESBCAST flag. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-06 19:04:36 +00:00
Andre Oppermann	773725a255	Fix the socket option IP_ONESBCAST by giving it its own case in ip_output() and skip over the normal IP processing. Add a supporting function ifa_ifwithbroadaddr() to verify and validate the supplied subnet broadcast address. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-06 17:12:10 +00:00
Gleb Smirnoff	2c857a9be9	o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremely bad under high load. For example with 40k sockets and 25k tcptw entries, connect() syscall can run for seconds. Debugging showed that it iterates the cycle millions times and purges thousands of tcptw entries at a time. Besides practical unusability this change is architecturally wrong. First, in_pcblookup_local() is used in connect() and bind() syscalls. No stale entries purging shouldn't be done here. Second, it is a layering violation. o Return back the tcptw purging cycle to tcp_timer_2msl_tw(), that was removed in rev. 1.78 by rwatson. The commit log of this revision tells nothing about the reason cycle was removed. Now we need this cycle, since major cleaner of stale tcptw structures is removed. o Disable probably necessary, but now unused tcp_twrecycleable() function. Reviewed by: ru	2006-09-06 13:56:35 +00:00
Gleb Smirnoff	c3e07bf82a	Finally fix rev. 1.256 Pointy hat to: glebius	2006-09-05 14:00:59 +00:00
Gleb Smirnoff	23ebab416c	Remove extra parenthesis in last commit. Nitpicked by: ru	2006-09-05 12:22:54 +00:00
Gleb Smirnoff	1f1f90c3a7	- Make net.inet.tcp.maxtcptw modifiable at run time. - If net.inet.tcp.maxtcptw was ever set explicitly, do not change it if kern.ipc.maxsockets is changed.	2006-09-05 12:08:47 +00:00
Thomas Quinot	d438d81581	Fix typo in comment.	2006-09-04 08:32:17 +00:00
John Hay	1c31b456b9	Recognise IPv6 PIM packets. MFC after: 1 week	2006-08-31 16:56:45 +00:00
Mohan Srinivasan	2374501ca4	Fix for a bug that causes the computation of "len" in tcp_output() to get messed up, resulting in an inconsistency between the TCP state and so_snd.	2006-08-26 17:53:19 +00:00
Julian Elischer	afad78e259	comply with style police Submitted by: ru MFC after: 1 month	2006-08-18 22:36:05 +00:00
Julian Elischer	c487be961a	Allow ipfw to forward to a destination that is specified by a table. for example: fwd tablearg ip from any to table(1) where table 1 has entries of the form: 1.1.1.0/24 10.2.3.4 208.23.2.0/24 router2 This allows trivial implementation of a secondary routing table implemented in the firewall layer. I expect more work (under discussion with Glebius) to follow this to clean up some of the messy parts of ipfw related to tables. Reviewed by: Glebius MFC after: 1 month	2006-08-17 22:49:50 +00:00
Julian Elischer	b7522c27d2	Remove the IPFIREWALL_FORWARD_EXTENDED option and make it on by default as it always was in older versions of FreeBSD. This option is pointless as it is needed in just about every interesting usage of forward that I have ever seen. It doesn't make the system any safer and just wastes huge amounts of develper time when the system doesn't behave as expected when code is moved from 4.x to 6.x It doesn't make the system any safer and just wastes huge amounts of develper time when the system doesn't behave as expected when code is moved from 4.x to 6.x or 7.x Reviewed by: glebius MFC after: 1 week	2006-08-17 00:37:03 +00:00
Mohan Srinivasan	464469c713	Fixes an edge case bug in timewait handling where ticks rolling over causing the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry). Reviewed by: silby	2006-08-11 21:15:23 +00:00
Brooks Davis	43bc7a9c62	With exception of the if_name() macro, all definitions in net_osdep.h were unused or already in if_var.h so add if_name() to if_var.h and remove net_osdep.h along with all references to it. Longer term we may want to kill off if_name() entierly since all modern BSDs have if_xname variables rendering it unnecessicary.	2006-08-04 21:27:40 +00:00
Oleg Bulyzhin	0e0b1bb57a	Remove useless NULL pointer check: we are using M_WAITOK flag for memory allocation. Submitted by: Andrey Elsukov <bu7cher at yandex dot ru> Approved by: glebius (mentor) MFC after: 1 week	2006-08-04 10:50:51 +00:00
Robert Watson	e850475248	Move soisdisconnected() in tcp_discardcb() to one of its calling contexts, tcp_twstart(), but not to the other, tcp_detach(), as the socket is already being torn down and therefore there are no listeners. This avoids a panic if kqueue state is registered on the socket at close(), and eliminates to XXX comments. There is one case remaining in which tcp_discardcb() reaches up to the socket layer as part of the TCP host cache, which would be good to avoid. Reported by: Goran Gajic <ggajic at afrodita dot rcub dot bg dot ac dot yu>	2006-08-02 16:18:05 +00:00
Oleg Bulyzhin	9b1858ca78	Do not leak memory while flushing rules. Noticed by: yar Approved by: glebius (mentor) MFC after: 1 week	2006-08-02 14:58:51 +00:00
Robert Watson	a152f8a361	Change semantics of socket close and detach. Add a new protocol switch function, pru_close, to notify protocols that the file descriptor or other consumer of a socket is closing the socket. pru_abort is now a notification of close also, and no longer detaches. pru_detach is no longer used to notify of close, and will be called during socket tear-down by sofree() when all references to a socket evaporate after an earlier call to abort or close the socket. This means detach is now an unconditional teardown of a socket, whereas previously sockets could persist after detach of the protocol retained a reference. This faciliates sharing mutexes between layers of the network stack as the mutex is required during the checking and removal of references at the head of sofree(). With this change, pru_detach can now assume that the mutex will no longer be required by the socket layer after completion, whereas before this was not necessarily true. Reviewed by: gnn	2006-07-21 17:11:15 +00:00
Stephan Uphoff	d915b28015	Fix race conditions on enumerating pcb lists by moving the initialization ( and where appropriate the destruction) of the pcb mutex to the init/finit functions of the pcb zones. This allows locking of the pcb entries and race condition free comparison of the generation count. Rearrange locking a bit to avoid extra locking operation to update the generation count in in_pcballoc(). (in_pcballoc now returns the pcb locked) I am planning to convert pcb list handling from a type safe to a reference count model soon. ( As this allows really freeing the PCBs) Reviewed by: rwatson@, mohans@ MFC after: 1 week	2006-07-18 22:34:27 +00:00
Sam Leffler	6b7330e2d4	Revise network interface cloning to take an optional opaque parameter that can specify configuration parameters: o rev cloner api's to add optional parameter block o add SIOCCREATE2 that accepts parameter data o rev vlan support to use new api (maintain old code) Reviewed by: arch@	2006-07-09 06:04:01 +00:00
Max Laier	05206588f2	Make in-kernel multicast protocols for pfsync and carp work after enabling dynamic resizing of multicast membership array. Reported and testing by: Maxim Konovalov, Scott Ullrich Reminded by: thompsa MFC after: 2 weeks	2006-07-08 00:01:01 +00:00
Robert Watson	be54a5eeb3	Remove unneeded mac.h include. MFC after: 3 days	2006-07-06 13:25:01 +00:00
Oleg Bulyzhin	6372145725	Complete timebase (time_second -> time_uptime) conversion. PR: kern/94249 Reviewed by: andre (few months ago) Approved by: glebius (mentor)	2006-07-05 23:37:21 +00:00
Maxim Konovalov	764a094c3f	o Kill BUGS section as it is not valid since rev. 1.4 alias_pptp.c. Spotted by: ru.unix.bsd activists MFC after: 1 week	2006-07-04 20:39:38 +00:00
Yaroslav Tykhiy	4b97d7affd	There is a consensus that ifaddr.ifa_addr should never be NULL, except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs. Suggested by: glebius, jhb, ru	2006-06-29 19:22:05 +00:00
Yaroslav Tykhiy	ad67537233	Use TAILQ_FOREACH consistently.	2006-06-29 17:09:47 +00:00
Gleb Smirnoff	4d09f5a030	Fix URL to Bellovin's paper. Submitted by: Anton Yuzhaninov <citrin rambler-co.ru>	2006-06-29 13:38:36 +00:00
Bjoern A. Zeeb	333ad3bc40	Eliminate the offset argument from send_reject. It's not been used since FreeBSD-SA-06:04.ipfw. Adopt send_reject6 to what had been done for legacy IP: no longer send or permit sending rejects for any but the first fragment. Discussed with: oleg, csjp (some weeks ago)	2006-06-29 11:17:16 +00:00
Bjoern A. Zeeb	421d8aa603	Use INPLOOKUP_WILDCARD instead of just 1 more consistently. OKed by: rwatson (some weeks ago)	2006-06-29 10:49:49 +00:00
Pawel Jakub Dawidek	835d4b8924	- Use suser_cred(9) instead of directly checking cr_uid. - Change the order of conditions to first verify that we actually need to check for privileges and then eventually check them. Reviewed by: rwatson	2006-06-27 11:35:53 +00:00
Andre Oppermann	cc477a6347	In syncache_respond() do not reply with a MSS that is larger than what the peer announced to us but make it at least tcp_minmss in size. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-26 17:54:53 +00:00
Andre Oppermann	8bfb19180d	Some cleanups and janitorial work to tcp_syncache: o don't assign remote/local host/port information manually between provided struct in_conninfo and struct syncache, bcopy() it instead o rename sc_tsrecent to sc_tsreflect in struct syncache to better capture the purpose of this field o rename sc_request_r_scale to sc_requested_r_scale for ditto reasons o fix IPSEC error case printf's to report correct function name o in syncache_socket() only transpose enhanced tcp options parameters to struct tcpcb when the inpcb doesn't has TF_NOOPT set o in syncache_respond() reorder stack variables o in syncache_respond() remove bogus KASSERT() No functional changes. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-26 16:14:19 +00:00
Andre Oppermann	f72167f4d1	Some cleanups and janitorial work to tcp_dooptions(): o redefine the parameter 'is_syn' to 'flags', add TO_SYN flag and adjust its usage accordingly o update the comments to the tcp_dooptions() invocation in tcp_input():after_listen to reflect reality o move the logic checking the echoed timestamp out of tcp_dooptions() to the only place that uses it next to the invocation described in the previous item o adjust parsing of TCPOPT_SACK_PERMITTED to use the same style as the others o add comments in to struct tcpopt.to_flags #defines No functional changes. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-26 15:35:25 +00:00
Andre Oppermann	dfabcc1d29	Reverse the source/destination parameters to in[6]_pcblookup_hash() in syncache_respond() for the #ifdef MAC case. Submitted by: Tai-hwa Liang <avatar-at-mmlab.cse.yzu.edu.tw>	2006-06-26 09:43:55 +00:00
Robert Watson	b4470c1639	In tcp6_usr_attach(), return immediately if SS_ISDISCONNECTED, to avoid dereferencing an uninitialized inp variable. Submitted by: Michiel Boland <michiel at boland dot org> MFC after: 1 month	2006-06-26 09:38:08 +00:00
Andre Oppermann	a846263567	Decrement the global syncache counter in syncache_expand() when the entry is removed from the bucket. This fixes the syncache statistics.	2006-06-25 11:11:33 +00:00
Andre Oppermann	649ac0ce5f	Move the syncookie MD5 context from globals to the stack to make it MP safe.	2006-06-22 15:07:45 +00:00
Hajimu UMEMOTO	a0a59ae4af	- Pullup even when the extention header is unknown, to prevent infinite loop with net.inet6.ip6.fw.deny_unknown_exthdrs=0. - Teach ipv6 and ipencap as they appear in an IPv4/IPv6 over IPv6 tunnel. - Test the next extention header even when the routing header type is unknown with net.inet6.ip6.fw.deny_unknown_exthdrs=0. Found by: xcast-fan-club MFC after: 1 week	2006-06-22 13:22:54 +00:00
Andre Oppermann	c9f7b0ad5b	Allocate a zero'ed syncache hashtable. mtx_init() tests the supplied memory location for already existing/initialized mutexes. With random data in the memory location this fails (ie. after a soft reboot). Reported by: brueffer, YAMAMOTO Shigeru Submitted by: YAMAMOTO Shigeru <shigeru-at-iij.ad.jp>	2006-06-20 08:11:30 +00:00
David Malone	5e1aa27995	When we receive an out-of-window SYN for an "ESTABLISHED" connection, ACK the SYN as required by RFC793, rather than ignoring it. NetBSD have had a similar change since 1999. PR: 93236 Submitted by: Grant Edwards <grante@visi.com> MFC after: 1 month	2006-06-19 12:33:52 +00:00
Andre Oppermann	6593a94979	Remove T/TCP RFC1644 Connection Count comparison macros. They are no longer used and needed. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-18 14:24:12 +00:00
Andre Oppermann	2f1a4ccfc1	Do not access syncache entry before it was allocated for the TF_NOOPT case in syncache_add(). Found by: Coverity Prevent CID: 1473	2006-06-18 13:03:42 +00:00
Andre Oppermann	8411d000a1	Move all syncache related structures to tcp_syncache.c. They are only used there. This unbreaks userland programs that include tcp_var.h. Discussed with: rwatson	2006-06-18 12:26:11 +00:00
Andre Oppermann	bdfbf1e203	Remove double lock acquisition in syncookie_lookup() which came from last minute conversions to macros. Pointy hat to: andre	2006-06-18 11:48:03 +00:00
Andre Oppermann	ee2e4c1d4e	Fix the !INET6 compile. Reported by: alc	2006-06-17 18:42:07 +00:00
Andre Oppermann	93f0d0c5bf	Rearrange fields in struct syncache and syncache_head to make them more cache line friendly. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-17 17:57:36 +00:00
Andre Oppermann	0c529372f0	ANSIfy and tidy up comments. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-17 17:49:11 +00:00
Andre Oppermann	351630c40d	Add locking to TCP syncache and drop the global tcpinfo lock as early as possible for the syncache_add() case. The syncache timer no longer aquires the tcpinfo lock and timeout/retransmit runs can happen in parallel with bucket granularity. On a P4 the additional locks cause a slight degression of 0.7% in tcp connections per second. When IP and TCP input are deserialized and can run in parallel this little overhead can be neglected. The syncookie handling still leaves room for improvement and its random salts may be moved to the syncache bucket head structures to remove the second lock operation currently required for it. However this would be a more involved change from the way syncookies work at the moment. Reviewed by: rwatson Tested by: rwatson, ps (earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005	2006-06-17 17:32:38 +00:00
Oleg Bulyzhin	254c472561	Add support of 'tablearg' feature for: - 'tag' & 'untag' action parameters. - 'tagged' & 'limit' rule options. Rule examples: pipe 1 tag tablearg ip from table(1) to any allow ip from any to table(2) tagged tablearg allow tcp from table(3) to any 25 setup limit src-addr tablearg sbin/ipfw/ipfw2.c: 1) new macros GET_UINT_ARG - support of 'tablearg' keyword, argument range checking. PRINT_UINT_ARG - support of 'tablearg' keyword. 2) strtoport(): do not silently truncate/accept invalid port list expressions like: '1,2-abc' or '1,2-3-4' or '1,2-3x4'. style(9) cleanup. Approved by: glebius (mentor) MFC after: 1 month	2006-06-15 09:39:22 +00:00
Oleg Bulyzhin	58a0fab73f	install_state(): style(9) cleanup Approved by: glebius (mentor) MFC after: 1 month	2006-06-15 08:54:29 +00:00
Andrew Thompson	5feebeeb53	Enable proxy ARP answers on any of the bridged interfaces if proxy record belongs to another interface within the bridge group. PR: kern/94408 Submitted by: Eygene A. Ryabinkin MFC after: 1 month	2006-06-09 00:33:30 +00:00
Oleg Bulyzhin	458009ae93	install_state() should properly initialize 'addr_type' field of newly created flows for O_LIMIT rules. Otherwise 'ipfw -d show' is unable to display PARENT rules properly. (This bug was exposed by ipfw2.c rev.1.90) Approved by: glebius (mentor) MFC after: 2 weeks	2006-06-08 11:27:45 +00:00
Oleg Bulyzhin	d2dc1907e8	Fix following rules: pipe X (tag\|altq) Y ... Approved by: glebius (mentor) MFC after: 2 weeks	2006-06-08 11:13:23 +00:00
Robert Watson	f2de87fec4	Push acquisition of pcbinfo lock out of tcp_usr_attach() into tcp_attach() after the call to soreserve(), as it doesn't require the global lock. Rearrange inpcb locking here also. MFC after: 1 month	2006-06-04 09:31:34 +00:00
Robert Watson	d8ab0ec661	When entering a timer on a tcpcb, don't continue processing if it has been dropped. This prevents a bug introduced during the socket/pcb refcounting work from occuring, in which occasionally the retransmit timer may fire after a connection has been reset, resulting in the resulting R\|A TCP packet having a source port of 0, as the port reservation has been released. While here, fixing up some RUNLOCK->WUNLOCK bugs. MFC after: 1 month	2006-06-03 19:37:08 +00:00
Robert Watson	f24618aaf0	Acquire udbinfo lock after call to soreserve() rather than before, as it is not required. This simplifies error-handling, and reduces the time that this lock is held. MFC after: 1 month	2006-06-03 19:29:26 +00:00
Christian S.J. Peron	16d878cc99	Fix the following bpf(4) race condition which can result in a panic: (1) bpf peer attaches to interface netif0 (2) Packet is received by netif0 (3) ifp->if_bpf pointer is checked and handed off to bpf (4) bpf peer detaches from netif0 resulting in ifp->if_bpf being initialized to NULL. (5) ifp->if_bpf is dereferenced by bpf machinery (6) Kaboom This race condition likely explains the various different kernel panics reported around sending SIGINT to tcpdump or dhclient processes. But really this race can result in kernel panics anywhere you have frequent bpf attach and detach operations with high packet per second load. Summary of changes: - Remove the bpf interface's "driverp" member - When we attach bpf interfaces, we now set the ifp->if_bpf member to the bpf interface structure. Once this is done, ifp->if_bpf should never be NULL. [1] - Introduce bpf_peers_present function, an inline operation which will do a lockless read bpf peer list associated with the interface. It should be noted that the bpf code will pickup the bpf_interface lock before adding or removing bpf peers. This should serialize the access to the bpf descriptor list, removing the race. - Expose the bpf_if structure in bpf.h so that the bpf_peers_present function can use it. This also removes the struct bpf_if; hack that was there. - Adjust all consumers of the raw if_bpf structure to use bpf_peers_present Now what happens is: (1) Packet is received by netif0 (2) Check to see if bpf descriptor list is empty (3) Pickup the bpf interface lock (4) Hand packet off to process From the attach/detach side: (1) Pickup the bpf interface lock (2) Add/remove from bpf descriptor list Now that we are storing the bpf interface structure with the ifnet, there is is no need to walk the bpf interface list to locate the correct bpf interface. We now simply look up the interface, and initialize the pointer. This has a nice side effect of changing a bpf interface attach operation from O(N) (where N is the number of bpf interfaces), to O(1). [1] From now on, we can no longer check ifp->if_bpf to tell us whether or not we have any bpf peers that might be interested in receiving packets. In collaboration with: sam@ MFC after: 1 month	2006-06-02 19:59:33 +00:00
Robert Watson	ad3a630f7e	Minor restyling and cleanup around ipport_tick(). MFC after: 1 month	2006-06-02 08:18:27 +00:00
Oleg Bulyzhin	6a7d5cb645	Implement internal (i.e. inside kernel) packet tagging using mbuf_tags(9). Since tags are kept while packet resides in kernelspace, it's possible to use other kernel facilities (like netgraph nodes) for altering those tags. Submitted by: Andrey Elsukov <bu7cher at yandex dot ru> Submitted by: Vadim Goncharov <vadimnuclight at tpu dot ru> Approved by: glebius (mentor) Idea from: OpenBSD PF MFC after: 1 month	2006-05-24 13:09:55 +00:00
Maxim Konovalov	d45e4f9945	o In udp\|rip_disconnect() acquire a socket lock before the socket state modification. To prevent races do that while holding inpcb lock. Reviewed by: rwatson	2006-05-21 19:28:46 +00:00
Maxim Konovalov	635354c446	o Add missed error check: in ip_ctloutput() sooptcopyin() returns a result but we never examine it. Reviewed by: rwatson MFC after: 2 weeks	2006-05-21 17:52:08 +00:00
Bruce M Simpson	8d7d85149e	Initialize the new members of struct ip_moptions as a defensive programming measure. Note that whilst these members are not used by the ip_output() path, we are passing an instance of struct ip_moptions here which is declared on the stack (which could be considered a bad thing). ip_output() does not consume struct ip_moptions, but in case it does in future, declare an in_multi vector on the stack too to behave more like ip_findmoptions() does.	2006-05-18 19:51:08 +00:00
Gleb Smirnoff	e5f88c4492	Since m_pullup() can return a new mbuf, change gre_input2() to return mbuf back to gre_input(). If the former returns mbuf back to the latter, then pass it to raw_input(). Coverity ID: 829	2006-05-16 11:15:22 +00:00
Gleb Smirnoff	ffb761f624	- Backout one line from 1.78. The tp can be freed by tcp_drop(). - Style next line. Coverity ID: 912	2006-05-16 10:51:26 +00:00
Maxim Konovalov	eb16472f74	o In rip_disconnect() do not call rip_abort(), just mark a socket as not connected. In soclose() case rip_detach() will kill inpcb for us later. It makes rawconnect regression test do not panic a system. Reviewed by: rwatson X-MFC after: with all 1th April inpcb changes	2006-05-15 09:28:57 +00:00
Max Laier	0e7185f6e7	Use only lower 64bit of src/dest (and src/dest port) for hashing of IPv6 connections and get rid of the flow_id as it is not guaranteed to be stable some (most?) current implementations seem to just zero it out. PR: kern/88664 Reported by: jylefort Submitted by: Joost Bekkers (w/ changes) Tested by "regisr" <regisrApoboxDcom>	2006-05-14 23:42:24 +00:00
Bruce M Simpson	3548bfc964	Fix a long-standing limitation in IPv4 multicast group membership. By making the imo_membership array a dynamically allocated vector, this minimizes disruption to existing IPv4 multicast code. This change breaks the ABI for the kernel module ip_mroute.ko, and may cause a small amount of churn for folks working on the IGMPv3 merge. Previously, sockets were subject to a compile-time limitation on the number of IPv4 group memberships, which was hard-coded to 20. The imo_membership relationship, however, is 1:1 with regards to a tuple of multicast group address and interface address. Users who ran routing protocols such as OSPF ran into this limitation on machines with a large system interface tree.	2006-05-14 14:22:49 +00:00
Max Laier	656faadcb8	Remove ip6fw. Since ipfw has full functional IPv6 support now and - in contrast to ip6fw - is properly lockes, it is time to retire ip6fw.	2006-05-12 20:39:23 +00:00
Max Laier	e93187482d	Reintroduce net.inet6.ip6.fw.enable sysctl to dis/enable the ipv6 processing seperately. Also use pfil hook/unhook instead of keeping the check functions in pfil just to return there based on the sysctl. While here fix some whitespace on a nearby SYSCTL_ macro.	2006-05-12 04:41:27 +00:00
Max Laier	432288dcb6	Don't claim "(+ipv6)" if we didn't build with INET6.	2006-05-11 15:22:38 +00:00
Robert Watson	59b8854eee	Modify UDP to use sosend_dgram() instead of sosend(). This allows for signicantly optimized UDP socket I/O when using a single UDP socket from many threads or processes that share it, by avoiding significant locking and other overhead in the general sosend() path that isn't necessary for simple datagram sockets. Specifically, this change results in a significant performance improvement for threaded name service in BIND9 under load. Suggested by: Jinmei_Tatsuya at isc dot org	2006-05-06 11:24:59 +00:00
Bjoern A. Zeeb	91b309a1c4	Make sure the ip data pointer is correct before touching it again after ipsec4_output processing else KAME IPSec using the handbook configuration with gif(4) will panic the kernel. Problem reported by: t. patterson <tp lot.org> Tested by: t. patterson <tp lot.org>	2006-05-05 07:31:03 +00:00
Robert Watson	3127286870	Only return (tw) from tcp_twclose() if reuse is passed, otherwise return NULL. In principle this shouldn't change the behavior, but avoids returning a potentially invalid/inappropriate pointer to the caller. Found with: Coverity Prevent (tm) Submitted by: pjd MFC after: 3 months	2006-05-05 06:50:23 +00:00
Pawel Jakub Dawidek	1d7d0bfe5e	/tmp/cvsTXPIwQ	2006-05-05 06:24:34 +00:00
Marcel Moolenaar	7c5a8ab212	In in_pcbdrop(), fix !INVARIANTS build.	2006-04-25 23:23:13 +00:00
Robert Watson	8e3f3b169e	Rename 'last' to 'inp' in udp_append(): the name 'last' is due to the fact that the loop through inpcb's in udp_input() tracks the last inpcb while looping. We keep that name in the calling loop but not in the delivery routine itself. MFC after: 3 months	2006-04-25 17:38:08 +00:00
Robert Watson	10702a2840	Abstract inpcb drop logic, previously just setting of INP_DROPPED in TCP, into in_pcbdrop(). Expand logic to detach the inpcb from its bound address/port so that dropping a TCP connection releases the inpcb resource reservation, which since the introduction of socket/pcb reference count updates, has been persisting until the socket closed rather than being released implicitly due to prior freeing of the inpcb on TCP drop. MFC after: 3 months	2006-04-25 11:17:35 +00:00
Robert Watson	c78cbc7b1d	Instead of calling tcp_usr_detach() from tcp_usr_abort(), break out common pcb tear-down logic into tcp_detach(), which is called from either. Invoke tcp_drop() from the tcp_usr_abort() path rather than tcp_disconnect(), as we want to drop it immediately not perform a FIN sequence. This is one reason why some people were experiencing panics in sodealloc(), as the netisr and aborting thread were simultaneously trying to tear down the socket. This bug could often be reproduced using repeated runs of the listenclose regression test. MFC after: 3 months PR: 96090 Reported by: Peter Kostouros <kpeter at melbpc dot org dot au>, kris Tested by: Peter Kostouros <kpeter at melbpc dot org dot au>, kris	2006-04-24 08:20:02 +00:00
Robert Watson	9106a6d6b0	Replace isn_mtx direct use with ISN_*() lock macros so that locking details/strategy can be changed without touching every use. MFC after: 3 months	2006-04-23 12:27:42 +00:00
Robert Watson	4c0e8f41f6	Introduce a new TCP mutex, isn_mtx, which protects the initial sequence number state, rather than re-using pcbinfo. This introduces some additional mutex operations during isn query, but avoids hitting the TCP pcbinfo lock out of yet another frequently firing TCP timer. MFC after: 3 months	2006-04-22 19:23:24 +00:00
Robert Watson	602cc7f12b	Assert the inpcb lock when rehashing an inpcb. Improve consistency of style around some current assertions. MFC after: 3 months	2006-04-22 19:15:20 +00:00
Robert Watson	6466b28a40	Remove pcbinfo locking from in_setsockaddr() and in_setpeeraddr(); holding the inpcb lock is sufficient to prevent races in reading the address and port, as both the inpcb lock and pcbinfo lock are required to change the address/port. Improve consistency of spelling in assertions about inp != NULL. MFC after: 3 months	2006-04-22 19:10:02 +00:00
Paul Saab	4f590175b7	Allow for nmbclusters and maxsockets to be increased via sysctl. An eventhandler is used to update all the various zones that depend on these values.	2006-04-21 09:25:40 +00:00
Gleb Smirnoff	4cbb118526	Merge rev. 1.240 of ip_output.c, so that IPFIREWALL_FORWARD_EXTENDED kernel option will affect both forwarding methods - classic and fast.	2006-04-18 09:20:16 +00:00
Robert Watson	3cbe7fafa5	Modify tcp_timewait() to accept an inpcb reference, not a tcptw reference. For now, we allow the possibility that the in_ppcb pointer in the inpcb may be NULL if a timewait socket has had its tcptw structure recycled. This allows tcp_timewait() to consistently unlock the inpcb. Reported by: Kazuaki Oda <kaakun at highway dot ne dot jp> MFC after: 3 months	2006-04-09 16:59:19 +00:00
Mohan Srinivasan	1714e18e79	Eliminate debug code that catches bugs in the hinting of sack variables (tcp_sack_output_debug checks cached hints aginst computed values by walking the scoreboard and reports discrepancies). The sack hinting code has been stable for many months now so it is time for the debug code to go. Leaving tcp_sack_output_debug ifdef'ed out in case we need to resurrect it at a later point.	2006-04-06 17:21:16 +00:00
Robert Watson	a460ae4b4c	Don't unlock a timewait structure if the pointer is NULL in tcp_timewait(). This corrects a bug (or lack of fixing of a bug) in tcp_input.c:1.295. Submitted by: Kazuaki Oda <kaakun at highway dot ne dot jp> MFC after: 3 months	2006-04-05 08:45:59 +00:00
Mohan Srinivasan	1f65c2cd31	Certain (bad) values of sack blocks can end up corrupting the sack scoreboard. Make the checks in tcp_sack_doack() more robust to prevent this. Submitted by: Raja Mukerji (raja@mukerji.com) Reviewed by: Mohan Srinivasan	2006-04-05 00:11:04 +00:00
Gleb Smirnoff	a73b656763	Add a tunable net.inet.tcp.maxtcptw, that allows to set a limit on tcptw zone independently from setting a limit on socket zone.	2006-04-04 14:31:37 +00:00
Robert Watson	ae0e714308	Before dereferencing intotw() when INP_TIMEWAIT, check for inp_ppcb being NULL. We currently do allow this to happen, but may want to remove that possibility in the future. This case can occur when a socket is left open after TCP wraps up, and the timewait state is recycled. This will be cleaned up in the future. Found by: Kazuaki Oda <kaakun at highway dot ne dot jp> MFC after: 3 months	2006-04-04 12:26:07 +00:00
Robert Watson	cb895fb9b0	In TCP notify routines, check inpcb for INP_TIMEWAIT and INP_DROPPED. The INP_DROPPED check replaces the current NULL checks; the INP_TIMEWAIT checks appear to have always been required, but not been there, which is/was a bug. This avoids unconditionally casting of in_ppcb to a tcpcb, when it may be a twtcb, which may have resulted in obscure ICMP-related panics in earlier releases. MFC after: 3 months	2006-04-03 14:07:50 +00:00
Robert Watson	afa39e25c4	Change inp_ppcb from caddr_t to void , fix/remove associated related casts. Consistently use intotw() to cast inp_ppcb pointers to struct tcptw pointers. Consistently use intotcpcb() to cast inp_ppcb pointers to struct tcpcb * pointers. Don't assign tp to the results to intotcpcb() during variable declation at the top of functions, as that is before the asserts relating to locking have been performed. Do this later in the function after appropriate assertions have run to allow that operation to be conisdered safe. MFC after: 3 months	2006-04-03 13:33:55 +00:00
Robert Watson	43f56a32a0	Style tweaks: convert to ANSI from K&R function prototypes. MFC after: 3 months	2006-04-03 12:59:27 +00:00
Robert Watson	2fc5ae87d0	Update comment on tcp_close() for new world order. MFC after: 3 months	2006-04-03 12:52:13 +00:00
Robert Watson	e6e65783d6	Clarify comment on handling of non-timewait TCP states in tcp_usr_detach(). MFC after: 3 months	2006-04-03 12:43:56 +00:00
Robert Watson	fa38deac65	Fix up locking surrounding tcp_drop sysctl: in the new world order, we don't free inpcbs until after the socket is closed, so we always need to unlock an inpcb after calling tcp_drop() on it. MFC after: 3 months	2006-04-03 11:57:12 +00:00
Robert Watson	3d2d3ef434	After checking for SO_ISDISCONNECTED in tcp_usr_accept(), return immediately rather than jumping to the normal output handling, which assumes we've pulled out the inpcb, which hasn't happened at this point (and isn't necessary). Return ECONNABORTED instead of EINVAL when the inpcb has entered INP_TIMEWAIT or INP_DROPPED, as this is the documented error value. This may correct the panic seen by Ganbold. MFC after: 1 month Reported by: Ganbold <ganbold at micom dot mng dot net>	2006-04-03 09:52:55 +00:00
Robert Watson	a34f6c1e1d	Correct incorrect assertion in div_bind(): inp must not be NULL here. Reported by: tegge MFC after: 3 months	2006-04-03 09:01:17 +00:00
Robert Watson	953b5606df	During reformulation of tcp_usr_detach(), the call to initiate TCP disconnect for fully connected sockets was dropped, meaning that if the socket was closed while the connection was alive, it would be leaked. Structure tcp_usr_detach() so that there are two clear parts: initiating disconnect, and reclaiming state, and reintroduce the tcp_disconnect() call in the first part. MFC after: 3 months	2006-04-02 16:42:51 +00:00
Robert Watson	34af7bae80	Properly handle an edge case previously not handled correctly: a socket can have a tcp connection that has entered time wait attached to it, in the event that shutdown() is called on the socket and the FINs properly exchange before close(). In this case we don't detach or free the inpcb, just leave the tcptw detached and freed, but we must release the inpcb lock (which we didn't previously). MFC after: 3 months	2006-04-01 23:53:25 +00:00
Robert Watson	623dce13c6	Update TCP for infrastructural changes to the socket/pcb refcount model, pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, the receive code no longer requires the pcbinfo lock, and the send code only requires it if building a new connection on an otherwise unconnected socket triggered via sendto() with an address. This should significnatly reduce tcbinfo lock contention in the receive and send cases. - In order to support the invariant that so_pcb != NULL, it is now necessary for the TCP code to not discard the tcpcb any time a connection is dropped, but instead leave the tcpcb until the socket is shutdown. This case is handled by setting INP_DROPPED, to substitute for using a NULL so_pcb to indicate that the connection has been dropped. This requires the inpcb lock, but not the pcbinfo lock. - Unlike all other protocols in the tree, TCP may need to retain access to the socket after the file descriptor has been closed. Set SS_PROTOREF in tcp_detach() in order to prevent the socket from being freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether or not it needs to free the socket when the connection finally does close. The typical case where this occurs is if close() is called on a TCP socket before all sent data in the send socket buffer has been transmitted or acknowledged. If INP_SOCKREF is found when the connection is dropped, we release the inpcb, tcpcb, and socket instead of flagging INP_DROPPED. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Annotate the existence of a long-standing race in the TCP timer code, in which timers are stopped but not drained when the socket is freed, as waiting for drain may lead to deadlocks, or have to occur in a context where waiting is not permitted. This race has been handled by testing to see if the tcpcb pointer in the inpcb is NULL (and vice versa), which is not normally permitted, but may be true of a inpcb and tcpcb have been freed. Add a counter to test how often this race has actually occurred, and a large comment for each instance where we compare potentially freed memory with NULL. This will have to be fixed in the near future, but requires is to further address how to handle the timer shutdown shutdown issue. - Several TCP calls no longer potentially free the passed inpcb/tcpcb, so no longer need to return a pointer to indicate whether the argument passed in is still valid. - Un-macroize debugging and locking setup for various protocol switch methods for TCP, as it lead to more obscurity, and as locking becomes more customized to the methods, offers less benefit. - Assert copyright on tcp_usrreq.c due to significant modifications that have been made as part of this work. These changes significantly modify the memory management and connection logic of our TCP implementation, and are (as such) High Risk Changes, and likely to contain serious bugs. Please report problems to the current@ mailing list ASAP, ideally with simple test cases, and optionally, packet traces. MFC after: 3 months	2006-04-01 16:36:36 +00:00
Robert Watson	14ba8add01	Update in_pcb-derived basic socket types following changes to pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, in protocol shutdown methods, and in raw IP send. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Invoke in_pcbfree() after in_pcbdetach() in order to free the detached in_pcb structure for a socket. MFC after: 3 months	2006-04-01 16:20:54 +00:00
Robert Watson	4c7c478d0f	Break out in_pcbdetach() into two functions: - in_pcbdetach(), which removes the link between an inpcb and its socket. - in_pcbfree(), which frees a detached pcb. Unlike the previous in_pcbdetach(), neither of these functions will attempt to conditionally free the socket, as they are responsible only for managing in_pcb memory. Mirror these changes into in6_pcbdetach() by breaking it into in6_pcbdetach() and in6_pcbfree(). While here, eliminate undesired checks for NULL inpcb pointers in sockets, as we will now have as an invariant that sockets will always have valid so_pcb pointers. MFC after: 3 months	2006-04-01 16:04:42 +00:00
Robert Watson	bc725eafc7	Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months	2006-04-01 15:42:02 +00:00
Robert Watson	ac45e92ff2	Change protocol switch pru_abort() API so that it returns void rather than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this. This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components. MFC after: 3 months	2006-04-01 15:15:05 +00:00
Robert Watson	6882aa2c03	Define two new inpcb flags in the inp_vflag field, which for whatever reason, seems to be where new flags are getting defined: INP_DROPPED - The protocol has terminated this connection and the socket is not reusable: when the socket code enters the protocol, an error is immediately returned. This will substitute for NULLing the so_pcb socket field, helping to implement the invariant that all valid sockets have valid pcb's in TCP. INP_SOCKREF - The protocol has become the owner of the socket reference, and will need to free it when freeing the pcb, which will be used when a TCP socket is closed but still has queued data. MFC after: 1 month	2006-03-26 11:30:31 +00:00
Robert Watson	a07b8fd178	Minor style tweak: tab after #define, not space. MFC after: 1 month	2006-03-26 11:26:12 +00:00
Robert Watson	1c53f80637	Explicitly assert socket pointer is non-NULL in tcp_input() so as to provide better debugging information. Prefer explicit comparison to NULL for tcpcb pointers rather than treating them as booleans. MFC after: 1 month	2006-03-26 01:33:41 +00:00
Gleb Smirnoff	0fa0801895	o Introduce carp_multicast_cleanup(), which removes and frees multicast addresses from carp interface. [1] o Rewrite carpdetach(), so that it does the following things: [1] - Stops callouts. - Decrements carp_suppress_preempt, if needed. - Downs interface and sets CARP state to INIT. - Calls carp_multicast_cleanup(). - Detaches softc from carp_if and if we are the last frees the carp_if. o Use new carpdetach() in carp_clone_destroy(). o In carp_ifdetach() acquire the carp_if lock and cleanup all interfaces hanging on carp_if. [1] o Make carp_ifdetach() static and use EVENT(9) to call it from if_detach(). [2] o In carp_setrun() exit if the softc doesn't have a valid pointer to parent. [1] Obtained from: OpenBSD [1] Submitted by: Dan Lukes <dan obluda.cz> [2] PR: kern/82908 [2]	2006-03-21 14:29:48 +00:00
Giorgos Keramidas	f92e18f4fc	Add descriptions for the sysctls: net.inet.icmp.drop_redirect net.inet.icmp.log_redirect net.inet.icmp.icmplim net.inet.icmp.icmplim_output Approved & text by: andre	2006-03-20 21:44:12 +00:00
David Malone	fcd1001c63	Make net.inet.ip.portrange.reservedhigh and net.inet.ip.portrange.reservedlow apply to IPv6 aswell as IPv4. We could have made new sysctls for IPv6, but that potentially makes things complicated for mapped addresses. This seems like the least confusing option and least likely to cause obscure problems in the future. This change makes the mac_portacl module useful with IPv6 apps. Reviewed by: ume MFC after: 1 month	2006-03-19 11:48:48 +00:00
Robert Watson	92c07a345e	Change soabort() from returning int to returning void, since all consumers ignore the return value, soabort() is required to succeed, and protocols produce errors here to report multiple freeing of the pcb, which we hope to eliminate.	2006-03-16 07:03:14 +00:00
Andrew Thompson	7f2d8767e0	Further refine the bridge hack in the arp code. Only do the special arp handling for interfaces which are actually in the bridge group, ignore all others. MFC after: 3 days	2006-03-07 21:40:44 +00:00
Gleb Smirnoff	e2779391fa	- Do not leak read lock in IP_FW_TABLE_GETSIZE case of ipfw_ctl(). - Acquire read (not write) lock in case of IP_FW_TABLE_LIST. In collaboration with: ru	2006-03-03 12:10:59 +00:00
Andre Oppermann	464fcfbc5c	Rework TCP window scaling (RFC1323) to properly scale the send window right from the beginning and partly clean up the differences in handling between SYN_SENT and SYN_RCVD (syncache). Further changes to this code to come. This is a first incremental step to a general overhaul and streamlining of the TCP code. PR: kern/15095 PR: kern/92690 (partly) Reviewed by: qingli (and tested with ANVL) Sponsored by: TCP/IP Optimization Fundraise 2005	2006-02-28 23:05:59 +00:00
Qing Li	4b8e98d632	This patch fixes the problem where the current TCP code can not handle simultaneous open. Both the bug and the patch were verified using the ANVL test suite. PR: kern/74935 Submitted by: qingli (before I became committer) Reviewed by: andre MFC after: 5 days	2006-02-23 21:14:34 +00:00
Hajimu UMEMOTO	9bf40914d2	Obey opt_inet6.h in kernel build directory. Reported by: Peter Losher <plosher-keyword-freebsd.a36e57__at__plosh.net> MFC after: 3 days	2006-02-20 12:30:32 +00:00
Andre Oppermann	8e8aab7aec	Remove unneeded includes and provide more accurate description to others. Submitted by: garys PR: kern/86437	2006-02-18 17:05:00 +00:00
Andre Oppermann	da3482e0f6	Add missing TH_PUSH to the TH_FLAGS enumeration. Submitted by: Andre Albsmeier <Andre.Albsmeier-at-siemens.com> PR: kern/85203	2006-02-18 16:50:08 +00:00
Andre Oppermann	eaf80179e2	Have TCP Inflight disable itself if the RTT is below a certain threshold. Inflight doesn't make sense on a LAN as it has trouble figuring out the maximal bandwidth because of the coarse tick granularity. The sysctl net.inet.tcp.inflight.rttthresh specifies the threshold in milliseconds below which inflight will disengage. It defaults to 10ms. Tested by: Joao Barros <joao.barros-at-gmail.com>, Rich Murphey <rich-at-whiteoaklabs.com> Sponsored by: TCP/IP Optimization Fundraise 2005	2006-02-16 19:38:07 +00:00
Andre Oppermann	cf744713e8	In in_pcbconnect_setup() reduce code duplication and use ip_rtaddr() to find the outgoing interface for this connection. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 2 weeks	2006-02-16 15:45:28 +00:00
Andre Oppermann	a4684d742d	Make sysctl_msec_to_ticks(SYSCTL_HANDLER_ARGS) generally available instead of being private to tcp_timer.c. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-02-16 15:40:36 +00:00
Ruslan Ermilov	ea9dce1461	When sending a packet from dummynet, indicate that we're forwarding it so that ip_id etc. don't get overwritten. This fixes forwarding of fragmented IP packets through a dummynet pipe -- fragments came out with modified and different(!) ip_id's, making it impossible to reassemble a datagram at the receiver side. Submitted by: Alexander Karptsov (reworked by me) MFC after: 3 days	2006-02-14 06:36:39 +00:00
Qing Li	eee9df08bd	Set the M_ZERO flag when calling uma_zalloc() to allocate a syncache entry. Reviewed by: andre, glebius MFC after: 3 days	2006-02-09 21:29:02 +00:00
Qing Li	c1fd993af9	Redo the previous fix by setting the UMA_ZONE_ZINIT bit in the syncache zone, eliminating the need to call bzero() after each syncache entry allocation. Suggested by: glebius Reviewed by: andre MFC after: 3 days	2006-02-08 23:32:57 +00:00
Qing Li	737b12e98f	Fixes a crash due to the memory of the newly allocated syncache entry in syncache_lookup() is not cleared and may lead to an arbitrary and bogus rtentry pointer which later gets free'd. Reviewed by: andre MFC after: 3 days	2006-02-07 19:59:46 +00:00
Oleg Bulyzhin	6edb555dbc	Fix five years old bug in ip_reass(): if we are using 'full' (i.e. including pseudo header) hardware rx checksum offloading ip_reass() fails to calculate TCP/UDP checksum for reassembled packet correctly. This also should fix recent 'NFS over UDP over bge' issue exposed by if_bge.c rev. 1.123 Reviewed by: sam (earlier version), bde Approved by: glebius (mentor) MFC after: 2 weeks	2006-02-07 11:48:10 +00:00
Hajimu UMEMOTO	d5e8a67ee9	Never select the PCB that has INP_IPV6 flag and is bound to :: if we have another PCB which is bound to 0.0.0.0. If a PCB has the INP_IPV6 flag, then we set its cost higher than IPv4 only PCBs. Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME MFC after: 1 week	2006-02-04 07:59:17 +00:00
Gleb Smirnoff	a7908db153	Dropping the lock in the transmit_event() is not safe, because we store some pipe pointers on stack. If user reconfigures dummynet in the interlock gap, we can work with freed pipes after relock. To fix this, we decided not to send packets in transmit_event(), but fill a queue. At the end of dummynet() and dummynet_io(), after the lock is dropped, if there is something in the queue we run dummynet_send() to process the queue. In collaboration with: ru	2006-02-03 11:38:19 +00:00
Gleb Smirnoff	ce62866023	Axe unused function.	2006-02-03 10:42:28 +00:00
Christian S.J. Peron	f5cdbcf14c	Use PFIL_HOOKED macros in if_bridge and pass the right argument to rw_assert. This un-breaks the build. Submitted by: Kostik Belousov Pointy hat to: csjp	2006-02-02 16:41:20 +00:00
Christian S.J. Peron	604afec496	Somewhat re-factor the read/write locking mechanism associated with the packet filtering mechanisms to use the new rwlock(9) locking API: - Drop the variables stored in the phil_head structure which were specific to conditions and the home rolled read/write locking mechanism. - Drop some includes which were used for condition variables - Drop the inline functions, and convert them to macros. Also, move these macros into pfil.h - Move pfil list locking macros intp phil.h as well - Rename ph_busy_count to ph_nhooks. This variable will represent the number of IN/OUT hooks registered with the pfil head structure - Define PFIL_HOOKED macro which evaluates to true if there are any hooks to be ran by pfil_run_hooks - In the IP/IP6 stacks, change the ph_busy_count comparison to use the new PFIL_HOOKED macro. - Drop optimization in pfil_run_hooks which checks to see if there are any hooks to be ran, and returns if not. This check is already performed by the IP stacks when they call: if (!PFIL_HOOKED(ph)) goto skip_hooks; - Drop in assertion which makes sure that the number of hooks never drops below 0 for good measure. This in theory should never happen, and if it does than there are problems somewhere - Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep - Drop variables which support home rolled read/write locking mechanism from the IPFW firewall chain structure. - Swap out the read/write firewall chain lock internal to use the rwlock(9) API instead of our home rolled version - Convert the inlined functions to macros Reviewed by: mlaier, andre, glebius Thanks to: jhb for the new locking API	2006-02-02 03:13:16 +00:00
Andre Oppermann	1dfcf0d2a3	Move the IPSEC related code blocks to their own file to unclutter and signifincantly improve the readability of ip_input() and ip_output() again. The resulting IPSEC hooks in ip_input() and ip_output() may be used later on for making IPSEC loadable. This move is mostly mechanical and should preserve current IPSEC behaviour as-is. Nothing shall prevent improvements in the way IPSEC interacts with the IPv4 stack. Discussed with: bz, gnn, rwatson; (earlier version)	2006-02-01 13:55:03 +00:00
Ruslan Ermilov	e46c3da737	Brain-o (use standard int types now).	2006-02-01 06:15:37 +00:00
Ruslan Ermilov	bc7eeed4c9	Fix multicast routing on 64-bit platforms. Tested on: amd64 MFC after: 3 days	2006-01-31 22:39:35 +00:00
Andrew Thompson	235073f4c0	Now that the bridge also processes Ethernet frames as itself, two arp replies will be sent if there is an address on the bridge. Exclude the bridge from the special arp handling. This has been tested with all combinations of addresses on the bridge and members. Pointed out by: Michal Mertl	2006-01-31 21:29:41 +00:00
Gleb Smirnoff	25af0bb50e	Add some initial locking to gif(4). It doesn't covers the whole driver, however IPv4-in-IPv4 tunnels are now stable on SMP. Details: - Add per-softc mutex. - Hold the mutex on output. The main problem was the rtentry, placed in softc. It could be freed by ip_output(). Meanwhile, another thread being in in_gif_output() can read and write this rtentry. Reported by: many Tested by: Alexander Shiryaev <aixp mail.ru>	2006-01-30 08:39:09 +00:00
Andrew Thompson	74948aa6f3	Back out of r1.148, it causes two arp replies to be sent with different mac addresses. One for the bridged interface with the IP address assigned but then another with the mac for the bridge itself.	2006-01-29 23:21:01 +00:00
Andre Oppermann	ab48768b20	When doing IP forwarding with [FAST_]IPSEC compiled into the kernel ip_forward() would report back a zero MTU in ICMP needfrag messages because on a IPSEC SP lookup failure no MTU got computed. Fix this by changing the logic to compute a new MTU in any case if IPSEC didn't do it. Change MTU computation logic to use egress interface MTU if available or the next smaller MTU compared to the current packet size instead of falling back to a very small fixed MTU. Fix associated comment. PR: kern/91412 MFC after: 3 days	2006-01-24 17:57:19 +00:00
Andre Oppermann	1dec73a153	In ip_mdq() compute the TV_DELTA the correct way around. PR: kern/91851 Submitted by: SAKAI Hiroaki <sakai.hiroaki-at-jp.fujitsu.com> MFC after: 3 days	2006-01-24 17:09:12 +00:00
Andre Oppermann	31343a3da2	In in_control() remove the temporary in_ifaddr structure from the ia_hash only if it actually is an AF_INET address. All other places test for sa_family == AF_INET but this one. PR: kern/92091 Submitted by: Seth Kingsley <sethk-at-meowfishies.com> MFC after: 3 days	2006-01-24 16:19:31 +00:00
Oleg Bulyzhin	44a515834f	Fix minor bug in uRPF: If net.link.ether.inet.useloopback=1 and we send broadcast packet using our own source ip address it may be rejected by uRPF rules. Same bug was fixed for IPv6 in rev. 1.115 by suz. PR: kern/76971 Approved by: glebius (mentor) MFC after: 3 days	2006-01-24 13:38:06 +00:00
Gleb Smirnoff	0b4ae859ac	Implement 'ipfw fwd laddr,port' feature for UDP. According to ipfw(8) it should work, however it never did. People expect it to work. PR: kern/90834	2006-01-24 09:08:54 +00:00
Gleb Smirnoff	1c0b0f523d	Fix build.	2006-01-23 20:10:49 +00:00
Andre Oppermann	06003a1e7c	Simplify ip_next_mtu() and make its logic more easy to see while silencing code analysis tools. Found by: Coverity Prevent(tm) Coverity ID: CID341 Sponsored by: TCP/IP Optimization Fundraise 2005	2006-01-23 17:06:32 +00:00
Robert Watson	136d4f1cf2	Convert remaining functions to ANSI C function declarations; remove 'register' where present. MFC after: 1 week	2006-01-22 01:16:25 +00:00
Robert Watson	d0c75d36b9	Convert last remaining function in ip_gre.c to ANSI C function declaration. MFC after: 1 week	2006-01-22 01:08:30 +00:00
Bjoern A. Zeeb	3f2e28fe9f	Fix stack corruptions on amd64. Vararg functions have a different calling convention than regular functions on amd64. Casting a varag function to a regular one to match the function pointer declaration will hide the varargs from the caller and we will end up with an incorrectly setup stack. Entirely remove the varargs from these functions and change the functions to match the declaration of the function pointers. Remove the now unnecessary casts. Lots of explanations and help from: peter Reviewed by: peter PR: amd64/89261 MFC after: 6 days	2006-01-21 10:44:34 +00:00
Christian S.J. Peron	9c57c204be	- Change the return type for init_tables from void to int so we can propagate errors from rn_inithead back to the ipfw initialization function. - Check return value of rn_inithead for failure, if table allocation has failed for any reason, free up any tables we have created and return ENOMEM - In ipfw_init check the return value of init_tables and free up any mutexes or UMA zones which may have been created. - Assert that the supplied table is not NULL before attempting to dereference. This fixes panics which were a result of invalid memory accesses due to failed table allocation. This is an issue mainly because the R_Zalloc function is a malloc(M_NOWAIT) wrapper, thus making it possible for allocations to fail. Found by: Coverity Prevent (tm) Coverity ID: CID79 MFC after: 1 week	2006-01-20 05:35:27 +00:00
Christian S.J. Peron	e9186cb94b	Destroy the dynamic rule zone in the event that we fail to insert the initial default rule. MFC after: 1 week	2006-01-20 03:21:25 +00:00
Andre Oppermann	0270746230	Do not derefence the ip header pointer in the IPv6 case. This fixes a bug in the previous commit. Found by: Coverity Prevent(tm) Coverity ID: CID253 Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-01-18 18:59:30 +00:00
Andre Oppermann	8f8d29f686	In in_delayed_cksum() we can't perform a m_pullup() as it may change the mbuf pointer and we don't have any way of passing it back to the callers. Instead just fail silently without updating the checksum but leaving the mbuf+chain intact. A search in our GNATS database did not turn up any match for the existing warning message when this case is encountered. Found by: Coverity Prevent(tm) Coverity ID: CID779 Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-01-18 18:49:16 +00:00
Andre Oppermann	79eb490467	In syncache_expand() insert a proper syncache_free() to fix a case that currently can't be triggered. But better be safe than sorry later on. Additionally it properly silences Coverity Prevent for future tests. Found by: Coverity Prevent(tm) Coverity ID: CID802 Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-01-18 18:25:03 +00:00
Andre Oppermann	39550088cf	Prevent dereferencing a NULL route pointer when trying to update the route MTU. This bug is very difficult to reach and not remotely exploitable. Found by: Coverity Prevent(tm) Coverity ID: CID162 Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-01-18 15:05:05 +00:00
Andre Oppermann	5d691e6da8	Return mbuf pointer or NULL from ip_fastforward() as the mbuf pointer may have changed by m_pullup() during fastforward processing. While this is a bug it is actually never triggered in real world situations and it is not remotely exploitable. Found by: Coverity Prevent(tm) Coverity ID: CID780 Sponsored by: TCP/IP Optimization Fundraise 2005	2006-01-18 14:24:39 +00:00
Robert Watson	d248c7d7f5	Modify the IP fragment reassembly code so that it uses a new UMA zone, ipq_zone, to allocate fragment headers from, rather than using cast mbuf storage. This was one of the few remaining uses of mbuf storage for local data structures that relied on dtom(). Implement the resource limit on ipq's using UMA zone limits, but preserve current sysctl semantics using a sysctl proc. MFC after: 3 weeks	2006-01-15 18:58:21 +00:00
Robert Watson	dfa60d9354	Staticize ipqlock, since it is local to ip_input.c. MFC after: 3 days	2006-01-15 17:05:48 +00:00
George V. Neville-Neil	34f83c52e7	Check the correct TTL in both the IPv6 and IPv4 cases. Submitted by: glebius Reviewed by: gnn, bz Found with: Coverity Prevent(tm)	2006-01-14 16:39:31 +00:00
Gleb Smirnoff	ecedca7441	UMA can return NULL not only in case when our zone is full, but also in case of generic memory shortage. In the latter case we may not find an old entry. Found with: Coverity Prevent(tm)	2006-01-14 13:04:08 +00:00
Robert Watson	e5bc0aa3c3	Remove dead code: 'opts' is not used in udp_append(), only in udp_input(), so no need to assign it to NULL or conditionally free it. Found with: Coverity Prevent(tm) MFC after: 3 days	2006-01-14 11:18:32 +00:00
Andrew Thompson	54c427e0e2	Include the bridge interface itself in the special arp handling. PR: 90973 MFC after: 1 week	2006-01-12 21:05:30 +00:00
Colin Percival	9ed97bee65	Correct insecure temporary file usage in texindex. [06:01] Correct insecure temporary file usage in ee. [06:02] Correct a race condition when setting file permissions, sanitize file names by default, and fix a buffer overflow when handling files larger than 4GB in cpio. [06:03] Fix an error in the handling of IP fragments in ipfw which can cause a kernel panic. [06:04] Security: FreeBSD-SA-06:01.texindex Security: FreeBSD-SA-06:02.ee Security: FreeBSD-SA-06:03.cpio Security: FreeBSD-SA-06:04.ipfw	2006-01-11 08:02:16 +00:00
Andrew Thompson	73ff045c57	Add RFC 3378 EtherIP support. This change makes it possible to add gif interfaces to bridges, which will then send and receive IP protocol 97 packets. Packets are Ethernet frames with an EtherIP header prepended. Obtained from: NetBSD MFC after: 2 weeks	2005-12-21 21:29:45 +00:00
Xin LI	92e0a4a2a4	Use consistent indent character as other IPPROTO_* lines did.	2005-12-20 09:38:03 +00:00
George V. Neville-Neil	496f9fc522	Add protocol number for SCTP. Submitted by: Randall Stewart rrs at cisco.com MFC after: 1 week	2005-12-20 09:24:04 +00:00
Gleb Smirnoff	3939390679	Add a knob to suppress logging of attempts to modify permanent ARP entries. Submitted by: Andrew Alcheyev <buddy telenet.ru>	2005-12-18 19:11:56 +00:00
Ed Maste	bd2b686fe8	Add descriptions for sysctl -d. Approved by: glebius Silence from: rwatson (mentor)	2005-12-16 15:01:44 +00:00
Gleb Smirnoff	6e02dbdfa3	Cleanup __FreeBSD_version.	2005-12-16 13:10:32 +00:00
John Baldwin	636a309adb	Use %t (ptrdiff_t modifier) to print a couple of pointer differences rather than casting them to int.	2005-12-15 21:57:32 +00:00
Maxime Henrion	e59898ff36	Fix a bunch of SYSCTL_INT() that should have been SYSCTL_ULONG() to match the type of the variable they are exporting. Spotted by: Thomas Hurst <tom@hur.st> MFC after: 3 days	2005-12-14 22:27:48 +00:00

... 9 10 11 12 13 ...

3455 commits